Search This Blog

Thursday, April 12, 2018

How to access PostgreSQL with Python: a chemistry example

PostgreSQL is an open-source database. How can you connect to your local PostgreSQL database and access data in PostgreSQL tables by scripting in Python? Answer: by using the Psycopg PostgreSQL adapter.

I herein demonstrate data extraction with an example solving a common task in chemistry: the calculation of the molar mass for a given molecular formula by retrieving atomic weights from a PostgreSQL table.

Let's begin with the atomic-weight table. We will use published standard atomic weights (DOI: dx.doi.org/10.1351/pac200678112051): Atomic Weights of the Elements 2005 (IUPAC Technical Report). We use the following implementation for table awe2005:

Column Type Constraints Comment or Description
id serial primary key
elem varchar(12) not null, UNIQUE Element name
symb varchar(3) not null, UNIQUE Element symbol
numb smallint not null, UNIQUE Element number
awr double precision
Atomic weight, rounded
awu varchar(20)
Atomic weight with uncertainty
footnotes varchar(5)
Footnotes
stbliso boolean
Stable isotope existence

You are welcome to download the atomic-weight table in CSV format—avalaible as awe2005.csv—and import it into your PostgreSQL database. In our demonstration, we only access values of the columns symb and awr from this table.

Now let's write the Python function calc_molar_mass that will calculate the molar mass for a molecular formula, which is passed as argument molform. This variable is a list of pairs. Each pair is a tuple containing an element symbol and its subscript. For example, the molecular formula for ascorbic acid, C6H8O6, would be passed as
molform = [('C',6), ('H',8), ('O',6)]
Function  calc_molar_mass needs to perform the following steps:
  • Connect with database
  • Extract atomic weight values and update molar mass
  • Close database
  • Return result

 

Begin communication with the database

To connect to an existing PostgreSQL database on a local computer, the database password and the name of the database that contains table awe2005 need to be provided (by replacing my_password and my_database, respectively):

  try:    
     conn = psycopg2.connect(database="my_database",
              user="postgres", password="my_password",
              host="127.0.0.1", port="5432")        
 except:
     print("NOT connected!")

     return {}
 cur = conn.cursor()


If the connection to the database cannot be established, an empty dictionary is returned; otherwise, a cursor is initialized to perform database operations. Then, you are all set to query table awe2005.

Use the cursor to obtain atomic weight values for given symbols 

Formulate the query, execute it and fetch the table row:

   query = "SELECT * FROM awe2005 WHERE symb = '%s'" % atsymb
   cur.execute(query)
   row = cur.fetchone()  


The string variable atsymb contains the atomic symbol, for which we want to obtain the atomic weight value. The latter is in column awr, which is column 4 (start counting with 0 in the above table and take the column number as field index for row):

   atweight = row[4]

The complete molar mass calculation requires a loop over all atomic symbol/subscript pairs:

   molmass = 0.0
   for pair in molform:
      atsymb    = pair[0]
      subscript = pair[1]
      query = "SELECT * FROM awe2005 WHERE symb = '%s'" % atsymb
      cur.execute(query)
      row = cur.fetchone()  
      if row is None:
         # unknown atomic symbol
      else:
         atweight = row[4]
         if atweight is None:
            # no atomic weight value for this symbol
         else:
            molmass += atweight * float(subscript)

           

The appendix contains an executable script that takes care of  those cases where a symbols or value is not found. The script can also be downloaded: molmass.py (right-click and use "Save link as..." option).  For a molecular formula input,  it generates a list of lines recording both successful and unsuccessful calculation steps. 

Close database

Close the communication with the database:

   cur.close()
   conn.close()


Depending on your script design, connecting with and disconnecting from the database may occur elsewhere—independent of the need in particular functions—and you will simply pass the cursor to database-accessing functions.

Appendix: Completing the script

The following is a complete script that displays the molar mass calculation step-by-step (for each atomic symbol) and reports the occurrence of unrecognized symbols and missing values:

import argparse
import psycopg2
#======================================================================#
# Calculate molar mass for given molecular formula                     #
#======================================================================#
def calc_molar_mass(molform):

   # Begin communication with the database
   dbname = 'my_database'
   try:     
      conn = psycopg2.connect(database=dbname,
                   user='postgres', password='my_password',
                   host='127.0.0.1', port='5432')         
   except:
      print('NOT connected to %s!' % dbname)
      return {}
   cur = conn.cursor()

   # Perform calculation of molar mass
   molmass = 0.0
   steps = []
   cntUnknownSymbols = 0
   cntMissingValues  = 0
   for pair  in molform:
      atsymb    = pair[0]
      subscript = pair[1]
      query = "SELECT * FROM awe2005 WHERE symb = '%s'" % atsymb
      cur.execute(query)
      row = cur.fetchone()  

      if row is None:
         cntUnknownSymbols += 1
         steps.append("            ?")
      else:
         atweight = row[4]
         if atweight is None:
            cntMissingValues += 1
            step = "            %-3s %6s            ?" %\
                                  (atsymb,subscript)
            steps.append(step)
         else:
            contribution = atweight * float(subscript)
            step = "            %-3s %6s %12s" % (atsymb,subscript,\
                                    "%.3f" % contribution)
            steps.append(step)
            molmass += contribution

   # End communication with the database
   cur.close()
   conn.close()

   # Finish by formatting report lines
   status = ':'
   resultline = ''
   if cntUnknownSymbols == 0 and cntMissingValues == 0:
      resultline = "Molar mass in g/mol:   %12s" % ("%.3f" % molmass)
   else:
      status = resultline = "Molar mass calculation failed:     "
   steps.append('-' * len(resultline))
   steps.append(resultline)
   if cntUnknownSymbols > 0:
      steps.append("Unknown symbols:%3i " % cntUnknownSymbols)
   if cntMissingValues > 0:
      steps.append("Missing values: %3i " % cntMissingValues)
   steps.append('-' * len(resultline))

   # Return results as dictioary with three entries
   mmdict = {'molmass': molmass, 'steps': steps, 'status': status}
   return mmdict


#======================================================================#
# Run molar mass calculation                                           #
#======================================================================#
if __name__ == '__main__':

   #===================================================================#
   # Get molecular formula from command line                           #
   #===================================================================#
   parser = argparse.ArgumentParser()
   parser.add_argument('entered', nargs='*')
   args = parser.parse_args()
   linearMolform = args.entered[0]

   #===================================================================#
   # Loop over atomic symbol/subscript pairs and calculate molar mass  #
   #===================================================================#
   tokens = linearMolform.split()
   molform = []
   i = 0
   previous = ''
   for token in tokens:
      i += 1
      if i % 2 == 0: # when even number of tokens, add symbol/value pair
                     # to molform
         molform.append((previous,token))
      else:
         previous = token # hold atomic symbol until next iteration step
   mmdict = calc_molar_mass(molform)

   #===================================================================#
   # Display results                                                   #
   #===================================================================#
   print('Result:')
   steps = mmdict['steps']
   for step in steps:
      print(step)



Get the script as file molmass.py (right-click and use "Save link as..." option).

Execution example: Run the script via command line submission 

   python.exe molmass.py "C 6 H 8 O 6"

to calculate the molar mass for the formula C6H8O6, which, for example, is the molecular formula of ascorbic acid with a molar mass of 176.12 g/mol.

The provided script was tested with Python 3.4.2 under Windows 10. The following screen snippet shows the script performance within a PowerShell window:



No comments:

Post a Comment