Thursday, April 12, 2018

How to access PostgreSQL with Python: a chemistry example

PostgreSQL is an open-source database. How can you connect to your local PostgreSQL database and access data in PostgreSQL tables by scripting in Python? Answer: by using the Psycopg PostgreSQL adapter.

I herein demonstrate data extraction with an example solving a common task in chemistry: the calculation of the molar mass for a given molecular formula by retrieving atomic weights from a PostgreSQL table.

Let's begin with the atomic-weight table. We will use published standard atomic weights (DOI: dx.doi.org/10.1351/pac200678112051): Atomic Weights of the Elements 2005 (IUPAC Technical Report). We use the following implementation for table awe2005:

Column Type Constraints Comment or Description
id serial primary key
elem varchar(12) not null, UNIQUE Element name
symb varchar(3) not null, UNIQUE Element symbol
numb smallint not null, UNIQUE Element number
awr double precision
Atomic weight, rounded
awu varchar(20)
Atomic weight with uncertainty
footnotes varchar(5)
Footnotes
stbliso boolean
Stable isotope existence

You are welcome to download the atomic-weight table in CSV format—avalaible as awe2005.csv—and import it into your PostgreSQL database. In our demonstration, we only access values of the columns symb and awr from this table.

Now let's write the Python function calc_molar_mass that will calculate the molar mass for a molecular formula, which is passed as argument molform. This variable is a list of pairs. Each pair is a tuple containing an element symbol and its subscript. For example, the molecular formula for ascorbic acid, C6H8O6, would be passed as
molform = [('C',6), ('H',8), ('O',6)]
Function  calc_molar_mass needs to perform the following steps:
  • Connect with database
  • Extract atomic weight values and update molar mass
  • Close database
  • Return result

 

Begin communication with the database

To connect to an existing PostgreSQL database on a local computer, the database password and the name of the database that contains table awe2005 need to be provided (by replacing my_password and my_database, respectively):

  try:    
     conn = psycopg2.connect(database="my_database",
              user="postgres", password="my_password",
              host="127.0.0.1", port="5432")        
 except:
     print("NOT connected!")

     return {}
 cur = conn.cursor()


If the connection to the database cannot be established, an empty dictionary is returned; otherwise, a cursor is initialized to perform database operations. Then, you are all set to query table awe2005.

Use the cursor to obtain atomic weight values for given symbols 

Formulate the query, execute it and fetch the table row:

   query = "SELECT * FROM awe2005 WHERE symb = '%s'" % atsymb
   cur.execute(query)
   row = cur.fetchone()  


The string variable atsymb contains the atomic symbol, for which we want to obtain the atomic weight value. The latter is in column awr, which is column 4 (start counting with 0 in the above table and take the column number as field index for row):

   atweight = row[4]

The complete molar mass calculation requires a loop over all atomic symbol/subscript pairs:

   molmass = 0.0
   for pair in molform:
      atsymb    = pair[0]
      subscript = pair[1]
      query = "SELECT * FROM awe2005 WHERE symb = '%s'" % atsymb
      cur.execute(query)
      row = cur.fetchone()  
      if row is None:
         # unknown atomic symbol
      else:
         atweight = row[4]
         if atweight is None:
            # no atomic weight value for this symbol
         else:
            molmass += atweight * float(subscript)

           

The appendix contains an executable script that takes care of  those cases where a symbols or value is not found. The script can also be downloaded: molmass.py (right-click and use "Save link as..." option).  For a molecular formula input,  it generates a list of lines recording both successful and unsuccessful calculation steps. 

Close database

Close the communication with the database:

   cur.close()
   conn.close()


Depending on your script design, connecting with and disconnecting from the database may occur elsewhere—independent of the need in particular functions—and you will simply pass the cursor to database-accessing functions.

Appendix: Completing the script

The following is a complete script that displays the molar mass calculation step-by-step (for each atomic symbol) and reports the occurrence of unrecognized symbols and missing values:

import argparse
import psycopg2
#======================================================================#
# Calculate molar mass for given molecular formula                     #
#======================================================================#
def calc_molar_mass(molform):

   # Begin communication with the database
   dbname = 'my_database'
   try:     
      conn = psycopg2.connect(database=dbname,
                   user='postgres', password='my_password',
                   host='127.0.0.1', port='5432')         
   except:
      print('NOT connected to %s!' % dbname)
      return {}
   cur = conn.cursor()

   # Perform calculation of molar mass
   molmass = 0.0
   steps = []
   cntUnknownSymbols = 0
   cntMissingValues  = 0
   for pair  in molform:
      atsymb    = pair[0]
      subscript = pair[1]
      query = "SELECT * FROM awe2005 WHERE symb = '%s'" % atsymb
      cur.execute(query)
      row = cur.fetchone()  

      if row is None:
         cntUnknownSymbols += 1
         steps.append("            ?")
      else:
         atweight = row[4]
         if atweight is None:
            cntMissingValues += 1
            step = "            %-3s %6s            ?" %\
                                  (atsymb,subscript)
            steps.append(step)
         else:
            contribution = atweight * float(subscript)
            step = "            %-3s %6s %12s" % (atsymb,subscript,\
                                    "%.3f" % contribution)
            steps.append(step)
            molmass += contribution

   # End communication with the database
   cur.close()
   conn.close()

   # Finish by formatting report lines
   status = ':'
   resultline = ''
   if cntUnknownSymbols == 0 and cntMissingValues == 0:
      resultline = "Molar mass in g/mol:   %12s" % ("%.3f" % molmass)
   else:
      status = resultline = "Molar mass calculation failed:     "
   steps.append('-' * len(resultline))
   steps.append(resultline)
   if cntUnknownSymbols > 0:
      steps.append("Unknown symbols:%3i " % cntUnknownSymbols)
   if cntMissingValues > 0:
      steps.append("Missing values: %3i " % cntMissingValues)
   steps.append('-' * len(resultline))

   # Return results as dictioary with three entries
   mmdict = {'molmass': molmass, 'steps': steps, 'status': status}
   return mmdict


#======================================================================#
# Run molar mass calculation                                           #
#======================================================================#
if __name__ == '__main__':

   #===================================================================#
   # Get molecular formula from command line                           #
   #===================================================================#
   parser = argparse.ArgumentParser()
   parser.add_argument('entered', nargs='*')
   args = parser.parse_args()
   linearMolform = args.entered[0]

   #===================================================================#
   # Loop over atomic symbol/subscript pairs and calculate molar mass  #
   #===================================================================#
   tokens = linearMolform.split()
   molform = []
   i = 0
   previous = ''
   for token in tokens:
      i += 1
      if i % 2 == 0: # when even number of tokens, add symbol/value pair
                     # to molform
         molform.append((previous,token))
      else:
         previous = token # hold atomic symbol until next iteration step
   mmdict = calc_molar_mass(molform)

   #===================================================================#
   # Display results                                                   #
   #===================================================================#
   print('Result:')
   steps = mmdict['steps']
   for step in steps:
      print(step)



Get the script as file molmass.py (right-click and use "Save link as..." option).

Execution example: Run the script via command line submission 

   python.exe molmass.py "C 6 H 8 O 6"

to calculate the molar mass for the formula C6H8O6, which, for example, is the molecular formula of ascorbic acid with a molar mass of 176.12 g/mol.

The provided script was tested with Python 3.4.2 under Windows 10. The following screen snippet shows the script performance within a PowerShell window:



Sunday, November 19, 2017

How to fix the wrong-start-page-problem when occurring with Firefox browser

Recently, my Firefox browser always opened with a start page (or startup page) displaying content from a third-party provider I never had invited to do so.

I fixed this problem in three quick steps:
  1. Open the desired start page in sabotaged Firefox browser.
  2. Select and drag the URL string from the address bar to the Home button in the toolbar. 
  3. When asked “Do you want this document to be your new home page?”, click Yes.
Now, my desired start page comes up each time, when I launch Firefox.

Note: I experienced some confusion about the terms “home page” and “start(up) page” when consulting different tutorial or support websites. My understanding is that one may set one's home page as a browser start page, but does not have to. Personally, I like to have a search engine page as my browser start page; neither my home page nor a browser-enforced page.

Friday, November 10, 2017

Upgrading to pgAdmin 4 on Windows: issue with contacting the local server solved!

Recently, I upgraded to PostgreSQL 10 and pgAdmin 4 on Windows 10. When I started pgAdmin 4 from the All apps list in the Start menu, pgAdmin did not start as expected and a message was displayed that the PostgreSQL server could not be contacted!

How to fix that? Fortunately, it turned out to be easy. I inspected the AppData\Roaming\pgAdmin folder within my Windows user account home folder and realized that it contained various files and folders. After moving all of them into the Recycle Bin, I was able to restart pgAdmin successfully.

Maybe, data stored and dumped while working with a prior pgAdmin version interfere with version 4. Then, to avoid pgAdmin4 start-up errors has a simple one-step fix (the way it worked for me as described above):

Remove all files and folders from your AppData\Roaming\pgAdmin folder. This folder has the path

C:\Users\YourUserName\AppData\Roaming\pgAdmin

with YourUserName being—well—your user name. If you still want to switch back to an older pgAdmin version previously installed and run on your computer, you should keep the removed files in storage—just in case..

Thursday, June 1, 2017

How to tweet the URL of a web page successfully with a “Large Image Summary Card”

Example of a “Large Image Summary Card for an HTML document featuring Meeks Creek Falls, a waterfall in California's Desolation Wilderness

On Twitter users tweet short text—microposts.

You want to enhance your tweet with an image? Post it along with your tweet (see Posting photos or GIFs on Twitter).

You want to enhance your tweet with an URL of an exciting web page plus an illustrating image?  Use Twitter Cards.

Frequently, I design an image page (photo page): a web page (HTML document) that contains an image annotated with some informative text. I like to include such pages in Twitter posts by presenting the targeted image with descriptive text—microannotation. This stirs up reader's curiosity and invites them to click through. To achieve this, I employ Twitter's “Large Image Summary Card” technology (see Summary Card with Large Image).

Here, I'll demonstrate how it works (how it worked for me, anyway).  Follow these straightforward steps:

1. Add specific HTML meta tags to your web page.
2. Run the URL of the page through Twitter's card validator tool.
3. Tweet the URL

As an example page, I am using my Meeks Creek Falls picture and page. I inserted the following metadata lines into the head section of the respective HTML document:

<meta name="twitter:card" content="summary_large_image"/>
<meta name="twitter:site" content="@TravelingAhead" />
<meta name="twitter:creator" content="@TravelingAhead" />
<meta name="twitter:title" content="Meeks Creek Falls" />
<meta name="twitter:description" content="White water tumbling, splashing and spraying downcreek between naturally sculptured rocks" />
<meta name="twitter:image" content="http://www.axeleratio.com/pic/waterfall/meekscreek/img/Meeks_Creek_Falls.jpg">
<meta name="twitter:image:alt" content="Meeks Creek Falls next to Tahoe-Yosemite Trail">

Then, the document was tested with the Card validator at https://cards-dev.twitter.com/validator. After pressing the Preview Card button, I got the following preview display including an info log indicating successful card mark-up:



I submitted the following message with link and hashtags via @TravelingAhead:

#TYT Refresh yourself on a Tahoe-Yosemite Trail outing http://www.axeleratio.com/pic/waterfall/meekscreek/meeks_creek_falls.htm #MeeksCreekFalls #DesolationWilderness #Waterfall
 
The tweet showed up in the feed (https://twitter.com/TravelingAhead/status/869294403973660672) as shown at the top of this post.

Keywords: microblogging, microdata, meta tags, image collecting, annotated image.

Saturday, March 4, 2017

How can I edit the PATH variable under Windows 10?

The PATH variable is an environment variable of an operating system. The value of this variable is a list of folder (directory) paths. These specified folders are the names of installation paths—also called system paths—of executable programs.

Here, we are interested in the accessibility of programs installed in Windows 10. The list of folder names is searched when a program is called from the Run dialog box, a Command Prompt or Powershell command line and also when requested by a running process. Thus, inclusion of a program's directory path in the PATH list enables a program launch by program name without typing or providing the complete path.

Updated Windows 10 versions now contain a PATH editor that facilitates modification, addition and deletion of path names. Using this editor infrequently, I am finding it difficult to memorize the step sequence to access it. In the following, I am sharing how to tab down to the PATH editing dialog:

  1. Right-click the Windows icon (Start button) and, from the pop-up menu, select System to open the System window.
  2. From the menu items on the left side of the basic computer information, click Advanced system settings to open the System Properties window.
  3. Click the Environment Variables... button to open the Environment Variables window containing a scrollable System variables pane.
  4. In the pane, double-click the line with the system variable Path to open the PATH Editor (with the heading Edit environment variable). 
In the editor, select a line with a given installation path. Click the Edit button to modify the path or the Delete button to remove the path. To add a new path, click the Browse... button to open a dialog that assists you in finding the folder path you want to add. No typing required! Leave the editor by clicking the OK button to save your changes.

Sunday, September 25, 2016

A hands-on OpenOffice™ Base introduction by example using some mineral data

A contribution by Axel Drefahl, axeleratio.com, September 25, 2016.

Apache OpenOffice™ is an open-source office suite including word processing, spreadsheet, presentation, graphics and database software [1]. Its database tool is called OpenOffice™ Base, shorter ooBase or simply Base. OpenOffice started out as an open-sourced version of  StarOffice (1985-1999) [2]. Later, the suite development fractured into the Apache OpenOffice project and the LibreOffice project.

Today, two separate office suites, which were built on the same original code, exist [3]. Both, OpenOffice and LibreOffice can be downloaded for free for Windows, Linux and Mac. The two projects share most of their code. Interfaces and feature are similar. Without going into any further detail of difference, we will here get started with OpenOffice to get a hands-on experience for a desk top database management system that save your data and metadata in a single file. If you are using Microsoft Access, you will find many similarities. Otherwise, learning ooBase provides the basic skills to continue with Access later—if you want to or need to. Data can be imported/exported both ways.  

You are going to learn how to handle the followings tasks:
  1. Table design  (table layout),
  2. Table creation (data entry),
  3. Relationship between tables,
  4. Building queries in design view,
  5. Export of data table via OpenOffice Calc.
The provided examples use data from mineralogy [4]. Here are the lessons:
  1. Exploring OpenOffice Base: an example involving rare-earth mineral data  
  2. Structuring rare-earth mineral data in an OpenOffice Base table
  3. Modifying the column layout of an existing OpenOffice Base table  
  4. Defining and creating relationships between OpenOffice Base tables  
  5. Creating a query in OpenOffice Base
  6. Combinatorial querying-with OpenOffice Base  
  7. Exporting an Openoffice Base table into a file with Comma Separated Values (CSV) format
Apache OpenOffice 4.1.1 was used on Window 10 to test the examples and create the sample files RareEarthMinerals_2tables.odb and RareEarthMinerals_query.odb.

It should go without saying that these lessons provide a quick start, but cover only a few selected features and capabilities of ooBase.


References and more to learn
[1] Apache OpenOffice™: Why should I use Apache OpenOffice? [www.openoffice.org/why/index.html].
[2] Wikipedia: OpenOffice.org [en.wikipedia.org/wiki/OpenOffice.org],
[3] How-To Geeks: OpenOffice vs. LibreOffice: What's the Difference and Which Should You Use?  [www.howtogeek.com/187663/openoffice-vs.-libreoffice-whats-the-difference-and-which-should-you-use/].
[4] Appendix of the contribution “Mineralogy of the Rare-Earth Elements” by F. P. Cesborn in: P. Möller, P. Černý and F. Saupé. Lanthanides, Tantalum and Niobium. Springer-Verlag, Berlin Heidelberg, Germany, 1989.

Saturday, September 24, 2016

Exporting an OpenOffice™ Base table into a file with Comma Separated Values (CSV) format

A contribution by Axel Drefahl, axeleratio.com, September 24, 2016.

This post continues my preceding OpenOffice™ Base (ooBase) lessons:
  1. Exploring OpenOffice Base: an example involving rare-earth mineral data  
  2. Structuring rare-earth mineral data in an OpenOffice Base table
  3. Modifying the column layout of an existing OpenOffice Base table  
  4. Defining and creating relationships between OpenOffice Base tables  
  5. Creating a query in OpenOffice Base
  6. Combinatorial querying-with OpenOffice Base 
A common database task is to export data into a plain-text file and adhering to a well-defined format such that the data can automatically be imported and resourced by other applications. In the following exercise we export our CesbronList into a CSV file:
  1. Open RareEarthMinerals_query.odb and click Tables in the Database pane bring the Tables pane in view.
  2. From the menu, select File > New > Spreadsheet to open an empty Calc sheet..
  3. In the Base window, right-click CesbronList and choose Copy from the context menu.
  4. In the Calc window, click cell A1 and Edit > Paste. The Base field names will show up as column headings in row 1 (cells A1 to U1) and the data will populate the cells of the rectangle with the diagonal-lined corners A2 and U168.
  5. In the Base window select File > Close to close RareEarthMinerals_query.odb.
  6. Continue with Calc. Select File > Save from the menu to save the imported data as CesbronList.ods (temporarily or longer,;you are the database administrator).
  7. Then, select File > Save As to open the dialog for outsourcing from OpenOffice™.
  8. Click the drop-down menu next to “Save as type” to view the export options. Select Text CSV (.csv)(*.csv) .
  9. Name the export file CesbronList.csv.
  10. Click Save to finalize the export.
CesbronList table exported into an OpenOffice Calc sheet

CesbronList table exported into a CSV file

As you certainly have noticed, the drop-down menu of the “Save as type” text field displayed a list of various export options, some of which you also may want to try. Have fun!