Search This Blog

Sunday, August 23, 2020

Axeleratio's Context Dictionary (AxCD) structure and application

Axeleratio uses a customizable format to flexibly integrate complex, diverse and context-interconnected data items and text snippets: the Axeleratio's Context Dictionary (AxCD) structure.

AxCD structure

The metalevel AxCD structure—as its name suggests—is a dictionary. The AxCD keys are names, terms or text phrases, which are of interest to be captured, looked up and also further processed. The AxCD content associated with a particular AxCD key is a list of lists. Each list consists of at least six fields:

  1. Identifier or reference: a secondary key or a reference code to cross-relate entries in AxCDs or identify a publication source. 
  2. Line dictionary: a lower level, programming-environment-independent dictionary representation of contextual data in line format with the advantage of allowing explorative sub-string search inside one string without iterating through a key-value pair structure.
  3. Associated interest: a word, code or text that compliments the AxCD key within the context domain of the AxCD.
  4. Descriptor 1: optional first descriptive symbol or notation charactarizing the interest entry.
  5. Descriptor 2: optional second descriptive symbol or notation characterizing the interest entry.
  6. Descriptor 3: optional third descriptive symbol or notation characterizing the interest entry.

We have found this type of structure convenient in the presentation of various collections of information such as data tables, vocabularies, glossaries, chronologies, directories, inventories, annotation notebooks and encyclopedic contributions. The AxCD format provides an excellent approach when domain-specific information needs to be presented while including items given by different names across various languages.


AxCD applications

In the following we give a few examples of how we use AxCDs to represent data. We use the dictionary data structure of the Python programming language. Our modules of wrapped, Python-encoded AxCDs are either manually entered or generated by export routines from databases and spreadsheet applications.

 

Chemical and biological materials notes.

'sea-silk': [ 

     ['HelenScales2015','',
            'The origins of sea-silk remain stubbornly mysterious, and no one knows for sure who first thought to pluck hairs from giant seashells and turn them into threads and fabric','q','153','axel'] 

]

The AxCD key is the name of a material. The identifier field contains the reference from which the following information was extracted,SMILES notation for this compound. The line dictionary is not used here. The interest field contains a quotation, followed by the letter q in the first descriptor field indicating that the text is a quotation. The second descriptor field provides the number of the page in the specified reference to retrieve the original text. The third descriptor provides a hint on who quoted the text.

 

Chemistry vocabulary English-German.

'aceto-acetic acid': [ 

     ['O=C(C)CC(=O)O','chemform:C4H6O3;CASRN:541-50-4;                IUPAC:3-oxobutanoic acid',
            'Acetessigsäure','f','-',''] 

]

The AxCD key is the Englsh name of a chemical compound. The identifier field contains the SMILES notation for this compound. The line dictionary encodes the chemical formula, the Chemical Abstract Service Registry Number and the name recommended by the International Union of Pure And Applied Chemistry (IUPAC). Then the German name for this compound is given followed by the grammatical gender of the noun (f for feminine), the genitive form (dash for none) and the plural form (empty for not applicable) .

 

Biology vocabulary English-German.

'American crocodile': [

    ['Crocodylus acutus','','Spitzkrokodil','n','-s','-e']

]

The AxCD key is the Englsh name of an animal species. The identifier field contains the scientific species names. The line dictionary field is empty. The German name for this species is followed by the grammatical gender of the noun (n for neutrum), the genitive form (ending in s) and the plural form (ending in e).


Conclusion and outlook

We use our currently available and frequently updated AxCDs in connection with other data collections and archives to generate novel dictionaries for customized client-desired search functions and to abstract knowledge for context- and task-specific requests.



Sunday, March 10, 2019

Curvilinear regression in R: a parabolic-model example

Simple linear regression (SLR) and multiple linear regression (MLR) analysis is frequently applied when modeling data. If your visual or statistical analysis during model building suggests
a relationship that is non-linear, you may want to try curvilinear modeling (polynomial modeling).
R programming for SLRMLR and curvilinear regression (CLR) analysis is very similar. CLR model building in R is done in four basis steps:
  1. Structure your data in a data frame, for example, via import from a CSV file.
  2. Calculate the desired powers for the independent variable(s) and add them to the data frame. 
  3. Derive the curvilinear model by using the lm() function.
  4. Review the results displayed with the summary() function.
In the case of a parabolic model (quadratic model), only square values need to be calculated and included in the data frame at step 2.

The derived model—if found to satisfactorily fit the data—can then be applied to estimate new values for the dependent variable (response values) by calling the predict() function, which needs to receive a data frame object with new values for the independent variables.

Find the tutorial-style documents and associated CSV files with example data for SLR, MLR and CLR modeling (parabolic modeling) with R in the following.

SLR modeling

Document: www.axeleratio.com/math/comp/linreg/linregways.pdf   
Data: www.axeleratio.com/math/comp/linreg/csv/woodward71.csv
 

MLR modeling

Document: www.axeleratio.com/math/comp/linreg/multilinreg.pdf   
Data: www.axeleratio.com/math/comp/linreg/csv/woodward82.csv

CLR modeling (parabolic modeling example)

Document: www.axeleratio.com/math/comp/linreg/curvilinreg.pdf   
Data:www.axeleratio.com/math/comp/linreg/csv/woodward83.csv



Example of curvilinear model building in R: details are given in my document “How to perform curvilinear regression analysis with R


Keywords: statictical analysis, linear modeling, non-linear modeling, machine learning, testing relationships, model building, R programming.

Tuesday, March 5, 2019

Multiple linear regression in the R software environment

Carrying out multiple linear regression (MLR) in the freely available R software environment is not very different from performing simple linear regression (SLR) in R. The same basic steps can be followed when working on a MLR problem:
  1. Structure your data in a data frame, for example, via import from a CSV file.
  2. Derive the linear model by using the lm() function.
  3. Review the results displayed with the summary() function.
The derived model can then be applied to estimate new values for the dependent variable (response values) by calling the predict() function, which needs to receive a data frame object with new values for the independent variables.

Data and code to get started with MLR in R:


CSV file with sample dataset at
www.axeleratio.com/math/comp/linreg/csv/woodward82.csv.

Tutorial-style document with title “How to perform multiple linear regression analysis with R” at
www.axeleratio.com/math/comp/linreg/multilinreg.pdf.

MLR in R using the woodward82.csv dataset as explained in the articleHow to perform multiple linear regression analysis

Wednesday, February 20, 2019

Simple linear regression with Python and R: Getting started

Linear modeling in R
Development of a linear model in R using physical property values of rubber samples. Explore the use of R for linear modeling in a detailed document.
Python and R are open-source programming languages. There is a large community of scientific software developers using Python and its NumPy and SciPy libraries. While Python is a general-purpose language, R programming mainly has its focus on statistical and predictive analysis. Both languages are currently popular choices in designing algorithms for big data problems and machine learning projects, but also are employed by researchers in diverse fields whenever the need arises for data fitting, complex calculations, simulations and modeling.

The evaluation of the the relationship between two variables is a frequently occurring task; for example, in calibrating measurement instruments and modeling experimental data. Here is a Getting Started document:  Simple linear regression with Python and R: three ways to begin with. Therein, linear modeling in Python and R is demonstrated and compared. You will learn how

  • to import CSV-formated data in Python and R ,
  • to use NumPy arrays in SLR computation,
  • to derive regression and correlation coefficients with SciPy's stats.linregress() function,
  • to use R's data.frame container with the lm() function to fit a linear model presenting your data.


Generation of scatter diagram in R
R instruction resulting into a scatter diagram for the rubber-sample data used in the linear model development
  
Keywords: linear regression, Python, R, statistical description, data analysis, machine learning.


Sunday, February 3, 2019

Threats or no threats? What is a harmful website?

The McAfee security scan is supposed to identify threats on and onto a computer [1]. Recently, I got a list of harmful websites, McAfee found after a scan. For example, the list included the following sites:


What does this mean? The last two sites in the list I don't recall visiting. The first two I visit frequently—like many of us do! So, I am not really expecting them to be a threat. Neither is my Trailingahead blog, hosted by Blogger, which is a Google service. 

I checked the URLs above with VirusTotal (www.virustotal.com/#/home/url). They came out clean. I didn't see flags or URL/domain blacklisting.

If McAfee finds a website it indicates as harmful, then—so goes the claim—the scanner has detected some evidence of misbehavior such as spamming, malware activity or a server problem [2]. But obviously this website-threat connection is not always true or, at least, not made transparent. There are people out there with the advice to ignore such “harmful website” warnings or even uninstall McAfee [3].

Now, I am not sure if this information was helpful? But I hope it was not harmful!


References


[1] Lynn Burbeck: How to Remove Threats Detected by McAdee. It Still Works. Link: https://itstillworks.com/remove-threats-detected-mcafee-8572308.html.
[2] What does it mean if McAfee scan finds an "issue" of "harmful website" for a site I visited in the past, but no other issues? Quora. Link: www.quora.com/What-does-it-mean-if-McAfee-scan-finds-an-issue-of-harmful-website-for-a-site-I-visited-in-the-past-but-no-other-issues.
[3] How do I get rid of a "Harmful Website" threat? Yahoo! Answers. Link: answers.yahoo.com/question/index?qid=20131218145638AA5JdP3.

Thursday, April 12, 2018

How to access PostgreSQL with Python: a chemistry example

PostgreSQL is an open-source database. How can you connect to your local PostgreSQL database and access data in PostgreSQL tables by scripting in Python? Answer: by using the Psycopg PostgreSQL adapter.

I herein demonstrate data extraction with an example solving a common task in chemistry: the calculation of the molar mass for a given molecular formula by retrieving atomic weights from a PostgreSQL table.

Let's begin with the atomic-weight table. We will use published standard atomic weights (DOI: dx.doi.org/10.1351/pac200678112051): Atomic Weights of the Elements 2005 (IUPAC Technical Report). We use the following implementation for table awe2005:

Column Type Constraints Comment or Description
id serial primary key
elem varchar(12) not null, UNIQUE Element name
symb varchar(3) not null, UNIQUE Element symbol
numb smallint not null, UNIQUE Element number
awr double precision
Atomic weight, rounded
awu varchar(20)
Atomic weight with uncertainty
footnotes varchar(5)
Footnotes
stbliso boolean
Stable isotope existence

You are welcome to download the atomic-weight table in CSV format—avalaible as awe2005.csv—and import it into your PostgreSQL database. In our demonstration, we only access values of the columns symb and awr from this table.

Now let's write the Python function calc_molar_mass that will calculate the molar mass for a molecular formula, which is passed as argument molform. This variable is a list of pairs. Each pair is a tuple containing an element symbol and its subscript. For example, the molecular formula for ascorbic acid, C6H8O6, would be passed as
molform = [('C',6), ('H',8), ('O',6)]
Function  calc_molar_mass needs to perform the following steps:
  • Connect with database
  • Extract atomic weight values and update molar mass
  • Close database
  • Return result

 

Begin communication with the database

To connect to an existing PostgreSQL database on a local computer, the database password and the name of the database that contains table awe2005 need to be provided (by replacing my_password and my_database, respectively):

  try:    
     conn = psycopg2.connect(database="my_database",
              user="postgres", password="my_password",
              host="127.0.0.1", port="5432")        
 except:
     print("NOT connected!")

     return {}
 cur = conn.cursor()


If the connection to the database cannot be established, an empty dictionary is returned; otherwise, a cursor is initialized to perform database operations. Then, you are all set to query table awe2005.

Use the cursor to obtain atomic weight values for given symbols 

Formulate the query, execute it and fetch the table row:

   query = "SELECT * FROM awe2005 WHERE symb = '%s'" % atsymb
   cur.execute(query)
   row = cur.fetchone()  


The string variable atsymb contains the atomic symbol, for which we want to obtain the atomic weight value. The latter is in column awr, which is column 4 (start counting with 0 in the above table and take the column number as field index for row):

   atweight = row[4]

The complete molar mass calculation requires a loop over all atomic symbol/subscript pairs:

   molmass = 0.0
   for pair in molform:
      atsymb    = pair[0]
      subscript = pair[1]
      query = "SELECT * FROM awe2005 WHERE symb = '%s'" % atsymb
      cur.execute(query)
      row = cur.fetchone()  
      if row is None:
         # unknown atomic symbol
      else:
         atweight = row[4]
         if atweight is None:
            # no atomic weight value for this symbol
         else:
            molmass += atweight * float(subscript)

           

The appendix contains an executable script that takes care of  those cases where a symbols or value is not found. The script can also be downloaded: molmass.py (right-click and use "Save link as..." option).  For a molecular formula input,  it generates a list of lines recording both successful and unsuccessful calculation steps. 

Close database

Close the communication with the database:

   cur.close()
   conn.close()


Depending on your script design, connecting with and disconnecting from the database may occur elsewhere—independent of the need in particular functions—and you will simply pass the cursor to database-accessing functions.

Appendix: Completing the script

The following is a complete script that displays the molar mass calculation step-by-step (for each atomic symbol) and reports the occurrence of unrecognized symbols and missing values:

import argparse
import psycopg2
#======================================================================#
# Calculate molar mass for given molecular formula                     #
#======================================================================#
def calc_molar_mass(molform):

   # Begin communication with the database
   dbname = 'my_database'
   try:     
      conn = psycopg2.connect(database=dbname,
                   user='postgres', password='my_password',
                   host='127.0.0.1', port='5432')         
   except:
      print('NOT connected to %s!' % dbname)
      return {}
   cur = conn.cursor()

   # Perform calculation of molar mass
   molmass = 0.0
   steps = []
   cntUnknownSymbols = 0
   cntMissingValues  = 0
   for pair  in molform:
      atsymb    = pair[0]
      subscript = pair[1]
      query = "SELECT * FROM awe2005 WHERE symb = '%s'" % atsymb
      cur.execute(query)
      row = cur.fetchone()  

      if row is None:
         cntUnknownSymbols += 1
         steps.append("            ?")
      else:
         atweight = row[4]
         if atweight is None:
            cntMissingValues += 1
            step = "            %-3s %6s            ?" %\
                                  (atsymb,subscript)
            steps.append(step)
         else:
            contribution = atweight * float(subscript)
            step = "            %-3s %6s %12s" % (atsymb,subscript,\
                                    "%.3f" % contribution)
            steps.append(step)
            molmass += contribution

   # End communication with the database
   cur.close()
   conn.close()

   # Finish by formatting report lines
   status = ':'
   resultline = ''
   if cntUnknownSymbols == 0 and cntMissingValues == 0:
      resultline = "Molar mass in g/mol:   %12s" % ("%.3f" % molmass)
   else:
      status = resultline = "Molar mass calculation failed:     "
   steps.append('-' * len(resultline))
   steps.append(resultline)
   if cntUnknownSymbols > 0:
      steps.append("Unknown symbols:%3i " % cntUnknownSymbols)
   if cntMissingValues > 0:
      steps.append("Missing values: %3i " % cntMissingValues)
   steps.append('-' * len(resultline))

   # Return results as dictioary with three entries
   mmdict = {'molmass': molmass, 'steps': steps, 'status': status}
   return mmdict


#======================================================================#
# Run molar mass calculation                                           #
#======================================================================#
if __name__ == '__main__':

   #===================================================================#
   # Get molecular formula from command line                           #
   #===================================================================#
   parser = argparse.ArgumentParser()
   parser.add_argument('entered', nargs='*')
   args = parser.parse_args()
   linearMolform = args.entered[0]

   #===================================================================#
   # Loop over atomic symbol/subscript pairs and calculate molar mass  #
   #===================================================================#
   tokens = linearMolform.split()
   molform = []
   i = 0
   previous = ''
   for token in tokens:
      i += 1
      if i % 2 == 0: # when even number of tokens, add symbol/value pair
                     # to molform
         molform.append((previous,token))
      else:
         previous = token # hold atomic symbol until next iteration step
   mmdict = calc_molar_mass(molform)

   #===================================================================#
   # Display results                                                   #
   #===================================================================#
   print('Result:')
   steps = mmdict['steps']
   for step in steps:
      print(step)



Get the script as file molmass.py (right-click and use "Save link as..." option).

Execution example: Run the script via command line submission 

   python.exe molmass.py "C 6 H 8 O 6"

to calculate the molar mass for the formula C6H8O6, which, for example, is the molecular formula of ascorbic acid with a molar mass of 176.12 g/mol.

The provided script was tested with Python 3.4.2 under Windows 10. The following screen snippet shows the script performance within a PowerShell window:



Sunday, November 19, 2017

How to fix the wrong-start-page-problem when occurring with Firefox browser

Recently, my Firefox browser always opened with a start page (or startup page) displaying content from a third-party provider I never had invited to do so.

I fixed this problem in three quick steps:
  1. Open the desired start page in sabotaged Firefox browser.
  2. Select and drag the URL string from the address bar to the Home button in the toolbar. 
  3. When asked “Do you want this document to be your new home page?”, click Yes.
Now, my desired start page comes up each time, when I launch Firefox.

Note: I experienced some confusion about the terms “home page” and “start(up) page” when consulting different tutorial or support websites. My understanding is that one may set one's home page as a browser start page, but does not have to. Personally, I like to have a search engine page as my browser start page; neither my home page nor a browser-enforced page.