pyMez Format Handling

This notebook documents some of the formats currently handled by the pyMez (backend library) and progress to integrate formats for data curration and analysis. The plan to promote interoperability is to create transformations linking these formats, all formats are imported using (10/2016)

import pyMez

or

from pyMez import *

to prevent the loading of the full pyMez application programming interface (API) add the folder pyMez to the sys.path variable and use

from Code.DataHandlers.<*Models> import <Model Name>
# example
from Code.DataHandlers.TouchstoneModels import SNP

currently all data handlers have a shared signature class(file_path=None,**options) where options vary from class to class.

All of the classes to handle these formats can be found at pyMez.Code.DataHandlers.NISTModels

  1. Raw Ascii Formats
  2. These formats are a result of transforming files saved as BDAT data files using Ron Ginley's Converter (modified Calrep) HP Basic program. The summation of these files currently form the history tables for the check standard database.
    1. pyMez:class:OnePortRawModel
    2. Model that deals with one-port raw measurements, currently this is restricted to checkstandard data but could be expanded to include raw one port measurements of customer data.
    3. pyMez:class:TwoPortRawModel
    4. Model that deals with two-port raw measurements, currently this is restricted to checkstandard data but could be expanded to include raw one port measurements of customer data.
    5. pyMez:class:TwoPortNRRawModel
    6. Model that deals with two-port non-reciprocal raw measurements, currently this is restricted to checkstandard data but could be expanded to include raw one port measurements of customer data.
    7. pyMez:class:PowerRawModel
    8. Model that deals with power raw measurements, currently this is restricted to checkstandard data but could be expanded to include raw one port measurements of customer data.
    9. Conversion to a CSV table and inclusion of SAS database
    10. Data before ~2002 was stored in a summarized form in a SAS based database. It has been added to the history file using the script found in Sparameter_Power_Data_Transformation_20160502_001.html , there is currently an error with the columns known as correction factor and KP ( I have joined them, but they are not the same KP has Gamma in it)
  3. Data Formats Processed With Calrep
  4. These formats are a result ruinning Calrep on a set of raw files to produce a measurement with uncertainty. These formats are saved as .asc files, .dut or as tables with a,b,c appended to the device name and .txt extension. All of the above are currently handled. They typically live in the ascii.dut folder in the impedance drive.There are files with sm in the name that are not raw and not fully analyzed for the calorimeter, these are not currently handled.
    1. pyMez:class:OnePortCalrepModel
    2. Model that deals with analyzed one-port measurements, currently this handles .asc file and if the directory is the same any file in the a,b,c table format. The .asc file format is prefered due to the inclusion of an analysis date.
    3. pyMez:class:OnePortDUTModel
    4. Model that deals with analyzed one-port measurements, currently this handles .dut files. It was added to search for one port gamma measurements that will be loaded to a direct comparision system.
    5. pyMez:class:TwoPortCalrepModel
    6. Model that deals with analyzed one-port measurements, currently this handles .asc files and if the directory is the same any file in the a,b,c table format. The .asc file format is prefered due to the inclusion of an analysis date.
    7. pyMez:class:PowerCalrepModel
    8. Model that deals with analyzed one-port measurements, currently this handles .asc files and if the directory is the same any file in the a,b,c table format. The .asc file format is prefered due to the inclusion of an analysis date. This actually deals with both types of power formats, ones with 4 error components and ones with three.
    9. Conversion to a CSV table and inclusion of SAS database
    10. A script that walks through a directory tree and adds all files to a single database file (currently csv format) can be found in Sparameter_Power_Data_Transformation_20160502_001.html .
  5. Other formats
    1. .res files
    2. The .res files are csv files with round robin measurements. They do not have a class of their own to date but can be used to plot a comparision with the function one_port_robin_comparision_plot(.asc file,.res file).

        </ol>

These classes deal with file formats that materials and on-wafer experiments have developed in parallel. The classes are spread out during development into pyMez.Code.DataHandlers.NISTModels, pyMez.Code.DataHandlers.TouchstoneModels and pyMez.Code.DataHandlers.RadiCALModels but may later be combined. Currently there are 3 basic formats, the ascii table format used to store data from the experiment, that data converted into s2p and the output from the radical program.

  1. Raw Ascii Formated Data
  2. The class to handle this is pyMez:class:JBSparameter . This class does not yet a formated metadata handler, because the format has drifted over time and some metadata is not uniform. There is a conversion in translations to s2p.
  3. Radical Data
  4. Radical data is stored in a single .mat file after processing. This file if saved in V7.3 is actually an hd5 file, and can be opened with the h5py package. However, the format stores the matlab type cell in a convoluted way and has to be extracted carefully. The model to process these files for data archiving is pyMez:class:RadicalDataModel, and is incomplete. It does currently have automatic retreival and conversion for calibrated and unclaibrated_short, calibrated and unclaibrated_Rs, calibrated_DUT, the frequency vector. All the other information is imported but not re organized or converted.
  5. s2p see Touchstone

StatistiCAL is a program to create calibrations and uncertainties for calibrations for two-port measurements. pyMez has both a COM (Common Object Model) wrapper for the program and a series of classes to deal with the files that statistiCAL requires to run and outputs

  1. The StatistiCAL wrapper
  2. pyMez:class:StatistiCALWrapper provides a python wrapper for the statistiCAL program. It can open a menu, run a calibration and save resulting files. StatistiCAL must be already installed.
  3. pyMez:class:StatistiCALMenuModel
  4. A class for dealing with StatistiCAL Menus. It supports printing,saving, clearing DUTS, and other manipulations of the menu. It needs to have a change_file_directories() method that rewrites all files to a selected directory.
  5. pyMez:class:TwelveTermErrorModel
  6. A class for using the 12 term VNA calibration coefficients output by statistiCAL. It lives in pyMez.Code.DataHandlers.NISTModels, in the anticipation that other programs may output this format. It provides an attribute complex_data that is used in corrections
  7. pyMez:class:StatistiCALSolutionModel
  8. A class for opening and manipulation the "Solution_Plus.txt" files generated by StatistiCAL. These files contain a large amount of information, but are used primarly for the 8-term error correction inside. This model provides the attribute eight_term_correction, a list of complex numbers and a frequency, which varies on intialization based on the option reciprocal=(True or False).
  9. The four port error adapter can be opened and used for correction using pyMez:class:SNP, by acessing the attribute sparameter_complex see Touchstone

The Microwave Uncertainty Framework is a program written by Dylan Williams to create Monte Carlo based uncertainites on VNA based measurements. It is written using VB.NET and can be accessed at a base level by using the package pythonnet, in addition it has several data formats that can be seperated by type. The first type are XML based menu formats that populate the GUI used to manipulate them (.vnauncert,.meas for example) the next type are ascii formats with extensions to denote the type of information they hold (.eps,.iso,.switch) and finally the other types are of the touchstone family (Touchstone </li>) Currently the DataHandlers are being written to match the logic and meaning of the file formats, but all the formats can be acessed through the base classes.

  1. XML Based Menus
  2. pyMez:class:XMLBase, need to extend this class to handle the particulars of the XML menus. To remotely operate the MUF we can manipulate the XML or run the program directly
  3. Ascii Based Formats
  4. This should be an extension of AsciiDataTable with an appropriate __read_and_fix__. By using the AsciiDataTable class we will have a record of the metadata in the form of a schema
  5. SNP file types (s2p,s4p)
  6. The error adapter is of type S4p and the results files are stored as directories with lots of s2p files in them. The .meas file type has a set of file names that act as pointers to these files

Touchstone is a series of formats for saving s-parameters and related data for network analyzer measurements. It is an ascii based format that can have several extensions associated with it. For a number of given number of ports it may have the file extension .snp where n is the number of ports. The most common extension is s2p, but other port numbers can exist and .ts can also represent a touchstone file of unknown port number. There are 2 versions of touchstone files, however version 1 is much more prevelent and currently all support in pyMez is based on version 1. All models that handle touchstone files can be found in pyMez.Code.DataHandlers.TouchstoneModels

  1. General SNP files (ports 1-100)
  2. pyMez:class:SNP handles all snp files that do not have noise parameters stored along with S-parameters. This general handler supports plotting, saving, printing, changing the format between (RI,DB, and MA), and changing the frequency units. It also handles comments properly. It has been tested with .s1p,s2p,s3p and s4p.
  3. The special case S2P
  4. pyMez:class:S2PV1 handles s2p with noise parameters included.
  5. The special case S1PV1
  6. pyMez:class:S2PV1 handles s2p with noise parameters included.

One of the primary ways to create a project file with a very portable GUI is to create an XML file and use XSLT to transform the data to a HTML page. This allows a lot of flexibility and a nice way to integrate it into CALNET as such there are XML models that represent Logs, InstrumentStates, DataTables, and a whole host of other files following specific patterns. Currently the test website displays data as interactive by reading it using the appropriate model, transforming it to a similar XML model, then transforming it to HTML using a XLST transform. All XMLModels are currently found in the module pyMez.Code.DataHandlers.XMLModels and the XSL (style sheets that transform the XML) can be found in folder pyMez/Code/DataHandlers/XSL

  1. pyMez:class:XMLBase
  2. A cparent class that deals with general XML files. It supports printing, transforming to HTML, saving and other general manipulations

  3. pyMez:class:XMLLog
  4. An XML log that supports a date, entry structure. It has attributes and methods that make it ready for integration into a GUI that is not HTML based. In addition there are several models that are the children of this model, pyMez:class:ChangeXMLLog,pyMez:class:EndOfDayXMLLog,pyMez:class:ErrorXMLLog and pyMez:class:ServiceXMLLog meant to be more specific instances of pyMez:class:XMLLog

  5. pyMez:class:DataTable
  6. The primary class for transforming Ascii based data tables to XML based data tables. It should be noted that the MUF uses a very similar algorithm when mapping s2p to XML (it just has slightly different tag and attribute names). This should be renamed.

  7. pyMez:class:FileRegister
  8. Class meant to hold a list of files and some information about their location. This forms an entity list for a specific ECPV pattern known as an arbitrary data base, or a list of files and metadata on those files. This should be renamed.

  9. pyMez:class:Metadata
  10. Class that holds the metadata descriptions for an arbitrary database. This should be renamed.

  11. pyMez:class:InstrumentSheet
  12. XML model for holding information about an instrument, used by instrument control to specify basic information

  13. pyMez:class:InstrumentState
  14. XML model for holding instrument states for GPIB devices

One of the most important themes in both pyMez and Calnet is the creation and management of "projects" or collections of files with a description of that collection. Most programs do this implicitly, but it is our goal to make this explicit so that we can exchange data between users and programs effectively. There are several strategies for creating projects the ones that pyMez will focus on are:

  1. Arbitrary Database Based Projects
  2. This type of project lets the directory structure of the file system handle the files and really relies on a registry of file and a description of those files to do its buisness. The descriptions of the files can be thought of as an instance of a closed Encyclopedia, and the registry a Enitity Table (In the ECPV framework). pyMez:class:FileRegister and pyMez:class:Metadata are designed to handle these

  3. ZIP based projects
  4. Similar to an Arbitrary Database Based Project but with all the files collected, compressed and saved into a single file. pyMez.Code.DataHandlers.ZipModels, in particular pyMez:class:ZipArchive is designed to handle these. It should be noted that objects that have can display as a string can be saved into an archive without saving them to disk first.

  5. XML Based Projects
  6. All the data in a project can be cast into XML and distrubuted that way. This is the principal behind the MUF archive data formats and is handy for transformation into HTML of very complex heterogenous data types. The basic idea would be to transform the constituents of the project into XML, join them and then define a XSL that transforms them into a HTML page. The UI for the checkstandard web interface will work this way.

  7. Binary Projects
  8. For those projects that are desired to be saved in Binary for processing speed, hd5 is the perferred format. Among other things matlab now uses this as its default format. It has standard mappings to XML and all files could be potential converted to hd5. It is my goal to create a hd5 data model that mirrors .meas_archive files. Radical Data can be thought of this type of project, however the metadata is not as explicit as it should be.

One of the most popular ways to store data is to create a Ascii file with a header followed by a set of columns and potentially a footer. This general data pattern along with options for delimters and other seperators is found in pyMez.Code.DataHandlers.GeneralModels. The primary class is pyMez:class:AsciiDataTable and gives the user the ability to save an ad-hoc schema by pickling (python specific saving) the options. Most of the classes for sparameter/power are derived from this class. Touchstone models have a slightly different format (they can save the data for a single frequency in multiple rows or have a different type of data present) so they inherit from a different base class. The AsciiDataTable is fairly general and can handle different data types, headers with different structures and changing units along with saving in different formats and retrieving and printing different logical units. This class needs to be updated with more ways to save the schema and more robust error handling. In addition an algorithm to guess at the format would be very useful. The rectangular portion of this object (data attribute) can easily be converted to many different formats (excell, csv, matlab, hd5) however the header must have a structure specified to be parsed as anything other than text.

Django is the python web framework of choice for CALNET and the Checkstandard database. It uses pyMez to analyze and track data. A django model is a specific data model that is directly cast into an SQL complient database such as Sqlite or MySQL. The models therefore shadow SQL column modelling and have attributes that are columns with the type specified by the class definition. Certain models are converted directly to these types and then stored inside of the website's database. There is a module pyMez.Code.DataHandlers.AbstractDjangoModules that stores basic patterns for reuse. Currently UserFile is the most important of these.

Currently matlab uses hd5 with the extension .mat to store data (V7.3 and greater). Older versions of matlab variables can be accessed by scipy.io.loadmat() and scipy.io.savemat(), but it is my intention to only support hd5 based files to reduce the work load and circumvent a bug in the python 2.7 Anaconda distibution. The ability to translate to matlab variables will be added to promote sharing of data, in addition the binary project model supports this type of information exchange

The guiding principal for pyMez will be one of data transformations and not a single data format. The transformations will follow a network approach that emphasizes formats that holds the same content as network nodes and the transformation as edges. Anytime content is changed it can be thought of as an off of graph transformation (jump). The basic data patterns that will have graphs defined are

  1. string (already defined)
  2. rectangular data table
  3. A column modeled data table (looks like SQL or pandas dataframe)
  4. data table
  5. A data table with header, footer and a column modeled portion
  6. project
  7. see Project Models for better description

The future set of models that need to be supported:

  1. JSON
  2. Java Script Object Notation, this can be thought of as a light weight version of XML. It is currently very popular and is gaining momentum as webstandard
  3. Images
  4. Really just what we need to convert them between project types there are plenty of existing packages
  5. DOM
  6. This is partially supported through XML, giving us the ability to create reports