Commit d70245b6 authored by syz's avatar syz
Browse files

updated from master

parents 99556b27 7ffa2f29
......@@ -56,3 +56,21 @@ Examples using ``pycroscopy.hdf_utils.reshape_to_Ndims``
.. only:: not html
* :ref:`sphx_glr_auto_tutorials_plot_tutorial_03_multidimensional_data.py`
.. raw:: html
<div class="sphx-glr-thumbcontainer" tooltip="11/11/2017">
.. only:: html
.. figure:: /auto_publications/images/thumb/sphx_glr_plot_tutorial_01_interacting_w_h5_files_thumb.png
:ref:`sphx_glr_auto_publications_plot_tutorial_01_interacting_w_h5_files.py`
.. raw:: html
</div>
.. only:: not html
* :ref:`sphx_glr_auto_publications_plot_tutorial_01_interacting_w_h5_files.py`
......@@ -92,3 +92,21 @@ Examples using ``pycroscopy.ioHDF5``
.. only:: not html
* :ref:`sphx_glr_auto_tutorials_plot_tutorial_02_writing_to_h5.py`
.. raw:: html
<div class="sphx-glr-thumbcontainer" tooltip="11/11/2017">
.. only:: html
.. figure:: /auto_publications/images/thumb/sphx_glr_plot_tutorial_01_interacting_w_h5_files_thumb.png
:ref:`sphx_glr_auto_publications_plot_tutorial_01_interacting_w_h5_files.py`
.. raw:: html
</div>
.. only:: not html
* :ref:`sphx_glr_auto_publications_plot_tutorial_01_interacting_w_h5_files.py`
==========
Pycroscopy
==========
**Scientific analysis of nanoscale materials imaging data**
What?
--------------------
pycroscopy is a `python <http://www.python.org/>`_ package for image processing and scientific analysis of imaging modalities such as multi-frequency scanning probe microscopy, scanning tunneling spectroscopy, x-ray diffraction microscopy, and transmission electron microscopy. pycroscopy uses a data-centric model wherein the raw data collected from the microscope, results from analysis and processing routines are all written to standardized hierarchical data format (HDF5) files for traceability, reproducibility, and provenance.
With pycroscopy we aim to:
1. Serve as a hub for collaboration across scientific domains (microscopists, material scientists, biologists...)
2. provide a community-developed, open standard for data formatting
3. provide a framework for developing data analysis routines
4. significantly lower the barrier to advanced data analysis procedures by simplifying I/O, processing, visualization, etc.
To learn more about the motivation, general structure, and philosophy of pycroscopy, please read this `short introduction <https://github.com/pycroscopy/pycroscopy/blob/master/docs/pycroscopy_2017_07_11.pdf>`_.
Who?
-----------
This project begun largely as an effort by scientists and engineers at the **C**\enter for **N**\anophase
**M**\aterials **S**\ciences (`CNMS <https://www.ornl.gov/facility/cnms>`_) to implement a python library
that can support the I/O, processing, and analysis of the gargantuan stream of images that their microscopes
generate (thanks to the large CNMS users community!).
By sharing our methodology and code for analyzing materials imaging we hope that it will benefit the wider
community of materials science/physics. We also hope, quite ardently, that other materials scientists would
follow suit.
**The (core) pycroscopy team:**
* `@ssomnath <https://github.com/ssomnath>`_ (Suhas Somnath),
* `@CompPhysChris <https://github.com/CompPhysChris>`_ (Chris R. Smith),
* `@nlaanait <https://github.com/nlaanait>`_ (Numan Laanait),
* `@stephenjesse <https://github.com/stephenjesse>`_ (Stephen Jesse)
* and many more...
Why?
---------------
There is that little thing called open science...
As we see it, there are a few opportunities in microscopy / imaging and materials science:
**1. Growing data sizes**
* Cannot use desktop computers for analysis
* *Need: High performance computing, storage resources and compatible, scalable file structures*
**2. Increasing data complexity**
* Sophisticated imaging and spectroscopy modes resulting in 5,6,7... dimensional data
* *Need: Robust software and generalized data formatting*
**3. Multiple file formats**
* Different formats from each instrument. Proprietary in most cases
* Incompatible for correlation
* *Need: Open, instrument independent data format*
**4. Disjoint communities**
* Similar analysis routines written by each community (SPM, STEM, TOF SIMs, XRD...) *independently*!
* *Need: Centralized repository, instrument agonistic analysis routines that bring communities together*
**5. Expensive analysis software**
* Software supplied with instruments often insufficient / incapable of custom analysis routines
* Commercial software (Eg: Matlab, Origin..) are often prohibitively expensive.
* *Need: Free, powerful, open souce, user-friendly software*
How?
-----------------
* pycroscopy uses an **instrument agnostic data structure** that facilitates the storage of data, regardless
of dimensionality (conventional 2D images to 9D multispectral SPM datasets) or instrument of origin (AFMs,
STMs, STEMs, TOF SIMS, and many more).
* This general defenition of data allows us to write a single and
generalized version of analysis and processing functions that can be applied to any kind of data.
* The data is stored in `heirarchical
data format (HDF5) <http://extremecomputingtraining.anl.gov/files/2015/03/HDF5-Intro-aug7-130.pdf>`_
files which:
* Allow easy and open acceess to data from any programming language.
* Accomodate datasets ranging from kilobytes (kB) to petabytes (pB)
* Are readily compaible with supercomputers and support parallel I/O
* Allows storage of relevant parameters along with data for improved traceability and reproducability of
analysis
* Scientific workflows are developed and disseminated through `jupyter notebooks <http://jupyter.org/>`_
that are interactive and portable web applications containing, text, images, code / scripts, and text-based
and graphical results
* Once a user converts their microscope's data format into a HDF5 format, by simply extending some of the
classes in \`io\`, the user gains access to the rest of the utilities present in `pycroscopy.\*`.
Package Structure
-----------------
The package structure is simple, with 4 main modules:
1. **io**: Reading and writing to HDF5 files + translating data from custom & proprietary microscope formats to HDF5.
2. **processing**: multivariate statistics, machine Learning, and signal filtering.
3. **analysis**: model-dependent analysis of information.
4. **viz**: Plotting functions and interactive jupyter widgets to visualize multidimenional data
Acknowledgements
----------------
Besides the packages used in pycroscopy, we would like to thank the developers of the following software
packages:
+ `Python <https://www.python.org>`_
+ `Anaconda Python <https://www.continuum.io/anaconda-overview>`_
+ `jupyter <http://jupyter.org/>`_
+ `PyCharm <https://www.jetbrains.com/pycharm/>`_
+ `GitKraken <https://www.gitkraken.com/>`_
Pycroscopy API Reference
========================
API Reference
=============
0. Description
--------------
A python package for image processing and scientific analysis of imaging modalities such as multi-frequency scanning probe microscopy,
scanning tunneling spectroscopy, x-ray diffraction microscopy, and transmission electron microscopy.
Classes implemented here are ported to a high performance computing platform at Oak Ridge National Laboratory (ORNL).
1. Package Structure
--------------------
Package Structure
-----------------
The package structure is simple, with 4 main modules:
1. `io`: Input/Output from custom & proprietary microscope formats to HDF5.
2. `processing`: Multivariate Statistics, Machine Learning, and Filtering.
3. `analysis`: Model-dependent analysis of image information.
4. `viz`: Visualization and interactive slicing of high-dimensional data by lightweight Qt viewers.
Once a user converts their microscope's data format into an HDF5 format, by simply extending some of the classes in `io`, the user gains access to the rest of the utilities present in `pycroscopy.*`.
.. currentmodule:: pycroscopy
.. automodule:: pycroscopy
......
{
"metadata": {
"language_info": {
"mimetype": "text/x-python",
"name": "python",
"version": "3.5.2",
"file_extension": ".py",
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"codemirror_mode": {
"name": "ipython",
"version": 3
}
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
},
"cells": [
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"%matplotlib inline"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n\n======================================================================================\nTutorial 1: Data Translation\n======================================================================================\n\n**Suhas Somnath**\n8/8/2017\n\nThis set of tutorials will serve as examples for developing end-to-end workflows for and using pycroscopy.\n\n**In this example, we extract data and parameters from a Scanning Tunnelling Spectroscopy (STS) raw data file, as\nobtained from an Omicron STM, and write these to a pycroscopy compatible data file.**\n\n\nPrerequisites:\n==============\n\nBefore proceeding with this example series, we recommend reading the previous documents to learn more about:\n\n1. Data and file formats\n * Why you should care about data formats\n * Current state of data formats in microscopy\n * Structuring data in pycroscopy\n\n2. HDF5 file format\n\n\nIntroduction to Data Translation\n================================\n\nBefore any data analysis, we need to access data stored in the raw file(s) generated by the microscope. Often, the\ndata and parameters in these files are **not** straightforward to access. In certain cases, additional / dedicated\nsoftware packages are necessary to access the data while in many other cases, it is possible to extract the necessary\ninformation from built-in **numpy** or similar python packages included with **anaconda**.\n\nPycroscopy aims to make data access, storage, curation, etc. simply by storing the data along with all\nrelevant parameters in a single **.hdf5** or **.h5** file.\n\nThe process of copying data from the original format to **pycroscopy compatible hdf5 files** is called\n**Translation** and the classes available in pycroscopy that perform these operation are called **Translators**\n\n\nWriting Your First Data Translator\n==================================\n\n**The goal in this section is to trandslate the .asc file obtained from an Omicron microscope into a pycroscopy\ncompatible .h5 file.**\n\nWhile there is an **AscTranslator** avialable in pycroscopy that can translate these files in just a **single** line,\nwe will intentionally assume that no such translator is avialable. Using a handful of useful functions in pycroscopy,\nwe will translate the files from the source **.asc** format to the pycroscopy compatible **.h5** in just a few lines.\nThe code developed below is essentially the **AscTranslator**. The same methodology can be used to translate other data\nformats\n\n\nSetting up the notebook\n=======================\n\nThere are a few setup procedures that need to be followed before any code is written. In this step, we simply load a\nfew python packages that will be necessary in the later steps.\n\n\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"# Ensure python 3 compatibility:\nfrom __future__ import division, print_function, absolute_import, unicode_literals\n\n# The package for accessing files in directories, etc.:\nimport os\n\n# Warning package in case something goes wrong\nfrom warnings import warn\n\n# Package for downloading online files:\ntry:\n # This package is not part of anaconda and may need to be installed.\n import wget\nexcept ImportError:\n warn('wget not found. Will install with pip.')\n import pip\n pip.main(['install', 'wget'])\n import wget\n\n# The mathematical computation package:\nimport numpy as np\n\n# The package used for creating and manipulating HDF5 files:\nimport h5py\n\n# Packages for plotting:\nimport matplotlib.pyplot as plt\n\n# Finally import pycroscopy for certain scientific analysis:\ntry:\n import pycroscopy as px\nexcept ImportError:\n warn('pycroscopy not found. Will install with pip.')\n import pip\n pip.main(['install', 'pycroscopy'])\n import pycroscopy as px"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"0. Select the Raw Data file\n===========================\nDownload the data file from Github:\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"url = 'https://raw.githubusercontent.com/pycroscopy/pycroscopy/master/data/STS.asc'\ndata_file_path = 'temp_1.asc'\nif os.path.exists(data_file_path):\n os.remove(data_file_path)\n_ = wget.download(url, data_file_path, bar=None)"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Exploring the Raw Data File\n==============================\n\nInherently, one may not know how to read these **.asc** files. One option is to try and read the file as a text file\none line at a time.\n\nIt turns out that these .asc files are effectively the standard **ASCII** text files.\n\nHere is how we tested to see if the **asc** files could be interpreted as text files. Below, we read just thefirst 10\nlines in the file\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"with open(data_file_path, 'r') as file_handle:\n for lin_ind in range(10):\n print(file_handle.readline())"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"2. Loading the data\n===================\nNow that we know that these files are simple text files, we can manually go through the file to find out which lines\nare important, at what lines the data starts etc.\nManual investigation of such .asc files revealed that these files are always formatted in the same way. Also, they\ncontain parameters in the first 403 lines and then contain data which is arranged as one pixel per row.\nSTS experiments result in 3 dimensional datasets (X, Y, current). In other words, a 1D array of current data (as a\nfunction of excitation bias) is sampled at every location on a two dimensional grid of points on the sample.\nBy knowing where the parameters are located and how the data is structured, it is possible to extract the necessary\ninformation from these files.\nSince we know that the data sizes (<200 MB) are much smaller than the physical memory of most computers, we can start\nby safely loading the contents of the entire file to memory\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"# Extracting the raw data into memory\nfile_handle = open(data_file_path, 'r')\nstring_lines = file_handle.readlines()\nfile_handle.close()"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"3. Read the parameters\n======================\nThe parameters in these files are present in the first few lines of the file\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"# Reading parameters stored in the first few rows of the file\nparm_dict = dict()\nfor line in string_lines[3:17]:\n line = line.replace('# ', '')\n line = line.replace('\\n', '')\n temp = line.split('=')\n test = temp[1].strip()\n try:\n test = float(test)\n # convert those values that should be integers:\n if test % 1 == 0:\n test = int(test)\n except ValueError:\n pass\n parm_dict[temp[0].strip()] = test\n\n# Print out the parameters extracted\nfor key in parm_dict.keys():\n print(key, ':\\t', parm_dict[key])"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"3.a Prepare to read the data\n============================\nBefore we read the data, we need to make an empty array to store all this data. In order to do this, we need to read\nthe dictionary of parameters we made in step 2 and extract necessary quantities\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"num_rows = int(parm_dict['y-pixels'])\nnum_cols = int(parm_dict['x-pixels'])\nnum_pos = num_rows * num_cols\nspectra_length = int(parm_dict['z-points'])"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"3.b Read the data\n=================\nData is present after the first 403 lines of parameters.\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"# num_headers = len(string_lines) - num_pos\nnum_headers = 403\n\n# Extract the STS data from subsequent lines\nraw_data_2d = np.zeros(shape=(num_pos, spectra_length), dtype=np.float32)\nfor line_ind in range(num_pos):\n this_line = string_lines[num_headers + line_ind]\n string_spectrum = this_line.split('\\t')[:-1] # omitting the new line\n raw_data_2d[line_ind] = np.array(string_spectrum, dtype=np.float32)"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"4.a Preparing some necessary parameters\n=======================================\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"max_v = 1 # This is the one parameter we are not sure about\n\nfolder_path, file_name = os.path.split(data_file_path)\nfile_name = file_name[:-4] + '_'\n\n# Generate the x / voltage / spectroscopic axis:\nvolt_vec = np.linspace(-1 * max_v, 1 * max_v, spectra_length)\n\nh5_path = os.path.join(folder_path, file_name + '.h5')"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"4b. Calling the NumpyTranslator to create the pycroscopy data file\n==================================================================\nThe NumpyTranslator simplifies the ceation of pycroscopy compatible datasets. It handles the file creation,\ndataset creation and writing, creation of ancillary datasets, datagroup creation, writing parameters, linking\nancillary datasets to the main dataset etc. With a single call to the NumpyTranslator, we complete the translation\nprocess.\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"tran = px.io.NumpyTranslator()\nh5_path = tran.translate(h5_path, raw_data_2d, num_rows, num_cols,\n qty_name='Current', data_unit='nA', spec_name='Bias',\n spec_unit='V', spec_val=volt_vec, scan_height=100,\n scan_width=200, spatial_unit='nm', data_type='STS',\n translator_name='ASC', parms_dict=parm_dict)"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notes on pycroscopy translation\n===============================\n* Steps 1-3 would be performed anyway in order to begin data analysis\n* The actual pycroscopy translation step are reduced to just 3-4 lines in step 4.\n* While this approach is feasible and encouraged for simple and small data, it may be necessary to use lower level\n calls to write efficient translators\n\nVerifying the newly written H5 file:\n====================================\n* We will only perform some simple and quick verification to show that the data has indeed been translated corectly.\n* Please see the next notebook in the example series to learn more about reading and accessing data.\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"with h5py.File(h5_path, mode='r') as h5_file:\n # See if a tree has been created within the hdf5 file:\n px.hdf_utils.print_tree(h5_file)\n\n h5_main = h5_file['Measurement_000/Channel_000/Raw_Data']\n fig, axes = plt.subplots(ncols=2, figsize=(11, 5))\n spat_map = np.reshape(h5_main[:, 100], (100, 100))\n px.plot_utils.plot_map(axes[0], spat_map, origin='lower')\n axes[0].set_title('Spatial map')\n axes[0].set_xlabel('X')\n axes[0].set_ylabel('Y')\n axes[1].plot(np.linspace(-1.0, 1.0, h5_main.shape[1]),\n h5_main[250])\n axes[1].set_title('IV curve at a single pixel')\n axes[1].set_xlabel('Tip bias [V]')\n axes[1].set_ylabel('Current [nA]')\n\n# Remove both the original and translated files:\nos.remove(h5_path)\nos.remove(data_file_path)"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
}
],
"nbformat": 4,
"nbformat_minor": 0,
"nbformat": 4
"metadata": {
"language_info": {
"codemirror_mode": {
"version": 3,
"name": "ipython"
},
"nbconvert_exporter": "python",
"file_extension": ".py",
"version": "3.5.2",
"pygments_lexer": "ipython3",
"name": "python",
"mimetype": "text/x-python"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
}
}
\ No newline at end of file
......@@ -249,20 +249,20 @@ The parameters in these files are present in the first few lines of the file
Out::
x-pixels : 100
z-range : 2000000000
y-offset : -781.441
scanspeed : 59519000000
z-offset : 1116.49
z-points : 500
x-length : 29.7595
voidpixels : 0
x-length : 29.7595
z-offset : 1116.49
y-length : 29.7595
value-unit : nA
scanspeed : 59519000000
y-pixels : 100
z-unit : nV
y-offset : -781.441
z-range : 2000000000
z-section : 491
x-offset : -967.807
value-unit : nA
y-length : 29.7595
z-unit : nV
z-points : 500
x-pixels : 100
3.a Prepare to read the data
......@@ -418,7 +418,7 @@ Verifying the newly written H5 file:
Measurement_000/Channel_000/Spectroscopic_Values
**Total running time of the script:** ( 1 minutes 14.025 seconds)
**Total running time of the script:** ( 5 minutes 40.598 seconds)
......
......@@ -500,22 +500,22 @@ operation being performed on the same dataset. The index will then be updated ac
Measurement_000/Channel_000Raw_Data-Cluster_/Label_Spectroscopic_Values
Writing the following attrbutes to the group:
num_clusters : 9
cluster_algorithm : KMeans
timestamp : 2017_11_29-10_00_47
n_init : 10
n_jobs : 1
max_iter : 300
timestamp : 2017_11_16-14_33_52
precompute_distances : auto
algorithm : auto
random_state : None
num_clusters : 9
verbose : 0
algorithm : auto
machine_id : challtdow-ThinkPad-T530
n_init : 10
copy_x : True
n_clusters : 9
tol : 0.0001
num_samples : 10000
tol : 0.0001
init : k-means++
cluster_algorithm : KMeans
precompute_distances : auto
machine_id : challtdow-ThinkPad-T530
n_clusters : 9
max_iter : 300
copy_x : True
Write to H5 and access the written objects
......@@ -546,21 +546,21 @@ Once the tree is prepared (previous cell), ioHDF5 will handle all the file writi
Out::
Created group /Measurement_000/Channel_000/Raw_Data-Cluster_000
Writing attribute: timestamp with value: 2017_11_29-10_00_47
Writing attribute: n_init with value: 10
Writing attribute: n_jobs with value: 1
Writing attribute: algorithm with value: auto
Writing attribute: num_clusters with value: 9
Writing attribute: verbose with value: 0
Writing attribute: num_samples with value: 10000
Writing attribute: tol with value: 0.0001
Writing attribute: init with value: k-means++
Writing attribute: cluster_algorithm with value: KMeans
Writing attribute: n_jobs with value: 1
Writing attribute: max_iter with value: 300
Writing attribute: timestamp with value: 2017_11_16-14_33_52
Writing attribute: precompute_distances with value: auto
Writing attribute: verbose with value: 0
Writing attribute: algorithm with value: auto
Writing attribute: machine_id with value: challtdow-ThinkPad-T530
Writing attribute: n_init with value: 10
Writing attribute: copy_x with value: True
Writing attribute: n_clusters with value: 9
Writing attribute: tol with value: 0.0001
Writing attribute: num_samples with value: 10000
Writing attribute: init with value: k-means++
Writing attribute: max_iter with value: 300
Writing attribute: copy_x with value: True
Wrote attributes to group: Raw_Data-Cluster_000
Created Dataset /Measurement_000/Channel_000/Raw_Data-Cluster_000/Labels
......@@ -724,7 +724,7 @@ Deletes the temporary files created in the example