Commit d70245b6 authored by syz's avatar syz
Browse files

updated from master

parents 99556b27 7ffa2f29
......@@ -407,7 +407,7 @@ a simple for loop.
Out::
Serial computation took 32.51 seconds
Serial computation took 31.07 seconds
2. Parallel Computation
......@@ -452,7 +452,7 @@ pycroscopy. The below parallel computation is reduced to a single line with this
Out::
Parallel computation took 22.05 seconds
Parallel computation took 17.36 seconds
Compare the results
......@@ -531,7 +531,7 @@ Please see another example on how to write a Process class for Pycroscopy
**Total running time of the script:** ( 1 minutes 37.320 seconds)
**Total running time of the script:** ( 1 minutes 22.369 seconds)
......
{
"metadata": {
"language_info": {
"mimetype": "text/x-python",
"name": "python",
"version": "3.5.2",
"file_extension": ".py",
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"codemirror_mode": {
"name": "ipython",
"version": 3
}
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
},
"cells": [
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"%matplotlib inline"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n=================================================================\nTutorial 5: Formalizing Data Processing\n=================================================================\n\n**Suhas Somnath**\n\n9/8/2017\n\n\nThis set of tutorials will serve as examples for developing end-to-end workflows for and using pycroscopy.\n\n**In this example, we will learn how to write a simple yet formal pycroscopy class for processing data.**\n\nIntroduction\n============\n\nData processing / analysis typically involves a few basic tasks:\n1. Reading data from file\n2. Computation\n3. Writing results to disk\n\nThis example is based on the parallel computing example where we fit a dataset containing spectra at each location to a\nfunction. While the previous example focused on comparing serial and parallel computing, we will focus on the framework\nthat needs to be built around a computation for robust data processing. As the example will show below, the framework\nessentially deals with careful file reading and writing.\n\nThe majority of the code for this example is based on the BESHOModel Class under pycroscopy.analysis\n\n"
],
"cell_type": "markdown"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Import necessary packages\n=========================\n\nEnsure python 3 compatibility:\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"from __future__ import division, print_function, absolute_import, unicode_literals\n\n# The package for accessing files in directories, etc.:\nimport os\n\n# Warning package in case something goes wrong\nfrom warnings import warn\n\n# Package for downloading online files:\ntry:\n # This package is not part of anaconda and may need to be installed.\n import wget\nexcept ImportError:\n warn('wget not found. Will install with pip.')\n import pip\n pip.main(['install', 'wget'])\n import wget\n\n# The mathematical computation package:\nimport numpy as np\nfrom numpy import exp, abs, sqrt, sum, real, imag, arctan2, append\n\n# The package used for creating and manipulating HDF5 files:\nimport h5py\n\n# Packages for plotting:\nimport matplotlib.pyplot as plt\n\n# Finally import pycroscopy for certain scientific analysis:\ntry:\n import pycroscopy as px\nexcept ImportError:\n warn('pycroscopy not found. Will install with pip.')\n import pip\n pip.main(['install', 'pycroscopy'])\n import pycroscopy as px\n\n\nfield_names = ['Amplitude [V]', 'Frequency [Hz]', 'Quality Factor', 'Phase [rad]']\nsho32 = np.dtype({'names': field_names,\n 'formats': [np.float32 for name in field_names]})"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Build the class\n===============\n\nEvery process class consists of the same basic functions:\n1. __init__ - instantiates a 'Process' object of this class after validating the inputs\n2. _create_results_datasets - creates the HDF5 datasets and datagroups to store the results.\n3. _unit_function - this is the operation that will per be performed on each element in the dataset.\n4. compute - This function essentially applies the unit function to every single element in the dataset.\n5. _write_results_chunk - writes the computed results back to the file\n\nNote that:\n\n* Only the code specific to this process needs to be implemented. However, the generic portions common to most\n Processes will be handled by the Process class.\n* The other functions such as the sho_function, sho_fast_guess function are all specific to this process. These have\n been inherited directly from the BE SHO model.\n* While the class appears to be large, remember that the majority of it deals with the creation of the datasets to store\n the results and the actual function that one would have anyway regardless of serial / parallel computation of the\n function. The additional code to turn this operation into a Pycroscopy Process is actually rather minimal. As\n described earlier, the goal of the Process class is to modularize and compartmentalize the main sections of the code\n in order to facilitate faster and more robust implementation of data processing algorithms.\n\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"class ShoGuess(px.Process):\n\n def __init__(self, h5_main, cores=None):\n \"\"\"\n Validate the inputs and set some parameters\n\n Parameters\n ----------\n h5_main - dataset to compute on\n cores - Number of CPU cores to use for computation - Optional\n \"\"\"\n super(ShoGuess, self).__init__(h5_main, cores)\n\n # find the frequency vector\n h5_spec_vals = px.hdf_utils.getAuxData(h5_main, 'Spectroscopic_Values')[-1]\n self.freq_vec = np.squeeze(h5_spec_vals.value) * 1E-3\n\n def _create_results_datasets(self):\n \"\"\"\n Creates the datasets an datagroups necessary to store the results.\n Just as the raw data is stored in the pycroscopy format, the results also need to conform to the same\n standards. Hence, the create_datasets function can appear to be a little longer than one might expect.\n \"\"\"\n h5_spec_inds = px.hdf_utils.getAuxData(self.h5_main, auxDataName=['Spectroscopic_Indices'])[0]\n h5_spec_vals = px.hdf_utils.getAuxData(self.h5_main, auxDataName=['Spectroscopic_Values'])[0]\n\n self.step_start_inds = np.where(h5_spec_inds[0] == 0)[0]\n self.num_udvs_steps = len(self.step_start_inds)\n \n ds_guess = px.MicroDataset('Guess', data=[],\n maxshape=(self.h5_main.shape[0], self.num_udvs_steps),\n chunking=(1, self.num_udvs_steps), dtype=sho32)\n\n not_freq = px.hdf_utils.get_attr(h5_spec_inds, 'labels') != 'Frequency'\n\n ds_sho_inds, ds_sho_vals = px.hdf_utils.buildReducedSpec(h5_spec_inds, h5_spec_vals, not_freq,\n self.step_start_inds)\n\n dset_name = self.h5_main.name.split('/')[-1]\n sho_grp = px.MicroDataGroup('-'.join([dset_name, 'SHO_Fit_']), self.h5_main.parent.name[1:])\n sho_grp.addChildren([ds_guess, ds_sho_inds, ds_sho_vals])\n sho_grp.attrs['SHO_guess_method'] = \"pycroscopy BESHO\"\n\n h5_sho_grp_refs = self.hdf.writeData(sho_grp)\n\n self.h5_guess = px.hdf_utils.getH5DsetRefs(['Guess'], h5_sho_grp_refs)[0]\n self.h5_results_grp = self.h5_guess.parent\n h5_sho_inds = px.hdf_utils.getH5DsetRefs(['Spectroscopic_Indices'],\n h5_sho_grp_refs)[0]\n h5_sho_vals = px.hdf_utils.getH5DsetRefs(['Spectroscopic_Values'],\n h5_sho_grp_refs)[0]\n\n # Reference linking before actual fitting\n px.hdf_utils.linkRefs(self.h5_guess, [h5_sho_inds, h5_sho_vals])\n # Linking ancillary position datasets:\n aux_dsets = px.hdf_utils.getAuxData(self.h5_main, auxDataName=['Position_Indices', 'Position_Values'])\n px.hdf_utils.linkRefs(self.h5_guess, aux_dsets)\n print('Finshed creating datasets')\n\n def compute(self, *args, **kwargs):\n \"\"\"\n Apply the unit_function to the entire dataset. Here, we simply extend the existing compute function and only\n pass the parameters for the unit function. In this case, the only parameter is the frequency vector.\n\n Parameters\n ----------\n args\n kwargs\n\n Returns\n -------\n\n \"\"\"\n return super(ShoGuess, self).compute(w_vec=self.freq_vec)\n\n def _write_results_chunk(self):\n \"\"\"\n Write the computed results back to the H5 file\n \"\"\"\n # converting from a list to a 2D numpy array\n self._results = np.array(self._results, dtype=np.float32)\n self.h5_guess[:, 0] = px.io_utils.realToCompound(self._results, sho32)\n\n # Now update the start position\n self._start_pos = self._end_pos\n # this should stop the computation.\n\n @staticmethod\n def _unit_function():\n\n return px.be_sho.SHOestimateGuess"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load the dataset\n================\n\nFor this example, we will be working with a Band Excitation Piezoresponse Force Microscopy (BE-PFM) imaging dataset\nacquired from advanced atomic force microscopes. In this dataset, a spectra was collected for each position in a two\ndimensional grid of spatial locations. Thus, this is a three dimensional dataset that has been flattened to a two\ndimensional matrix in accordance with the pycroscopy data format.\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"# download the raw data file from Github:\nh5_path = 'temp.h5'\nurl = 'https://raw.githubusercontent.com/pycroscopy/pycroscopy/master/data/BELine_0004.h5'\nif os.path.exists(h5_path):\n os.remove(h5_path)\n_ = wget.download(url, h5_path, bar=None)"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"# Open the file in read-only mode\nh5_file = h5py.File(h5_path, mode='r+')\n\n# Get handles to the the raw data along with other datasets and datagroups that contain necessary parameters\nh5_meas_grp = h5_file['Measurement_000']\nnum_rows = px.hdf_utils.get_attr(h5_meas_grp, 'grid_num_rows')\nnum_cols = px.hdf_utils.get_attr(h5_meas_grp, 'grid_num_cols')\n\n# Getting a reference to the main dataset:\nh5_main = h5_meas_grp['Channel_000/Raw_Data']\n\n# Extracting the X axis - vector of frequencies\nh5_spec_vals = px.hdf_utils.getAuxData(h5_main, 'Spectroscopic_Values')[-1]\nfreq_vec = np.squeeze(h5_spec_vals.value) * 1E-3"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use the ShoGuess class, defined earlier, to calculate the four\nparameters of the complex gaussian.\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"fitter = ShoGuess(h5_main, cores=1)\nh5_results_grp = fitter.compute()\nh5_guess = h5_results_grp['Guess']\n\nrow_ind, col_ind = 103, 19\npix_ind = col_ind + row_ind * num_cols\nresp_vec = h5_main[pix_ind]\nnorm_guess_parms = h5_guess[pix_ind]\n\n# Converting from compound to real:\nnorm_guess_parms = px.io_utils.compound_to_scalar(norm_guess_parms)\nprint('Functional fit returned:', norm_guess_parms)\nnorm_resp = px.be_sho.SHOfunc(norm_guess_parms, freq_vec)"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Plot the Amplitude and Phase of the gaussian versus the raw data.\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"fig, axes = plt.subplots(nrows=2, sharex=True, figsize=(5, 10))\nfor axis, func, title in zip(axes.flat, [np.abs, np.angle], ['Amplitude (a.u.)', 'Phase (rad)']):\n axis.scatter(freq_vec, func(resp_vec), c='red', label='Measured')\n axis.plot(freq_vec, func(norm_resp), 'black', lw=3, label='Guess')\n axis.set_title(title, fontsize=16)\n axis.legend(fontsize=14)\n\naxes[1].set_xlabel('Frequency (kHz)', fontsize=14)\naxes[0].set_ylim([0, np.max(np.abs(resp_vec)) * 1.1])\naxes[1].set_ylim([-np.pi, np.pi])"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Delete the temporarily downloaded file**\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"h5_file.close()\nos.remove(h5_path)"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
}
],
"nbformat": 4,
"nbformat_minor": 0,
"nbformat": 4
"metadata": {
"language_info": {
"codemirror_mode": {
"version": 3,
"name": "ipython"
},
"nbconvert_exporter": "python",
"file_extension": ".py",
"version": "3.5.2",
"pygments_lexer": "ipython3",
"name": "python",
"mimetype": "text/x-python"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
}
}
\ No newline at end of file
......@@ -345,7 +345,7 @@ Plot the Amplitude and Phase of the gaussian versus the raw data.
**Total running time of the script:** ( 0 minutes 49.348 seconds)
**Total running time of the script:** ( 0 minutes 57.835 seconds)
......
......@@ -48,9 +48,9 @@ Pycroscopy Examples
<div style='clear:both'></div>
==============================
Pycroscopy Developer Tutorials
==============================
===================
Developer Tutorials
===================
.. raw:: html
......@@ -164,9 +164,9 @@ Pycroscopy Publications
<div style='clear:both'></div>
=========================
Pycroscopy User Tutorials
=========================
==============
User Tutorials
==============
.. raw:: html
......
{
"metadata": {
"language_info": {
"mimetype": "text/x-python",
"name": "python",
"version": "3.5.2",
"file_extension": ".py",
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"codemirror_mode": {
"name": "ipython",
"version": 3
}
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
},
"cells": [
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"%matplotlib inline"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# Writing to hdf5 using the Microdata objects\n\n\n\n\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"# Code source: Chris Smith -- cq6@ornl.gov\n# Liscense: MIT\n\nimport os\nimport numpy as np\nimport pycroscopy as px"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create some MicroDatasets and MicroDataGroups that will be written to the file.\nWith h5py, groups and datasets must be created from the top down,\nbut the Microdata objects allow us to build them in any order and link them later.\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"# First create some data\ndata1 = np.random.rand(5, 7)"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now use the array to build the dataset. This dataset will live\ndirectly under the root of the file. The MicroDataset class also implements the\ncompression and chunking parameters from h5py.Dataset.\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"ds_main = px.MicroDataset('Main_Data', data=data1, parent='/')"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also create an empty dataset and write the values in later\nWith this method, it is neccessary to specify the dtype and maxshape kwarg parameters.\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"ds_empty = px.MicroDataset('Empty_Data', data=[], dtype=np.float32, maxshape=[7, 5, 3])"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also create groups and add other MicroData objects as children.\nIf the group's parent is not given, it will be set to root.\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"data_group = px.MicroDataGroup('Data_Group', parent='/')\n\nroot_group = px.MicroDataGroup('/')\n\n# After creating the group, we then add an existing object as its child.\ndata_group.addChildren([ds_empty])\nroot_group.addChildren([ds_main, data_group])"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The showTree method allows us to view the data structure before the hdf5 file is\ncreated.\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"root_group.showTree()"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have created the objects, we can write them to an hdf5 file\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"# First we specify the path to the file\nh5_path = 'microdata_test.h5'\n\n# Then we use the ioHDF5 class to build the file from our objects.\nhdf = px.ioHDF5(h5_path)"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The writeData method builds the hdf5 file using the structure defined by the\nMicroData objects. It returns a list of references to all h5py objects in the\nnew file.\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"h5_refs = hdf.writeData(root_group, print_log=True)\n\n# We can use these references to get the h5py dataset and group objects\nh5_main = px.io.hdf_utils.getH5DsetRefs(['Main_Data'], h5_refs)[0]\nh5_empty = px.io.hdf_utils.getH5DsetRefs(['Empty_Data'], h5_refs)[0]"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Compare the data in our dataset to the original\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"print(np.allclose(h5_main[()], data1))"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As mentioned above, we can now write to the Empty_Data object\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"data2 = np.random.rand(*h5_empty.shape)\nh5_empty[:] = data2[:]"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we are using h5py objects, we must use flush to write the data to file\nafter it has been altered.\nWe need the file object to do this. It can be accessed as an attribute of the\nhdf object.\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"h5_file = hdf.file\nh5_file.flush()"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we are done, we should close the file so that it can be accessed elsewhere.\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"h5_file.close()\nos.remove(h5_path)"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
}