Commit d70245b6 authored by syz's avatar syz
Browse files

updated from master

parents 99556b27 7ffa2f29
{
"metadata": {
"language_info": {
"mimetype": "text/x-python",
"name": "python",
"version": "3.5.2",
"file_extension": ".py",
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"codemirror_mode": {
"name": "ipython",
"version": 3
}
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
},
"cells": [
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"%matplotlib inline"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n=================================================================\nTutorial 4: Parallel Computing\n=================================================================\n\n**Suhas Somnath, Chris R. Smith**\n\n9/8/2017\n\n\nThis set of tutorials will serve as examples for developing end-to-end workflows for and using pycroscopy.\n\n**In this example, we will learn how to compute using all available cores on a computer.** Note, that this is\napplicable only for a single CPU. Please refer to another advanced example for multi-CPU computing.\n\nQuite often, we need to perform the same operation on every single component in our data. One of the most popular\nexamples is functional fitting applied to spectra collected at each location on a grid. While, the operation itself\nmay not take very long, computing this operation thousands of times, once per location, using a single CPU core can\ntake a long time to complete. Most personal computers today come with at least two cores, and in many cases, each of\nthese cores is represented via two logical cores, thereby summing to a total of at least four cores. Thus, it is\nprudent to make use of these unused cores whenever possible. Fortunately, there are a few python packages that\nfacilitate the efficient use of all CPU cores with minimal modifications to the existing code.\n\n**Here, we show how one can fit the thousands of spectra, one at each location, using multiple CPU cores.**\n\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"# Ensure python 3 compatibility:\nfrom __future__ import division, print_function, absolute_import, unicode_literals\n\n# The package for accessing files in directories, etc.:\nimport os\n\n# Warning package in case something goes wrong\nfrom warnings import warn\n\n# Package for downloading online files:\ntry:\n # This package is not part of anaconda and may need to be installed.\n import wget\nexcept ImportError:\n warn('wget not found. Will install with pip.')\n import pip\n pip.main(['install', 'wget'])\n import wget\n\n# The mathematical computation package:\nimport numpy as np\n\n# The package used for creating and manipulating HDF5 files:\nimport h5py\n\n# Packages for plotting:\nimport matplotlib.pyplot as plt\n\n# Parallel computation library:\ntry:\n import joblib\nexcept ImportError:\n warn('joblib not found. Will install with pip.')\n import pip\n pip.main(['install', 'joblib'])\n import joblib\n\n# Timing\nimport time\n\n# Finally import pycroscopy for certain scientific analysis:\ntry:\n import pycroscopy as px\nexcept ImportError:\n warn('pycroscopy not found. Will install with pip.')\n import pip\n pip.main(['install', 'pycroscopy'])\n import pycroscopy as px"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load the dataset\n================\n\nFor this example, we will be working with a Band Excitation Piezoresponse Force Microscopy (BE-PFM) imaging dataset\nacquired from advanced atomic force microscopes. In this dataset, a spectra was collected for each position in a two\ndimensional grid of spatial locations. Thus, this is a three dimensional dataset that has been flattened to a two\ndimensional matrix in accordance with the pycroscopy data format.\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"# download the raw data file from Github:\nh5_path = 'temp.h5'\nurl = 'https://raw.githubusercontent.com/pycroscopy/pycroscopy/master/data/BELine_0004.h5'\nif os.path.exists(h5_path):\n os.remove(h5_path)\n_ = wget.download(url, h5_path)"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"# Open the file in read-only mode\nh5_file = h5py.File(h5_path, mode='r')\n\n# Get handles to the the raw data along with other datasets and datagroups that contain necessary parameters\nh5_meas_grp = h5_file['Measurement_000']\n\n# Getting a reference to the main dataset:\nh5_main = h5_meas_grp['Channel_000/Raw_Data']\nprint('\\nThe main dataset:\\n------------------------------------')\nprint(h5_main)\n\nnum_rows = px.hdf_utils.get_attr(h5_meas_grp, 'grid_num_rows')\nnum_cols = px.hdf_utils.get_attr(h5_meas_grp, 'grid_num_cols')\n\n# Extracting the X axis - vector of frequencies\nh5_spec_vals = px.hdf_utils.getAuxData(h5_main, 'Spectroscopic_Values')[-1]\nfreq_vec = np.squeeze(h5_spec_vals.value) * 1E-3"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Visualize the data\n==================\n\nVisualize the spectra at each of the locations using the interactive jupyter widgets below:\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"px.viz.be_viz_utils.jupyter_visualize_be_spectrograms(h5_main)"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The operation\n=============\nWe will be computing the parameters that would best describe these complex-valued spectra using a simple harmonic\noscillator model in the functions below. These functions have been taken from the BESHOFit submodule available in\npycroscopy.analysis.\n\nThe specifics of the functions are not of interest for this example. Instead, all we need to know\nis that we need to apply a function (SHOestimateGuess in our case) on each element in our dataset.\nthe functions below\n\n.. code-block:: python\n\n def SHOfunc(parms, w_vec):\n \"\"\"\n Generates the SHO response over the given frequency band\n\n Parameters\n -----------\n parms : list or tuple\n SHO parameters=(Amplitude, frequency ,Quality factor, phase)\n w_vec : 1D numpy array\n Vector of frequency values\n \"\"\"\n return parms[0] * exp(1j * parms[3]) * parms[1] ** 2 / (w_vec ** 2 - 1j * w_vec * parms[1] / parms[2] - parms[1] ** 2)\n\n\n def SHOestimateGuess(resp_vec, w_vec=None, num_points=5):\n \"\"\"\n Generates good initial guesses for fitting\n\n Parameters\n ------------\n resp_vec : 1D complex numpy array or list\n BE response vector as a function of frequency\n w_vec : 1D numpy array or list, Optional\n Vector of BE frequencies\n num_points : (Optional) unsigned int\n Quality factor of the SHO peak\n\n Returns\n ---------\n retval : tuple\n SHO fit parameters arranged as amplitude, frequency, quality factor, phase\n \"\"\"\n if w_vec is None:\n # Some default value\n w_vec = np.linspace(300E+3, 350E+3, resp_vec.size)\n\n ii = np.argsort(abs(resp_vec))[::-1]\n\n a_mat = np.array([])\n e_vec = np.array([])\n\n for c1 in range(num_points):\n for c2 in range(c1 + 1, num_points):\n w1 = w_vec[ii[c1]]\n w2 = w_vec[ii[c2]]\n X1 = real(resp_vec[ii[c1]])\n X2 = real(resp_vec[ii[c2]])\n Y1 = imag(resp_vec[ii[c1]])\n Y2 = imag(resp_vec[ii[c2]])\n\n denom = (w1 * (X1 ** 2 - X1 * X2 + Y1 * (Y1 - Y2)) + w2 * (-X1 * X2 + X2 ** 2 - Y1 * Y2 + Y2 ** 2))\n if denom > 0:\n a = ((w1 ** 2 - w2 ** 2) * (w1 * X2 * (X1 ** 2 + Y1 ** 2) - w2 * X1 * (X2 ** 2 + Y2 ** 2))) / denom\n b = ((w1 ** 2 - w2 ** 2) * (w1 * Y2 * (X1 ** 2 + Y1 ** 2) - w2 * Y1 * (X2 ** 2 + Y2 ** 2))) / denom\n c = ((w1 ** 2 - w2 ** 2) * (X2 * Y1 - X1 * Y2)) / denom\n d = (w1 ** 3 * (X1 ** 2 + Y1 ** 2) -\n w1 ** 2 * w2 * (X1 * X2 + Y1 * Y2) -\n w1 * w2 ** 2 * (X1 * X2 + Y1 * Y2) +\n w2 ** 3 * (X2 ** 2 + Y2 ** 2)) / denom\n\n if d > 0:\n a_mat = append(a_mat, [a, b, c, d])\n\n A_fit = abs(a + 1j * b) / d\n w0_fit = sqrt(d)\n Q_fit = -sqrt(d) / c\n phi_fit = arctan2(-b, -a)\n\n H_fit = A_fit * w0_fit ** 2 * exp(1j * phi_fit) / (\n w_vec ** 2 - 1j * w_vec * w0_fit / Q_fit - w0_fit ** 2)\n\n e_vec = append(e_vec,\n sum((real(H_fit) - real(resp_vec)) ** 2) +\n sum((imag(H_fit) - imag(resp_vec)) ** 2))\n if a_mat.size > 0:\n a_mat = a_mat.reshape(-1, 4)\n\n weight_vec = (1 / e_vec) ** 4\n w_sum = sum(weight_vec)\n\n a_w = sum(weight_vec * a_mat[:, 0]) / w_sum\n b_w = sum(weight_vec * a_mat[:, 1]) / w_sum\n c_w = sum(weight_vec * a_mat[:, 2]) / w_sum\n d_w = sum(weight_vec * a_mat[:, 3]) / w_sum\n\n A_fit = abs(a_w + 1j * b_w) / d_w\n w0_fit = sqrt(d_w)\n Q_fit = -sqrt(d_w) / c_w\n phi_fit = np.arctan2(-b_w, -a_w)\n\n H_fit = A_fit * w0_fit ** 2 * exp(1j * phi_fit) / (w_vec ** 2 - 1j * w_vec * w0_fit / Q_fit - w0_fit ** 2)\n\n if np.std(abs(resp_vec)) / np.std(abs(resp_vec - H_fit)) < 1.2 or w0_fit < np.min(w_vec) or w0_fit > np.max(\n w_vec):\n p0 = sho_fast_guess(w_vec, resp_vec)\n else:\n p0 = np.array([A_fit, w0_fit, Q_fit, phi_fit])\n else:\n p0 = sho_fast_guess(resp_vec, w_vec)\n\n return p0\n\n\n def sho_fast_guess(resp_vec, w_vec, qual_factor=200):\n \"\"\"\n Default SHO guess from the maximum value of the response\n\n Parameters\n ------------\n resp_vec : 1D complex numpy array or list\n BE response vector as a function of frequency\n w_vec : 1D numpy array or list\n Vector of BE frequencies\n qual_factor : float\n Quality factor of the SHO peak\n\n Returns\n -------\n retval : 1D numpy array\n SHO fit parameters arranged as [amplitude, frequency, quality factor, phase]\n \"\"\"\n amp_vec = abs(resp_vec)\n i_max = int(len(resp_vec) / 2)\n return np.array([np.mean(amp_vec) / qual_factor, w_vec[i_max], qual_factor, np.angle(resp_vec[i_max])])\n\n"
],
"cell_type": "markdown"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Testing the function\n====================\nLet's see what the operation on an example spectra returns. The function essentially returns four parameters that can\ncapture the the shape of the spectra.\n\nA single call to the function does not take substantial time. However, performing the same operation on each of the\n16,384 pixels can take substantial time\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"row_ind, col_ind = 103, 19\nresp_vec = h5_main[col_ind + row_ind*num_cols]\nnorm_guess_parms = px.analysis.be_sho.SHOestimateGuess(resp_vec, freq_vec)\nprint('Functional fit returned:', norm_guess_parms)\nnorm_resp = px.analysis.be_sho.SHOfunc(norm_guess_parms, freq_vec)\n\n\nfig, axes = plt.subplots(ncols=2, figsize=(10, 5))\nfor axis, func, title in zip(axes.flat, [np.abs, np.angle], ['Amplitude (a.u.)', 'Phase (rad)']):\n axis.scatter(freq_vec, func(resp_vec), c='red', label='Measured')\n axis.plot(freq_vec, func(norm_resp), 'black', lw=3, label='Guess')\n axis.set_title(title, fontsize=16)\n axis.legend(fontsize=14)\n axis.set_xlabel('Frequency (kHz)', fontsize=14)\n\naxes[0].set_ylim([0, np.max(np.abs(resp_vec))*1.1])\naxes[1].set_ylim([-np.pi, np.pi])"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Applying the function to the entire dataset\n===========================================\n\nWe will be comparing the:\n1. Traditional - serial computation approach\n2. Parallel computation\n\nIn an effort to avoid reading / writing to the data files, we will read the entire dataset to memory.\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"raw_data = h5_main[()]\n\nserial_results = np.zeros((raw_data.shape[0], 4), dtype=np.float)"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Serial Computation\n---------------------\nThe simplest method to compute the paramters for each spectra in the dataset is by looping over each position using\na simple for loop.\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"t_0 = time.time()\nfor pix_ind in range(raw_data.shape[0]):\n serial_results[pix_ind] = px.analysis.be_sho.SHOestimateGuess(raw_data[pix_ind], freq_vec)\nprint('Serial computation took', np.round(time.time()-t_0, 2), ' seconds')"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"2. Parallel Computation\n-----------------------\n\nThere are several libraries that can utilize multiple CPU cores to perform the same computation in parallel. Popular\nexamples are **Multiprocessing**, **Mutiprocess**, **Dask**, **Joblib** etc. Each of these has their own\nstrengths and weaknesses. An installation of **Anaconda** comes with **Multiprocessing** by default and could be\nthe example of choice. However, in our experience we found **Joblib** to offer the best balance of efficiency,\nsimplicity, portabiity, and ease of installation.\n\nFor illustrative purposes, we will only be demonstrating how the above serial computation can be made parallel using\n**Joblib**. We only need two lines to perform the parallel computation. The first line sets up the computational\njobs while the second performs the computation.\n\nNote that the first argument to the function **MUST** be the data vector itself. The other arguments (parameters),\nsuch as the frequency vector in this case, must come after the data argument. This approach allows the specification\nof both required arguments and optional (keyword) arguments.\n\nParallel computing has been made more accessible via the parallel_compute() function in the `process` module in\npycroscopy. The below parallel computation is reduced to a single line with this function.\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"func = px.analysis.be_sho.SHOestimateGuess\ncores = 4\nargs = freq_vec\n\nt_0 = time.time()\nvalues = [joblib.delayed(func)(x, args) for x in raw_data]\nparallel_results = joblib.Parallel(n_jobs=cores)(values)\nprint('Parallel computation took', np.round(time.time()-t_0, 2), ' seconds')"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Compare the results\n-------------------\n\nBy comparing the run-times for the two approaches, we see that the parallel computation is substantially faster than\nthe serial computation. Note that the numbers will differ between computers. Also, the computation was performed on\na relatively small dataset for illustrative purposes. The benefits of using such parallel computation will be far\nmore apparent for much larger datasets.\n\nLet's compare the results from both the serial and parallel methods to ensure they give the same results:\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"row_ind, col_ind = 103, 19\npix_ind = col_ind + row_ind * num_cols\nprint('Parallel and serial computation results matching:',\n np.all(np.isclose(serial_results[pix_ind], parallel_results[pix_ind])))"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Best practices for parallel computing\n=====================================\n\nWhile it may seem tempting to do everything in parallel, it is important to be aware of some of the trade-offs and\nbest-practices for parallel computing (multiple CPU cores) when compared to traditional serial computing (single\nCPU core):\n* There is noticable time overhead involved with setting up each parallel computing job. For very simple or small\ncomputations, this overhead may outweigh the speed-up gained with using multiple cores.\n* Parallelizing computations that read and write to files at each iteration may be actually be noticably __slower__\nthan serial computation since each core will compete with all other cores for rights to read and write to the file(s)\nand these input/output operations are by far the slowest components of the computation. Instead, it makes sense to\nread large amounts of data from the necessary files once, perform the computation, and then write to the files once\nafter all the computation is complete. In fact, this is what we automatically do in the **Analysis** and\n**Process** class in **pycroscopy**\n\nProcess class - Formalizing data processing\n-------------------------------------------\n\nData processing / analysis typically involves a few basic operations:\n1. Reading data from file\n2. Parallel computation\n3. Writing results to disk\n\nThe Process class in pycroscopy aims to modularize these operations for faster development of standardized,\neasy-to-debug code. Common operations can be inherited from this class and only the operation-specific functions\nneed to be extended in your class.\nPlease see another example on how to write a Process class for Pycroscopy\n\n"
],
"cell_type": "markdown"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Delete the temporarily downloaded file**\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"h5_file.close()\nos.remove(h5_path)"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
}
],
"nbformat": 4,
"nbformat_minor": 0,
"nbformat": 4
"metadata": {
"language_info": {
"codemirror_mode": {
"version": 3,
"name": "ipython"
},
"nbconvert_exporter": "python",
"file_extension": ".py",
"version": "3.5.2",
"pygments_lexer": "ipython3",
"name": "python",
"mimetype": "text/x-python"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
}
}
\ No newline at end of file
......@@ -407,7 +407,7 @@ a simple for loop.
Out::
Serial computation took 32.51 seconds
Serial computation took 31.07 seconds
2. Parallel Computation
......@@ -452,7 +452,7 @@ pycroscopy. The below parallel computation is reduced to a single line with this
Out::
Parallel computation took 22.05 seconds
Parallel computation took 17.36 seconds
Compare the results
......@@ -531,7 +531,7 @@ Please see another example on how to write a Process class for Pycroscopy
**Total running time of the script:** ( 1 minutes 37.320 seconds)
**Total running time of the script:** ( 1 minutes 22.369 seconds)
......
{
"metadata": {
"language_info": {
"mimetype": "text/x-python",
"name": "python",
"version": "3.5.2",
"file_extension": ".py",
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"codemirror_mode": {
"name": "ipython",
"version": 3
}
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
},
"cells": [
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"%matplotlib inline"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n=================================================================\nTutorial 5: Formalizing Data Processing\n=================================================================\n\n**Suhas Somnath**\n\n9/8/2017\n\n\nThis set of tutorials will serve as examples for developing end-to-end workflows for and using pycroscopy.\n\n**In this example, we will learn how to write a simple yet formal pycroscopy class for processing data.**\n\nIntroduction\n============\n\nData processing / analysis typically involves a few basic tasks:\n1. Reading data from file\n2. Computation\n3. Writing results to disk\n\nThis example is based on the parallel computing example where we fit a dataset containing spectra at each location to a\nfunction. While the previous example focused on comparing serial and parallel computing, we will focus on the framework\nthat needs to be built around a computation for robust data processing. As the example will show below, the framework\nessentially deals with careful file reading and writing.\n\nThe majority of the code for this example is based on the BESHOModel Class under pycroscopy.analysis\n\n"
],
"cell_type": "markdown"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Import necessary packages\n=========================\n\nEnsure python 3 compatibility:\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"from __future__ import division, print_function, absolute_import, unicode_literals\n\n# The package for accessing files in directories, etc.:\nimport os\n\n# Warning package in case something goes wrong\nfrom warnings import warn\n\n# Package for downloading online files:\ntry:\n # This package is not part of anaconda and may need to be installed.\n import wget\nexcept ImportError:\n warn('wget not found. Will install with pip.')\n import pip\n pip.main(['install', 'wget'])\n import wget\n\n# The mathematical computation package:\nimport numpy as np\nfrom numpy import exp, abs, sqrt, sum, real, imag, arctan2, append\n\n# The package used for creating and manipulating HDF5 files:\nimport h5py\n\n# Packages for plotting:\nimport matplotlib.pyplot as plt\n\n# Finally import pycroscopy for certain scientific analysis:\ntry:\n import pycroscopy as px\nexcept ImportError:\n warn('pycroscopy not found. Will install with pip.')\n import pip\n pip.main(['install', 'pycroscopy'])\n import pycroscopy as px\n\n\nfield_names = ['Amplitude [V]', 'Frequency [Hz]', 'Quality Factor', 'Phase [rad]']\nsho32 = np.dtype({'names': field_names,\n 'formats': [np.float32 for name in field_names]})"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Build the class\n===============\n\nEvery process class consists of the same basic functions:\n1. __init__ - instantiates a 'Process' object of this class after validating the inputs\n2. _create_results_datasets - creates the HDF5 datasets and datagroups to store the results.\n3. _unit_function - this is the operation that will per be performed on each element in the dataset.\n4. compute - This function essentially applies the unit function to every single element in the dataset.\n5. _write_results_chunk - writes the computed results back to the file\n\nNote that:\n\n* Only the code specific to this process needs to be implemented. However, the generic portions common to most\n Processes will be handled by the Process class.\n* The other functions such as the sho_function, sho_fast_guess function are all specific to this process. These have\n been inherited directly from the BE SHO model.\n* While the class appears to be large, remember that the majority of it deals with the creation of the datasets to store\n the results and the actual function that one would have anyway regardless of serial / parallel computation of the\n function. The additional code to turn this operation into a Pycroscopy Process is actually rather minimal. As\n described earlier, the goal of the Process class is to modularize and compartmentalize the main sections of the code\n in order to facilitate faster and more robust implementation of data processing algorithms.\n\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"class ShoGuess(px.Process):\n\n def __init__(self, h5_main, cores=None):\n \"\"\"\n Validate the inputs and set some parameters\n\n Parameters\n ----------\n h5_main - dataset to compute on\n cores - Number of CPU cores to use for computation - Optional\n \"\"\"\n super(ShoGuess, self).__init__(h5_main, cores)\n\n # find the frequency vector\n h5_spec_vals = px.hdf_utils.getAuxData(h5_main, 'Spectroscopic_Values')[-1]\n self.freq_vec = np.squeeze(h5_spec_vals.value) * 1E-3\n\n def _create_results_datasets(self):\n \"\"\"\n Creates the datasets an datagroups necessary to store the results.\n Just as the raw data is stored in the pycroscopy format, the results also need to conform to the same\n standards. Hence, the create_datasets function can appear to be a little longer than one might expect.\n \"\"\"\n h5_spec_inds = px.hdf_utils.getAuxData(self.h5_main, auxDataName=['Spectroscopic_Indices'])[0]\n h5_spec_vals = px.hdf_utils.getAuxData(self.h5_main, auxDataName=['Spectroscopic_Values'])[0]\n\n self.step_start_inds = np.where(h5_spec_inds[0] == 0)[0]\n self.num_udvs_steps = len(self.step_start_inds)\n \n ds_guess = px.MicroDataset('Guess', data=[],\n maxshape=(self.h5_main.shape[0], self.num_udvs_steps),\n chunking=(1, self.num_udvs_steps), dtype=sho32)\n\n not_freq = px.hdf_utils.get_attr(h5_spec_inds, 'labels') != 'Frequency'\n\n ds_sho_inds, ds_sho_vals = px.hdf_utils.buildReducedSpec(h5_spec_inds, h5_spec_vals, not_freq,\n self.step_start_inds)\n\n dset_name = self.h5_main.name.split('/')[-1]\n sho_grp = px.MicroDataGroup('-'.join([dset_name, 'SHO_Fit_']), self.h5_main.parent.name[1:])\n sho_grp.addChildren([ds_guess, ds_sho_inds, ds_sho_vals])\n sho_grp.attrs['SHO_guess_method'] = \"pycroscopy BESHO\"\n\n h5_sho_grp_refs = self.hdf.writeData(sho_grp)\n\n self.h5_guess = px.hdf_utils.getH5DsetRefs(['Guess'], h5_sho_grp_refs)[0]\n self.h5_results_grp = self.h5_guess.parent\n h5_sho_inds = px.hdf_utils.getH5DsetRefs(['Spectroscopic_Indices'],\n h5_sho_grp_refs)[0]\n h5_sho_vals = px.hdf_utils.getH5DsetRefs(['Spectroscopic_Values'],\n h5_sho_grp_refs)[0]\n\n # Reference linking before actual fitting\n px.hdf_utils.linkRefs(self.h5_guess, [h5_sho_inds, h5_sho_vals])\n # Linking ancillary position datasets:\n aux_dsets = px.hdf_utils.getAuxData(self.h5_main, auxDataName=['Position_Indices', 'Position_Values'])\n px.hdf_utils.linkRefs(self.h5_guess, aux_dsets)\n print('Finshed creating datasets')\n\n def compute(self, *args, **kwargs):\n \"\"\"\n Apply the unit_function to the entire dataset. Here, we simply extend the existing compute function and only\n pass the parameters for the unit function. In this case, the only parameter is the frequency vector.\n\n Parameters\n ----------\n args\n kwargs\n\n Returns\n -------\n\n \"\"\"\n return super(ShoGuess, self).compute(w_vec=self.freq_vec)\n\n def _write_results_chunk(self):\n \"\"\"\n Write the computed results back to the H5 file\n \"\"\"\n # converting from a list to a 2D numpy array\n self._results = np.array(self._results, dtype=np.float32)\n self.h5_guess[:, 0] = px.io_utils.realToCompound(self._results, sho32)\n\n # Now update the start position\n self._start_pos = self._end_pos\n # this should stop the computation.\n\n @staticmethod\n def _unit_function():\n\n return px.be_sho.SHOestimateGuess"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load the dataset\n================\n\nFor this example, we will be working with a Band Excitation Piezoresponse Force Microscopy (BE-PFM) imaging dataset\nacquired from advanced atomic force microscopes. In this dataset, a spectra was collected for each position in a two\ndimensional grid of spatial locations. Thus, this is a three dimensional dataset that has been flattened to a two\ndimensional matrix in accordance with the pycroscopy data format.\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"# download the raw data file from Github:\nh5_path = 'temp.h5'\nurl = 'https://raw.githubusercontent.com/pycroscopy/pycroscopy/master/data/BELine_0004.h5'\nif os.path.exists(h5_path):\n os.remove(h5_path)\n_ = wget.download(url, h5_path, bar=None)"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"# Open the file in read-only mode\nh5_file = h5py.File(h5_path, mode='r+')\n\n# Get handles to the the raw data along with other datasets and datagroups that contain necessary parameters\nh5_meas_grp = h5_file['Measurement_000']\nnum_rows = px.hdf_utils.get_attr(h5_meas_grp, 'grid_num_rows')\nnum_cols = px.hdf_utils.get_attr(h5_meas_grp, 'grid_num_cols')\n\n# Getting a reference to the main dataset:\nh5_main = h5_meas_grp['Channel_000/Raw_Data']\n\n# Extracting the X axis - vector of frequencies\nh5_spec_vals = px.hdf_utils.getAuxData(h5_main, 'Spectroscopic_Values')[-1]\nfreq_vec = np.squeeze(h5_spec_vals.value) * 1E-3"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use the ShoGuess class, defined earlier, to calculate the four\nparameters of the complex gaussian.\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"fitter = ShoGuess(h5_main, cores=1)\nh5_results_grp = fitter.compute()\nh5_guess = h5_results_grp['Guess']\n\nrow_ind, col_ind = 103, 19\npix_ind = col_ind + row_ind * num_cols\nresp_vec = h5_main[pix_ind]\nnorm_guess_parms = h5_guess[pix_ind]\n\n# Converting from compound to real:\nnorm_guess_parms = px.io_utils.compound_to_scalar(norm_guess_parms)\nprint('Functional fit returned:', norm_guess_parms)\nnorm_resp = px.be_sho.SHOfunc(norm_guess_parms, freq_vec)"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Plot the Amplitude and Phase of the gaussian versus the raw data.\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"fig, axes = plt.subplots(nrows=2, sharex=True, figsize=(5, 10))\nfor axis, func, title in zip(axes.flat, [np.abs, np.angle], ['Amplitude (a.u.)', 'Phase (rad)']):\n axis.scatter(freq_vec, func(resp_vec), c='red', label='Measured')\n axis.plot(freq_vec, func(norm_resp), 'black', lw=3, label='Guess')\n axis.set_title(title, fontsize=16)\n axis.legend(fontsize=14)\n\naxes[1].set_xlabel('Frequency (kHz)', fontsize=14)\naxes[0].set_ylim([0, np.max(np.abs(resp_vec)) * 1.1])\naxes[1].set_ylim([-np.pi, np.pi])"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Delete the temporarily downloaded file**\n\n"
],
"cell_type": "markdown"
]
},
{
"execution_count": null,
"cell_type": "code",
"outputs": [],
"metadata": {
"collapsed": false
},
"source": [
"h5_file.close()\nos.remove(h5_path)"
],
"outputs": [],
"execution_count": null,
"cell_type": "code"
]
}
],
"nbformat": 4,
"nbformat_minor": 0,
"nbformat": 4
"metadata": {
"language_info": {
"codemirror_mode": {
"version": 3,
"name": "ipython"
},
"nbconvert_exporter": "python",
"file_extension": ".py",
"version": "3.5.2",
"pygments_lexer": "ipython3",
"name": "python",
"mimetype": "text/x-python"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
}
}
\ No newline at end of file
......@@ -345,7 +345,7 @@ Plot the Amplitude and Phase of the gaussian versus the raw data.
**Total running time of the script:** ( 0 minutes 49.348 seconds)
**Total running time of the script:** ( 0 minutes 57.835 seconds)
......
......@@ -48,9 +48,9 @@ Pycroscopy Examples
<div style='clear:both'></div>
==============================
Pycroscopy Developer Tutorials
==============================
===================
Developer Tutorials
===================
.. raw:: html
......@@ -164,9 +164,9 @@ Pycroscopy Publications
<div style='clear:both'></div>
=========================
Pycroscopy User Tutorials
=========================
==============
User Tutorials
==============