"# Loading, reshaping, visualizing data using pycroscopy\n",
"### Suhas Somnath, Chris R. Smith and Stephen Jesse\n",
"The Center for Nanophase Materials Science and The Institute for Functional Imaging for Materials <br>\n",
"Oak Ridge National Laboratory<br>\n",
"8/01/2017\n",
"\n",
"Here, we will demonstrate how to load, reshape, and visualize multidimensional imaging datasets. For this example, we will load a three dimensional Band Excitation imaging dataset acquired from an atomic force microscope. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Make sure pycroscopy and wget are installed\n",
"# set up notebook to show plots within the notebook\n",
"% matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load pycroscopy compatible file\n",
"\n",
"For simplicity we will use a dataset that has already been transalated form its original data format into a pycroscopy compatible hierarchical data format (HDF5 or H5) file\n",
"\n",
"#### HDF5 or H5 files:\n",
"* are like smart containers that can store matrices with data, folders to organize these datasets, images, metadata like experimental parameters, links or shortcuts to datasets, etc.\n",
"* are readily compatible with high-performance computing facilities\n",
"* scale very efficiently from few kilobytes to several terabytes\n",
"* can be read and modified using any language including Python, Matlab, C/C++, Java, Fortran, Igor Pro, etc.\n",
"\n",
"Python uses the h5py libaray to read, write, and access HDF5 files"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Downloading the example file from the pycroscopy Github project\n",
"# Here, h5_file is an active handle to the open file"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Inspect the contents of this h5 data file\n",
"The file contents are stored in a tree structure, just like files on a contemporary computer. The file contains datagroups (similar to file folders) and datasets (similar to spreadsheets). \n",
"\n",
"There are several datasets in the file and these store:\n",
"* the actual measurement collected from the experiment, \n",
"* spatial location on the sample where each measurement was collected,\n",
"* information to support and explain the spectral data collected at each location\n",
"* Since pycroscopy stores results from processing and analyses performed on the data in the same file, these datasets and datagroups are present as well\n",
"* any other relevant ancillary information "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('Datasets and datagroups within the file:\\n------------------------------------')\n",
"px.hdf_utils.print_tree(h5_file)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Accessing datasests and datagroups\n",
"\n",
"Datasets and datagroups can be accessed by specifying the path, just like a webpage or a file in a directory"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('Datagroup corresponding to a channel of information:')\n",
"The output above shows that the \"Raw_Data\" dataset is a two dimensional dataset, and has complex value (a +bi) entries at each element in the 2D matrix.\n",
"\n",
"This dataset is contained in a datagroup called \"Channel_000\" which itself is contained in a datagroup called \"Measurement_000\"\n",
"\n",
"The datagroup \"Channel_000\" contains several \"members\", where these members could be datasets like \"Raw_Data\" or datagroups like \"Channel_000\"\n",
"\n",
"### Attributes\n",
"HDF5 datasets and datagroups can also store metadata such as experimental parameters. These metadata can be text, numbers, small lists of numbers or text etc. These metadata can be very important for understanding the datasets and guide the analysis routines"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('\\nMetadata or attributes in a datagroup\\n------------------------------------')\n",
"for key in h5_file['/Measurement_000'].attrs:\n",
"In the case of the spectral dataset under investigation, a spectra with a single peak was collected at each spatial location on a two dimensional grid of points. Thus, this dataset has two position dimensions and one spectroscopic dimension (spectra). \n",
"\n",
"In pycroscopy, all spatial dimensions are collapsed to a single dimension and similarly, all spectroscopic dimensions are also collapsed to a single dimension. Thus, the data is stored as a two-dimensional (N x P) matrix with N spatial locations each with P spectroscopic datapoints.\n",
"\n",
"This general and intuitive format allows imaging data from any instrument, measurement scheme, size, or dimensionality to be represented in the same way.\n",
"\n",
"Such an instrument independent data format enables a single set of anaysis and processing functions to be reused for multiple image formats or modalities. "
"axis.set_title('Spectra at position {}'.format(pixel_ind));"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Inspecting the spatial distribution of the amplitude at a single frequency\n",
"\n",
"If the frequency is fixed, the spatial distribution would result in a 2D spatial map.\n",
"\n",
"Note that the spatial dimensions are collapsed to a single dimension in all pycroscopy datasets. Thus, the 1D vector at the specified frequency needs to be reshaped back to a 2D map "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# specify a pixel index of interest\n",
"freq_ind = 40\n",
"\n",
"# ensuring that this index is within the bounds of the dataset\n",
"axis.set_title('Amplitude at frequency {} kHz '.format(np.round(freq_vec[freq_ind], 2)));"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reshaping data back to N dimensions\n",
"\n",
"There are several utility functions in pycroscopy that make it easy to access and reshape datasets. Here we show you how to return your dat to the N dimensional form in one easy step.\n",
"\n",
"While this data is a simple example and can be reshaped manually, such reshape operations become especially useful for 5,6,7 or larger dimensional datasets. "
" print('Collapsed dataset originally of shape: ', h5_main.shape)\n",
" print('Reshaped dataset of shape: ', ndim_data.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The same data investigation can be performed on the N dimensional dataset:\n",
"\n",
"Here we will plot the spatial maps of the sample at a given frequency again. The reshape operation is no longer necessary and we get the same spatial map again."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# specify a pixel index of interest\n",
"freq_ind = 40\n",
"\n",
"# ensuring that this index is within the bounds of the dataset\n",
"axis.set_title('Amplitude at frequency {} kHz '.format(np.round(freq_vec[freq_ind], 2)));"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Closing the HDF5 file after data processing or visualization"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"h5_file.close()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Removing the temporary data file:\n",
"remove(h5_path)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
%% Cell type:markdown id: tags:
# Loading, reshaping, visualizing data using pycroscopy
### Suhas Somnath, Chris R. Smith and Stephen Jesse
The Center for Nanophase Materials Science and The Institute for Functional Imaging for Materials <br>
Oak Ridge National Laboratory<br>
8/01/2017
Here, we will demonstrate how to load, reshape, and visualize multidimensional imaging datasets. For this example, we will load a three dimensional Band Excitation imaging dataset acquired from an atomic force microscope.
# set up notebook to show plots within the notebook
%matplotlibinline
```
%% Cell type:markdown id: tags:
## Load pycroscopy compatible file
For simplicity we will use a dataset that has already been transalated form its original data format into a pycroscopy compatible hierarchical data format (HDF5 or H5) file
#### HDF5 or H5 files:
* are like smart containers that can store matrices with data, folders to organize these datasets, images, metadata like experimental parameters, links or shortcuts to datasets, etc.
* are readily compatible with high-performance computing facilities
* scale very efficiently from few kilobytes to several terabytes
* can be read and modified using any language including Python, Matlab, C/C++, Java, Fortran, Igor Pro, etc.
Python uses the h5py libaray to read, write, and access HDF5 files
%% Cell type:code id: tags:
``` python
# Downloading the example file from the pycroscopy Github project
# Here, h5_file is an active handle to the open file
```
%% Cell type:markdown id: tags:
## Inspect the contents of this h5 data file
The file contents are stored in a tree structure, just like files on a contemporary computer. The file contains datagroups (similar to file folders) and datasets (similar to spreadsheets).
There are several datasets in the file and these store:
* the actual measurement collected from the experiment,
* spatial location on the sample where each measurement was collected,
* information to support and explain the spectral data collected at each location
* Since pycroscopy stores results from processing and analyses performed on the data in the same file, these datasets and datagroups are present as well
* any other relevant ancillary information
%% Cell type:code id: tags:
``` python
print('Datasets and datagroups within the file:\n------------------------------------')
px.hdf_utils.print_tree(h5_file)
```
%% Cell type:markdown id: tags:
#### Accessing datasests and datagroups
Datasets and datagroups can be accessed by specifying the path, just like a webpage or a file in a directory
%% Cell type:code id: tags:
``` python
print('Datagroup corresponding to a channel of information:')
print(h5_file['/Measurement_000/Channel_000/'])
print('\nDataset containing the raw data collected from the microscope:')
The output above shows that the "Raw_Data" dataset is a two dimensional dataset, and has complex value (a +bi) entries at each element in the 2D matrix.
This dataset is contained in a datagroup called "Channel_000" which itself is contained in a datagroup called "Measurement_000"
The datagroup "Channel_000" contains several "members", where these members could be datasets like "Raw_Data" or datagroups like "Channel_000"
### Attributes
HDF5 datasets and datagroups can also store metadata such as experimental parameters. These metadata can be text, numbers, small lists of numbers or text etc. These metadata can be very important for understanding the datasets and guide the analysis routines
%% Cell type:code id: tags:
``` python
print('\nMetadata or attributes in a datagroup\n------------------------------------')
In the case of the spectral dataset under investigation, a spectra with a single peak was collected at each spatial location on a two dimensional grid of points. Thus, this dataset has two position dimensions and one spectroscopic dimension (spectra).
In pycroscopy, all spatial dimensions are collapsed to a single dimension and similarly, all spectroscopic dimensions are also collapsed to a single dimension. Thus, the data is stored as a two-dimensional (N x P) matrix with N spatial locations each with P spectroscopic datapoints.
This general and intuitive format allows imaging data from any instrument, measurement scheme, size, or dimensionality to be represented in the same way.
Such an instrument independent data format enables a single set of anaysis and processing functions to be reused for multiple image formats or modalities.
%% Cell type:code id: tags:
``` python
h5_chan_grp=h5_file['/Measurement_000/']
h5_main=h5_chan_grp['Channel_000/Raw_Data']
print('\nThe main dataset:\n------------------------------------')
print(h5_main)
print('Original three dimensional matrix had {} rows and {} columns \
each having {} spectral points'.format(h5_chan_grp.attrs['grid_num_rows'],
h5_chan_grp.attrs['grid_num_cols'],
h5_chan_grp.attrs['num_bins']))
print('Collapsing the position dimensions: ({}x{}, {}) -> ({}, {})'.format(
axis.set_title('Spectra at position {}'.format(pixel_ind));
```
%% Cell type:markdown id: tags:
## Inspecting the spatial distribution of the amplitude at a single frequency
If the frequency is fixed, the spatial distribution would result in a 2D spatial map.
Note that the spatial dimensions are collapsed to a single dimension in all pycroscopy datasets. Thus, the 1D vector at the specified frequency needs to be reshaped back to a 2D map