Commit f6678f32 authored by syz's avatar syz
Browse files

Replaced with a new tutorial

parent 5d869e9b
%% Cell type:markdown id: tags:
# Loading, reshaping, visualizing data using pycroscopy
### Suhas Somnath, Chris R. Smith and Stephen Jesse
The Center for Nanophase Materials Science and The Institute for Functional Imaging for Materials <br>
Oak Ridge National Laboratory<br>
8/01/2017
Here, we will demonstrate how to load, reshape, and visualize multidimensional imaging datasets. For this example, we will load a three dimensional Band Excitation imaging dataset acquired from an atomic force microscope.
%% Cell type:code id: tags:
``` python
# Make sure pycroscopy and wget are installed
!pip install pycroscopy
!pip install -U wget
```
%% Cell type:code id: tags:
``` python
# Ensure python 3 compatibility
from __future__ import division, print_function, absolute_import
# Import necessary libraries:
import h5py
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from IPython.display import display
from os import remove
import pycroscopy as px
# set up notebook to show plots within the notebook
% matplotlib inline
```
%% Cell type:markdown id: tags:
## Load pycroscopy compatible file
For simplicity we will use a dataset that has already been transalated form its original data format into a pycroscopy compatible hierarchical data format (HDF5 or H5) file
#### HDF5 or H5 files:
* are like smart containers that can store matrices with data, folders to organize these datasets, images, metadata like experimental parameters, links or shortcuts to datasets, etc.
* are readily compatible with high-performance computing facilities
* scale very efficiently from few kilobytes to several terabytes
* can be read and modified using any language including Python, Matlab, C/C++, Java, Fortran, Igor Pro, etc.
Python uses the h5py libaray to read, write, and access HDF5 files
%% Cell type:code id: tags:
``` python
# Downloading the example file from the pycroscopy Github project
url = 'https://raw.githubusercontent.com/pycroscopy/pycroscopy/master/data/BELine_0004.h5'
h5_path = 'temp.h5'
_ = wget.download(url, h5_path)
print('Working on:\n' + h5_path)
# Open the file in read-only mode
h5_file = h5py.File(h5_path, mode='r')
# Here, h5_file is an active handle to the open file
```
%% Cell type:markdown id: tags:
## Inspect the contents of this h5 data file
The file contents are stored in a tree structure, just like files on a contemporary computer. The file contains datagroups (similar to file folders) and datasets (similar to spreadsheets).
There are several datasets in the file and these store:
* the actual measurement collected from the experiment,
* spatial location on the sample where each measurement was collected,
* information to support and explain the spectral data collected at each location
* Since pycroscopy stores results from processing and analyses performed on the data in the same file, these datasets and datagroups are present as well
* any other relevant ancillary information
%% Cell type:code id: tags:
``` python
print('Datasets and datagroups within the file:\n------------------------------------')
px.hdf_utils.print_tree(h5_file)
```
%% Cell type:markdown id: tags:
#### Accessing datasests and datagroups
Datasets and datagroups can be accessed by specifying the path, just like a webpage or a file in a directory
%% Cell type:code id: tags:
``` python
print('Datagroup corresponding to a channel of information:')
print(h5_file['/Measurement_000/Channel_000/'])
print('\nDataset containing the raw data collected from the microscope:')
print(h5_file['/Measurement_000/Channel_000/Raw_Data'])
```
%% Cell type:markdown id: tags:
The output above shows that the "Raw_Data" dataset is a two dimensional dataset, and has complex value (a +bi) entries at each element in the 2D matrix.
This dataset is contained in a datagroup called "Channel_000" which itself is contained in a datagroup called "Measurement_000"
The datagroup "Channel_000" contains several "members", where these members could be datasets like "Raw_Data" or datagroups like "Channel_000"
### Attributes
HDF5 datasets and datagroups can also store metadata such as experimental parameters. These metadata can be text, numbers, small lists of numbers or text etc. These metadata can be very important for understanding the datasets and guide the analysis routines
%% Cell type:code id: tags:
``` python
print('\nMetadata or attributes in a datagroup\n------------------------------------')
for key in h5_file['/Measurement_000'].attrs:
print('{} : {}'.format(key, px.hdf_utils.get_attr(h5_file['/Measurement_000'], key)))
```
%% Cell type:markdown id: tags:
In the case of the spectral dataset under investigation, a spectra with a single peak was collected at each spatial location on a two dimensional grid of points. Thus, this dataset has two position dimensions and one spectroscopic dimension (spectra).
In pycroscopy, all spatial dimensions are collapsed to a single dimension and similarly, all spectroscopic dimensions are also collapsed to a single dimension. Thus, the data is stored as a two-dimensional (N x P) matrix with N spatial locations each with P spectroscopic datapoints.
This general and intuitive format allows imaging data from any instrument, measurement scheme, size, or dimensionality to be represented in the same way.
Such an instrument independent data format enables a single set of anaysis and processing functions to be reused for multiple image formats or modalities.
%% Cell type:code id: tags:
``` python
h5_chan_grp = h5_file['/Measurement_000/']
h5_main = h5_chan_grp['Channel_000/Raw_Data']
print('\nThe main dataset:\n------------------------------------')
print(h5_main)
print('Original three dimensional matrix had {} rows and {} columns \
each having {} spectral points'.format(h5_chan_grp.attrs['grid_num_rows'],
h5_chan_grp.attrs['grid_num_cols'],
h5_chan_grp.attrs['num_bins']))
print('Collapsing the position dimensions: ({}x{}, {}) -> ({}, {})'.format(
h5_chan_grp.attrs['grid_num_rows'],
h5_chan_grp.attrs['grid_num_cols'],
h5_chan_grp.attrs['num_bins'],
h5_chan_grp.attrs['grid_num_rows'] * h5_chan_grp.attrs['grid_num_cols'],
h5_chan_grp.attrs['num_bins']))
```
%% Cell type:markdown id: tags:
Each main dataset is always accompanied by four ancillary datasets that explain:
* the position and spectroscopic value of any given element in the dataset
* the original dimensionality of the dataset
* how to reshape the data back to its N dimensional form
In the case of the 3d dataset under investigation, the positions will be arranged as row0-col0, row0-col1.... row0-colN, row1-col0....
The spectroscopic information remains unchanged.
%% Cell type:code id: tags:
``` python
print('\nThe ancillary datasets:\n------------------------------------')
print(h5_file['/Measurement_000/Channel_000/Position_Indices'])
print(h5_file['/Measurement_000/Channel_000/Position_Values'])
print(h5_file['/Measurement_000/Channel_000/Spectroscopic_Indices'])
print(h5_file['/Measurement_000/Channel_000/Spectroscopic_Values'])
print('\nSpatial dimensions:', px.hdf_utils.get_attr(
h5_file['/Measurement_000/Channel_000/Position_Values'], 'labels'))
print('Spectroscopic dimensions:', px.hdf_utils.get_attr(
h5_file['/Measurement_000/Channel_000/Spectroscopic_Values'], 'labels'))
```
%% Cell type:markdown id: tags:
## Visualizing the position dimensions
%% Cell type:code id: tags:
``` python
fig, axis = plt.subplots(figsize=(6,6))
axis.plot(h5_file['/Measurement_000/Channel_000/Position_Indices'][:, 0],
'orange', label='column')
axis.plot(h5_file['/Measurement_000/Channel_000/Position_Indices'][:, 1],
'black', label='row')
axis.legend()
axis.set_title('Position Indices');
```
%% Cell type:markdown id: tags:
## Inspecting the measurement at a single spatial pixel:
%% Cell type:code id: tags:
``` python
# specify a pixel index of interest
pixel_ind = 128
# ensuring that this index is within the bounds of the dataset
pixel_ind = max(0, min(int(pixel_ind), h5_main.shape[0]))
# Extracting the frequency vector (x-axis) to plot the spectra against
freq_vec = h5_file['/Measurement_000/Channel_000/Bin_Frequencies'][()] * 1E-3
fig, axis = plt.subplots(figsize=(6,6))
axis.plot(freq_vec, np.abs(h5_main[pixel_ind]))
axis.set_xlabel('Frequency (kHz)')
axis.set_ylabel('Amplitude (a. u.)')
axis.set_title('Spectra at position {}'.format(pixel_ind));
```
%% Cell type:markdown id: tags:
## Inspecting the spatial distribution of the amplitude at a single frequency
If the frequency is fixed, the spatial distribution would result in a 2D spatial map.
Note that the spatial dimensions are collapsed to a single dimension in all pycroscopy datasets. Thus, the 1D vector at the specified frequency needs to be reshaped back to a 2D map
%% Cell type:code id: tags:
``` python
# specify a pixel index of interest
freq_ind = 40
# ensuring that this index is within the bounds of the dataset
freq_ind = max(0, min(int(freq_ind), h5_main.shape[1]))
# extracting the position data (1D) at the spcified frequency index
data_vec = np.abs(h5_main[:, freq_ind])
# Constructing the 2D spatial map from the 1D vector:
spat_map = np.reshape(data_vec, (h5_chan_grp.attrs['grid_num_rows'],
h5_chan_grp.attrs['grid_num_cols']))
fig, axis = plt.subplots(figsize=(6,6))
axis.imshow(spat_map, cmap='inferno')
axis.set_xlabel('Columns')
axis.set_ylabel('Rows')
freq_vec = h5_file['/Measurement_000/Channel_000/Bin_Frequencies'][()] * 1E-3
axis.set_title('Amplitude at frequency {} kHz '.format(np.round(freq_vec[freq_ind], 2)));
```
%% Cell type:markdown id: tags:
## Reshaping data back to N dimensions
There are several utility functions in pycroscopy that make it easy to access and reshape datasets. Here we show you how to return your dat to the N dimensional form in one easy step.
While this data is a simple example and can be reshaped manually, such reshape operations become especially useful for 5,6,7 or larger dimensional datasets.
%% Cell type:code id: tags:
``` python
ndim_data, success = px.hdf_utils.reshape_to_Ndims(h5_main)
if not success:
print('There was a problem automatically reshaping the dataset. \
Attempting to reshape manually')
ndim_data = np.reshape(h5_main[()], (h5_chan_grp.attrs['grid_num_rows'],
h5_chan_grp.attrs['grid_num_cols'],
h5_chan_grp.attrs['num_bins']))
else:
print('Collapsed dataset originally of shape: ', h5_main.shape)
print('Reshaped dataset of shape: ', ndim_data.shape)
```
%% Cell type:markdown id: tags:
## The same data investigation can be performed on the N dimensional dataset:
Here we will plot the spatial maps of the sample at a given frequency again. The reshape operation is no longer necessary and we get the same spatial map again.
%% Cell type:code id: tags:
``` python
# specify a pixel index of interest
freq_ind = 40
# ensuring that this index is within the bounds of the dataset
freq_ind = max(0, min(int(freq_ind), h5_main.shape[1]))
# Constructing the 2D spatial map from the 3D dataset
spat_map = np.abs(ndim_data[:, :, freq_ind])
fig, axis = plt.subplots(figsize=(6,6))
axis.imshow(spat_map, cmap='inferno')
axis.set_xlabel('Columns')
axis.set_ylabel('Rows')
freq_vec = h5_file['/Measurement_000/Channel_000/Bin_Frequencies'][()] * 1E-3
axis.set_title('Amplitude at frequency {} kHz '.format(np.round(freq_vec[freq_ind], 2)));
```
%% Cell type:markdown id: tags:
## Closing the HDF5 file after data processing or visualization
%% Cell type:code id: tags:
``` python
h5_file.close()
```
%% Cell type:code id: tags:
``` python
# Removing the temporary data file:
remove(h5_path)
```
%% Cell type:code id: tags:
``` python
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment