Commit 6b80e5f3 authored by syz's avatar syz
Browse files

Merge branch 'cades_dev' of https://github.com/pycroscopy/pycroscopy into cades_dev_local

parents 4aa6b15d ae28137b
......@@ -2,9 +2,9 @@
Roadmap / Milestones
--------------------
1. Mid Sep - better BE notebook + visualization for users
2. Mid of Oct - Cleaned versions of the main modules (Analysis pending) + enough documentation for users and developers
3. End of Oct - Multi-node compute capability
1. better BE notebook + visualization for users
2. Cleaned versions of the main modules (Analysis pending) + enough documentation for users and developers
3. Multi-node compute capability
New features
------------
......@@ -21,9 +21,7 @@ Core development
* Right hand side for spectral
* 1D spectra or 2D images.
* Users will be asked to slice N-1 or N-2 spectral dimensions
* (re)Write at least one existing processing function as a class.
* Josh can redo fft filtering
* Rama can redo gIV bayesian inference
* Rewrite fft filtering as a Process class.
* Simplify and demystify analyis / optimize. Use parallel_compute instead of optimize and gues_methods and fit_methods
* multi-node computing capability in parallel_compute
* Data Generators
......@@ -38,11 +36,105 @@ External user contributions
* Port everything from IFIM Matlab -> Python translation exercises
* Other workflows/functions that already exist as scripts or notebooks
Plotting updates
----------------
GUI
----
* Switch to using plot.ly and dash for interactive elements
* Possibly use MayaVi for 3d plotting
Plot Utils
----------
* Define default font sizes that will be adopted in all functions wherever applicable using rcParams. See - https://matplotlib.org/users/customizing.html
* label_fontsize=16
* tick_fontsize=14
* title_fontsize=18
* Override wherever necessary.
* _add_loop_parameters - is BE specific and should be moved out of plot_utils
* rainbow_plot -
1. pop cmap from kwargs instead of specifying camp as a separate argument.
2. Rename parameters from ax to axis, ao_vec to x_values, ai_vec to y_values.
3. Use same methodology from single_img_cbar_plot to add color bar. You will need to expect the figure handle as well for this.
* plot_line_family -
1. Rename x_axis parameter to something more sensible like x_values
2. Remove c map as one of the arguments. It should come from kwargs
3. Optional color bar (don’t show legend in this case)
* plot_map -combine this with single_img_cbar_plot
* single_img_cbar_plot - It is OK to spend a lot of time on single_img_cbar_plot and plot_map since these will be used HEAVILY for papers.
1. Combine with plot_map
2. allow the tick labels to be specified instead of just the x_size and y_size.
3. Rename this function to something more sensible
4. Color bar should be shown by default
* plot_loops
1. Allow excitation_waveform to also be a list - this will allow different x resolutions for each line family.
2. Apply appropriate x, y, label font sizes etc. This should look very polished and ready for publications
3. Enable use of kwargs - to specify line widths etc.
4. Ensure that the title is not crammed somewhere behind the subtitles
* Plot_complex_map_stack
1. allow kwargs.
2. Use plot_map
3. Respect font sizes for x, y labels, titles - use new kwargs wherever necessary
4. Remove map as a kwarg
5. Show color bars
6. Possibly allow horizontal / vertical configurations? (Optional)
* plot_complex_loop_stack
1. Respect font sizes for x, y labels, titles - use new kwargs wherever necessary
2. Allow individual plots sizes to be specified
3. Allow **kwargs and pass two plot functions
* plotScree
1. rename to plot_scree
2. Use **kwargs on the plot function
* plot_map_stack:
1. Do something about the super title getting hidden behind the subtitles
2. Respect tick, x label, y label, title, etc font sizes
3. Add ability to manually specify x and y tick labels - see plot_cluster_results_together for inspiration
4. See all other changes that were made for the image cleaning paper
* plot_cluster_results_together
1. Use plot_map and its cleaner color bar option
2. Respect font sizes
3. Option to use a color bar for the centroids instead of a legend - especially if number of clusters > 7
4. See mode IV paper to see other changes
* plot_cluster_results_separate
1. Use same guidelines as above
* plot_cluster_dendrogram - this function has not worked recently to my knowledge. Fortunately, it is not one of the more popular functions so it gets low priority for now. Use inspiration from image cleaning paper
* plot_1d_spectrum
1. Respect font sizes
2. Do not save figure here. This should be done in the place where this function is called
3. Use **kwargs and pass to the plot functions
4. Title should be optional
* plot_2d_spectrogram
1. Respect font sizes
2. Use plot_map - show color bar
3. Don’t allow specification of figure_path here. Save elsewhere
* plot_histograms - not used frequently. Can be ignored for this pass
Examples / Tutorials
--------------------
Short tutorials on how to use pycroscopy
......@@ -57,7 +149,6 @@ Short tutorials on how to use pycroscopy
Longer examples (via specific scientific usecases)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* How to write your write your own parallel computing function using the process module - add more documentation
* How to use the PycroDataset object
* A tour of the many functions in hdf_utils and io_utils since these functions need data to show / explain them.
* How to write your own analysis class based on the (to-be simplified) Model class
* pycroscopy pacakge organization - a short writeup on what is where and differences between the process / analyis submodules
......
......@@ -218,19 +218,84 @@ print('Positions:', pos_dim_sizes, '\nSpectroscopic:', spec_dim_sizes)
# Let's assume that we are interested in visualizing the spectrograms at the first field of the second cycle at
# position - row:3 and column 2. There are two ways of accessing the data:
#
# 1. The hard method - find the spectroscopic and position indices of interest and slice the 2D dataset
# 1. The easiest method - use the PycroDataset class to slice the data
#
# * This method will only work for ``main`` datasets. We recommend using method 2 for slicing all others.
#
# 2. The easier method - reshape the data to N dimensions and slice the dataset
#
# * This approach, while easy, may not be suitable for large datasets which may or may not fit in memory
#
# 3. The easiest method - use the PycroDataset class to slice the data
# 3. The hard method - find the spectroscopic and position indices of interest and slice the 2D dataset
#
# * This method will only work for ``main`` datasets. We recommend using method 2 for slicing all others.
#########################################################################
# Approach 1 - Using the PycroDataset
# -----------------------------------
# We will use the new PycroDataset class to create an N dimensional slice directly from the two dimensional
# data in the file.
#
# First we convert from an HDF5 Dataset to a PycroDataset
pd_main = px.PycroDataset(h5_main)
print(pd_main.shape)
#########################################################################
# As you can see, the data is still two dimensional. The PycroDataset has several attributes that will help with
# the slicing.
#
# Let's check the names and sizes of each dimension
print(pd_main.n_dim_labels)
print(pd_main.n_dim_sizes)
#########################################################################
# With this information, we can now get our data slice.
#
slice_dict = dict(X=[2], Y=[3], Field=[0], Cycle=[1])
nd_spec, success = pd_main.slice(slice_dict=slice_dict)
print(success)
print(nd_spec.shape)
#########################################################################
# The slice is returned already in the N dimensional form. We just need to remove all the
# dimensions with length one, transpose it like in method 2, and plot.
#
spectrogram3 = nd_spec.squeeze().T
# Now the spectrogram is of order (DC_Offset x frequency)
fig, axis = plt. subplots()
axis.imshow(np.abs(spectrogram3), origin='lower')
axis.set_xlabel('Frequency Index')
axis.set_ylabel('DC Offset Index')
axis.set_title('Spectrogram Amplitude');
#########################################################################
# Approach 2 - N-dimensional form
# -------------------------------
# We will use convenient pycroscopy function that safely reshapes the data to its N dimensional form with a single
# line. Note that while this approach appears simple on the surface, there are a fair number of lines of code that
# make up this function.
ds_nd, success, labels = px.hdf_utils.reshape_to_Ndims(h5_main, get_labels=True)
print('Shape of the N-dimensional dataset:', ds_nd.shape)
print(labels)
#########################################################################
# Now that we have the data in its original N dimensional form, we can easily slice the dataset:
spectrogram2 = ds_nd[2, 3, :, :, 0, 1]
# Now the spectrogram is of order (frequency x DC_Offset).
spectrogram2 = spectrogram2.T
# Now the spectrogram is of order (DC_Offset x frequency)
fig, axis = plt. subplots()
axis.imshow(np.abs(spectrogram2), origin='lower')
axis.set_xlabel('Frequency Index')
axis.set_ylabel('DC Offset Index')
axis.set_title('Spectrogram Amplitude');
#########################################################################
# Approach 1 - slicing the 2D matrix
# Approach 3 - slicing the 2D matrix
# ----------------------------------
#
# This approach is hands-on and requires that we be very careful with the indexing and slicing. Nonetheless,
......@@ -335,71 +400,6 @@ axis.set_xlabel('Frequency Index')
axis.set_ylabel('DC Offset Index')
axis.set_title('Spectrogram Amplitude')
#########################################################################
# Approach 2 - N-dimensional form
# -------------------------------
# We will use convenient pycroscopy function that safely reshapes the data to its N dimensional form with a single
# line. Note that while this approach appears simple on the surface, there are a fair number of lines of code that
# make up this function.
ds_nd, success, labels = px.hdf_utils.reshape_to_Ndims(h5_main, get_labels=True)
print('Shape of the N-dimensional dataset:', ds_nd.shape)
print(labels)
#########################################################################
# Now that we have the data in its original N dimensional form, we can easily slice the dataset:
spectrogram2 = ds_nd[2, 3, :, :, 0, 1]
# Now the spectrogram is of order (frequency x DC_Offset).
spectrogram2 = spectrogram2.T
# Now the spectrogram is of order (DC_Offset x frequency)
fig, axis = plt. subplots()
axis.imshow(np.abs(spectrogram2), origin='lower')
axis.set_xlabel('Frequency Index')
axis.set_ylabel('DC Offset Index')
axis.set_title('Spectrogram Amplitude');
#########################################################################
# Approach 3 - Using the PycroDataset
# -----------------------------------
# We will use the new PycroDataset class to create an N dimensional slice directly from the two dimensional
# data in the file.
#
# First we convert from an HDF5 Dataset to a PycroDataset
pd_main = px.PycroDataset(h5_main)
print(pd_main.shape)
#########################################################################
# As you can see, the data is still two dimensional. The PycroDataset has several attributes that will help with
# the slicing.
#
# Let's check the names and sizes of each dimension
print(pd_main.n_dim_labels)
print(pd_main.n_dim_sizes)
#########################################################################
# With this information, we can now get our data slice.
#
slice_dict = dict(X=[2], Y=[3], Field=[0], Cycle=[1])
nd_spec, success = pd_main.slice(**slice_dict)
print(success)
print(nd_spec.shape)
#########################################################################
# The slice is returned already in the N dimensional form. We just need to remove all the
# dimensions with length one, transpose it like in method 2, and plot.
#
spectrogram3 = nd_spec.squeeze().T
# Now the spectrogram is of order (DC_Offset x frequency)
fig, axis = plt. subplots()
axis.imshow(np.abs(spectrogram3), origin='lower')
axis.set_xlabel('Frequency Index')
axis.set_ylabel('DC Offset Index')
axis.set_title('Spectrogram Amplitude');
# Close and delete the h5_file
h5_file.close()
os.remove(h5_path)
version = '0.56.01'
date = '9/11/2017'
time = '11:42:47'
version = '0.58.2'
date = '10/19/2017'
time = '1:05:51'
......@@ -1137,10 +1137,13 @@ class LoopOptimize(Optimize):
Subclass of Optimize with BE Loop specific changes
"""
def _initiateSolverAndObjFunc(self):
if self.solver_type in scipy.optimize.__dict__.keys():
solver = scipy.optimize.__dict__[self.solver_type]
if self.obj_func is None:
fm = BE_Fit_Methods()
func = fm.__getattribute__(self.obj_func_name)(self.obj_func_xvals)
return solver, self.solver_options, func
def _initiateSolverAndObjFunc(self, obj_func):
fm = BE_Fit_Methods()
if obj_func['class'] is None:
self.obj_func = obj_func['obj_func']
else:
self.obj_func_name = obj_func.pop('obj_func')
self.obj_func = fm.__getattribute__(self.obj_func_name)
self.obj_func_class = obj_func.pop('class')
self.obj_func_args = obj_func.values()
......@@ -7,9 +7,10 @@ from __future__ import division, print_function, absolute_import, unicode_litera
from warnings import warn
import numpy as np
from .model import Model
from ..io.pycro_data import PycroDataset
from ..io.be_hdf_utils import isReshapable, reshapeToNsteps, reshapeToOneStep
from ..io.hdf_utils import buildReducedSpec, copyRegionRefs, linkRefs, getAuxData, getH5DsetRefs, \
copyAttributes, get_attr
create_empty_dataset
from ..io.microdata import MicroDataset, MicroDataGroup
'''
......@@ -57,8 +58,8 @@ class BESHOmodel(Model):
"""
# Create all the ancilliary datasets, allocate space.....
h5_spec_inds = getAuxData(self.h5_main, auxDataName=['Spectroscopic_Indices'])[0]
h5_spec_vals = getAuxData(self.h5_main, auxDataName=['Spectroscopic_Values'])[0]
h5_spec_inds = self.h5_main.h5_spec_inds
h5_spec_vals = self.h5_main.h5_spec_vals
self.step_start_inds = np.where(h5_spec_inds[0] == 0)[0]
self.num_udvs_steps = len(self.step_start_inds)
......@@ -72,7 +73,7 @@ class BESHOmodel(Model):
maxshape=(self.h5_main.shape[0], self.num_udvs_steps),
chunking=(1, self.num_udvs_steps), dtype=sho32)
not_freq = get_attr(h5_spec_inds, 'labels') != 'Frequency'
not_freq = np.array(self.h5_main.spec_dim_labels) != 'Frequency'
ds_sho_inds, ds_sho_vals = buildReducedSpec(h5_spec_inds, h5_spec_vals, not_freq, self.step_start_inds)
......@@ -101,6 +102,8 @@ class BESHOmodel(Model):
copyRegionRefs(self.h5_main, self.h5_guess)
self.h5_guess = PycroDataset(self.h5_guess)
def _create_fit_datasets(self):
"""
Creates the HDF5 fit dataset. pycroscopy requires that the h5 group, guess dataset,
......@@ -114,7 +117,7 @@ class BESHOmodel(Model):
return
if self.step_start_inds is None:
h5_spec_inds = getAuxData(self.h5_main, auxDataName=['Spectroscopic_Indices'])[0]
h5_spec_inds = self.h5_main.h5_spec_inds
self.step_start_inds = np.where(h5_spec_inds[0] == 0)[0]
if self.num_udvs_steps is None:
......@@ -124,24 +127,11 @@ class BESHOmodel(Model):
self._get_frequency_vector()
h5_sho_grp = self.h5_guess.parent
h5_sho_grp.attrs['SHO_fit_method'] = "pycroscopy BESHO"
sho_grp = MicroDataGroup(h5_sho_grp.name.split('/')[-1],
h5_sho_grp.parent.name[1:])
# dataset size is same as guess size
ds_result = MicroDataset('Fit', data=[], maxshape=(self.h5_guess.shape[0], self.h5_guess.shape[1]),
chunking=self.h5_guess.chunks, dtype=sho32)
sho_grp.addChildren([ds_result])
sho_grp.attrs['SHO_fit_method'] = "pycroscopy BESHO"
h5_sho_grp_refs = self.hdf.writeData(sho_grp)
self.h5_fit = getH5DsetRefs(['Fit'], h5_sho_grp_refs)[0]
'''
Copy attributes of the fit guess
'''
copyAttributes(self.h5_guess, self.h5_fit, skip_refs=False)
# Create the fit dataset as an empty dataset of the same size and dtype as the guess.
# Also automatically links in the ancillary datasets.
self.h5_fit = PycroDataset(create_empty_dataset(self.h5_guess, dtype=sho32, dset_name='Fit'))
def _get_frequency_vector(self):
"""
......@@ -149,8 +139,8 @@ class BESHOmodel(Model):
This assumes that the data is reshape-able.
"""
h5_spec_vals = getAuxData(self.h5_main, auxDataName=['Spectroscopic_Values'])[0]
freq_dim = np.argwhere(get_attr(h5_spec_vals, 'labels') == 'Frequency').squeeze()
h5_spec_vals = self.h5_main.h5_spec_vals
freq_dim = np.argwhere('Frequency' == np.array(self.h5_main.spec_dim_labels)).squeeze()
if len(self.step_start_inds) == 1: # BE-Line
end_ind = h5_spec_vals.shape[1]
......@@ -252,7 +242,7 @@ class BESHOmodel(Model):
h5_guess : h5py.Dataset
Dataset object containing the guesses
"""
h5_spec_inds = getAuxData(self.h5_main, auxDataName=['Spectroscopic_Indices'])[0]
h5_spec_inds = self.h5_main.h5_spec_inds
self.step_start_inds = np.where(h5_spec_inds[0] == 0)[0]
self.num_udvs_steps = len(self.step_start_inds)
......
......@@ -22,7 +22,7 @@ class Fit_Methods(object):
self.methods = ['SHO']
@staticmethod
def SHO(freq_vector, *args):
def SHO(guess, data_vec, freq_vector, *args):
"""
Generates the single Harmonic Oscillator response over the given vector
......@@ -42,23 +42,20 @@ class Fit_Methods(object):
SHO_func: callable function.
"""
def SHOFunc(guess, data_vec):
if guess.size < 4:
raise ValueError('Error: The Single Harmonic Oscillator requires 4 parameter guesses!')
# data_vec, guess = args
data_mean = np.mean(data_vec)
if guess.size < 4:
raise ValueError('Error: The Single Harmonic Oscillator requires 4 parameter guesses!')
# data_vec, guess = args
data_mean = np.mean(data_vec)
Amp, w_0, Q, phi = guess[:4]
func = Amp * np.exp(1.j * phi) * w_0 ** 2 / (freq_vector ** 2 - 1j * freq_vector * w_0 / Q - w_0 ** 2)
Amp, w_0, Q, phi = guess[:4]
func = Amp * np.exp(1.j * phi) * w_0 ** 2 / (freq_vector ** 2 - 1j * freq_vector * w_0 / Q - w_0 ** 2)
ss_tot = sum(abs(data_vec - data_mean) ** 2)
ss_res = sum(abs(data_vec - func) ** 2)
ss_tot = sum(abs(data_vec - data_mean) ** 2)
ss_res = sum(abs(data_vec - func) ** 2)
r_squared = 1 - ss_res / ss_tot if ss_tot > 0 else 0
r_squared = 1 - ss_res / ss_tot if ss_tot > 0 else 0
return 1-r_squared
return SHOFunc
return 1-r_squared
class BE_Fit_Methods(object):
......@@ -69,9 +66,9 @@ class BE_Fit_Methods(object):
self.methods = ['BE_LOOP']
@staticmethod
def BE_LOOP(dc_vec, *args):
def BE_LOOP(coef_vec, data_vec, dc_vec, *args):
"""
Parameters
----------
dc_vec : numpy.ndarray
......@@ -82,22 +79,52 @@ class BE_Fit_Methods(object):
-------
"""
def loop_func(coef_vec, data_vec):
if coef_vec.size < 9:
raise ValueError('Error: The Loop Fit requires 9 parameter guesses!')
data_mean = np.mean(data_vec)
func = loop_fit_function(dc_vec, coef_vec)
ss_tot = sum(abs(data_vec - data_mean) ** 2)
ss_res = sum(abs(data_vec - func) ** 2)
r_squared = 1 - ss_res / ss_tot if ss_tot > 0 else 0
return 1 - r_squared
return loop_func
if coef_vec.size < 9:
raise ValueError('Error: The Loop Fit requires 9 parameter guesses!')
data_mean = np.mean(data_vec)
func = loop_fit_function(dc_vec, coef_vec)
ss_tot = sum(abs(data_vec - data_mean) ** 2)
ss_res = sum(abs(data_vec - func) ** 2)
r_squared = 1 - ss_res / ss_tot if ss_tot > 0 else 0
return 1 - r_squared
# @staticmethod
# def BE_LOOP(dc_vec, *args):
# """
#
# Parameters
# ----------
# dc_vec : numpy.ndarray
# The DC offset vector
# args : list
#
# Returns
# -------
#
# """
# def loop_func(coef_vec, data_vec):
# if coef_vec.size < 9:
# raise ValueError('Error: The Loop Fit requires 9 parameter guesses!')
#
# data_mean = np.mean(data_vec)
#
# func = loop_fit_function(dc_vec, coef_vec)
#
# ss_tot = sum(abs(data_vec - data_mean) ** 2)
# ss_res = sum(abs(data_vec - func) ** 2)
#
# r_squared = 1 - ss_res / ss_tot if ss_tot > 0 else 0
#
# return 1 - r_squared
#
# return loop_func
class forc_iv_fit_methods(Fit_Methods):
......
......@@ -26,21 +26,20 @@ class GuessMethods(object):
self.methods = ['wavelet_peaks', 'relative_maximum', 'gaussian_processes', 'complex_gaussian']
@staticmethod
def wavelet_peaks(*args, **kwargs):
def wavelet_peaks(vector, *args, **kwargs):
"""
This is a wrapper over scipy.signal.find_peaks_cwt() that finds peaks in the data using wavelet convolution.
This is the function that will be mapped by multiprocess. This is a wrapper around the scipy function.
It uses a parameter - wavelet_widths that is configured outside this function.
Parameters
----------
args: dictionary
List of optional parameters for this function - not used.
kwargs: dictionary
Passed to find_peaks_cwt().
vector : 1D numpy array
Feature vector containing peaks
Returns
-------
wpeaks: callable function.
peak_indices : list
List of indices of peaks within the prescribed peak widths
"""
try:
peak_width_bounds = kwargs.get('peak_widths')
......@@ -50,47 +49,29 @@ class GuessMethods(object):
# The below numpy array is used to configure the returned function wpeaks
wavelet_widths = np.linspace(peak_width_bounds[0], peak_width_bounds[1], peak_width_step)
def wpeaks(vector):
"""
This is the function that will be mapped by multiprocess. This is a wrapper around the scipy function.
It uses a parameter - wavelet_widths that is configured outside this function.
Parameters
----------
vector : 1D numpy array
Feature vector containing peaks
Returns
-------
peak_indices : list
List of indices of peaks within the prescribed peak widths
"""
peak_indices = find_peaks_cwt(np.abs(vector), wavelet_widths, **kwargs)
return peak_indices
return wpeaks
peak_indices = find_peaks_cwt(np.abs(vector), wavelet_widths, **kwargs)
return peak_indices