plot_spectral_unmixing.rst 12.6 KB
Newer Older
CompPhysChris's avatar
CompPhysChris committed
1
2


Unknown's avatar
Unknown committed
3
.. _sphx_glr_auto_examples_data_analysis_plot_spectral_unmixing.py:
CompPhysChris's avatar
CompPhysChris committed
4
5
6
7
8
9


=================================================================
Spectral Unmixing
=================================================================

Unknown's avatar
Unknown committed
10
11
12
13
S. Somnath\ :sup:`1,2`,  R. K. Vasudevan\ :sup:`1,3`
* :sup:`1` Institute for Functional Imaging of Materials
* :sup:`2` Advanced Data and Workflows Group
* :sup:`3` Center for Nanophase Materials Sciences
CompPhysChris's avatar
CompPhysChris committed
14
15
16
17
18
19
20
21
22
23
24
25
26

Oak Ridge National Laboratory, Oak Ridge TN 37831, USA

In this notebook we load some spectral data, and perform basic data analysis, including:
========================================================================================
* KMeans Clustering
* Non-negative Matrix Factorization
* Principal Component Analysis

Software Prerequisites:
=======================
* Standard distribution of **Anaconda** (includes numpy, scipy, matplotlib and sci-kit learn)
* **pycroscopy** : Though pycroscopy is mainly used here for plotting purposes only, it's true capabilities
Unknown's avatar
Unknown committed
27
28
  are realized through the ability to seamlessly perform these analyses on any imaging dataset (regardless
  of origin, size, complexity) and storing the results back into the same dataset among other things
CompPhysChris's avatar
CompPhysChris committed
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61




.. code-block:: python


    #Import packages

    # Ensure that this code works on both python 2 and python 3
    from __future__ import division, print_function, absolute_import, unicode_literals

    # basic numeric computation:
    import numpy as np

    # The package used for creating and manipulating HDF5 files:
    import h5py

    # Plotting and visualization:
    import matplotlib.pyplot as plt
    from mpl_toolkits.axes_grid1 import make_axes_locatable

    # for downloading files:
    import wget
    import os

    # multivariate analysis:
    from sklearn.cluster import KMeans
    from sklearn.decomposition import NMF

    # finally import pycroscopy:
    import pycroscopy as px

Unknown's avatar
Unknown committed
62
63
64
    """
  
    """
CompPhysChris's avatar
CompPhysChris committed
65
66


67
68
69
70
71





CompPhysChris's avatar
CompPhysChris committed
72
73
74
The Data
========

Unknown's avatar
Unknown committed
75
76
In this example, we will work on a **Band Excitation Piezoresponse Force Microscopy (BE-PFM)** imaging dataset
acquired from advanced atomic force microscopes. In this dataset, a spectra was colllected for each position in a two
Unknown's avatar
Unknown committed
77
dimensional grid of spatial locations. Thus, this is a three dimensional dataset that has been flattened to a two
Unknown's avatar
Unknown committed
78
dimensional matrix in accordance with the pycroscopy data format.
CompPhysChris's avatar
CompPhysChris committed
79

Unknown's avatar
Unknown committed
80
81
Fortunately, all statistical analysis, machine learning, spectral unmixing algorithms, etc. only accept data that is
formatted in the same manner of [position x spectra] in a two dimensional matrix.
CompPhysChris's avatar
CompPhysChris committed
82

Unknown's avatar
Unknown committed
83
We will begin by downloading the BE-PFM dataset from Github
CompPhysChris's avatar
CompPhysChris committed
84
85
86



Unknown's avatar
Unknown committed
87

CompPhysChris's avatar
CompPhysChris committed
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
.. code-block:: python


    data_file_path = 'temp_um.h5'
    # download the data file from Github:
    url = 'https://raw.githubusercontent.com/pycroscopy/pycroscopy/master/data/BELine_0004.h5'
    _ = wget.download(url, data_file_path, bar=None)

    hdf = px.ioHDF5(data_file_path)
    h5_file = hdf.file

    print('Contents of data file:')
    print('----------------------')
    px.hdf_utils.print_tree(h5_file)
    print('----------------------')

    h5_meas_grp = h5_file['Measurement_000']

    # Extracting some basic parameters:
    num_rows = px.hdf_utils.get_attr(h5_meas_grp,'grid_num_rows')
    num_cols = px.hdf_utils.get_attr(h5_meas_grp,'grid_num_cols')

    # Getting a reference to the main dataset:
    h5_main = h5_meas_grp['Channel_000/Raw_Data']

    # Extracting the X axis - vector of frequencies
114
    h5_spec_vals = px.hdf_utils.get_auxillary_datasets(h5_main,'Spectroscopic_Values')[-1]
CompPhysChris's avatar
CompPhysChris committed
115
116
117
118
119
120
121
122
    freq_vec = np.squeeze(h5_spec_vals.value) * 1E-3

    print('Data currently of shape:', h5_main.shape)

    x_label = 'Frequency (kHz)'
    y_label = 'Amplitude (a.u.)'


123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157



.. rst-class:: sphx-glr-script-out

 Out::

    Contents of data file:
    ----------------------
    /
    Measurement_000
    Measurement_000/Channel_000
    Measurement_000/Channel_000/Bin_FFT
    Measurement_000/Channel_000/Bin_Frequencies
    Measurement_000/Channel_000/Bin_Indices
    Measurement_000/Channel_000/Bin_Step
    Measurement_000/Channel_000/Bin_Wfm_Type
    Measurement_000/Channel_000/Excitation_Waveform
    Measurement_000/Channel_000/Noise_Floor
    Measurement_000/Channel_000/Position_Indices
    Measurement_000/Channel_000/Position_Values
    Measurement_000/Channel_000/Raw_Data
    Measurement_000/Channel_000/Spatially_Averaged_Plot_Group_000
    Measurement_000/Channel_000/Spatially_Averaged_Plot_Group_000/Bin_Frequencies
    Measurement_000/Channel_000/Spatially_Averaged_Plot_Group_000/Mean_Spectrogram
    Measurement_000/Channel_000/Spatially_Averaged_Plot_Group_000/Spectroscopic_Parameter
    Measurement_000/Channel_000/Spatially_Averaged_Plot_Group_000/Step_Averaged_Response
    Measurement_000/Channel_000/Spectroscopic_Indices
    Measurement_000/Channel_000/Spectroscopic_Values
    Measurement_000/Channel_000/UDVS
    Measurement_000/Channel_000/UDVS_Indices
    ----------------------
    Data currently of shape: (16384, 119)


CompPhysChris's avatar
CompPhysChris committed
158
159
160
161
162
163
164
165
166
167
168
169
170
Visualize the Amplitude Data
============================
Note that we are not hard-coding / writing any tick labels / axis labels by hand.
All the necessary information was present in the H5 file



.. code-block:: python


    px.viz.be_viz_utils.jupyter_visualize_be_spectrograms(h5_main)


171
172


Unknown's avatar
Unknown committed
173
.. image:: /auto_examples/data_analysis/images/sphx_glr_plot_spectral_unmixing_001.png
174
175
176
177
178
179
180
181
182
183
184
185
    :align: center


.. rst-class:: sphx-glr-script-out

 Out::

    No position datasets found as attributes of /Measurement_000/Channel_000/Spectroscopic_Values
    HBox(children=(Text(value='temp_um.h5', description='Output Filename:', layout=Layout(width='50%'), placeholder='Type something'), Button(description='Save figure', style=ButtonStyle())))
    interactive(children=(IntSlider(value=59, description='step', max=118), Output()), _dom_classes=('widget-interact',))


CompPhysChris's avatar
CompPhysChris committed
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
1. Singular Value Decomposition (SVD)
=====================================

SVD is an eigenvector decomposition that is defined statistically, and therefore typically produces
non-physical eigenvectors. Consequently, the interpretation of eigenvectors and abundance maps from
SVD requires care and caution in interpretation. Nontheless, it is a good method for quickly
visualizing the major trends in the dataset since the resultant eigenvectors are sorted in descending
order of variance or importance. Furthermore, SVD is also very well suited for data cleaning through
the reconstruction of the dataset using only the first N (most significant) components.

SVD results in three matrices:
* V - Eigenvectors sorted by variance in descending order
* U - corresponding bundance maps
* S - Variance or importance of each of these components

Advantage of pycroscopy:
------------------------
Notice that we are working with a complex valued dataset. Passing the complex values as is to SVD would result in
complex valued eigenvectors / endmembers as well as abundance maps. Complex valued abundance maps are not physical.
Thus, one would need to restructure the data such that it is real-valued only.

One solution is to stack the real value followed by the magnitude of the imaginary component before passing to SVD.
After SVD, the real-valued eigenvectors would need to be treated as the concatenation of the real and imaginary
components. So, the eigenvectors would need to be restructured to get back the complex valued eigenvectors.

**Pycroscopy handles all these data transformations (both for the source dataset and the eigenvectors)
automatically.**  In general, pycroscopy handles compund / complex valued datasets everywhere possible

Furthermore, while it is not discussed in this example, pycroscopy also writes back the results from SVD back to
the same source h5 file including all relevant links to the source dataset and other ancillary datasets



.. code-block:: python


Unknown's avatar
Unknown committed
222
223
    do_svd = px.processing.svd_utils.SVD(h5_main, num_components=256)
    h5_svd_group = do_svd.compute()
CompPhysChris's avatar
CompPhysChris committed
224
225
226
227
228
229
230
231
232

    h5_u = h5_svd_group['U']
    h5_v = h5_svd_group['V']
    h5_s = h5_svd_group['S']

    # Since the two spatial dimensions (x, y) have been collapsed to one, we need to reshape the abundance maps:
    abun_maps = np.reshape(h5_u[:,:25], (num_rows, num_cols, -1))

    # Visualize the variance / statistical importance of each component:
Unknown's avatar
Unknown committed
233
    px.plot_utils.plot_scree(h5_s, title='Note the exponential drop of variance with number of components')
CompPhysChris's avatar
CompPhysChris committed
234
235
236
237
238

    # Visualize the eigenvectors:
    first_evecs = h5_v[:9, :]

    px.plot_utils.plot_loops(freq_vec, np.abs(first_evecs), x_label=x_label, y_label=y_label, plots_on_side=3,
Unknown's avatar
Unknown committed
239
                             subtitle_prefix='Component', title='SVD Eigenvectors (Amplitude)', evenly_spaced=False)
CompPhysChris's avatar
CompPhysChris committed
240
    px.plot_utils.plot_loops(freq_vec, np.angle(first_evecs), x_label=x_label, y_label='Phase (rad)', plots_on_side=3,
Unknown's avatar
Unknown committed
241
                             subtitle_prefix='Component', title='SVD Eigenvectors (Phase)', evenly_spaced=False)
CompPhysChris's avatar
CompPhysChris committed
242
243
244
245
246
247

    # Visualize the abundance maps:
    px.plot_utils.plot_map_stack(abun_maps, num_comps=9, heading='SVD Abundance Maps',
                                 color_bar_mode='single', cmap='inferno')


248
249
250
251
252
253
254


.. rst-class:: sphx-glr-horizontal


    *

Unknown's avatar
Unknown committed
255
      .. image:: /auto_examples/data_analysis/images/sphx_glr_plot_spectral_unmixing_002.png
256
257
258
259
            :scale: 47

    *

Unknown's avatar
Unknown committed
260
      .. image:: /auto_examples/data_analysis/images/sphx_glr_plot_spectral_unmixing_003.png
261
262
263
264
            :scale: 47

    *

Unknown's avatar
Unknown committed
265
      .. image:: /auto_examples/data_analysis/images/sphx_glr_plot_spectral_unmixing_004.png
266
267
268
269
            :scale: 47

    *

Unknown's avatar
Unknown committed
270
      .. image:: /auto_examples/data_analysis/images/sphx_glr_plot_spectral_unmixing_005.png
271
272
273
274
275
276
277
278
            :scale: 47


.. rst-class:: sphx-glr-script-out

 Out::

    Performing SVD decomposition
Unknown's avatar
Unknown committed
279
    SVD took 0.91 seconds.  Writing results to file.
280
281


CompPhysChris's avatar
CompPhysChris committed
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
2. KMeans Clustering
====================

KMeans clustering is a quick and easy method to determine the types of spectral responses present in the
data. It is not a decomposition method, but a basic clustering method. The user inputs the number of
clusters (sets) to partition the data into. The algorithm proceeds to find the optimal labeling
(ie., assignment of each spectra as belonging to the k<sup>th</sup> set) such that the within-cluster
sum of squares is minimized.

Set the number of clusters below



.. code-block:: python


    num_clusters = 4

    estimators = px.Cluster(h5_main, 'KMeans', n_clusters=num_clusters)
    h5_kmeans_grp = estimators.do_cluster(h5_main)
    h5_kmeans_labels = h5_kmeans_grp['Labels']
    h5_kmeans_mean_resp = h5_kmeans_grp['Mean_Response']

    px.plot_utils.plot_cluster_h5_group(h5_kmeans_grp)


308
309


Unknown's avatar
Unknown committed
310
.. image:: /auto_examples/data_analysis/images/sphx_glr_plot_spectral_unmixing_006.png
311
312
313
314
315
316
317
318
319
320
321
322
    :align: center


.. rst-class:: sphx-glr-script-out

 Out::

    Performing clustering on /Measurement_000/Channel_000/Raw_Data.
    Calculated the Mean Response of each cluster.
    Writing clustering results to file.


CompPhysChris's avatar
CompPhysChris committed
323
324
325
326
327
328
329
330
331
3. Non-negative Matrix Factorization (NMF)
===========================================

NMF, or non-negative matrix factorization, is a method that is useful towards unmixing of spectral
data. It only works on data with positive real values. It operates by approximate determination of
factors (matrices) W and H, given a matrix V, as shown below

.. image:: https://upload.wikimedia.org/wikipedia/commons/f/f9/NMF.png

Unknown's avatar
Unknown committed
332
333
334
Unlike SVD and k-Means that can be applied to complex-valued datasets, NMF only works on non-negative datasets.
For illustrative purposes, we will only take the amplitude component of the spectral data

CompPhysChris's avatar
CompPhysChris committed
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357


.. code-block:: python


    num_comps = 4

    # get the non-negative portion of the dataset
    data_mat = np.abs(h5_main)

    model = NMF(n_components=num_comps, init='random', random_state=0)
    model.fit(data_mat)

    fig, axis = plt.subplots(figsize=(5.5, 5))
    px.plot_utils.plot_line_family(axis, freq_vec, model.components_, label_prefix='NMF Component #')
    axis.set_xlabel(x_label, fontsize=12)
    axis.set_ylabel(y_label, fontsize=12)
    axis.set_title('NMF Components', fontsize=14)
    axis.legend(bbox_to_anchor=[1.0, 1.0], fontsize=12)




Unknown's avatar
Unknown committed
358
.. image:: /auto_examples/data_analysis/images/sphx_glr_plot_spectral_unmixing_007.png
359
    :align: center
CompPhysChris's avatar
CompPhysChris committed
360
361


Unknown's avatar
Unknown committed
362

CompPhysChris's avatar
CompPhysChris committed
363
364
365
366
367


.. code-block:: python


368
369
370
    # Close and delete the h5_file
    h5_file.close()
    os.remove(data_file_path)
CompPhysChris's avatar
CompPhysChris committed
371
372
373
374
375
376






Unknown's avatar
Unknown committed
377
**Total running time of the script:** ( 0 minutes  12.822 seconds)
CompPhysChris's avatar
CompPhysChris committed
378
379
380



381
.. only :: html
CompPhysChris's avatar
CompPhysChris committed
382

383
 .. container:: sphx-glr-footer
CompPhysChris's avatar
CompPhysChris committed
384
385
386
387
388
389
390
391
392
393
394
395
396


  .. container:: sphx-glr-download

     :download:`Download Python source code: plot_spectral_unmixing.py <plot_spectral_unmixing.py>`



  .. container:: sphx-glr-download

     :download:`Download Jupyter notebook: plot_spectral_unmixing.ipynb <plot_spectral_unmixing.ipynb>`


397
398
399
400
401
.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.readthedocs.io>`_