Ptychography.ipynb 16 KB
Newer Older
Chris Smith's avatar
Chris Smith committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Analyzing Ptychography data using pycroscopy\n",
    "### Stephen Jesse, Suhas Somnath, and Chris R. Smith, \n",
    "The Center for Nanophase Materials Science and The Institute for Functional Imaging for Materials <br>\n",
    "Oak Ridge National Laboratory<br>\n",
    "1/19/2017\n",
    "\n",
    "Here, we will be working with ptychography datasets acquired using a scanning transmission electron microscope (STEM). These ptychography datsets have four dimensions - two (x, y) dimensions from the position of the electron beam and each spatial pixel contains a two dimensional (u, v) image, called a ronchigram, recorded by the detector. Though the ronchigrams are typically averaged to two values (bright field, dark field), retaining the raw ronchigrams enables deeper investigation of data to reveal the existence of different phases in the material and other patterns that would not be visible in the averaged data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Configure the notebook first"
   ]
  },
  {
   "cell_type": "code",
Unknown's avatar
Unknown committed
25
26
   "execution_count": null,
   "metadata": {},
Chris Smith's avatar
Chris Smith committed
27
28
   "outputs": [],
   "source": [
Unknown's avatar
Unknown committed
29
30
    "!pip install -U pycroscopy scipy numpy matplotlib h5py\n",
    "\n",
31
    "# Ensure python 3 compatibility\n",
32
    "from __future__ import division, print_function, absolute_import\n",
Chris Smith's avatar
Chris Smith committed
33
34
35
36
37
38
    "\n",
    "# Import necessary libraries:\n",
    "import h5py\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.patches as patches\n",
Unknown's avatar
Unknown committed
39
    "from IPython.display import display, HTML\n",
Chris Smith's avatar
Chris Smith committed
40
    "import ipywidgets as widgets\n",
41
42
    "from sklearn.cluster import KMeans\n",
    "    \n",
43
44
45
    "import pycroscopy as px\n",
    "\n",
    "# set up notebook to show plots within the notebook\n",
Unknown's avatar
Unknown committed
46
47
48
49
50
51
52
53
54
55
    "% matplotlib notebook\n",
    "\n",
    "# Make Notebook take up most of page width\n",
    "display(HTML(data=\"\"\"\n",
    "<style>\n",
    "    div#notebook-container    { width: 95%; }\n",
    "    div#menubar-container     { width: 65%; }\n",
    "    div#maintoolbar-container { width: 99%; }\n",
    "</style>\n",
    "\"\"\"))"
Chris Smith's avatar
Chris Smith committed
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load pycroscopy compatible ptychography dataset\n",
    "\n",
    "For simplicity we will use a dataset that has already been transalated form its original data format into a pycroscopy compatible hierarchical data format (HDF5 or H5) file\n",
    "\n",
    "#### H5 files:\n",
    "* are like smart containers that can store matrices with data, folders to organize these datasets, images, metadata like experimental parameters, links or shortcuts to datasets, etc.\n",
    "* are readily compatible with high-performance computing facilities\n",
    "* scale very efficiently from few kilobytes to several terabytes\n",
    "* can be read and modified using any language including Python, Matlab, C/C++, Java, Fortran, Igor Pro, etc."
   ]
  },
  {
   "cell_type": "code",
Unknown's avatar
Unknown committed
75
76
77
   "execution_count": null,
   "metadata": {},
   "outputs": [],
Chris Smith's avatar
Chris Smith committed
78
79
80
   "source": [
    "# Select a file to work on:\n",
    "# h5_path = px.io_utils.uiGetFile('*.h5', 'pycroscopy formatted Ptychography dataset')\n",
Unknown's avatar
Unknown committed
81
    "h5_path = \"/home/challtdow/workspace/pycroscopy_data/Raw_Data/20120212_21_GB/20120212_21_GB.h5\"\n",
Chris Smith's avatar
Chris Smith committed
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
    "print('Working on:\\n' + h5_path)\n",
    "# Open the file\n",
    "h5_file = h5py.File(h5_path, mode='r+')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inspect the contents of this h5 data file\n",
    "The file contents are stored in a tree structure, just like files on a contemporary computer.\n",
    "The data is stored as a 2D matrix (position, spectroscopic value) regardless of the dimensionality of the data.  \n",
    "In the case of these 4D ptychography datasets, the data is stored as a N x P dataset where N is the number of spatial positions of the electron beam and P is the number of pixels in the detector. \n",
    "\n",
    "The main dataset is always accompanied by four ancillary datasets that explain the position and spectroscopic value of any given element in the dataset.\n",
    "In the case of the 2d images, the positions will be arranged as row0-col0, row0-col1.... row0-colN, row1-col0....\n",
    "The spectroscopic information is trivial since the data at any given pixel is just a scalar value"
   ]
  },
  {
   "cell_type": "code",
Unknown's avatar
Unknown committed
103
104
105
   "execution_count": null,
   "metadata": {},
   "outputs": [],
Chris Smith's avatar
Chris Smith committed
106
   "source": [
107
108
    "print('Datasets and datagroups within the file:\\n------------------------------------')\n",
    "px.hdf_utils.print_tree(h5_file)\n",
Chris Smith's avatar
Chris Smith committed
109
    " \n",
110
    "print('\\nThe main dataset:\\n------------------------------------')\n",
Chris Smith's avatar
Chris Smith committed
111
    "print(h5_file['/Measurement_000/Channel_000/Raw_Data'])\n",
112
    "print('\\nThe ancillary datasets:\\n------------------------------------')\n",
Chris Smith's avatar
Chris Smith committed
113
114
115
116
117
    "print(h5_file['/Measurement_000/Channel_000/Position_Indices'])\n",
    "print(h5_file['/Measurement_000/Channel_000/Position_Values'])\n",
    "print(h5_file['/Measurement_000/Channel_000/Spectroscopic_Indices'])\n",
    "print(h5_file['/Measurement_000/Channel_000/Spectroscopic_Values'])\n",
    "\n",
118
    "print('\\nMetadata or attributes in a datagroup\\n------------------------------------')\n",
Chris Smith's avatar
Chris Smith committed
119
120
121
122
123
124
125
126
127
128
129
130
131
    "for key in h5_file['/Measurement_000'].attrs:\n",
    "    print('{} : {}'.format(key, h5_file['/Measurement_000'].attrs[key]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Read some basic parameters for visualization"
   ]
  },
  {
   "cell_type": "code",
Unknown's avatar
Unknown committed
132
133
134
   "execution_count": null,
   "metadata": {},
   "outputs": [],
Chris Smith's avatar
Chris Smith committed
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
   "source": [
    "# Select the dataset containing the raw data to start working with:\n",
    "h5_main = px.hdf_utils.getDataSet(h5_file, 'Raw_Data')[-1]\n",
    "\n",
    "# Read some necessary parameters:\n",
    "h5_pos_inds = px.hdf_utils.getAuxData(h5_main, auxDataName=['Position_Indices'])[0]\n",
    "num_rows = len(np.unique(h5_pos_inds[:, 0]))\n",
    "num_cols = len(np.unique(h5_pos_inds[:, 1]))\n",
    "h5_spec_inds = px.hdf_utils.getAuxData(h5_main, auxDataName=['Spectroscopic_Indices'])[0]\n",
    "num_sensor_rows = len(np.unique(h5_spec_inds[0, :]))\n",
    "num_sensor_cols = len(np.unique(h5_spec_inds[1, :]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualize the Raw Ronchigrams"
   ]
  },
  {
   "cell_type": "code",
Unknown's avatar
Unknown committed
157
158
159
   "execution_count": null,
   "metadata": {},
   "outputs": [],
Chris Smith's avatar
Chris Smith committed
160
   "source": [
161
162
163
    "coarse_row = int(0.5*num_rows)\n",
    "coarse_col = int(0.5*num_cols)\n",
    "coarse_pos = coarse_row * num_rows + coarse_col\n",
Chris Smith's avatar
Chris Smith committed
164
    "\n",
165
    "current_ronch = np.reshape(h5_main[coarse_pos], (num_sensor_rows, num_sensor_cols))\n",
Chris Smith's avatar
Chris Smith committed
166
167
168
    "\n",
    "fig, axes = plt.subplots(ncols=2, figsize=(14,7))\n",
    "axes[0].hold(True)\n",
169
    "axes[0].set_title('Mean Response')\n",
Unknown's avatar
Unknown committed
170
171
    "main_map = axes[0].imshow(np.reshape(h5_main.parent['Spectroscopic_Mean'], (num_rows, num_cols)), \n",
    "                          cmap=px.plot_utils.cmap_jet_white_center(), origin='lower')\n",
172
173
174
175
    "main_vert_line = axes[0].axvline(x=coarse_col, color='k')\n",
    "main_hor_line = axes[0].axhline(y=coarse_row, color='k')\n",
    "axes[1].set_title('Ronchigram at current pixel')\n",
    "img_zoom = axes[1].imshow(current_ronch,cmap=px.plot_utils.cmap_jet_white_center(), origin='lower')\n",
Chris Smith's avatar
Chris Smith committed
176
    "\n",
Unknown's avatar
Unknown committed
177
178
179
180
181
182
    "def move_zoom_box(event):\n",
    "    if not main_map.axes.in_axes(event):\n",
    "        return\n",
    "    \n",
    "    coarse_col = int(round(event.xdata))\n",
    "    coarse_row = int(round(event.ydata))\n",
183
184
185
186
187
188
189
190
    "    main_vert_line.set_xdata(coarse_col)\n",
    "    main_hor_line.set_ydata(coarse_row)\n",
    "    \n",
    "    coarse_pos = coarse_row * num_rows + coarse_col\n",
    "    current_ronch = np.reshape(h5_main[coarse_pos], (num_sensor_rows, num_sensor_cols))\n",
    "\n",
    "    img_zoom.set_data(current_ronch)\n",
    "    #img_zoom.set_clim(vmax=ronch_max, vmin=ronch_min)\n",
Unknown's avatar
Unknown committed
191
192
    "    fig.canvas.draw()\n",
    "    \n",
Chris Smith's avatar
Chris Smith committed
193
    "\n",
Unknown's avatar
Unknown committed
194
195
196
    "cid = main_map.figure.canvas.mpl_connect('button_press_event', move_zoom_box)\n",
    "# widgets.interact(move_zoom_box, coarse_row=(0, num_rows, 1), \n",
    "#                  coarse_col=(0, num_cols, 1));"
Chris Smith's avatar
Chris Smith committed
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Performing Singular Value Decompostion (SVD)\n",
    "SVD decomposes data (arranged as position x value) into a sequence of orthogonal components arranged in descending order of variance. The first component contains the most significant trend in the data. The second component contains the next most significant trend orthogonal to all previous components (just the first component). Each component consists of the trend itself (eigenvector), the spatial variaion of this trend (eigenvalues), and the variance (statistical importance) of the component.\n",
    "\n",
    "Here, SVD essentially compares every single ronchigram with every other ronchigram to find statistically significant trends in the dataset. Such correlation would be infeasible if the ronchigrams were averaged to bright-field and dark-field scalar values. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
Unknown's avatar
Unknown committed
212
   "metadata": {},
Chris Smith's avatar
Chris Smith committed
213
214
   "outputs": [],
   "source": [
215
216
    "proc = px.SVD(h5_main, num_comps=256)\n",
    "\n",
Chris Smith's avatar
Chris Smith committed
217
    "# First check if SVD was already computed on this dataset:\n",
218
    "if proc.duplicate_h5_groups is None:\n",
Chris Smith's avatar
Chris Smith committed
219
    "    print('No prior SVD results found - doing SVD now')\n",
220
    "    h5_svd_group = proc.compute()\n",
Chris Smith's avatar
Chris Smith committed
221
222
    "else:\n",
    "    print('Taking previous SVD results already present in file')\n",
223
    "    h5_svd_group = proc.duplicate_h5_groups[-1]\n",
Chris Smith's avatar
Chris Smith committed
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
    "    \n",
    "h5_u = h5_svd_group['U']\n",
    "h5_v = h5_svd_group['V']\n",
    "h5_s = h5_svd_group['S']\n",
    "num_comps = 16"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualize the SVD results\n",
    "\n",
    "##### S (variance):\n",
    "The plot below shows the variance or statistical significance of the SVD components. The first few components contain the most significant information while the last few components mainly contain noise. \n",
    "\n",
    "Note also that the plot below is a log-log plot. The importance of each subsequent component drops exponentially."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
Unknown's avatar
Unknown committed
246
   "metadata": {},
Chris Smith's avatar
Chris Smith committed
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
   "outputs": [],
   "source": [
    "# Visualize variance of the principal components\n",
    "fig, axes = px.plot_utils.plotScree(h5_s, title='Variance')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### U (Eigenvalues or loading maps):\n",
    "The plot below shows the spatial distribution of each SVD component"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
Unknown's avatar
Unknown committed
265
    "scrolled": false
Chris Smith's avatar
Chris Smith committed
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
   },
   "outputs": [],
   "source": [
    "# Visualize the eigenvalues or loading maps from SVD:\n",
    "loadings = np.reshape(h5_u[:, :num_comps], (num_rows, num_cols, -1))\n",
    "fig, axes = px.plot_utils.plot_map_stack(loadings, num_comps=num_comps, heading='Eigenvalues',\n",
    "                                         cmap=px.plot_utils.cmap_jet_white_center())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### V (Eigenvectors)\n",
    "The V dataset contains the eigenvectors for each component"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
Unknown's avatar
Unknown committed
287
    "scrolled": false
Chris Smith's avatar
Chris Smith committed
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
   },
   "outputs": [],
   "source": [
    "# Visualize the eigenvectors from SVD:\n",
    "eigenvectors = np.reshape(h5_v[:num_comps], (-1, num_sensor_rows, num_sensor_cols))\n",
    "eigenvectors = np.transpose(eigenvectors, (1, 2, 0))\n",
    "fig, axes = px.plot_utils.plot_map_stack(eigenvectors, num_comps=num_comps, heading='Eigenvectors',\n",
    "                                         cmap=px.plot_utils.cmap_jet_white_center())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clustering\n",
    "Clustering divides data into k clusters such that the variance within each cluster is minimized. \n",
    "\n",
    "In principle, clustering can be perfomed on any dataset that has some spectral values (eg. - ronchgirams in the case of the raw dataset or an array of values for each SVD component) for each position. However, the computation time increases with the size of the dataset.\n",
    "\n",
    "Here, we will be performing k-means clustering on the U matrix from SVD. \n",
    "We want a large enough number of clusters so that K-means identifies fine nuances in the data. At the same time, we want to minimize computational time by reducing the number of clusters. We recommend 32 clusters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
Unknown's avatar
Unknown committed
314
   "metadata": {},
Chris Smith's avatar
Chris Smith committed
315
316
317
   "outputs": [],
   "source": [
    "# Attempt to find any previous computation\n",
318
319
320
321
322
323
324
    "num_clusters = 32\n",
    "spectral_components = 128\n",
    "estimator = KMeans(n_clusters=num_clusters)\n",
    "\n",
    "proc = px.Cluster(h5_u, estimator, num_comps=spectral_components)\n",
    "\n",
    "if proc.duplicate_h5_groups is None:\n",
Chris Smith's avatar
Chris Smith committed
325
    "    print('No k-Means computation found. Doing K-Means now')\n",
326
    "    h5_kmeans_group = proc.compute()\n",
Chris Smith's avatar
Chris Smith committed
327
328
    "else:\n",
    "    print('Taking existing results of previous K-Means computation')\n",
329
    "    h5_kmeans_group = proc.duplicate_h5_groups[-1]\n",
Chris Smith's avatar
Chris Smith committed
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
    "    \n",
    "h5_labels = h5_kmeans_group['Labels']\n",
    "h5_centroids = h5_kmeans_group['Mean_Response']\n",
    "\n",
    "# In case we take existing results, we need to get these parameters\n",
    "num_clusters = h5_centroids.shape[0]\n",
    "num_comps_for_clustering = h5_centroids.shape[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Visualize k-means results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
Unknown's avatar
Unknown committed
349
   "metadata": {},
Chris Smith's avatar
Chris Smith committed
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
   "outputs": [],
   "source": [
    "label_mat = np.reshape(h5_labels, (num_rows, num_cols))\n",
    "fig, axis = plt.subplots(figsize=(7,7))\n",
    "axis.imshow(label_mat, cmap=px.plot_utils.cmap_jet_white_center())\n",
    "axis.set_title('k-Means Cluster labels');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Visualize the hierarchical clustering\n",
    "The vertical length of the branches indicates the relative separation between neighboring clusters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
Unknown's avatar
Unknown committed
369
   "metadata": {},
Chris Smith's avatar
Chris Smith committed
370
371
   "outputs": [],
   "source": [
372
    "e_vals = np.reshape(h5_u[:, :spectral_components], \n",
Chris Smith's avatar
Chris Smith committed
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
    "                    (num_rows, num_cols, -1))\n",
    "fig = px.plot_utils.plot_cluster_dendrogram(label_mat, e_vals, \n",
    "                                            num_comps_for_clustering, \n",
    "                                            num_clusters, \n",
    "                                            last=num_clusters);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Save and close\n",
    "* Save the .h5 file that we are working on by closing it. <br>\n",
    "* Also, consider exporting this notebook as a notebook or an html file. <br> To do this, go to File >> Download as >> HTML\n",
    "* Finally consider saving this notebook if necessary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
Unknown's avatar
Unknown committed
393
   "metadata": {},
Chris Smith's avatar
Chris Smith committed
394
395
396
397
398
399
400
401
   "outputs": [],
   "source": [
    "h5_file.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
Unknown's avatar
Unknown committed
402
   "metadata": {},
Chris Smith's avatar
Chris Smith committed
403
   "outputs": [],
404
   "source": []
Chris Smith's avatar
Chris Smith committed
405
406
407
408
409
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
Unknown's avatar
Unknown committed
410
   "display_name": "Python 2",
Chris Smith's avatar
Chris Smith committed
411
412
413
414
415
416
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
Unknown's avatar
Unknown committed
417
    "version": 3
Chris Smith's avatar
Chris Smith committed
418
419
420
421
422
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
Unknown's avatar
Unknown committed
423
424
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
425
426
427
428
429
430
  },
  "widgets": {
   "state": {
    "74a037e0ed7f4854a0c8e337ac2d6798": {
     "views": [
      {
431
       "cell_index": 10.0
432
433
434
435
436
      }
     ]
    }
   },
   "version": "1.2.0"
Chris Smith's avatar
Chris Smith committed
437
438
439
  }
 },
 "nbformat": 4,
Unknown's avatar
Unknown committed
440
 "nbformat_minor": 1
441
}