ToDo.rst 11.2 KB
Newer Older
Somnath, Suhas's avatar
Somnath, Suhas committed
1
2
.. contents::

Somnath, Suhas's avatar
Somnath, Suhas committed
3
4
v 1.0 goals
-----------
Somnath, Suhas's avatar
Somnath, Suhas committed
5
1. test utils - 2+ weeks
Somnath, Suhas's avatar
Somnath, Suhas committed
6
7
8
9
2. good utilities for interrogating data - pycro data - done
3. good documentation for both users and developers

  * Need more on dealing with data and (for developers) explaining what is where and why 
Somnath, Suhas's avatar
Somnath, Suhas committed
10
11
12
13
14
15
16
4. generic visualizer - mostly complete - 
5. settle on a structure for process and analysis - moderate ~ 1 day

  * Process should implement some checks. 
  * Model needs to catch up with Process
6. good utils for generating publishable plots - easy ~ 1 day
7. Promote / demote lesser used utilites to processes / analyses. 
Somnath, Suhas's avatar
Somnath, Suhas committed
17

Somnath, Suhas's avatar
Somnath, Suhas committed
18
Short-term goals
Somnath, Suhas's avatar
Somnath, Suhas committed
19
--------------------
Somnath, Suhas's avatar
Somnath, Suhas committed
20
* Multi-node compute capability
Somnath, Suhas's avatar
Somnath, Suhas committed
21
22
* More documentation to help users / developers + PAPER
* Cleaned versions of the main modules (Analysis pending) + enough documentation for users and developers
Somnath, Suhas's avatar
Somnath, Suhas committed
23

Somnath, Suhas's avatar
Somnath, Suhas committed
24
25
Documentation
-------------
Somnath, Suhas's avatar
Somnath, Suhas committed
26
27
* Upload clean exports of paper notebooks - Stephen and Chris
* Organize papers by instrument / technique
Somnath, Suhas's avatar
Somnath, Suhas committed
28
29
*	Include examples in documentation
* Links to references for all functions and methods used in our workflows.
30

31
Fundamental tutorials on how to use pycroscopy
Somnath, Suhas's avatar
Somnath, Suhas committed
32
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
33
34
35
36
37
* A tour of the hdf_utils functions used for writing h5 files since these functions need data to show / explain them.
  
  * chunking the main dataset
* A tour of the io_utils functions since these functions need data to show / explain them.
* A tour of plot_utils
Somnath, Suhas's avatar
Somnath, Suhas committed
38
39
* pycroscopy pacakge organization - a short writeup on what is where and differences between the process / analyis submodules
* How to write your own analysis class based on the (to-be simplified) Model class
40
* Links to tutorials on how to use pycharm, Git, 
Somnath, Suhas's avatar
Somnath, Suhas committed
41
42
43
44
45

Rama's (older and more applied / specific) tutorial goals
~~~~~~~~~~~~~~~~~~~~
1. Open a translated and fitted FORC-PFM file, and plot the SHO Fit from cycle k corresponding to voltage p, along with the raw spectrogram for that location and the SHO guess. Plot both real and imaginary, and do so for both on and off-field.
2. Continuing above, determine the average of the quality factor coming from cycles 1,3,4 for spatial points stored in vector b for the on-field part for a predetermined voltage range given by endpoints [e,f]. Compare the results with the SHO guess and fit for the quality factor.
Somnath, Suhas's avatar
Somnath, Suhas committed
46
3. After opening a h5 file containing results from a relaxation experiment, plot the response at a particular point and voltage, run exponential fitting and then store the results of the fit in the same h5 file using iohdf and/or numpy translators.
Somnath, Suhas's avatar
Somnath, Suhas committed
47
48
49
4. Take a FORC IV ESM dataset and break it up into forward and reverse branches, along with positive and negative branches. Do correlation analysis between PFM and IV for different branches and store the results in the file, and readily access them for plotting again.
5. A guide to using the model fitter for parallel fitting of numpy array-style datasets. This one can be merged with number 

Somnath, Suhas's avatar
Somnath, Suhas committed
50
51
52
53
New features
------------
Core development
~~~~~~~~~~~~~~~~
Somnath, Suhas's avatar
Somnath, Suhas committed
54
55
56
57
* EVERY process tool should implement two new features:
  
  1. Check if the same process has been performed with the same paramters. When initializing the process, throw an exception. This is better than checking in the notebook stage.
  2. (Gracefully) Abort and resume processing.
Somnath, Suhas's avatar
Somnath, Suhas committed
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
  
* Legacy processes **MUST** extend Process:
  
  * sklearn wrapper classes:
  
    * Cluter
    * Decomposition
    * The computation will continue to be performed by sklearn. No need to use parallel_compute() or resume computation.
  
  * Own classes:
  
    * Image Windowing
    * Image Cleaning
    * As time permits, ensure that these can resume processing
  * All these MUST implement the check for previous computations at the very least
  
Somnath, Suhas's avatar
Somnath, Suhas committed
74
* Absorb functionality from Process into Model
Somnath, Suhas's avatar
Somnath, Suhas committed
75
* multi-node computing capability in parallel_compute
Somnath, Suhas's avatar
Somnath, Suhas committed
76
* Image cleaning should be (something like at the very least) a Process
Somnath, Suhas's avatar
Somnath, Suhas committed
77
* Bayesian GIV should actually be an analysis
Somnath, Suhas's avatar
Somnath, Suhas committed
78
* Demystify analyis / optimize. Use parallel_compute instead of optimize and guess_methods and fit_methods
Somnath, Suhas's avatar
Somnath, Suhas committed
79
* Data Generators
Somnath, Suhas's avatar
Somnath, Suhas committed
80
81
82
* Consistency in the naming of and placement of attributes (chan or meas group) in all translators - Some put attributes in the measurement level, some in the channel level! hyperspy appears to create datagroups solely for the purpose of organizing metadata in a tree structure! 
* Consider developing a generic curve fitting class a la `hyperspy <http://nbviewer.jupyter.org/github/hyperspy/hyperspy-demos/blob/master/Fitting_tutorial.ipynb>`_
* Improve visualization of file contents in print_tree() like hyperspy's `metadata <http://hyperspy.org/hyperspy-doc/current/user_guide/metadata_structure.html>`_
Somnath, Suhas's avatar
Somnath, Suhas committed
83

Somnath, Suhas's avatar
Somnath, Suhas committed
84
GUI
Somnath, Suhas's avatar
Somnath, Suhas committed
85
~~~~~~~~~~~
Somnath, Suhas's avatar
Somnath, Suhas committed
86
* Make the generic interactive visualizer for 3 and 4D float numpy arrays ROBUST
Somnath, Suhas's avatar
Somnath, Suhas committed
87

Somnath, Suhas's avatar
Somnath, Suhas committed
88
89
90
  * Allow slicing at the pycrodataset level to handle > 4D datasets - 20 mins
  * Need to handle appropriate reference values for the tick marks in 2D plots - 20 mins
  * Handle situation when only one position and one spectral axis are present. - low priority - 20 mins
Somnath, Suhas's avatar
Somnath, Suhas committed
91
* TRULY Generic visualizer in plot.lly / dash? that can use the PycroDataset class
Somnath, Suhas's avatar
Somnath, Suhas committed
92
93
94
*	Switch to using plot.ly and dash for interactive elements
*	Possibly use MayaVi for 3d plotting

Somnath, Suhas's avatar
Somnath, Suhas committed
95
Plot Utils
Somnath, Suhas's avatar
Somnath, Suhas committed
96
~~~~~~~~~
Somnath, Suhas's avatar
Somnath, Suhas committed
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
* _add_loop_parameters - is BE specific and should be moved out of plot_utils

* rainbow_plot - 

  1. pop cmap from kwargs instead of specifying camp as a separate argument. 
  2. Rename parameters from ax to axis, ao_vec to x_values, ai_vec to y_values. 
  3. Use same methodology from single_img_cbar_plot to add color bar. You will need to expect the figure handle as well for this.

* plot_line_family - 

  1. Rename x_axis parameter to something more sensible like x_values
  2. Remove c map as one of the arguments. It should come from kwargs
  3. Optional color bar (don’t show legend in this case)

* plot_map -combine this with single_img_cbar_plot

* single_img_cbar_plot - It is OK to spend a lot of time on single_img_cbar_plot and plot_map since these will be used HEAVILY for papers.

  1. Combine with plot_map
  2. allow the tick labels to be specified instead of just the x_size and y_size. 
  3. Rename this function to something more sensible
  4. Color bar should be shown by default

* plot_loops

  1. Allow excitation_waveform to also be a list - this will allow different x resolutions for each line family. 
  2. Apply appropriate x, y, label font sizes etc. This should look very polished and ready for publications
  3. Enable use of kwargs - to specify line widths etc.
  4. Ensure that the title is not crammed somewhere behind the subtitles

* Plot_complex_map_stack

  1. allow kwargs. 
  2. Use plot_map 
  3. Respect font sizes for x, y labels, titles - use new kwargs wherever necessary 
  4. Remove map as a kwarg
  5. Show color bars
  6. Possibly allow horizontal / vertical configurations? (Optional)

* plot_complex_loop_stack

  1. Respect font sizes for x, y labels, titles - use new kwargs wherever necessary 
  2. Allow individual plots sizes to be specified
  3. Allow **kwargs and pass two plot functions

* plotScree

  1. rename to plot_scree
  2. Use **kwargs on the plot function

* plot_map_stack:

Somnath, Suhas's avatar
Somnath, Suhas committed
149
150
151
  1. Respect tick, x label, y label, title, etc font sizes
  2. Add ability to manually specify x and y tick labels - see plot_cluster_results_together for inspiration
  3. See all other changes that were made for the image cleaning paper
Somnath, Suhas's avatar
Somnath, Suhas committed
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179

* plot_cluster_results_together

  1. Use plot_map and its cleaner color bar option
  2. Respect font sizes
  3. Option to use a color bar for the centroids instead of a legend - especially if number of clusters > 7
  4. See mode IV paper to see other changes

* plot_cluster_results_separate
  
  1. Use same guidelines as above

* plot_cluster_dendrogram - this function has not worked recently to my knowledge. Fortunately, it is not one of the more popular functions so it gets low priority for now. Use inspiration from image cleaning paper

* plot_1d_spectrum

  1. Respect font sizes
  2. Do not save figure here. This should be done in the place where this function is called
  3. Use **kwargs and pass to the plot functions
  4. Title should be optional

* plot_2d_spectrogram

  1. Respect font sizes
  2. Use plot_map - show color bar
  3. Don’t allow specification of figure_path here. Save elsewhere

* plot_histograms - not used frequently. Can be ignored for this pass
Somnath, Suhas's avatar
Somnath, Suhas committed
180
181
Examples / Tutorials

Somnath, Suhas's avatar
Somnath, Suhas committed
182
183
184
185
186
187
188
189
190
191
External user contributions
~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Li Xin classification code 
* Ondrej Dyck’s atom finding code – written but needs work before fully integrated
* Nina Wisinger’s processing code (Tselev) – in progress
* Sabine Neumeyer's cKPFM code
* Iaroslav Gaponenko's Distort correct code from - https://github.com/paruch-group/distortcorrect.
* Port everything from IFIM Matlab -> Python translation exercises
* Other workflows/functions that already exist as scripts or notebooks

Somnath, Suhas's avatar
Somnath, Suhas committed
192
193
194
195
196
197
198
199
Formatting changes
------------------
*	Fix remaining PEP8 problems
*	Ensure code and documentation is standardized
*	Classes and major Functions should check to see if the results already exist

Notebooks
---------
Somnath, Suhas's avatar
Somnath, Suhas committed
200
*	Investigate using JupyterLab
Somnath, Suhas's avatar
Somnath, Suhas committed
201
202
203
204
205
206

Testing
-------
*	Write test code
*	Unit tests for simple functions
*	Longer tests using data (real or generated) for the workflow tests
Somnath, Suhas's avatar
Somnath, Suhas committed
207
208
209
210
211
212
213
214
*  measure coverage using codecov.io and codecov package

Software Engineering
--------------------
* Consider releasing bug fixes (to onsite CNMS users) via git instead of rapid pypi releases 
   * example release steps (incl. git tagging): https://github.com/cesium-ml/cesium/blob/master/RELEASE.txt
* Use https://docs.pytest.org/en/latest/ instead of nose (nose is no longer maintained)
* Add requirements.txt
Somnath, Suhas's avatar
Somnath, Suhas committed
215
* Consider facilitating conda installation in addition to pypi
Somnath, Suhas's avatar
Somnath, Suhas committed
216
217
218
219
220
221
222
223
224
225

Scaling to clusters
-------------------
We have two kinds of large computational jobs and one kind of large I/O job:

* I/O - reading and writing large amounts of data
   * Dask and MPI are compatible. Spark is probably not
* Computation
   1. Machine learning and Statistics
   
Somnath, Suhas's avatar
Somnath, Suhas committed
226
      1.1. Use custom algorithms developed for BEAM
Somnath, Suhas's avatar
Somnath, Suhas committed
227
228
229
         * Advantage - Optimized (and tested) for various HPC environments
         * Disadvantages:
            * Need to integarate non-python code
Somnath, Suhas's avatar
Somnath, Suhas committed
230
            * We only have a handful of these. NOT future compatible            
Somnath, Suhas's avatar
Somnath, Suhas committed
231
      1.2. OR continue using a single FAT node for these jobs
Somnath, Suhas's avatar
Somnath, Suhas committed
232
         * Advantages:
Somnath, Suhas's avatar
Somnath, Suhas committed
233
234
            * No optimization required
            * Continue using the same scikit learn packages
Somnath, Suhas's avatar
Somnath, Suhas committed
235
         * Disadvantage - Is not optimized for HPC
Somnath, Suhas's avatar
Somnath, Suhas committed
236
237
238
239
240
241
242
       1.3. OR use pbdR / write pbdPy (wrappers around pbdR)
         * Advantages:
            * Already optimized / mature project
            * In-house project (good support) 
         * Disadvantages:
            * Dependant on pbdR for implementing new algorithms
            
Somnath, Suhas's avatar
Somnath, Suhas committed
243
244
245
246
247
   2. Parallel parametric search - analyze subpackage and some user defined functions in processing. Can be extended using:
   
      * Dask - An inplace replacement of multiprocessing will work on laptops and clusters. More elegant and easier to write and maintain compared to MPI at the cost of efficiency
         * simple dask netcdf example: http://matthewrocklin.com/blog/work/2016/02/26/dask-distributed-part-3
      * MPI - Need alternatives to Optimize / Process classes - Better efficiency but a pain to implement