ToDo.rst 7.39 KB
Newer Older
Somnath, Suhas's avatar
Somnath, Suhas committed
1
2
.. contents::

Somnath, Suhas's avatar
Somnath, Suhas committed
3
4
5
6
7
Roadmap / Milestones
--------------------
1. Sep 2017 end - Cleaned versions of the main modules (Analysis pending) + enough documentation for users and developers
2. Oct 2017 mid - Multi-node compute capability

Somnath, Suhas's avatar
Somnath, Suhas committed
8
9
10
11
New features
------------
Core development
~~~~~~~~~~~~~~~~
12
13
* Finish PycroDataset and test the many data slicing, referencing operations on **main** datasets. Essentially, the goal is to turn the **main** datasets into powerful python objects that obviate the need for users to dig into ancillary datasets to slice, understand the datasets. 
* Generic visualizer in plot.lly / dash? that can use the PycroDataset class
Somnath, Suhas's avatar
Somnath, Suhas committed
14
15
16
17
18
19
20
21
   * One suggestion is 2 (or more panes). 
         * Left hand side for positions
               * 1D lines or 2D images
               * Ability to select individual pixels, points within a polygon.
               * What quantity to display for these images? Select one within P fields for compound datasets. Perhaps we need sliders / dropdowns for all spectral dimensions here to for the user to slices?
         * Right hand side for spectral
               * 1D spectra or 2D images. 
               * Users will be asked to slice N-1 or N-2 spectral dimensions
22
* Simplify and demystify analyis / optimize. Use parallel_compute instead of optimize and gues_methods and fit_methods
Somnath, Suhas's avatar
Somnath, Suhas committed
23
* multi-node computing capability in parallel_compute
Somnath, Suhas's avatar
Somnath, Suhas committed
24
* Data Generators
25

Somnath, Suhas's avatar
Somnath, Suhas committed
26
27
28
External user contributions
~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Li Xin classification code 
Somnath, Suhas's avatar
Somnath, Suhas committed
29
30
31
32
* Ondrej Dyck’s atom finding code – written but needs work before fully integrated
* Nina Wisinger’s processing code (Tselev) – in progress
* Sabine Neumeyer's cKPFM code
* Iaroslav Gaponenko's Distort correct code from - https://github.com/paruch-group/distortcorrect.
33
* Port everything from IFIM Matlab -> Python translation exercises
Somnath, Suhas's avatar
Somnath, Suhas committed
34
35
36
37
38
39
40
41
42
* Other workflows/functions that already exist as scripts or notebooks

Plotting updates
----------------
*	Switch to using plot.ly and dash for interactive elements
*	Possibly use MayaVi for 3d plotting

Examples / Tutorials
--------------------
43
44
45
46
Short tutorials on how to use pycroscopy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Access h5 files
* Find a specific dataset/group in the file
Somnath, Suhas's avatar
Somnath, Suhas committed
47
* Select data within a dataset in various ways
48
49
50
51
* micro datasets / microdata groups
* chunking the main dataset
* Links to tutorials on how to use pycharm, Git, 

Somnath, Suhas's avatar
Somnath, Suhas committed
52
Longer examples (via specific scientific usecases)
Somnath, Suhas's avatar
Somnath, Suhas committed
53
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Somnath, Suhas's avatar
Somnath, Suhas committed
54
Done:
Somnath, Suhas's avatar
Somnath, Suhas committed
55

56
* Data formatting in pycroscopy
Somnath, Suhas's avatar
Somnath, Suhas committed
57
58
59
60
* How to write a Translator
* How to write (back) to H5
* Spectral Unmixing with pycroscopy
* Basic introduction to loading data in pycroscopy
61
62
* Handling multidimensional (6D) datasets
* Visualizing data (interactively using widgets) (needs some tiny automation in the end)
Somnath, Suhas's avatar
Somnath, Suhas committed
63
64

Pending:
Somnath, Suhas's avatar
Somnath, Suhas committed
65

66
67
68
* How to write your write your own parallel computing function using the process module - add more documentation
* How to write your own analysis class based on the (to-be simplified) Model class
* How to use the PycroDataset object
Somnath, Suhas's avatar
Somnath, Suhas committed
69
* A tour of the many functions in hdf_utils and io_utils since these functions need data to show / explain them.
Somnath, Suhas's avatar
Somnath, Suhas committed
70
* pycroscopy pacakge organization - a short writeup on what is where and differences between the process / analyis submodules
Somnath, Suhas's avatar
Somnath, Suhas committed
71

Somnath, Suhas's avatar
Somnath, Suhas committed
72
Rama's (older and more applied / specific) tutorial goals
Somnath, Suhas's avatar
Somnath, Suhas committed
73
74
75
76
77
~~~~~~~~~~~~~~~~~~~~
1. Open a translated and fitted FORC-PFM file, and plot the SHO Fit from cycle k corresponding to voltage p, along with the raw spectrogram for that location and the SHO guess. Plot both real and imaginary, and do so for both on and off-field.
2. Continuing above, determine the average of the quality factor coming from cycles 1,3,4 for spatial points stored in vector b for the on-field part for a predetermined voltage range given by endpoints [e,f]. Compare the results with the SHO guess and fit for the quality factor.
3. After opening a h5 file containing results from a relaxation experiment, plot the response at a particular point and voltage, run exponential fitting and then store the results of the fit in the same h5 file using iohdf and/or numpy translators.
4. Take a FORC IV ESM dataset and break it up into forward and reverse branches, along with positive and negative branches. Do correlation analysis between PFM and IV for different branches and store the results in the file, and readily access them for plotting again.
78
5. A guide to using the model fitter for parallel fitting of numpy array-style datasets. This one can be merged with number 
Somnath, Suhas's avatar
Somnath, Suhas committed
79

Somnath, Suhas's avatar
Somnath, Suhas committed
80
81
Documentation
-------------
82
*	Switch from static examples to dynamic jupyter notebook like examples:
Somnath, Suhas's avatar
Somnath, Suhas committed
83
84
85
86
87
   * http://scikit-image.org/docs/dev/auto_examples/ 
   * http://scikit-learn.org/stable/auto_examples/index.html 
   * more complicated analyses -  http://nipy.org/dipy/examples_index.html
   * Done for existing documentation
   * Work will be needed after examples are done
Somnath, Suhas's avatar
Somnath, Suhas committed
88
89
90
91
92
93
94
95
96
97
98
99
100
101
*	Include examples in documentation

Formatting changes
------------------
*	Fix remaining PEP8 problems
*	Ensure code and documentation is standardized
*	Switch to standard version formatting
*	Classes and major Functions should check to see if the results already exist

Notebooks
---------
*	Direct downloading of notebooks (ipynb an html)
  * nbviewer?
  * Host somewhere other than github?
Somnath, Suhas's avatar
Somnath, Suhas committed
102
*	Investigate using JupyterLab
Somnath, Suhas's avatar
Somnath, Suhas committed
103
104
105
106
107
108

Testing
-------
*	Write test code
*	Unit tests for simple functions
*	Longer tests using data (real or generated) for the workflow tests
Somnath, Suhas's avatar
Somnath, Suhas committed
109
110
111
112
113
114
115
116
*  measure coverage using codecov.io and codecov package

Software Engineering
--------------------
* Consider releasing bug fixes (to onsite CNMS users) via git instead of rapid pypi releases 
   * example release steps (incl. git tagging): https://github.com/cesium-ml/cesium/blob/master/RELEASE.txt
* Use https://docs.pytest.org/en/latest/ instead of nose (nose is no longer maintained)
* Add requirements.txt
Somnath, Suhas's avatar
Somnath, Suhas committed
117
* Consider facilitating conda installation in addition to pypi
Somnath, Suhas's avatar
Somnath, Suhas committed
118
119
120
121
122
123
124
125
126
127

Scaling to clusters
-------------------
We have two kinds of large computational jobs and one kind of large I/O job:

* I/O - reading and writing large amounts of data
   * Dask and MPI are compatible. Spark is probably not
* Computation
   1. Machine learning and Statistics
   
Somnath, Suhas's avatar
Somnath, Suhas committed
128
      1.1. Use custom algorithms developed for BEAM
Somnath, Suhas's avatar
Somnath, Suhas committed
129
130
131
         * Advantage - Optimized (and tested) for various HPC environments
         * Disadvantages:
            * Need to integarate non-python code
Somnath, Suhas's avatar
Somnath, Suhas committed
132
            * We only have a handful of these. NOT future compatible            
Somnath, Suhas's avatar
Somnath, Suhas committed
133
      1.2. OR continue using a single FAT node for these jobs
Somnath, Suhas's avatar
Somnath, Suhas committed
134
         * Advantages:
Somnath, Suhas's avatar
Somnath, Suhas committed
135
136
            * No optimization required
            * Continue using the same scikit learn packages
Somnath, Suhas's avatar
Somnath, Suhas committed
137
         * Disadvantage - Is not optimized for HPC
Somnath, Suhas's avatar
Somnath, Suhas committed
138
139
140
141
142
143
144
       1.3. OR use pbdR / write pbdPy (wrappers around pbdR)
         * Advantages:
            * Already optimized / mature project
            * In-house project (good support) 
         * Disadvantages:
            * Dependant on pbdR for implementing new algorithms
            
Somnath, Suhas's avatar
Somnath, Suhas committed
145
146
147
148
149
   2. Parallel parametric search - analyze subpackage and some user defined functions in processing. Can be extended using:
   
      * Dask - An inplace replacement of multiprocessing will work on laptops and clusters. More elegant and easier to write and maintain compared to MPI at the cost of efficiency
         * simple dask netcdf example: http://matthewrocklin.com/blog/work/2016/02/26/dask-distributed-part-3
      * MPI - Need alternatives to Optimize / Process classes - Better efficiency but a pain to implement