ToDo.rst 7.3 KB
Newer Older
Somnath, Suhas's avatar
Somnath, Suhas committed
1
2
.. contents::

Somnath, Suhas's avatar
Somnath, Suhas committed
3
4
5
6
7
Roadmap / Milestones
--------------------
1. Sep 2017 end - Cleaned versions of the main modules (Analysis pending) + enough documentation for users and developers
2. Oct 2017 mid - Multi-node compute capability

Somnath, Suhas's avatar
Somnath, Suhas committed
8
9
10
11
New features
------------
Core development
~~~~~~~~~~~~~~~~
12
13
14
15
* A new class (pycro_data?) for simplifying the many data slicing, referencing operations on **main** datasets.
    * Essentially, the goal is to turn the **main** datasets into powerful python objects that obviate the need for users to dig into ancillary datasets to slice, understand the datasets. Pycroscopy chooses to use a rather generalized representation of data at the cost of simplictiy. This object should bring back the simplicity of accessing the data. 
    * In the process of enabling greater insight into a dataset, this class would read and analyze ancillary datasets once and reuse this knowledge when the user requests another operation (that most likely also requires references to ancillary datasets etc. anyway).
    * Nearly all the functionality has been implemented in hdf_utils and some in io_utils. This class can simply reuse these general functions.
Somnath, Suhas's avatar
Somnath, Suhas committed
16
* Generic visualizer in plot.lly / dash? that can use the pycrodata class
17
* Simplify and demystify analyis / optimize. Use parallel_compute (joblib instead of multiprocessing)
Somnath, Suhas's avatar
Somnath, Suhas committed
18
* multi-node computing capability in parallel_compute
Somnath, Suhas's avatar
Somnath, Suhas committed
19
* Data Generators
20

Somnath, Suhas's avatar
Somnath, Suhas committed
21
22
23
External user contributions
~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Li Xin classification code 
Somnath, Suhas's avatar
Somnath, Suhas committed
24
25
26
27
* Ondrej Dyck’s atom finding code – written but needs work before fully integrated
* Nina Wisinger’s processing code (Tselev) – in progress
* Sabine Neumeyer's cKPFM code
* Iaroslav Gaponenko's Distort correct code from - https://github.com/paruch-group/distortcorrect.
28
* Port everything from IFIM Matlab -> Python translation exercises
Somnath, Suhas's avatar
Somnath, Suhas committed
29
30
31
32
33
34
35
36
37
* Other workflows/functions that already exist as scripts or notebooks

Plotting updates
----------------
*	Switch to using plot.ly and dash for interactive elements
*	Possibly use MayaVi for 3d plotting

Examples / Tutorials
--------------------
38
39
40
41
Short tutorials on how to use pycroscopy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Access h5 files
* Find a specific dataset/group in the file
Somnath, Suhas's avatar
Somnath, Suhas committed
42
* Select data within a dataset in various ways
43
44
45
46
* micro datasets / microdata groups
* chunking the main dataset
* Links to tutorials on how to use pycharm, Git, 

Somnath, Suhas's avatar
Somnath, Suhas committed
47
Longer examples (via specific scientific usecases)
Somnath, Suhas's avatar
Somnath, Suhas committed
48
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Somnath, Suhas's avatar
Somnath, Suhas committed
49
Done:
Somnath, Suhas's avatar
Somnath, Suhas committed
50

51
* Data formatting in pycroscopy
Somnath, Suhas's avatar
Somnath, Suhas committed
52
53
54
55
* How to write a Translator
* How to write (back) to H5
* Spectral Unmixing with pycroscopy
* Basic introduction to loading data in pycroscopy
56
57
* Handling multidimensional (6D) datasets
* Visualizing data (interactively using widgets) (needs some tiny automation in the end)
Somnath, Suhas's avatar
Somnath, Suhas committed
58
* How to write your write your own parallel computing function using the process module
Somnath, Suhas's avatar
Somnath, Suhas committed
59
60

Pending:
Somnath, Suhas's avatar
Somnath, Suhas committed
61

Somnath, Suhas's avatar
Somnath, Suhas committed
62
* How to write your own analysis class
Somnath, Suhas's avatar
Somnath, Suhas committed
63
* A tour of the many functions in hdf_utils and io_utils since these functions need data to show / explain them.
Somnath, Suhas's avatar
Somnath, Suhas committed
64
* pycroscopy pacakge organization - a short writeup on what is where and differences between the process / analyis submodules
Somnath, Suhas's avatar
Somnath, Suhas committed
65

Somnath, Suhas's avatar
Somnath, Suhas committed
66
Rama's (older and more applied / specific) tutorial goals
Somnath, Suhas's avatar
Somnath, Suhas committed
67
68
69
70
71
~~~~~~~~~~~~~~~~~~~~
1. Open a translated and fitted FORC-PFM file, and plot the SHO Fit from cycle k corresponding to voltage p, along with the raw spectrogram for that location and the SHO guess. Plot both real and imaginary, and do so for both on and off-field.
2. Continuing above, determine the average of the quality factor coming from cycles 1,3,4 for spatial points stored in vector b for the on-field part for a predetermined voltage range given by endpoints [e,f]. Compare the results with the SHO guess and fit for the quality factor.
3. After opening a h5 file containing results from a relaxation experiment, plot the response at a particular point and voltage, run exponential fitting and then store the results of the fit in the same h5 file using iohdf and/or numpy translators.
4. Take a FORC IV ESM dataset and break it up into forward and reverse branches, along with positive and negative branches. Do correlation analysis between PFM and IV for different branches and store the results in the file, and readily access them for plotting again.
72
5. A guide to using the model fitter for parallel fitting of numpy array-style datasets. This one can be merged with number 
Somnath, Suhas's avatar
Somnath, Suhas committed
73

Somnath, Suhas's avatar
Somnath, Suhas committed
74
75
Documentation
-------------
76
*	Switch from static examples to dynamic jupyter notebook like examples:
Somnath, Suhas's avatar
Somnath, Suhas committed
77
78
79
80
81
   * http://scikit-image.org/docs/dev/auto_examples/ 
   * http://scikit-learn.org/stable/auto_examples/index.html 
   * more complicated analyses -  http://nipy.org/dipy/examples_index.html
   * Done for existing documentation
   * Work will be needed after examples are done
Somnath, Suhas's avatar
Somnath, Suhas committed
82
83
84
85
86
87
88
89
90
91
92
93
94
95
*	Include examples in documentation

Formatting changes
------------------
*	Fix remaining PEP8 problems
*	Ensure code and documentation is standardized
*	Switch to standard version formatting
*	Classes and major Functions should check to see if the results already exist

Notebooks
---------
*	Direct downloading of notebooks (ipynb an html)
  * nbviewer?
  * Host somewhere other than github?
Somnath, Suhas's avatar
Somnath, Suhas committed
96
*	Investigate using JupyterLab
Somnath, Suhas's avatar
Somnath, Suhas committed
97
98
99
100
101
102

Testing
-------
*	Write test code
*	Unit tests for simple functions
*	Longer tests using data (real or generated) for the workflow tests
Somnath, Suhas's avatar
Somnath, Suhas committed
103
104
105
106
107
108
109
110
*  measure coverage using codecov.io and codecov package

Software Engineering
--------------------
* Consider releasing bug fixes (to onsite CNMS users) via git instead of rapid pypi releases 
   * example release steps (incl. git tagging): https://github.com/cesium-ml/cesium/blob/master/RELEASE.txt
* Use https://docs.pytest.org/en/latest/ instead of nose (nose is no longer maintained)
* Add requirements.txt
Somnath, Suhas's avatar
Somnath, Suhas committed
111
* Consider facilitating conda installation in addition to pypi
Somnath, Suhas's avatar
Somnath, Suhas committed
112
113
114
115
116
117
118
119
120
121

Scaling to clusters
-------------------
We have two kinds of large computational jobs and one kind of large I/O job:

* I/O - reading and writing large amounts of data
   * Dask and MPI are compatible. Spark is probably not
* Computation
   1. Machine learning and Statistics
   
Somnath, Suhas's avatar
Somnath, Suhas committed
122
      1.1. Use custom algorithms developed for BEAM
Somnath, Suhas's avatar
Somnath, Suhas committed
123
124
125
         * Advantage - Optimized (and tested) for various HPC environments
         * Disadvantages:
            * Need to integarate non-python code
Somnath, Suhas's avatar
Somnath, Suhas committed
126
            * We only have a handful of these. NOT future compatible            
Somnath, Suhas's avatar
Somnath, Suhas committed
127
      1.2. OR continue using a single FAT node for these jobs
Somnath, Suhas's avatar
Somnath, Suhas committed
128
         * Advantages:
Somnath, Suhas's avatar
Somnath, Suhas committed
129
130
            * No optimization required
            * Continue using the same scikit learn packages
Somnath, Suhas's avatar
Somnath, Suhas committed
131
         * Disadvantage - Is not optimized for HPC
Somnath, Suhas's avatar
Somnath, Suhas committed
132
133
134
135
136
137
138
       1.3. OR use pbdR / write pbdPy (wrappers around pbdR)
         * Advantages:
            * Already optimized / mature project
            * In-house project (good support) 
         * Disadvantages:
            * Dependant on pbdR for implementing new algorithms
            
Somnath, Suhas's avatar
Somnath, Suhas committed
139
140
141
142
143
   2. Parallel parametric search - analyze subpackage and some user defined functions in processing. Can be extended using:
   
      * Dask - An inplace replacement of multiprocessing will work on laptops and clusters. More elegant and easier to write and maintain compared to MPI at the cost of efficiency
         * simple dask netcdf example: http://matthewrocklin.com/blog/work/2016/02/26/dask-distributed-part-3
      * MPI - Need alternatives to Optimize / Process classes - Better efficiency but a pain to implement