Commits · f7901d1924e0cb721b2f5bc83a2e25fa3a9c9736 · mantidproject / mantid

This project is mirrored from https://github.com/mantidproject/mantid.git. Pull mirroring updated 28 minutes ago.

Jan 20, 2020

Abins: update Gaussian, DMOL3 test data · f7901d19
Adam J. Jackson authored 5 years ago
```
See commit notes for CASTEP data update
```
f7901d19
Update Abins CRYSTAL test data · fae3e5a0
Adam J. Jackson authored 5 years ago
```
Reference files for data import need updating for new advanced params
```
fae3e5a0

Update Abins tests, fix CASTEP tests · a718f818

Adam J. Jackson authored 5 years ago

Test data needs updating due to changes in the Advanced
Parameters. Have eyeballed some of the file changes and the changes
seem fine. (There's also an increase in the written precision of atom
positions, but that will be related to the new test data writer.)

CASTEP data updated here, other codes to follow.

a718f818

Abins Instrument: use NotImplementedError for method stubs · bd221efd

Adam J. Jackson authored 5 years ago

Abins uses a base class for Instruments, which defines a couple of
methods returning None. None is not the right response if these
methods were not adapted to a child class; use NotImplementedError to
mark these more clearly.

bd221efd

Abins Broadening: remove non-optional kwargs · 8d999994
Adam J. Jackson authored 5 years ago

8d999994
Abins broadening: Release notes · 7a801abe
Adam J. Jackson authored 5 years ago
```
(And a typo in one of the test cases.)
```
7a801abe

Abins interpolated broadening: tolerate direction changes in sigma · 7c26355b

Adam J. Jackson authored 5 years ago

The initial implementation assumed that the input series of sigma
values would monotonically increase, which is reasonable for TOSCA but
not for other instruments.

The logic is modified so that instead of
the input data, the list of sigma values used for convolution is
required to be sorted. As this list was generated within the same
function and ranges from (min(sigma), max(sigma)), the condition is
always satisfied.

The unit tests are not affected, but there may be a small numerical
difference as another implementation detail has been fixed in the
process; the interpolation x-values ('sigma factors') should be the
ratio between sigma and sigma at the lower bound of the current
window. In the initial implementation this was mistakenly identified
as sigma_chunk[0], the left-most value in the block _out of the actual
sigma values_; it should be the exact convolved width from `sigma_samples`.

7c26355b

Update Abins SystemTest data for new broadening · 0b895043
Adam J. Jackson authored 5 years ago

0b895043

Abins refactoring: provide method to generate spowder test data · 97891384

Adam J. Jackson authored 5 years ago

A new method is provided in AbinsModules.CalculateS to generate a test
data file for the AbinsCalculateSPowder test. This format isn't great
and should be refined to something like standard JSON, but at least we
now have a standard way of generating it. The input options are a few
calculation parameters and a precalculated AbinsData object.

To make these objects easier to instantiate, a from_calculation_data()
classmethod is added. The logic for identifying and calling a file
loader has been moved from the main Abins algorithm into AbinsData to
support this without redundancy. It may make sense to move this again
to somewhere else, e.g. a new *AbinsModules/Loaders/__init__.py* but
for now this is close to the point of use.

Equivalence of this method's output to the existing test file has been
verified manually for the Si2 order-1 test data. The reference file
does not need to be updated, and tests should continue to pass.

97891384

Abins: replace CalculateSPowder test data for new sampling scheme · d7ed1e8b

Adam J. Jackson authored 5 years ago

The sample data for the CalculateSPowder test needs to be updated as
part of the work on broadening, because Abins is now producing
different results. Specifically: the instrumental broadening function
applied to the S data has been replaced and is now smooth; the
reference frequencies are now defined to be in the middle of the
histogram bins rather than at the edges. This difference in binning
conventions has actually changed the size of the results array (by 1!)

The format of the sample data for CalculateSPowder is not very pretty;
it is some kind of raw text dump. It would be nice to avoid such
things with a proper JSON serialisation setup, but for now the
priority is to fix the test in a way that is comparable with previous
behaviour. The process used to generate the replacement file was to
obtain the calculation data with

AbinsModules.CalculateS.init(...args...).get_formatted_data()

and dump the text representation to a file with

numpy.set_printoptions(threshold=numpy.nan)
with open('Si2-sc-CalculateSPowder_S.txt', 'w') as f:
f.write(str(calculated_data.extract()))

The resulting file appears more neatly formatted than its predecessor
4b77bac3f8c1dc54f63bd41ca3075c48, but seems acceptable to the test parser.

d7ed1e8b

Abins: fix advanced params test · 442fdfaa
Adam J. Jackson authored 5 years ago

442fdfaa
Abins: move tests into a directory to de-clutter scripts/tests · be3545eb
Adam J. Jackson authored 5 years ago

be3545eb
Abins docs: how to set up loader tests · 11245394
Adam J. Jackson authored 5 years ago

11245394
Abins broadening: tests for broadening kernels · b082a36c
Adam J. Jackson authored 5 years ago

b082a36c

Abins broadening: rework normalisation to work with extreme values · e216c68b

Adam J. Jackson authored 5 years ago

Normalise Gaussians by multiplying by bin-width instead of dividing by
sum(values). The sum approach can go badly wrong if values lie near or
beyond the sampling range, as as tail fragment (or noise floor!) would
be amplified to unity. Scaling by the theoretical value as we do here
has a different downside: truncated broadening kernels will not quite
sum up to 1, so a little intensity is lost in broadening. This error
is more tolerable, especially as it can be decreased by extending the
truncation range.

e216c68b

Abins Broadening: add tests for broaden_spectrum schemes · 5d341892
Adam J. Jackson authored 5 years ago

5d341892

Abins dev docs: move to Concepts, take advantage of plot:: RST magic · 74ab0608

Adam J. Jackson authored 5 years ago

To avoid polluting the dev-docs with algorithm-specific info and to
avoid storing/updating many image files the Abins development notes
are moved back into Concepts. The fast broadening method is broken
into a separate document to avoid too much wading through plots.

74ab0608

Abins dev docs: aliasing image; deprecation to top; wording tweaks · 802dbc13

Adam J. Jackson authored 5 years ago

Move deprecation plans to the top of this document to make them more
visible. The broadening section is a little rambling and will probably
remain so until the theory is laid out more formally.

Images could use a few tweaks (more consistent sizing, remove
unnecessary x values) but I don't want to fill the Git history with
small changes to image files.

802dbc13

Abins dev docs: schematic illustrating fast broadening scheme · aee7c553
Adam J. Jackson authored 5 years ago

aee7c553
Abins broadening: avoid unreliable float comparison · d0f4cb5d
Adam J. Jackson authored 5 years ago

d0f4cb5d

Write-up Abins interpolated broadening scheme for developer docs · bc989657

Adam J. Jackson authored 5 years ago

Figures are included as image files and the code to generate them is
included as a comment. This doesn't seem ideal as they may be revised
and bloat the repo (or become outdated); is there any workflow to
generate images on the fly when building the docs?

The section on interpolation method is a bit long and should perhaps
be spun out into another document.

bc989657

Move Abins details to dev docs and expand content. · a4586f41
Adam J. Jackson authored 5 years ago

a4586f41
Abins broadening: flake8 tidy-up · 67743e98
Adam J. Jackson authored 5 years ago

67743e98

Abins broadening: null method, remove legacy, clean up · 5e08904c

Adam J. Jackson authored 5 years ago

- The 'legacy' broadening method is removed. It is not recommended for
  use in any circumstance.
- A 'none' option is provided which simply resamples, for
  reference/test purposes.
- Some other broadening schemes are consolidated:
  - 'gaussian_trunc' method is removed and the superior
    'gaussian_trunc_windowed' is renamed to 'gaussian_truncated'
  - The different summation approaches are no longer exposed to the
    choice of broadening method; a default 'auto' choice selects between
    the two summation schemes based on the size of the data.
- An 'auto' broadening scheme is introduced, and from here all Abins
  Instruments are expected to provide one. For TOSCA it switches
  between 'gaussian_truncated' and 'interpolate' depending on the
  number of input frequencies.
- Responsibility for deciding to resample is best located at the
  Instrument level, as this needs to trigger the calculation of an
  appropriate set of broadening widths (sigma).
- The broadening method selection is now drawn from AbinsParameters
  during SPowderSemiEmpiricalCalculator; the motivation here is to move
  'Abins state' information as high up in the program as appropriate,
  so that information about what modes/settings are used are generally
  determined by function arguments. This is more maintainable as it
  allows lower-level objects/functions to be tested independently of a
  full Abins run.

Performance tests are interesting at this point. Testing
_broaden_spectrum_ with some synthetic data:

| Scheme             | Average time | Max time |
|--------------------+--------------+----------|
| none               |       0.0002 |   0.0005 |
| interpolate        |       0.0035 |   0.0046 |
| interpolate_coarse |       0.0029 |   0.0034 |
| gaussian_truncated |       0.0241 |   0.0269 |
| normal_truncated   |       0.0245 |   0.0257 |
| gaussian           |       0.2760 |   0.2794 |
| normal             |       0.2422 |   0.2525 |

So there is a clear heirarchy of methods with order-of-magnitude
performance differences.

Comparing full runs of Abins on crystalline benzene up to 3rd-order:

| Scheme             |  time |
|--------------------+-------|
| none               | 12.46 |
| interpolate        | 21.85 |
| interpolate_coarse | 21.28 |
| auto               | 21.78 |
| gaussian_truncated | 28.22 |
| normal_truncated   | 27.94 |

If we assume that 'none' is essentially pure overhead, then
'interpolate' costs around 10s... and the other methods are less than
twice as expensive. Where has the performance difference gone?

Try another system: Toluene molecule up to 4th order

| Scheme             |  time |
|--------------------+-------|
| none               | 1.905 |
| interpolate        | 3.188 |
| interpolate_coarse | 2.844 |
| auto               | 2.723 |
| gaussian_truncated | 5.887 |
| normal_truncated   | 5.943 |
| gaussian           | 57.81 |
| normal             | 50.53 |

Interesting that "normal" is quicker than "gaussian" in this case; the
"max time" column of the faster test has generally pointed to this
method being more _consistently_ fast than Gaussian even if slower on
_average_. I suspect some kind of caching magic.

"Auto" has beaten "interpolate" this time; this could be due to a
saving on the first order which doesn't really need binning. In any
case, it's disappointing that the factor-10 improvement from
interpolation seems to reduce to a factor-2 in practice. Perhaps
further optimisation is possible, but there may also be a
system-dependence at work. (The smaller test uses data randomly
distributed over the whole spectrum, while real data will cluster.)

In any case, with 'auto' we seem to have a strategy that brings the
cost of broadening roughly into line with other overheads in the
program.

There is a known route by which it might be possible to optimise
'interpolate' further: limiting the range over which each convolution
is performed. This could be quite complicated in practice, and there
are likely to be more fruitful avenues for improving the performance
of Abins. (e.g. combining data from different atoms before broadening.)

5e08904c

Abins broadening: remove unused extra kernel in interpolated scheme · a23c7209

Adam J. Jackson authored 5 years ago

The number of kernels was accidentally rounded up twice, leading to an
empty chunk which needed to be accounted for. Removing this seems to
have no impact on the results.

a23c7209

Abins broadening: refactor interpolation, apply PEP8 · cec6ae97

Adam J. Jackson authored 5 years ago

Fast interpolated broadening is now moved into a more
general/extensible function, including ascii-art documentation.
sqrt(2) spacing is implemented, and initial testing shows that this
noticably improves accuracy for little extra cost. Set this as the
default, and for now leave in the coarser option for test purposes.

A flag is also provided to skip rebinning inside this function; this
potentially avoids and expensive redundant operation but makes for
convoluted logic when this rebin is also considered inside the
Instrument. A cleaner scheme should be possible, in which it is
obvious where rebinning should happen.

cec6ae97

Abins broadening refactor · c02db5ba

Adam J. Jackson authored 5 years ago

Clean up some of the experimental broadening schemes into a more
systematic framework. Add comments and ASCII art to explain the
windowing procedure.

c02db5ba

Abins broadening - normalisation and tidying · 5fc365c1

Adam J. Jackson authored 5 years ago

A normalisation term was found in the Gaussian function which scales
using Sigma and fwhm to fix the maximum value of the peak as sigma is
varied. This does not appear to be consistent with the function
defined in the Abins paper, and leads to an increasing weight for
high-energy terms. This term is removed.

Instead a normalisation feature is added which normalises the sum of
the output values to one. This is required for a convolution kernel
that does not change the overall intensity, and removes any dependence
on the bin width. The cost of this process is not insignificant; it
may be more efficient to use the analytic integral ("normal" function)
which is 'pre-normalised'. Testing with different bin widths finds
that the differences in the resulting curves are negligible at typical
bin width and with large bins the results, while different, are not
obviously superior or inferior.

Work is started on refactoring the broadening methods that use
truncated functions by providing a truncation function that accepts
the `gaussian` or `normal` function as an input.

5fc365c1

Abins: Refactor into new file AbinsModules/Instruments/Broadening.py · e62a7283

Adam J. Jackson authored 5 years ago

Work in progress on the broadening features of Abins.

The Instrument class is becoming bloated with generic broadening
functions so these are moved into their own module. In contrast with
most of Abins, this module is not object-oriented and consists of
top-level functions; these are primarily for use by Instrument
implementations.

The "legacy" broadening logic is removed entirely and this label is now
used for the implementation of the legacy behaviour within the new
broadening logic (replacing the "legacy_plus") label.

Several redundant broadening implementations are currently in place
while their performance/scaling is tested.

e62a7283

Abins: proof-of-concept implementation for interpolated broadening · 157aa731

Adam J. Jackson authored 5 years ago

This is a rough implementation of a new scheme for frequency-dependent
broadening. The instrumental broadening function for TOSCA has an
energy-dependent width; while there are several ways of sequencing and
implementating the process, it generally requires a fresh Gaussian to
be evaluated at every frequency bin and those Gaussians to be combined
in a weighted sum.

We note that a Gaussian function may be approximated by a linear
combination of Gaussians with width parameters (sigma) that bracket
the target and are not too far away. An error of ~1% may be reached by
limiting the sigma span to a factor of sqrt(2), while a factor of two
gives error of ~5%. Given the relevant range of widths for
instrumental broadening, a suitable collection of broadening functions
may be obtained with just a few logarithmically-spaced function
evaluations.

In this proof-of-concept, the spectrum is convolved with five
Gaussian kernels, logarithmically spaced by factors of two. At each bin, the
broadened value is drawn from a corresponding broad spectrum, obtained
by interpolation from those at neighbouring sigma values. This
interpolation involves some "magic numbers" in the form of a cubic
function fitted to reproduce a Gaussian as closely as possible from
wider and narrower functions.

The main assumption made in this approach is that the broadening
function is short-ranged relative to the rate of change in width.
Compared to a sum of functions centered at each bin, this method
introduces a slight asymmetry to the peaks. The benefit is a
drastically reduced cost of evaluation; the implementation here is far
from optimal (as it convolutes larger spectral regions than necessary)
and reduces the runtime of Abins by almost half compared to the
fastest implementation of a full sum.

157aa731

Abins: proof-of-concept implementation for interpolated broadening · 4d2872bc

Adam J. Jackson authored 5 years ago

This is a rough implementation of a new scheme for frequency-dependent
broadening. The instrumental broadening function for TOSCA has an
energy-dependent width; while there are several ways of sequencing and
implementating the process, it generally requires a fresh Gaussian to
be evaluated at every frequency bin and those Gaussians to be combined
in a weighted sum.

We note that a Gaussian function may be approximated by a linear
combination of Gaussians with width parameters (sigma) that bracket
the target and are not too far away. An error of ~1% may be reached by
limiting the sigma span to a factor of sqrt(2), while a factor of two
gives error of ~5%. Given the relevant range of widths for
instrumental broadening, a suitable collection of broadening functions
may be obtained with just a few logarithmically-spaced function
evaluations.

In this proof-of-concept, the spectrum is convolved with five
Gaussian kernels, logarithmically spaced by factors of two. At each bin, the
broadened value is drawn from a corresponding broad spectrum, obtained
by interpolation from those at neighbouring sigma values. This
interpolation involves some "magic numbers" in the form of a cubic
function fitted to reproduce a Gaussian as closely as possible from
wider and narrower functions.

The main assumption made in this approach is that the broadening
function is short-ranged relative to the rate of change in width.
Compared to a sum of functions centered at each bin, this method
introduces a slight asymmetry to the peaks. The benefit is a
drastically reduced cost of evaluation; the implementation here is far
from optimal (as it convolutes larger spectral regions than necessary)
and reduces the runtime of Abins by almost half compared to the
fastest implementation of a full sum.

4d2872bc

Abins: Some more broadening implementations · d16b0b09

Adam J. Jackson authored 5 years ago

A considerable speedup is obtained by limiting the frequency range of the
Gaussian functions. Performance is comparable to the legacy
implementation - if a similar number of samples is used! 3*max(sigma)
corresponds to about 150 points, while the default number of samples
in the legacy implementation is just 50 points spread over the same
distance. If these numbers are brought into line, the performance is
quite similar. Interestingly, the for-loop summation and np.histogram
summation change their ranking for speed depending on the number of
samples. We should keep both approaches while developing this method,
and see which is fastest for production calculations.

The names aren't great and there's a lot of redundancy - this should
be cleaned up once the best approaches are established. There isn't
much reason to keep the legacy mode around either, as it gives
objectively wrong results.

d16b0b09

Abins broadening: fresh implementation alongside legacy · d1e2e878

Adam J. Jackson authored 5 years ago

In order to work on the broadening in Abins, some adjustments are
being made to how these functions are called and it is made possible
to request different broadening schemes. A reimplementation of the
existing broadening is to be implemented as 'legacy' for ease of
comparison.

At this stage 'legacy' is the actual legacy implementation, while
'legacy_plus' is a similar implementation in the new framework. This
will replace the 'legacy' code as soon as possible.

The 'legacy' and 'legacy_plus' results give very similar results and
timing data, so while 'legacy_plus' does not provide full
compatibility it is a fair basis for comparison with new methods.

d1e2e878

Abins: spelling update · 1655f0dd
Adam J. Jackson authored 5 years ago

1655f0dd
Merge pull request #27690 from mantidproject/EditSaveNISTDAT · 9d14c1d4
Gigg, Martyn Anthony authored 5 years ago
```
Remove Confusion from SaveNISTDAT example 
```
Unverified

9d14c1d4
Merge pull request #27669 from mantidproject/27554_create_total_scattering_pdf_rebin_output · e9dea5e6
Gigg, Martyn Anthony authored 5 years ago
```
Create total scattering pdf rebin output
```
Unverified

e9dea5e6

Jan 18, 2020
- Merge pull request #27743 from martyngigg/fix-msvc-py3-py2-switch · 2753bb69
  Gigg, Martyn Anthony authored 5 years ago
  
  Windows: Fix switching between Python 2/3 without clean
  Unverified
  
  2753bb69
- Set cache variables for PYTHON_EXECUTABLE on windows · bbeea14f
  Gigg, Martyn Anthony authored 5 years ago
  
  Also uses fin_package rather than handwritten code to find Python libraries so unsetting code works as expected.
  bbeea14f
Jan 17, 2020

Add Usage Data link back in! · bc6970c7

DanielMurphy22 authored 5 years ago

I've added the link back in as the saveNISTDAT_data.nxs file was already in the DocTest data file and had just been removed from the usagedata so this was the simplest thing to do. This file is now in the UsageData

Unverified

bc6970c7

Remove UsageData link · 13afedb2
DanielMurphy22 authored 5 years ago

Unverified

13afedb2