This project is mirrored from https://github.com/mantidproject/mantid.git.
Pull mirroring updated .
- Jan 20, 2020
-
-
Adam J. Jackson authored
See commit notes for CASTEP data update
-
Adam J. Jackson authored
Reference files for data import need updating for new advanced params
-
Adam J. Jackson authored
Test data needs updating due to changes in the Advanced Parameters. Have eyeballed some of the file changes and the changes seem fine. (There's also an increase in the written precision of atom positions, but that will be related to the new test data writer.) CASTEP data updated here, other codes to follow.
-
Adam J. Jackson authored
Abins uses a base class for Instruments, which defines a couple of methods returning None. None is not the right response if these methods were not adapted to a child class; use NotImplementedError to mark these more clearly.
-
Adam J. Jackson authored
-
Adam J. Jackson authored
(And a typo in one of the test cases.)
-
Adam J. Jackson authored
The initial implementation assumed that the input series of sigma values would monotonically increase, which is reasonable for TOSCA but not for other instruments. The logic is modified so that instead of the input data, the list of sigma values used for convolution is required to be sorted. As this list was generated within the same function and ranges from (min(sigma), max(sigma)), the condition is always satisfied. The unit tests are not affected, but there may be a small numerical difference as another implementation detail has been fixed in the process; the interpolation x-values ('sigma factors') should be the ratio between sigma and sigma at the lower bound of the current window. In the initial implementation this was mistakenly identified as sigma_chunk[0], the left-most value in the block _out of the actual sigma values_; it should be the exact convolved width from `sigma_samples`.
-
Adam J. Jackson authored
-
Adam J. Jackson authored
A new method is provided in AbinsModules.CalculateS to generate a test data file for the AbinsCalculateSPowder test. This format isn't great and should be refined to something like standard JSON, but at least we now have a standard way of generating it. The input options are a few calculation parameters and a precalculated AbinsData object. To make these objects easier to instantiate, a from_calculation_data() classmethod is added. The logic for identifying and calling a file loader has been moved from the main Abins algorithm into AbinsData to support this without redundancy. It may make sense to move this again to somewhere else, e.g. a new *AbinsModules/Loaders/__init__.py* but for now this is close to the point of use. Equivalence of this method's output to the existing test file has been verified manually for the Si2 order-1 test data. The reference file does not need to be updated, and tests should continue to pass.
-
Adam J. Jackson authored
The sample data for the CalculateSPowder test needs to be updated as part of the work on broadening, because Abins is now producing different results. Specifically: the instrumental broadening function applied to the S data has been replaced and is now smooth; the reference frequencies are now defined to be in the middle of the histogram bins rather than at the edges. This difference in binning conventions has actually changed the size of the results array (by 1!) The format of the sample data for CalculateSPowder is not very pretty; it is some kind of raw text dump. It would be nice to avoid such things with a proper JSON serialisation setup, but for now the priority is to fix the test in a way that is comparable with previous behaviour. The process used to generate the replacement file was to obtain the calculation data with AbinsModules.CalculateS.init(...args...).get_formatted_data() and dump the text representation to a file with numpy.set_printoptions(threshold=numpy.nan) with open('Si2-sc-CalculateSPowder_S.txt', 'w') as f: f.write(str(calculated_data.extract())) The resulting file appears more neatly formatted than its predecessor 4b77bac3f8c1dc54f63bd41ca3075c48, but seems acceptable to the test parser.
-
Adam J. Jackson authored
-
Adam J. Jackson authored
-
Adam J. Jackson authored
-
Adam J. Jackson authored
-
Adam J. Jackson authored
Normalise Gaussians by multiplying by bin-width instead of dividing by sum(values). The sum approach can go badly wrong if values lie near or beyond the sampling range, as as tail fragment (or noise floor!) would be amplified to unity. Scaling by the theoretical value as we do here has a different downside: truncated broadening kernels will not quite sum up to 1, so a little intensity is lost in broadening. This error is more tolerable, especially as it can be decreased by extending the truncation range.
-
Adam J. Jackson authored
-
Adam J. Jackson authored
To avoid polluting the dev-docs with algorithm-specific info and to avoid storing/updating many image files the Abins development notes are moved back into Concepts. The fast broadening method is broken into a separate document to avoid too much wading through plots.
-
Adam J. Jackson authored
Move deprecation plans to the top of this document to make them more visible. The broadening section is a little rambling and will probably remain so until the theory is laid out more formally. Images could use a few tweaks (more consistent sizing, remove unnecessary x values) but I don't want to fill the Git history with small changes to image files.
-
Adam J. Jackson authored
-
Adam J. Jackson authored
-
Adam J. Jackson authored
Figures are included as image files and the code to generate them is included as a comment. This doesn't seem ideal as they may be revised and bloat the repo (or become outdated); is there any workflow to generate images on the fly when building the docs? The section on interpolation method is a bit long and should perhaps be spun out into another document.
-
Adam J. Jackson authored
-
Adam J. Jackson authored
-
Adam J. Jackson authored
- The 'legacy' broadening method is removed. It is not recommended for use in any circumstance. - A 'none' option is provided which simply resamples, for reference/test purposes. - Some other broadening schemes are consolidated: - 'gaussian_trunc' method is removed and the superior 'gaussian_trunc_windowed' is renamed to 'gaussian_truncated' - The different summation approaches are no longer exposed to the choice of broadening method; a default 'auto' choice selects between the two summation schemes based on the size of the data. - An 'auto' broadening scheme is introduced, and from here all Abins Instruments are expected to provide one. For TOSCA it switches between 'gaussian_truncated' and 'interpolate' depending on the number of input frequencies. - Responsibility for deciding to resample is best located at the Instrument level, as this needs to trigger the calculation of an appropriate set of broadening widths (sigma). - The broadening method selection is now drawn from AbinsParameters during SPowderSemiEmpiricalCalculator; the motivation here is to move 'Abins state' information as high up in the program as appropriate, so that information about what modes/settings are used are generally determined by function arguments. This is more maintainable as it allows lower-level objects/functions to be tested independently of a full Abins run. Performance tests are interesting at this point. Testing _broaden_spectrum_ with some synthetic data: | Scheme | Average time | Max time | |--------------------+--------------+----------| | none | 0.0002 | 0.0005 | | interpolate | 0.0035 | 0.0046 | | interpolate_coarse | 0.0029 | 0.0034 | | gaussian_truncated | 0.0241 | 0.0269 | | normal_truncated | 0.0245 | 0.0257 | | gaussian | 0.2760 | 0.2794 | | normal | 0.2422 | 0.2525 | So there is a clear heirarchy of methods with order-of-magnitude performance differences. Comparing full runs of Abins on crystalline benzene up to 3rd-order: | Scheme | time | |--------------------+-------| | none | 12.46 | | interpolate | 21.85 | | interpolate_coarse | 21.28 | | auto | 21.78 | | gaussian_truncated | 28.22 | | normal_truncated | 27.94 | If we assume that 'none' is essentially pure overhead, then 'interpolate' costs around 10s... and the other methods are less than twice as expensive. Where has the performance difference gone? Try another system: Toluene molecule up to 4th order | Scheme | time | |--------------------+-------| | none | 1.905 | | interpolate | 3.188 | | interpolate_coarse | 2.844 | | auto | 2.723 | | gaussian_truncated | 5.887 | | normal_truncated | 5.943 | | gaussian | 57.81 | | normal | 50.53 | Interesting that "normal" is quicker than "gaussian" in this case; the "max time" column of the faster test has generally pointed to this method being more _consistently_ fast than Gaussian even if slower on _average_. I suspect some kind of caching magic. "Auto" has beaten "interpolate" this time; this could be due to a saving on the first order which doesn't really need binning. In any case, it's disappointing that the factor-10 improvement from interpolation seems to reduce to a factor-2 in practice. Perhaps further optimisation is possible, but there may also be a system-dependence at work. (The smaller test uses data randomly distributed over the whole spectrum, while real data will cluster.) In any case, with 'auto' we seem to have a strategy that brings the cost of broadening roughly into line with other overheads in the program. There is a known route by which it might be possible to optimise 'interpolate' further: limiting the range over which each convolution is performed. This could be quite complicated in practice, and there are likely to be more fruitful avenues for improving the performance of Abins. (e.g. combining data from different atoms before broadening.)
-
Adam J. Jackson authored
The number of kernels was accidentally rounded up twice, leading to an empty chunk which needed to be accounted for. Removing this seems to have no impact on the results.
-
Adam J. Jackson authored
Fast interpolated broadening is now moved into a more general/extensible function, including ascii-art documentation. sqrt(2) spacing is implemented, and initial testing shows that this noticably improves accuracy for little extra cost. Set this as the default, and for now leave in the coarser option for test purposes. A flag is also provided to skip rebinning inside this function; this potentially avoids and expensive redundant operation but makes for convoluted logic when this rebin is also considered inside the Instrument. A cleaner scheme should be possible, in which it is obvious where rebinning should happen.
-
Adam J. Jackson authored
Clean up some of the experimental broadening schemes into a more systematic framework. Add comments and ASCII art to explain the windowing procedure.
-
Adam J. Jackson authored
A normalisation term was found in the Gaussian function which scales using Sigma and fwhm to fix the maximum value of the peak as sigma is varied. This does not appear to be consistent with the function defined in the Abins paper, and leads to an increasing weight for high-energy terms. This term is removed. Instead a normalisation feature is added which normalises the sum of the output values to one. This is required for a convolution kernel that does not change the overall intensity, and removes any dependence on the bin width. The cost of this process is not insignificant; it may be more efficient to use the analytic integral ("normal" function) which is 'pre-normalised'. Testing with different bin widths finds that the differences in the resulting curves are negligible at typical bin width and with large bins the results, while different, are not obviously superior or inferior. Work is started on refactoring the broadening methods that use truncated functions by providing a truncation function that accepts the `gaussian` or `normal` function as an input.
-
Adam J. Jackson authored
Work in progress on the broadening features of Abins. The Instrument class is becoming bloated with generic broadening functions so these are moved into their own module. In contrast with most of Abins, this module is not object-oriented and consists of top-level functions; these are primarily for use by Instrument implementations. The "legacy" broadening logic is removed entirely and this label is now used for the implementation of the legacy behaviour within the new broadening logic (replacing the "legacy_plus") label. Several redundant broadening implementations are currently in place while their performance/scaling is tested.
-
Adam J. Jackson authored
This is a rough implementation of a new scheme for frequency-dependent broadening. The instrumental broadening function for TOSCA has an energy-dependent width; while there are several ways of sequencing and implementating the process, it generally requires a fresh Gaussian to be evaluated at every frequency bin and those Gaussians to be combined in a weighted sum. We note that a Gaussian function may be approximated by a linear combination of Gaussians with width parameters (sigma) that bracket the target and are not too far away. An error of ~1% may be reached by limiting the sigma span to a factor of sqrt(2), while a factor of two gives error of ~5%. Given the relevant range of widths for instrumental broadening, a suitable collection of broadening functions may be obtained with just a few logarithmically-spaced function evaluations. In this proof-of-concept, the spectrum is convolved with five Gaussian kernels, logarithmically spaced by factors of two. At each bin, the broadened value is drawn from a corresponding broad spectrum, obtained by interpolation from those at neighbouring sigma values. This interpolation involves some "magic numbers" in the form of a cubic function fitted to reproduce a Gaussian as closely as possible from wider and narrower functions. The main assumption made in this approach is that the broadening function is short-ranged relative to the rate of change in width. Compared to a sum of functions centered at each bin, this method introduces a slight asymmetry to the peaks. The benefit is a drastically reduced cost of evaluation; the implementation here is far from optimal (as it convolutes larger spectral regions than necessary) and reduces the runtime of Abins by almost half compared to the fastest implementation of a full sum.
-
Adam J. Jackson authored
This is a rough implementation of a new scheme for frequency-dependent broadening. The instrumental broadening function for TOSCA has an energy-dependent width; while there are several ways of sequencing and implementating the process, it generally requires a fresh Gaussian to be evaluated at every frequency bin and those Gaussians to be combined in a weighted sum. We note that a Gaussian function may be approximated by a linear combination of Gaussians with width parameters (sigma) that bracket the target and are not too far away. An error of ~1% may be reached by limiting the sigma span to a factor of sqrt(2), while a factor of two gives error of ~5%. Given the relevant range of widths for instrumental broadening, a suitable collection of broadening functions may be obtained with just a few logarithmically-spaced function evaluations. In this proof-of-concept, the spectrum is convolved with five Gaussian kernels, logarithmically spaced by factors of two. At each bin, the broadened value is drawn from a corresponding broad spectrum, obtained by interpolation from those at neighbouring sigma values. This interpolation involves some "magic numbers" in the form of a cubic function fitted to reproduce a Gaussian as closely as possible from wider and narrower functions. The main assumption made in this approach is that the broadening function is short-ranged relative to the rate of change in width. Compared to a sum of functions centered at each bin, this method introduces a slight asymmetry to the peaks. The benefit is a drastically reduced cost of evaluation; the implementation here is far from optimal (as it convolutes larger spectral regions than necessary) and reduces the runtime of Abins by almost half compared to the fastest implementation of a full sum.
-
Adam J. Jackson authored
A considerable speedup is obtained by limiting the frequency range of the Gaussian functions. Performance is comparable to the legacy implementation - if a similar number of samples is used! 3*max(sigma) corresponds to about 150 points, while the default number of samples in the legacy implementation is just 50 points spread over the same distance. If these numbers are brought into line, the performance is quite similar. Interestingly, the for-loop summation and np.histogram summation change their ranking for speed depending on the number of samples. We should keep both approaches while developing this method, and see which is fastest for production calculations. The names aren't great and there's a lot of redundancy - this should be cleaned up once the best approaches are established. There isn't much reason to keep the legacy mode around either, as it gives objectively wrong results.
-
Adam J. Jackson authored
In order to work on the broadening in Abins, some adjustments are being made to how these functions are called and it is made possible to request different broadening schemes. A reimplementation of the existing broadening is to be implemented as 'legacy' for ease of comparison. At this stage 'legacy' is the actual legacy implementation, while 'legacy_plus' is a similar implementation in the new framework. This will replace the 'legacy' code as soon as possible. The 'legacy' and 'legacy_plus' results give very similar results and timing data, so while 'legacy_plus' does not provide full compatibility it is a fair basis for comparison with new methods.
-
Adam J. Jackson authored
-
Gigg, Martyn Anthony authored
Remove Confusion from SaveNISTDAT example
-
Gigg, Martyn Anthony authored
Create total scattering pdf rebin output
-
- Jan 18, 2020
-
-
Gigg, Martyn Anthony authored
Windows: Fix switching between Python 2/3 without clean
-
Gigg, Martyn Anthony authored
Also uses fin_package rather than handwritten code to find Python libraries so unsetting code works as expected.
-
- Jan 17, 2020
-
-
DanielMurphy22 authored
I've added the link back in as the saveNISTDAT_data.nxs file was already in the DocTest data file and had just been removed from the usagedata so this was the simplest thing to do. This file is now in the UsageData
-
DanielMurphy22 authored
-