Commit a79c5e90 authored by Diy2 Upstream's avatar Diy2 Upstream Committed by Sujin Philip
Browse files

diy2 2016-01-22 (8c89e567)

Code extracted from:

    https://gitlab.kitware.com/third-party/diy2.git

at commit 8c89e5677ce3d29f621049d736167cf386648ecc (for/vtk).
parents
* -whitespace
Copyright Notice
DIY2, Copyright (c) 2015, The Regents of the University of California, through
Lawrence Berkeley National Laboratory (subject to receipt of any required
approvals from the U.S. Dept. of Energy). All rights reserved.
If you have questions about your rights to use or distribute this software,
please contact Berkeley Lab's Technology Transfer Department at TTD@lbl.gov.
NOTICE. This software is owned by the U.S. Department of Energy. As such, the
U.S. Government has been granted for itself and others acting on its behalf a
paid-up, nonexclusive, irrevocable, worldwide license in the Software to
reproduce, prepare derivative works, and perform publicly and display publicly.
Beginning five (5) years after the date permission to assert copyright is
obtained from the U.S. Department of Energy, and subject to any subsequent five
(5) year renewals, the U.S. Government is granted for itself and others acting
on its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the
Software to reproduce, prepare derivative works, distribute copies to the
public, perform publicly and display publicly, and to permit others to do so.
License Agreement
"DIY2, Copyright (c) 2015, The Regents of the University of California, through
Lawrence Berkeley National Laboratory (subject to receipt of any required
approvals from the U.S. Dept. of Energy). All rights reserved."
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
(1) Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
(2) Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation and/or
other materials provided with the distribution.
(3) Neither the name of the University of California, Lawrence Berkeley National
Laboratory, U.S. Dept. of Energy nor the names of its contributors may be used
to endorse or promote products derived from this software without specific prior
written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
You are under no obligation whatsoever to provide any bug fixes, patches, or
upgrades to the features, functionality or performance of the source code
("Enhancements") to anyone; however, if you choose to make your Enhancements
available either publicly, or directly to Lawrence Berkeley National Laboratory,
without imposing a separate written license agreement for such Enhancements,
then you hereby grant the following license: a non-exclusive, royalty-free
perpetual license to install, use, modify, prepare derivative works, incorporate
into other computer software, distribute, and sublicense such enhancements or
derivative works thereof, in binary and source code form.
## DIY2 is a data-parallel out-of-core library
DIY2 is a data-parallel library for implementing scalable algorithms that can execute both
in-core and out-of-core. The same program can be executed with one or more threads per MPI
process, seamlessly combining distributed-memory message passing with shared-memory thread
parallelism. The abstraction enabling these capabilities is block-based parallelism; blocks
and their message queues are mapped onto processing elements (MPI processes or threads) and are
migrated between memory and storage by the DIY2 runtime. Complex communication patterns,
including neighbor exchange, merge reduction, swap reduction, and all-to-all exchange, are
possible in- and out-of-core in DIY2.
## Licensing
DIY2 is released as open source software under a BSD style [license](./LICENSE.txt).
## Dependencies
DIY2 requires an MPI installation. We recommend [MPICH](http://www.mpich.org/).
## Download, build, install
- You can clone this repository, or
- You can download the [latest tarball](https://github.com/diatomic/diy2/archive/master.tar.gz).
DIY2 is a header-only library. It does not need to be built; you can simply
include it in your project. The examples can be built using `cmake` from the
top level directory.
## Documentation
[Doxygen pages](https://diatomic.github.io/diy2)
## Example
A simple DIY2 program, shown below, consists of the following components:
- `struct`s called blocks,
- a diy object called the `master`,
- a set of callback functions performed on each block by `master.foreach()`,
- optionally one or more message exchanges between the blocks by `master.exchange()`, and
- there may be other collectives and global reductions not shown below.
The callback functions (`enqueue_block()` and `average()` in the example below) are given the block
pointer and a communication proxy for the message exchange between blocks. It is usual for the
callback functions to enqueue or dequeue messages from the proxy, so that information can be
received and sent during rounds of message exchange.
```cpp
// --- main program --- //
struct Block { float local, average; }; // define your block structure
Master master(world); // world = MPI_Comm
... // populate master with blocks
master.foreach<Block>(&enqueue_local); // call enqueue_local() for each block
master.exchange(); // exchange enqueued data between blocks
master.foreach<Block>(&average); // call average() for each block
// --- callback functions --- //
// enqueue block data prior to exchanging it
void enqueue_local(Block* b, // one block
const Proxy& cp, // communication proxy
// i.e., the neighbor blocks with which
// this block communicates
void* aux) // user-defined additional arguments
{
for (size_t i = 0; i < cp.link()->size(); i++) // for all neighbor blocks
cp.enqueue(cp.link()->target(i), b->local); // enqueue the data to be sent
// to this neighbor block in the next
// exchange
}
// use the received data after exchanging it, in this case compute its average
void average(Block* b, // one block
const Proxy& cp, // communication proxy
// i.e., the neighbor blocks with which
// this block communicates
void* aux) // user-defined additional arguments
{
float x, average = 0;
for (size_t i = 0; i < cp.link()->size(); i++) // for all neighbor blocks
{
cp.dequeue(cp.link()->target(i).gid, x); // dequeue the data received from this
// neighbor block in the last exchange
average += x;
}
b->average = average / cp.link()->size();
}
```
#ifndef DIY_ALGORITHMS_HPP
#define DIY_ALGORITHMS_HPP
#include <vector>
#include "master.hpp"
#include "assigner.hpp"
#include "reduce.hpp"
#include "reduce-operations.hpp"
#include "partners/swap.hpp"
#include "detail/algorithms/sort.hpp"
#include "detail/algorithms/kdtree.hpp"
namespace diy
{
/**
* \ingroup Algorithms
* \brief sample sort `values` of each block, store the boundaries between blocks in `samples`
*/
template<class Block, class T, class Cmp>
void sort(Master& master, //!< master object
const Assigner& assigner, //!< assigner object
std::vector<T> Block::* values, //!< all values to sort
std::vector<T> Block::* samples, //!< (output) boundaries of blocks
size_t num_samples, //!< desired number of samples
const Cmp& cmp, //!< comparison function
int k = 2, //!< k-ary reduction will be used
bool samples_only = false) //!< false: results will be all_to_all exchanged; true: only sort but don't exchange results
{
bool immediate = master.immediate();
master.set_immediate(false);
// NB: although sorter will go out of scope, its member functions sample()
// and exchange() will return functors whose copies get saved inside reduce
detail::SampleSort<Block,T,Cmp> sorter(values, samples, cmp, num_samples);
// swap-reduce to all-gather samples
RegularSwapPartners partners(1, assigner.nblocks(), k);
reduce(master, assigner, partners, sorter.sample(), detail::SkipIntermediate(partners.rounds()));
// all_to_all to exchange the values
if (!samples_only)
all_to_all(master, assigner, sorter.exchange(), k);
master.set_immediate(immediate);
}
/**
* \ingroup Algorithms
* \brief sample sort `values` of each block, store the boundaries between blocks in `samples`
* shorter version of above sort algorithm with the default less-than comparator used for T
* and all_to_all exchange included
*/
template<class Block, class T>
void sort(Master& master, //!< master object
const Assigner& assigner, //!< assigner object
std::vector<T> Block::* values, //!< all values to sort
std::vector<T> Block::* samples, //!< (output) boundaries of blocks
size_t num_samples, //!< desired number of samples
int k = 2) //!< k-ary reduction will be used
{
sort(master, assigner, values, samples, num_samples, std::less<T>(), k);
}
/**
* \ingroup Algorithms
* \brief build a kd-tree and sort a set of points into it
*/
template<class Block, class Point>
void kdtree(Master& master, //!< master object
const Assigner& assigner, //!< assigner object
int dim, //!< dimensionality
const ContinuousBounds& domain, //!< global data extents
std::vector<Point> Block::* points, //!< input points to sort into kd-tree
size_t bins, //!< number of histogram bins for splitting a dimension
bool wrap = false)//!< periodic boundaries in all dimensions
{
if (assigner.nblocks() & (assigner.nblocks() - 1))
{
fprintf(stderr, "KD-tree requires a number of blocks that's a power of 2, got %d\n", assigner.nblocks());
std::abort();
}
typedef diy::RegularContinuousLink RCLink;
for (int i = 0; i < master.size(); ++i)
{
RCLink* link = static_cast<RCLink*>(master.link(i));
link->core() = domain;
link->bounds() = domain;
}
detail::KDTreePartition<Block,Point> kdtree_partition(dim, points, bins);
detail::KDTreePartners partners(dim, assigner.nblocks(), wrap, domain);
reduce(master, assigner, partners, kdtree_partition);
// update master.expected to match the links
int expected = 0;
for (int i = 0; i < master.size(); ++i)
expected += master.link(i)->size_unique();
master.set_expected(expected);
}
}
#endif
#ifndef DIY_ASSIGNER_HPP
#define DIY_ASSIGNER_HPP
#include <vector>
namespace diy
{
// Derived types should define
// int rank(int gid) const
// that converts a global block id to a rank that it's assigned to.
class Assigner
{
public:
/**
* \ingroup Assignment
* \brief Manages how blocks are assigned to processes
*/
Assigner(int size, //!< total number of processes
int nblocks //!< total (global) number of blocks
):
size_(size), nblocks_(nblocks) {}
//! returns the total number of process ranks
int size() const { return size_; }
//! returns the total number of global blocks
int nblocks() const { return nblocks_; }
//! sets the total number of global blocks
void set_nblocks(int nblocks) { nblocks_ = nblocks; }
//! gets the local gids for a given process rank
virtual void local_gids(int rank, std::vector<int>& gids) const =0;
//! returns the process rank of the block with global id gid (need not be local)
virtual int rank(int gid) const =0;
private:
int size_; // total number of ranks
int nblocks_; // total number of blocks
};
class ContiguousAssigner: public Assigner
{
public:
/**
* \ingroup Assignment
* \brief Assigns blocks to processes in contiguous gid (block global id) order
*/
ContiguousAssigner(int size, //!< total number of processes
int nblocks //!< total (global) number of blocks
):
Assigner(size, nblocks) {}
using Assigner::size;
using Assigner::nblocks;
int rank(int gid) const
{
int div = nblocks() / size();
int mod = nblocks() % size();
int r = gid / (div + 1);
if (r < mod)
{
return r;
} else
{
return mod + (gid - (div + 1)*mod)/div;
}
}
inline
void local_gids(int rank, std::vector<int>& gids) const;
};
class RoundRobinAssigner: public Assigner
{
public:
/**
* \ingroup Assignment
* \brief Assigns blocks to processes in cyclic or round-robin gid (block global id) order
*/
RoundRobinAssigner(int size, //!< total number of processes
int nblocks //!< total (global) number of blocks
):
Assigner(size, nblocks) {}
using Assigner::size;
using Assigner::nblocks;
int rank(int gid) const { return gid % size(); }
inline
void local_gids(int rank, std::vector<int>& gids) const;
};
}
void
diy::ContiguousAssigner::
local_gids(int rank, std::vector<int>& gids) const
{
int div = nblocks() / size();
int mod = nblocks() % size();
int from, to;
if (rank < mod)
from = rank * (div + 1);
else
from = mod * (div + 1) + (rank - mod) * div;
if (rank + 1 < mod)
to = (rank + 1) * (div + 1);
else
to = mod * (div + 1) + (rank + 1 - mod) * div;
for (int gid = from; gid < to; ++gid)
gids.push_back(gid);
}
void
diy::RoundRobinAssigner::
local_gids(int rank, std::vector<int>& gids) const
{
int cur = rank;
while (cur < nblocks())
{
gids.push_back(cur);
cur += size();
}
}
#endif
#ifndef DIY_COLLECTION_HPP
#define DIY_COLLECTION_HPP
#include <vector>
#include "serialization.hpp"
#include "storage.hpp"
#include "thread.hpp"
namespace diy
{
class Collection
{
public:
typedef void* Element;
typedef std::vector<Element> Elements;
typedef critical_resource<int, recursive_mutex> CInt;
typedef void* (*Create)();
typedef void (*Destroy)(void*);
typedef detail::Save Save;
typedef detail::Load Load;
public:
Collection(Create create,
Destroy destroy,
ExternalStorage* storage,
Save save,
Load load):
create_(create),
destroy_(destroy),
storage_(storage),
save_(save),
load_(load),
in_memory_(0) {}
size_t size() const { return elements_.size(); }
const CInt& in_memory() const { return in_memory_; }
inline void clear();
int add(Element e) { elements_.push_back(e); external_.push_back(-1); ++(*in_memory_.access()); return elements_.size() - 1; }
void* release(int i) { void* e = get(i); elements_[i] = 0; return e; }
void* find(int i) const { return elements_[i]; } // possibly returns 0, if the element is unloaded
void* get(int i) { if (!find(i)) load(i); return find(i); } // loads the element first, and then returns its address
int available() const { int i = 0; for (; i < (int)size(); ++i) if (find(i) != 0) break; return i; }
inline void load(int i);
inline void unload(int i);
Create creator() const { return create_; }
Destroy destroyer() const { return destroy_; }
Load loader() const { return load_; }
Save saver() const { return save_; }
void* create() const { return create_(); }
void destroy(int i) { if (find(i)) { destroy_(find(i)); elements_[i] = 0; } else if (external_[i] != -1) storage_->destroy(external_[i]); }
bool own() const { return destroy_ != 0; }
ExternalStorage* storage() const { return storage_; }
private:
Create create_;
Destroy destroy_;
ExternalStorage* storage_;
Save save_;
Load load_;
Elements elements_;
std::vector<int> external_;
CInt in_memory_;
};
}
void
diy::Collection::
clear()
{
if (own())
for (size_t i = 0; i < size(); ++i)
destroy(i);
elements_.clear();
external_.clear();
*in_memory_.access() = 0;
}
void
diy::Collection::
unload(int i)
{
//BinaryBuffer bb;
void* e = find(i);
//save_(e, bb);
//external_[i] = storage_->put(bb);
external_[i] = storage_->put(e, save_);
destroy_(e);
elements_[i] = 0;
--(*in_memory_.access());
}
void
diy::Collection::
load(int i)
{
//BinaryBuffer bb;
//storage_->get(external_[i], bb);
void* e = create_();
//load_(e, bb);
storage_->get(external_[i], e, load_);
elements_[i] = e;
external_[i] = -1;
++(*in_memory_.access());
}
#endif
#ifndef DIY_COMMUNICATOR_HPP
#define DIY_COMMUNICATOR_HPP
#warning "diy::Communicator (in diy/communicator.hpp) is deprecated, use diy::mpi::communicator directly"
#include "mpi.hpp"
namespace diy
{
typedef mpi::communicator Communicator;
}
#endif
#ifndef DIY_CONSTANTS_H
#define DIY_CONSTANTS_H
// Default DIY_MAX_DIM to 4, unless provided by the user
// (used for static array size in various Bounds (bb_t in types.h))
#ifndef DIY_MAX_DIM
#define DIY_MAX_DIM 4
#endif
/* DIY directions: neighbor direction enumeration
used to identify direction of one neighbor block for both regular and
wrapround neighbors
can be bitwise ORed, eg., maximum-side neighboring block in 3 dimensions
would be DIY_X1 | DIY_Y1 | DIY_Z1
each use identifies exactly one block
eg. DIY_X0 is the (one) left neighbor block, not the entire left plane */
typedef enum
{
DIY_X0 = 0x01, /* minimum-side x (left) neighbor */
DIY_X1 = 0x02, /* maximum-side x (right) neighbor */
DIY_Y0 = 0x04, /* minimum-side y (bottom) neighbor */
DIY_Y1 = 0x08, /* maximum-side y (top) neighbor */
DIY_Z0 = 0x10, /* minimum-side z (back) neighbor */
DIY_Z1 = 0x20, /* maximum-side z (front)neighbor */
DIY_T0 = 0x40, /* minimum-side t (earlier) neighbor */
DIY_T1 = 0x80 /* maximum-side t (later) neighbor */
} dir_t;
#endif
#ifndef DIY_CRITICAL_RESOURCE_HPP
#define DIY_CRITICAL_RESOURCE_HPP
namespace diy
{
template<class T, class Mutex>
class resource_accessor
{
public:
resource_accessor(T& x, Mutex& m):
x_(x), lock_(m) {}
T& operator*() { return x_; }
T* operator->() { return &x_; }
const T& operator*() const { return x_; }
const T* operator->() const { return &x_; }
private:
T& x_;
lock_guard<Mutex> lock_;
};
template<class T, class Mutex = fast_mutex>
class critical_resource
{
public:
typedef resource_accessor<T, Mutex> accessor;
typedef resource_accessor<const T, Mutex> const_accessor; // eventually, try shared locking
public:
critical_resource() {}
critical_resource(const T& x):
x_(x) {}
accessor access() { return accessor(x_, m_); }
const_accessor const_access() const { return const_accessor(x_, m_); }
private:
T x_;
mutable Mutex m_;
};
}
#endif
#ifndef DIY_DECOMPOSITION_HPP
#define DIY_DECOMPOSITION_HPP
#include <vector>
#include <algorithm>
#include <iostream>
#include <cmath>
#include "link.hpp"
#include "assigner.hpp"
#include "master.hpp"