Commit c4f6394a authored by Dmitry I. Lyakh's avatar Dmitry I. Lyakh
Browse files

1. Limited max number of tensor ops in fly; 2. Revised tensor domain existence policy.


Signed-off-by: default avatarDmitry I. Lyakh <quant4me@gmail.com>
parent af2869b5
Pipeline #166931 failed with stage
in 5 minutes and 42 seconds
/** ExaTN::Numerics: General client header (free function API)
REVISION: 2021/09/27
REVISION: 2021/09/29
Copyright (C) 2018-2021 Dmitry I. Lyakh (Liakh)
Copyright (C) 2018-2021 Oak Ridge National Laboratory (UT-Battelle) **/
......@@ -25,8 +25,10 @@ Copyright (C) 2018-2021 Oak Ridge National Laboratory (UT-Battelle) **/
(c) Tensor signature is an ordered tuple of {space_id,subspace_id} pairs
for each tensor dimension. In case space_id = SOME_SPACE, subspace_id
is simply the base offset in the anonymous vector space (min = 0).
(d) Additionally, a subset of tensor dimensions can be assigned an isometry property;
any tensor may have no more than two disjoint isometric dimension groups.
(d) Additionally, a subset of tensor dimensions can be assigned an isometry property.
Contraction over such a subset of isometric dimensions of a tensor with its
conjugate produces a Kronecker Delta tensor. Any tensor may have no more than
two disjoint isometric dimension subsets.
4. Tensor operation [tensor_operation.hpp]:
(a) Tensor operation is a mathematical operation on one or more tensor arguments.
(b) Evaluating a tensor operation means computing the value of all its output tensors,
......@@ -84,19 +86,28 @@ Copyright (C) 2018-2021 Oak Ridge National Laboratory (UT-Battelle) **/
(a) A tensor can be allocated storage and processed at any time after its formal definition.
(b) Tensor storage allocation is called tensor creation. A tensor can either be created across all
MPI processes or within a specified group of them. The subset of MPI processes participating
in the tensor creation operation defines its domain of existence, meaning that only these
in the tensor creation operation defines its Domain of Existence, meaning that only these
MPI processses are aware of the existence of the created tensor. Note that the concrete
physical distribution of the tensor body among the MPI processes is hidden from the user
(either fully replicated or fully distributed or a mix of the two).
(c) A subset of MPI processes participating in a given tensor operation defines
(either fully replicated or fully distributed or a mix of the two). A contiguous subset
of MPI processes from the tensor existence domain that contains all tensor elements is
called the Subdomain of Full Presence.
(c) A set of MPI processes participating in a given tensor operation defines
its execution domain. The tensor operation execution domain must be compatible
with the existence domains of its tensor operands:
(1) The existence domains of all output tensor operands must be the same;
(2) The tensor operation execution domain must coincide with the existence
domains of all output tensor operands;
(3) The existence domain of each input tensor operand must include
the tensor operation execution domain AND each input tensor operand
must be fully available within the tensor operation execution domain.
with the existence/presence domains of its tensor operands:
(1) The existence domains of all tensor operands must be properly nested,
that is, there should exist an order of their placement such that
each previous domain is contained in or congruent to the next one:
D_i <= D_j <= D_k <= ...,
where D_i is the existence domain of tensor operand i.
(2) The tensor operation execution domain is the smallest of
the tensor operand existence domains;
(3) The tensor operation execution domain must be a subdomain
of full presence for all tensor operands;
(4) If any of the output tensor operands has a larger existence domain
than the execution domain of the tensor operation, it is the user's
responsibility to update the tensor value outside the tensor operation
execution domain, otherwise the code is non-compliant.
(d) By default, the tensor body is replicated across all MPI processes in its domain of existence.
The user also has an option to create a distributed tensor by specifying which dimensions of
this tensor to split into segments, thus inducing a block-wise decomposition of the tensor body.
......@@ -260,11 +271,13 @@ inline bool withinTensorExistenceDomain(Args&&... tensor_names) //in: tensor nam
/** Returns the process group associated with the given tensors, that is,
the overlap of existence domains of the given tensors. Note that the
existence domains of the given tensors must be properly nested,
the intersection of existence domains of the given tensors. Note that
the existence domains of the given tensors must be properly nested,
tensorA <= tensorB <= tensorC <= ... <= tensorZ,
otherwise the code will result in an undefined behavior. As a useful
rule, always place output tensors in front of input tensors. **/
for some order of the tensors, otherwise the code will result in
an undefined behavior. It is user's responsibility to ensure that
the returned process group is also a subdomain of full presence
for all participating tensors. **/
template <typename... Args>
inline const ProcessGroup & getTensorProcessGroup(Args&&... tensor_names) //in: tensor names
{return numericalServer->getTensorProcessGroup(std::forward<Args>(tensor_names)...);}
......
/** ExaTN::Numerics: Numerical server
REVISION: 2021/09/27
REVISION: 2021/09/29
Copyright (C) 2018-2021 Dmitry I. Lyakh (Liakh)
Copyright (C) 2018-2021 Oak Ridge National Laboratory (UT-Battelle) **/
......@@ -630,7 +630,6 @@ bool NumServer::submit(const ProcessGroup & process_group,
if(logging_ > 0) logfile_ << "[" << std::fixed << std::setprecision(6) << exatn::Timer::timeInSecHR(getTimeStampStart())
<< "]: Found a contraction sequence candidate locally (caching = " << contr_seq_caching_
<< ")" << std::endl;
#ifdef MPI_ENABLED
//Synchronize on the best tensor contraction sequence across processes:
if(num_procs > 1 && num_input_tensors > 2){
......@@ -706,7 +705,9 @@ bool NumServer::submit(const ProcessGroup & process_group,
std::dynamic_pointer_cast<numerics::TensorOpTransform>(op1)->
resetFunctor(std::shared_ptr<TensorMethod>(new numerics::FunctorInitVal(0.0)));
submitted = submit(op1,tensor_mapper); if(!submitted) return false;
//Submit all tensor operations for tensor network evaluation:
std::size_t num_tens_ops_in_fly = 0;
const auto num_split_indices = network.getNumSplitIndices(); //total number of indices that were split
if(logging_ > 0) logfile_ << "Number of split indices = " << num_split_indices << std::endl << std::flush;
std::size_t num_items_executed = 0; //number of tensor sub-networks executed
......@@ -805,6 +806,7 @@ bool NumServer::submit(const ProcessGroup & process_group,
std::dynamic_pointer_cast<numerics::TensorOpCreate>(create_slice)->
resetTensorElementType(tensor->getElementType());
submitted = submit(create_slice,tensor_mapper); if(!submitted) return false;
++num_tens_ops_in_fly;
//Extract the slice contents from the input/output tensor:
if(tensor_is_output){ //make sure the output tensor slice only shows up once
//assert(tensor == output_tensor);
......@@ -815,6 +817,7 @@ bool NumServer::submit(const ProcessGroup & process_group,
extract_slice->setTensorOperand(tensor_slice);
extract_slice->setTensorOperand(tensor);
submitted = submit(extract_slice,tensor_mapper); if(!submitted) return false;
++num_tens_ops_in_fly;
}
}else{
if(debugging && logging_ > 1) logfile_ << " without split indices" << std::endl; //debug
......@@ -822,12 +825,14 @@ bool NumServer::submit(const ProcessGroup & process_group,
} //loop over tensor operands
//Submit the primary tensor operation with the current slices:
submitted = submit(tens_op,tensor_mapper); if(!submitted) return false;
++num_tens_ops_in_fly;
//Insert the output tensor slice back into the output tensor:
if(output_tensor_slice){
std::shared_ptr<TensorOperation> insert_slice = tensor_op_factory_->createTensorOp(TensorOpCode::INSERT);
insert_slice->setTensorOperand(output_tensor);
insert_slice->setTensorOperand(output_tensor_slice);
submitted = submit(insert_slice,tensor_mapper); if(!submitted) return false;
++num_tens_ops_in_fly;
output_tensor_slice.reset();
}
//Destroy temporary input tensor slices:
......@@ -835,8 +840,12 @@ bool NumServer::submit(const ProcessGroup & process_group,
std::shared_ptr<TensorOperation> destroy_slice = tensor_op_factory_->createTensorOp(TensorOpCode::DESTROY);
destroy_slice->setTensorOperand(input_slice);
submitted = submit(destroy_slice,tensor_mapper); if(!submitted) return false;
++num_tens_ops_in_fly;
}
if(serialize || num_tens_ops_in_fly > exatn::runtime::TensorRuntime::MAX_RUNTIME_DAG_SIZE){
sync(process_group);
num_tens_ops_in_fly = 0;
}
if(serialize) sync(process_group); //sync for serialization
input_slices.clear();
} //loop over tensor operations
//Erase intermediate tensor slices once all tensor operations have been executed:
......@@ -851,10 +860,12 @@ bool NumServer::submit(const ProcessGroup & process_group,
allreduce->setTensorOperand(output_tensor);
std::dynamic_pointer_cast<numerics::TensorOpAllreduce>(allreduce)->resetMPICommunicator(process_group.getMPICommProxy());
submitted = submit(allreduce,tensor_mapper); if(!submitted) return false;
++num_tens_ops_in_fly;
}
}else{ //only a single tensor (sub-)network executed redundantly by all processes
for(auto op = op_list.begin(); op != op_list.end(); ++op){
submitted = submit(*op,tensor_mapper); if(!submitted) return false;
++num_tens_ops_in_fly;
}
++num_items_executed;
}
......
/** ExaTN::Numerics: Numerical server
REVISION: 2021/09/27
REVISION: 2021/09/29
Copyright (C) 2018-2021 Dmitry I. Lyakh (Liakh)
Copyright (C) 2018-2021 Oak Ridge National Laboratory (UT-Battelle) **/
......@@ -427,11 +427,6 @@ public:
bool sync(const ProcessGroup & process_group,
const Tensor & tensor,
bool wait = true);
/** Synchronizes execution of a specific tensor operation.
Changing wait to FALSE will only test for completion.
`This method has local synchronization semantics! **/
bool sync(TensorOperation & operation,
bool wait = true);
/** Synchronizes execution of a specific tensor network.
Changing wait to FALSE, only tests for completion.
If ProcessGroup is not provided, defaults to the local process. **/
......@@ -489,19 +484,26 @@ public:
bool withinTensorExistenceDomain(const std::string & tensor_name) const; //in: tensor name
/** Returns the process group associated with the given tensors, that is,
the overlap of existence domains of the given tensors. Note that the
existence domains of the given tensors must be properly nested,
the intersection of existence domains of the given tensors. Note that
the existence domains of the given tensors must be properly nested,
tensorA <= tensorB <= tensorC <= ... <= tensorZ,
otherwise the code will result in an undefined behavior. As a useful
rule, always place output tensors in front of input tensors. **/
for some order of the tensors, otherwise the code will result in
an undefined behavior. It is user's responsibility to ensure that
the returned process group is also a subdomain of full presence
for all participating tensors. **/
template <typename... Args>
const ProcessGroup & getTensorProcessGroup(const std::string & tensor_name, Args&&... tensor_names) const //in: tensor names
{
const auto & tensor_domain = getTensorProcessGroup(tensor_name);
const auto & other_tensors_domain = getTensorProcessGroup(std::forward<Args>(tensor_names)...);
if(!tensor_domain.isContainedIn(other_tensors_domain)){
if(tensor_domain.isContainedIn(other_tensors_domain)){
return tensor_domain;
}else if(other_tensors_domain.isContainedIn(tensor_domain)){
return other_tensors_domain;
}else{
std::cout << "#ERROR(exatn::getTensorProcessGroup): Tensor operand existence domains must be properly nested: "
<< "Tensor " << tensor_name << " violates this requirement!" << std::endl;
<< "Tensor " << tensor_name << " is not properly nested w.r.t. tensors ";
print_variadic_pack(std::forward<Args>(tensor_names)...); std::cout << std::endl;
const auto & tensor_domain_ranks = tensor_domain.getProcessRanks();
const auto & other_tensors_domain_ranks = other_tensors_domain.getProcessRanks();
std::cout << tensor_name << ":" << std::endl;
......@@ -512,7 +514,7 @@ public:
std::cout << std::endl;
assert(false);
};
return tensor_domain;
return getDefaultProcessGroup();
}
const ProcessGroup & getTensorProcessGroup(const std::string & tensor_name) const; //tensor name
......@@ -985,6 +987,12 @@ protected:
/** Submits an individual tensor operation for processing. **/
bool submitOp(std::shared_ptr<TensorOperation> operation); //in: tensor operation for numerical evaluation
/** Synchronizes execution of a specific tensor operation.
Changing wait to FALSE will only test for completion.
This method has local synchronization semantics! **/
bool sync(TensorOperation & operation, //in: previously submitted tensor operation
bool wait = true);
/** Destroys orphaned tensors (garbage collection). **/
void destroyOrphanedTensors();
......
......@@ -45,8 +45,8 @@
#define EXATN_TEST24
#define EXATN_TEST25
#define EXATN_TEST26
//#define EXATN_TEST27 //requires input file from source
//#define EXATN_TEST28 //requires input file from source
#define EXATN_TEST27 //requires input file from source
#define EXATN_TEST28 //requires input file from source
#define EXATN_TEST30
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment