Skip to content
GitLab
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Sign in
Toggle navigation
Menu
Open sidebar
ORNL Quantum Computing Institute
exatn
Commits
c4f6394a
Commit
c4f6394a
authored
Sep 29, 2021
by
Dmitry I. Lyakh
Browse files
1. Limited max number of tensor ops in fly; 2. Revised tensor domain existence policy.
Signed-off-by:
Dmitry I. Lyakh
<
quant4me@gmail.com
>
parent
af2869b5
Pipeline
#166931
failed with stage
in 5 minutes and 42 seconds
Changes
4
Pipelines
3
Hide whitespace changes
Inline
Side-by-side
src/exatn/exatn_numerics.hpp
View file @
c4f6394a
/** ExaTN::Numerics: General client header (free function API)
REVISION: 2021/09/2
7
REVISION: 2021/09/2
9
Copyright (C) 2018-2021 Dmitry I. Lyakh (Liakh)
Copyright (C) 2018-2021 Oak Ridge National Laboratory (UT-Battelle) **/
...
...
@@ -25,8 +25,10 @@ Copyright (C) 2018-2021 Oak Ridge National Laboratory (UT-Battelle) **/
(c) Tensor signature is an ordered tuple of {space_id,subspace_id} pairs
for each tensor dimension. In case space_id = SOME_SPACE, subspace_id
is simply the base offset in the anonymous vector space (min = 0).
(d) Additionally, a subset of tensor dimensions can be assigned an isometry property;
any tensor may have no more than two disjoint isometric dimension groups.
(d) Additionally, a subset of tensor dimensions can be assigned an isometry property.
Contraction over such a subset of isometric dimensions of a tensor with its
conjugate produces a Kronecker Delta tensor. Any tensor may have no more than
two disjoint isometric dimension subsets.
4. Tensor operation [tensor_operation.hpp]:
(a) Tensor operation is a mathematical operation on one or more tensor arguments.
(b) Evaluating a tensor operation means computing the value of all its output tensors,
...
...
@@ -84,19 +86,28 @@ Copyright (C) 2018-2021 Oak Ridge National Laboratory (UT-Battelle) **/
(a) A tensor can be allocated storage and processed at any time after its formal definition.
(b) Tensor storage allocation is called tensor creation. A tensor can either be created across all
MPI processes or within a specified group of them. The subset of MPI processes participating
in the tensor creation operation defines its
d
omain of
e
xistence, meaning that only these
in the tensor creation operation defines its
D
omain of
E
xistence, meaning that only these
MPI processses are aware of the existence of the created tensor. Note that the concrete
physical distribution of the tensor body among the MPI processes is hidden from the user
(either fully replicated or fully distributed or a mix of the two).
(c) A subset of MPI processes participating in a given tensor operation defines
(either fully replicated or fully distributed or a mix of the two). A contiguous subset
of MPI processes from the tensor existence domain that contains all tensor elements is
called the Subdomain of Full Presence.
(c) A set of MPI processes participating in a given tensor operation defines
its execution domain. The tensor operation execution domain must be compatible
with the existence domains of its tensor operands:
(1) The existence domains of all output tensor operands must be the same;
(2) The tensor operation execution domain must coincide with the existence
domains of all output tensor operands;
(3) The existence domain of each input tensor operand must include
the tensor operation execution domain AND each input tensor operand
must be fully available within the tensor operation execution domain.
with the existence/presence domains of its tensor operands:
(1) The existence domains of all tensor operands must be properly nested,
that is, there should exist an order of their placement such that
each previous domain is contained in or congruent to the next one:
D_i <= D_j <= D_k <= ...,
where D_i is the existence domain of tensor operand i.
(2) The tensor operation execution domain is the smallest of
the tensor operand existence domains;
(3) The tensor operation execution domain must be a subdomain
of full presence for all tensor operands;
(4) If any of the output tensor operands has a larger existence domain
than the execution domain of the tensor operation, it is the user's
responsibility to update the tensor value outside the tensor operation
execution domain, otherwise the code is non-compliant.
(d) By default, the tensor body is replicated across all MPI processes in its domain of existence.
The user also has an option to create a distributed tensor by specifying which dimensions of
this tensor to split into segments, thus inducing a block-wise decomposition of the tensor body.
...
...
@@ -260,11 +271,13 @@ inline bool withinTensorExistenceDomain(Args&&... tensor_names) //in: tensor nam
/** Returns the process group associated with the given tensors, that is,
the
overlap
of existence domains of the given tensors. Note that
the
existence domains of the given tensors must be properly nested,
the
intersection
of existence domains of the given tensors. Note that
the
existence domains of the given tensors must be properly nested,
tensorA <= tensorB <= tensorC <= ... <= tensorZ,
otherwise the code will result in an undefined behavior. As a useful
rule, always place output tensors in front of input tensors. **/
for some order of the tensors, otherwise the code will result in
an undefined behavior. It is user's responsibility to ensure that
the returned process group is also a subdomain of full presence
for all participating tensors. **/
template
<
typename
...
Args
>
inline
const
ProcessGroup
&
getTensorProcessGroup
(
Args
&&
...
tensor_names
)
//in: tensor names
{
return
numericalServer
->
getTensorProcessGroup
(
std
::
forward
<
Args
>
(
tensor_names
)...);}
...
...
src/exatn/num_server.cpp
View file @
c4f6394a
/** ExaTN::Numerics: Numerical server
REVISION: 2021/09/2
7
REVISION: 2021/09/2
9
Copyright (C) 2018-2021 Dmitry I. Lyakh (Liakh)
Copyright (C) 2018-2021 Oak Ridge National Laboratory (UT-Battelle) **/
...
...
@@ -630,7 +630,6 @@ bool NumServer::submit(const ProcessGroup & process_group,
if
(
logging_
>
0
)
logfile_
<<
"["
<<
std
::
fixed
<<
std
::
setprecision
(
6
)
<<
exatn
::
Timer
::
timeInSecHR
(
getTimeStampStart
())
<<
"]: Found a contraction sequence candidate locally (caching = "
<<
contr_seq_caching_
<<
")"
<<
std
::
endl
;
#ifdef MPI_ENABLED
//Synchronize on the best tensor contraction sequence across processes:
if
(
num_procs
>
1
&&
num_input_tensors
>
2
){
...
...
@@ -706,7 +705,9 @@ bool NumServer::submit(const ProcessGroup & process_group,
std
::
dynamic_pointer_cast
<
numerics
::
TensorOpTransform
>
(
op1
)
->
resetFunctor
(
std
::
shared_ptr
<
TensorMethod
>
(
new
numerics
::
FunctorInitVal
(
0.0
)));
submitted
=
submit
(
op1
,
tensor_mapper
);
if
(
!
submitted
)
return
false
;
//Submit all tensor operations for tensor network evaluation:
std
::
size_t
num_tens_ops_in_fly
=
0
;
const
auto
num_split_indices
=
network
.
getNumSplitIndices
();
//total number of indices that were split
if
(
logging_
>
0
)
logfile_
<<
"Number of split indices = "
<<
num_split_indices
<<
std
::
endl
<<
std
::
flush
;
std
::
size_t
num_items_executed
=
0
;
//number of tensor sub-networks executed
...
...
@@ -805,6 +806,7 @@ bool NumServer::submit(const ProcessGroup & process_group,
std
::
dynamic_pointer_cast
<
numerics
::
TensorOpCreate
>
(
create_slice
)
->
resetTensorElementType
(
tensor
->
getElementType
());
submitted
=
submit
(
create_slice
,
tensor_mapper
);
if
(
!
submitted
)
return
false
;
++
num_tens_ops_in_fly
;
//Extract the slice contents from the input/output tensor:
if
(
tensor_is_output
){
//make sure the output tensor slice only shows up once
//assert(tensor == output_tensor);
...
...
@@ -815,6 +817,7 @@ bool NumServer::submit(const ProcessGroup & process_group,
extract_slice
->
setTensorOperand
(
tensor_slice
);
extract_slice
->
setTensorOperand
(
tensor
);
submitted
=
submit
(
extract_slice
,
tensor_mapper
);
if
(
!
submitted
)
return
false
;
++
num_tens_ops_in_fly
;
}
}
else
{
if
(
debugging
&&
logging_
>
1
)
logfile_
<<
" without split indices"
<<
std
::
endl
;
//debug
...
...
@@ -822,12 +825,14 @@ bool NumServer::submit(const ProcessGroup & process_group,
}
//loop over tensor operands
//Submit the primary tensor operation with the current slices:
submitted
=
submit
(
tens_op
,
tensor_mapper
);
if
(
!
submitted
)
return
false
;
++
num_tens_ops_in_fly
;
//Insert the output tensor slice back into the output tensor:
if
(
output_tensor_slice
){
std
::
shared_ptr
<
TensorOperation
>
insert_slice
=
tensor_op_factory_
->
createTensorOp
(
TensorOpCode
::
INSERT
);
insert_slice
->
setTensorOperand
(
output_tensor
);
insert_slice
->
setTensorOperand
(
output_tensor_slice
);
submitted
=
submit
(
insert_slice
,
tensor_mapper
);
if
(
!
submitted
)
return
false
;
++
num_tens_ops_in_fly
;
output_tensor_slice
.
reset
();
}
//Destroy temporary input tensor slices:
...
...
@@ -835,8 +840,12 @@ bool NumServer::submit(const ProcessGroup & process_group,
std
::
shared_ptr
<
TensorOperation
>
destroy_slice
=
tensor_op_factory_
->
createTensorOp
(
TensorOpCode
::
DESTROY
);
destroy_slice
->
setTensorOperand
(
input_slice
);
submitted
=
submit
(
destroy_slice
,
tensor_mapper
);
if
(
!
submitted
)
return
false
;
++
num_tens_ops_in_fly
;
}
if
(
serialize
||
num_tens_ops_in_fly
>
exatn
::
runtime
::
TensorRuntime
::
MAX_RUNTIME_DAG_SIZE
){
sync
(
process_group
);
num_tens_ops_in_fly
=
0
;
}
if
(
serialize
)
sync
(
process_group
);
//sync for serialization
input_slices
.
clear
();
}
//loop over tensor operations
//Erase intermediate tensor slices once all tensor operations have been executed:
...
...
@@ -851,10 +860,12 @@ bool NumServer::submit(const ProcessGroup & process_group,
allreduce
->
setTensorOperand
(
output_tensor
);
std
::
dynamic_pointer_cast
<
numerics
::
TensorOpAllreduce
>
(
allreduce
)
->
resetMPICommunicator
(
process_group
.
getMPICommProxy
());
submitted
=
submit
(
allreduce
,
tensor_mapper
);
if
(
!
submitted
)
return
false
;
++
num_tens_ops_in_fly
;
}
}
else
{
//only a single tensor (sub-)network executed redundantly by all processes
for
(
auto
op
=
op_list
.
begin
();
op
!=
op_list
.
end
();
++
op
){
submitted
=
submit
(
*
op
,
tensor_mapper
);
if
(
!
submitted
)
return
false
;
++
num_tens_ops_in_fly
;
}
++
num_items_executed
;
}
...
...
src/exatn/num_server.hpp
View file @
c4f6394a
/** ExaTN::Numerics: Numerical server
REVISION: 2021/09/2
7
REVISION: 2021/09/2
9
Copyright (C) 2018-2021 Dmitry I. Lyakh (Liakh)
Copyright (C) 2018-2021 Oak Ridge National Laboratory (UT-Battelle) **/
...
...
@@ -427,11 +427,6 @@ public:
bool
sync
(
const
ProcessGroup
&
process_group
,
const
Tensor
&
tensor
,
bool
wait
=
true
);
/** Synchronizes execution of a specific tensor operation.
Changing wait to FALSE will only test for completion.
`This method has local synchronization semantics! **/
bool
sync
(
TensorOperation
&
operation
,
bool
wait
=
true
);
/** Synchronizes execution of a specific tensor network.
Changing wait to FALSE, only tests for completion.
If ProcessGroup is not provided, defaults to the local process. **/
...
...
@@ -489,19 +484,26 @@ public:
bool
withinTensorExistenceDomain
(
const
std
::
string
&
tensor_name
)
const
;
//in: tensor name
/** Returns the process group associated with the given tensors, that is,
the
overlap
of existence domains of the given tensors. Note that
the
existence domains of the given tensors must be properly nested,
the
intersection
of existence domains of the given tensors. Note that
the
existence domains of the given tensors must be properly nested,
tensorA <= tensorB <= tensorC <= ... <= tensorZ,
otherwise the code will result in an undefined behavior. As a useful
rule, always place output tensors in front of input tensors. **/
for some order of the tensors, otherwise the code will result in
an undefined behavior. It is user's responsibility to ensure that
the returned process group is also a subdomain of full presence
for all participating tensors. **/
template
<
typename
...
Args
>
const
ProcessGroup
&
getTensorProcessGroup
(
const
std
::
string
&
tensor_name
,
Args
&&
...
tensor_names
)
const
//in: tensor names
{
const
auto
&
tensor_domain
=
getTensorProcessGroup
(
tensor_name
);
const
auto
&
other_tensors_domain
=
getTensorProcessGroup
(
std
::
forward
<
Args
>
(
tensor_names
)...);
if
(
!
tensor_domain
.
isContainedIn
(
other_tensors_domain
)){
if
(
tensor_domain
.
isContainedIn
(
other_tensors_domain
)){
return
tensor_domain
;
}
else
if
(
other_tensors_domain
.
isContainedIn
(
tensor_domain
)){
return
other_tensors_domain
;
}
else
{
std
::
cout
<<
"#ERROR(exatn::getTensorProcessGroup): Tensor operand existence domains must be properly nested: "
<<
"Tensor "
<<
tensor_name
<<
" violates this requirement!"
<<
std
::
endl
;
<<
"Tensor "
<<
tensor_name
<<
" is not properly nested w.r.t. tensors "
;
print_variadic_pack
(
std
::
forward
<
Args
>
(
tensor_names
)...);
std
::
cout
<<
std
::
endl
;
const
auto
&
tensor_domain_ranks
=
tensor_domain
.
getProcessRanks
();
const
auto
&
other_tensors_domain_ranks
=
other_tensors_domain
.
getProcessRanks
();
std
::
cout
<<
tensor_name
<<
":"
<<
std
::
endl
;
...
...
@@ -512,7 +514,7 @@ public:
std
::
cout
<<
std
::
endl
;
assert
(
false
);
};
return
tensor_domain
;
return
getDefaultProcessGroup
()
;
}
const
ProcessGroup
&
getTensorProcessGroup
(
const
std
::
string
&
tensor_name
)
const
;
//tensor name
...
...
@@ -985,6 +987,12 @@ protected:
/** Submits an individual tensor operation for processing. **/
bool
submitOp
(
std
::
shared_ptr
<
TensorOperation
>
operation
);
//in: tensor operation for numerical evaluation
/** Synchronizes execution of a specific tensor operation.
Changing wait to FALSE will only test for completion.
This method has local synchronization semantics! **/
bool
sync
(
TensorOperation
&
operation
,
//in: previously submitted tensor operation
bool
wait
=
true
);
/** Destroys orphaned tensors (garbage collection). **/
void
destroyOrphanedTensors
();
...
...
src/exatn/tests/NumServerTester.cpp
View file @
c4f6394a
...
...
@@ -45,8 +45,8 @@
#define EXATN_TEST24
#define EXATN_TEST25
#define EXATN_TEST26
//
#define EXATN_TEST27 //requires input file from source
//
#define EXATN_TEST28 //requires input file from source
#define EXATN_TEST27 //requires input file from source
#define EXATN_TEST28 //requires input file from source
#define EXATN_TEST30
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment