Commit 43c0c7ec authored by Simon Spannagel's avatar Simon Spannagel
Browse files

Merge branch 'fix_tref_object_count' into 'master'

Reset ObjectCount to Avoid Exploding TProcessID Table

Closes #190

See merge request allpix-squared/allpix-squared!952
parents edaf9136 1448c9c4
Loading
Loading
Loading
Loading
+63 −0
Original line number Diff line number Diff line
@@ -64,3 +64,66 @@ to handle parallelization internally which violates the Allpix Squared design. F
reproducibility between its multithreaded and sequential run managers. Modules that would like to use the Geant4 library
shall not use the run managers provided by Geant4. Instead, they must use the custom run managers provided by Allpix Squared
as described in [Section 14.1](../14_additional/01_tools.md#geant4-interface).

### Object History, TRefs and PointerWrappers

Allpix Squared uses ROOTs `TRef` objects to store the persistent links of the simulation object history. These references act 
similar to C pointers and allow accessing the referenced object directly without additional bookkeeping or lookup of indices. 
Furthermore they persist when being written to a ROOT file and read back to memory. ROOT implements this via a central lookup 
table that keeps track of the referenced objects and their location in memory as described 
[in the ROOT documentation](https://root.cern.ch/doc/master/classTRef.html).

This approach comes with some drawbacks, especially in multithreaded environments. Most importantly the lookup table is a 
global object, which means mutexes are required for accessing it. Multiple threads generating or using `TRef` references will 
have to share this mutex and will consequently be subject to significant waiting for lock release. Furthermore generating more 
and more `TRef` relations over the course of a simulation will increase the size of the central reference table. This table is 
initialized with a fixed size, and once the number of `TRef` objects outgrows this pre-allocated space, new memory has to be 
acquired, leading to a reallocation of memory for the entire new size of the table. With potentially millions of entries, this 
very quickly becomes a very computationally very expensive operation, slowing down the simulation significantly.

Allpix Squared solves these limitations by wrapping the `TRef` objects into a class called `PointerWrapper`. It contains both 
a direct, but transitional C pointer and a `TRef` to the referenced object. The latter, however, is only generated when 
required, i.e. if the object holding the `PointerWrapper` as well as referenced object are going to be written to file. This 
is achieved by first going through all relevant objects, marking them for storage:

```cpp
for(auto& object : objects) {
    object.markForStorage();
}
```

Now, the required history references can be identified and `TRef` objects are generated *only* for relations between two objects
that are both marked for storage:

```cpp
for(auto& object : objects) {
    object.petrifyHistory();
}
```

Objects can now be written to file and will contain the persistent reference to the related object.

This approach solves the above problems. File writing has to be performed single-threaded anyway, so generating `TRef` objects 
on the same thread does not lead to additional locking of the central reference table mutex in root. In addition, `TRef` entries 
are only generated and stored in the table for objects that require it - all references to objects not to be stored will be 
`nullptr` in either case since the target object is not available anymore when reading in the data. Since now the generation of 
`TRef` objects and access to the reference table is performed by a single thread and one single event at a time, it is also 
possible to reset the ROOT-internal object ID of `TRef` references after the event has been processed. The subsequent event will 
reuse the same IDs again, preventing a continuous growth of the reference table and related memory re-allocation issues.

As a consequence, when reading objects back from file in a mutlithreaded environment, the `TRef` has to be converted back to a C
memory pointer in the reading thread, both to prevent mixing of re-used `TRef` object IDs from different events and to avoid 
locking access to the central reference table when looking up the memory location from there. This is performed similarly to the 
generation of history relations, and here only relations to valid TRefs are loaded, other relations will hold a `nullptr`:

```cpp
for(auto& object : objects) {
    object.loadHistory();
}
```

For single-threaded applications such as ROOT analysis macros, this step is not necessary and the reference will be lazy-loaded
when accessed, i.e. the `TRef` reference will be converted to a direct raw pointer only when actually used. Since events are
processed sequentially and memory is freed between events, no mixing of IDs occurs.

+8 −2
Original line number Diff line number Diff line
@@ -110,6 +110,10 @@ bool ROOTObjectWriterModule::filter(const std::shared_ptr<BaseMessage>& message,
void ROOTObjectWriterModule::run(Event* event) {
    auto root_lock = root_process_lock();

    // Retrieve current object count:
    auto object_count = TProcessID::GetObjectCount();

    // Fetch filtered messages
    auto messages = messenger_->fetchFilteredMessages(this, event);

    // Mark objects to be stored:
@@ -192,8 +196,6 @@ void ROOTObjectWriterModule::run(Event* event) {
        // Fill the branch vector
        for(Object& object : object_array) {
            // Trigger the creation of TRefs for cross-object references to be able to store them to file.
            // We can reset the TObject count after processing this event because the TRef creation is only done here locally
            // in one worker thread instead of framew-work wide.
            object.petrifyHistory();
            ++write_cnt_;
            write_list_[index_tuple]->push_back(&object);
@@ -212,6 +214,10 @@ void ROOTObjectWriterModule::run(Event* event) {
    for(auto& index_data : write_list_) {
        index_data.second->clear();
    }

    // We can reset the TObject count after processing this event because the TRef creation is only done here locally
    // in one worker thread instead of framework wide.
    TProcessID::SetObjectCount(object_count);
}

void ROOTObjectWriterModule::finalize() {