Commit 672d544b authored by felix.tomski's avatar felix.tomski Committed by Joachim Jenke
Browse files

[TSan] Ignore reads if not stored early

As documented in this paper https://publications.rwth-aachen.de/record/840022/files/840022.pdf,
we could trace back a significant runtime overhead introduced by certain HPC/scientific
applications to concurrent shared read accesses.
A typical scenario for such read accesses is matrix-vector multiplication which is
frequently used to solve linar equation system. Accidentally, similar operations are
also present in different machine learning algorithms.
The performance issue typically arises when the code executes with more than 4 threads
and gets worse when the threads are spread across different NUMA domains / sockets.

The proposed change is to skip logging of reads, of they are not logged early. This
means that previous reads by the current threads will still be updated. Empty shadow
cells will also be used for logging.
This change also avoids that previous writes get randomly overwritten by a read access.

Under review as #74575
parent f8575ff4
Loading
Loading
Loading
Loading
+7 −1
Original line number Diff line number Diff line
@@ -224,6 +224,8 @@ bool CheckRaces(ThreadState* thr, RawShadow* shadow_mem, Shadow cur,
  // the current access info, so we are done.
  if (LIKELY(stored))
    return false;
  if (LIKELY(typ & kAccessRead))
    return false;
  // Choose a random candidate slot and replace it.
  uptr index =
      atomic_load_relaxed(&thr->trace_pos) / sizeof(Event) % kShadowCnt;
@@ -345,9 +347,13 @@ STORE : {
    const m128 empty = _mm_cmpeq_epi32(shadow, zero);
    const int empty_mask = _mm_movemask_epi8(empty);
    index = __builtin_ffs(empty_mask);
    if (UNLIKELY(index == 0))
    if (UNLIKELY(index == 0)) {
      // If we reach here, we give up storing reads
      if (typ & kAccessRead)
        return false;
      index = (atomic_load_relaxed(&thr->trace_pos) / 2) % 16;
    }
  }
  StoreShadow(&shadow_mem[index / 4], cur.raw());
  // We could zero other slots determined by rewrite_mask.
  // That would help other threads to evict better slots,