Commit 672d544b authored Nov 30, 2023 by felix.tomski Committed by Joachim Jenke Dec 08, 2023

[TSan] Ignore reads if not stored early

As documented in this paper https://publications.rwth-aachen.de/record/840022/files/840022.pdf,
we could trace back a significant runtime overhead introduced by certain HPC/scientific
applications to concurrent shared read accesses.
A typical scenario for such read accesses is matrix-vector multiplication which is
frequently used to solve linar equation system. Accidentally, similar operations are
also present in different machine learning algorithms.
The performance issue typically arises when the code executes with more than 4 threads
and gets worse when the threads are spread across different NUMA domains / sockets.

The proposed change is to skip logging of reads, of they are not logged early. This
means that previous reads by the current threads will still be updated. Empty shadow
cells will also be used for logging.
This change also avoids that previous writes get randomly overwritten by a read access.

Under review as #74575

parent f8575ff4

compiler-rt/lib/tsan/rtl/tsan_rtl_access.cpp

+7 −1

Original line number	Diff line number	Diff line
		@@ -224,6 +224,8 @@ bool CheckRaces(ThreadState* thr, RawShadow* shadow_mem, Shadow cur,
		// the current access info, so we are done.
		if (LIKELY(stored))
		return false;
		if (LIKELY(typ & kAccessRead))
		return false;
		// Choose a random candidate slot and replace it.
		uptr index =
		atomic_load_relaxed(&thr->trace_pos) / sizeof(Event) % kShadowCnt;
		@@ -345,9 +347,13 @@ STORE : {
		const m128 empty = _mm_cmpeq_epi32(shadow, zero);
		const int empty_mask = _mm_movemask_epi8(empty);
		index = __builtin_ffs(empty_mask);
		if (UNLIKELY(index == 0))
		if (UNLIKELY(index == 0)) {
		// If we reach here, we give up storing reads
		if (typ & kAccessRead)
		return false;
		index = (atomic_load_relaxed(&thr->trace_pos) / 2) % 16;
		}
		}
		StoreShadow(&shadow_mem[index / 4], cur.raw());
		// We could zero other slots determined by rewrite_mask.
		// That would help other threads to evict better slots,