This project is mirrored from Pull mirroring updated .
  1. 23 Jan, 2022 1 commit
    • Fangrui Song's avatar
      [Support] Simplify parallelForEach{,N} · 8e382ae9
      Fangrui Song authored
      * Merge parallel_for_each into parallelForEach (this removes 1 `Fn(...)` call)
      * Change parallelForEach to use parallelForEachN
      * Move parallelForEachN into Parallel.cpp
      My x86-64 `lld` executable is 100KiB smaller.
      No noticeable difference in performance.
      Reviewed By: lattner
      Differential Revision:
  2. 18 Sep, 2021 1 commit
  3. 08 Jul, 2020 1 commit
  4. 31 Mar, 2020 1 commit
    • Fangrui Song's avatar
      [lld][COFF][ELF][WebAssembly] Replace --[no-]threads /threads[:no] with... · eb4663d8
      Fangrui Song authored
      [lld][COFF][ELF][WebAssembly] Replace --[no-]threads /threads[:no] with --threads={1,2,...} /threads:{1,2,...}
      --no-threads is a name copied from gold.
      gold has --no-thread, --thread-count and several other --thread-count-*.
      There are needs to customize the number of threads (running several lld
      processes concurrently or customizing the number of LTO threads).
      Having a single --threads=N is a straightforward replacement of gold's
      --no-threads + --thread-count.
      --no-threads is used rarely. So just delete --no-threads instead of
      keeping it for compatibility for a while.
      If --threads= is specified (ELF,wasm; COFF /threads: is similar),
      --thinlto-jobs= defaults to --threads=,
      otherwise all available hardware threads are used.
      There is currently no way to override a --threads={1,2,...}. It is still
      a debate whether we should use --threads=all.
      Reviewed By: rnk, aganea
      Differential Revision:
  5. 14 Feb, 2020 1 commit
    • Alexandre Ganea's avatar
      [Support] On Windows, ensure hardware_concurrency() extends to all CPU sockets and all NUMA groups · 8404aeb5
      Alexandre Ganea authored
      The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.
      == Background ==
      Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.
      By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.
      This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.
      == The problem ==
      The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std::thread::hardware_concurrency() -- which can only return processors from the current "processor group".
      == The changes in this patch ==
      To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).
      When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
      The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.
      When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.
      Differential Revision:
  6. 10 Jan, 2020 1 commit
    • Andrew Ng's avatar
      [Support] ThreadPoolExecutor fixes for Windows/MinGW · 564481ae
      Andrew Ng authored
      Changed ThreadPoolExecutor to no longer use detached threads and instead
      to join threads on destruction. This is to prevent intermittent crashing
      on Windows when doing a normal full exit, e.g. via exit().
      Changed ThreadPoolExecutor to be a ManagedStatic so that it can be
      stopped on llvm_shutdown(). Without this, it would only be stopped in
      the destructor when doing a full exit. This is required to avoid
      intermittent crashing on Windows due to a race condition between the
      ThreadPoolExecutor starting up threads and the process doing a fast
      exit, e.g. via _exit().
      The Windows crashes appear to only occur with the MSVC static runtimes
      and are more frequent with the debug static runtime.
      These changes also prevent intermittent deadlocks on exit with the MinGW
      Differential Revision:
  7. 10 Oct, 2019 1 commit
    • Nico Weber's avatar
      win: Move Parallel.h off concrt to cross-platform code · d4960032
      Nico Weber authored
      r179397 added Parallel.h and implemented it terms of concrt in 2013.
      In 2015, a cross-platform implementation of the functions has appeared
      and is in use everywhere but on Windows (r232419).  r246219 hints that
      <thread> had issues in MSVC2013, but r296906 suggests they've been fixed
      now that we require 2015+.
      So remove the concrt code. It's less code, and it sounds like concrt has
      conceptual and performance issues, see PR41198.
      I built blink_core.dll in a debug component build with full symbols and
      in a release component build without any symbols.  I couldn't measure a
      performance difference for linking blink_core.dll before and after this
      Differential Revision:
      llvm-svn: 374421
  8. 25 Apr, 2019 1 commit
    • Fangrui Song's avatar
      Parallel: only allow the first TaskGroup to run tasks parallelly · f6a62909
      Fangrui Song authored
      Concurrent (e.g. nested) llvm::parallel::for_each() may lead to dead
      locks. See PR35788 (fixed by rLLD322041) and PR41508 (fixed by D60757).
      When parallel_for_each() is about to return, in ~Latch() called by
      ~TaskGroup(), a thread (in the default executor) may block in
      Latch::sync() waiting for Count to become zero. If all threads in the
      default executor are blocked, it is a dead lock.
      To fix this, force serial execution if the current TaskGroup is not the
      first one. For a nested llvm::parallel::for_each(), this parallelizes
      the outermost loop and serializes inner loops.
      Differential Revision:
      llvm-svn: 359182
  9. 19 Jan, 2019 1 commit
    • Chandler Carruth's avatar
      Update the file headers across all of the LLVM projects in the monorepo · 2946cd70
      Chandler Carruth authored
      to reflect the new license.
      We understand that people may be surprised that we're moving the header
      entirely to discuss the new license. We checked this carefully with the
      Foundation's lawyer and we believe this is the correct approach.
      Essentially, all code in the project is now made available by the LLVM
      project under our new license, so you will see that the license headers
      include that license only. Some of our contributors have contributed
      code under our old license, and accordingly, we have retained a copy of
      our old license notice in the top-level files in each project and
      llvm-svn: 351636
  10. 11 May, 2018 1 commit
  11. 01 May, 2018 1 commit
    • Adrian Prantl's avatar
      Remove \brief commands from doxygen comments. · 5f8f34e4
      Adrian Prantl authored
      We've been running doxygen with the autobrief option for a couple of
      years now. This makes the \brief markers into our comments
      redundant. Since they are a visual distraction and we don't want to
      encourage more \brief markers in new code either, this patch removes
      them all.
      Patch produced by
        for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
      Differential Revision:
      llvm-svn: 331272
  12. 04 Oct, 2017 2 commits
    • Rafael Espindola's avatar
      Bring r314809 back. · 8c0ff950
      Rafael Espindola authored
      But now include a check for CPU_COUNT so we still build on 10 year old
      versions of glibc.
      Original message:
      Use sched_getaffinity instead of std::thread::hardware_concurrency.
      The issue with std::thread::hardware_concurrency is that it forwards
      to libc and some implementations (like glibc) don't take thread
      affinity into consideration.
      With this change a llvm program that can execute in only 2 cores will
      use 2 threads, even if the machine has 32 cores.
      This makes benchmarking a lot easier, but should also help if someone
      doesn't want to use all cores for compilation for example.
      llvm-svn: 314931
    • Daniel Neilson's avatar
      Revert D38481 due to missing cmake check for CPU_COUNT · bef94bcb
      Daniel Neilson authored
      This reverts D38481. The change breaks systems with older versions of glibc. It
      injects a use of CPU_COUNT() from sched.h without checking to ensure that the
      function exists first.
      llvm-svn: 314922
  13. 03 Oct, 2017 1 commit
    • Rafael Espindola's avatar
      Use sched_getaffinity instead of std::thread::hardware_concurrency. · 6e182fba
      Rafael Espindola authored
      The issue with std::thread::hardware_concurrency is that it forwards
      to libc and some implementations (like glibc) don't take thread
      affinity into consideration.
      With this change a llvm program that can execute in only 2 cores will
      use 2 threads, even if the machine has 32 cores.
      This makes benchmarking a lot easier, but should also help if someone
      doesn't want to use all cores for compilation for example.
      llvm-svn: 314809
  14. 11 May, 2017 5 commits
  15. 10 May, 2017 1 commit
  16. 05 May, 2017 3 commits