This project is mirrored from https://github.com/llvm-doe-org/llvm-project.git. Pull mirroring updated .
  1. 23 Jan, 2022 1 commit
    • Fangrui Song's avatar
      [Support] Simplify parallelForEach{,N} · 8e382ae9
      Fangrui Song authored
      * Merge parallel_for_each into parallelForEach (this removes 1 `Fn(...)` call)
      * Change parallelForEach to use parallelForEachN
      * Move parallelForEachN into Parallel.cpp
      
      My x86-64 `lld` executable is 100KiB smaller.
      No noticeable difference in performance.
      
      Reviewed By: lattner
      
      Differential Revision: https://reviews.llvm.org/D117510
      8e382ae9
  2. 18 Sep, 2021 1 commit
  3. 03 May, 2021 1 commit
    • Chris Lattner's avatar
      [Support/Parallel] Add a special case for 0/1 items to llvm::parallel_for_each. · 5fa9d416
      Chris Lattner authored
      This avoids the non-trivial overhead of creating a TaskGroup in these degenerate
      cases, but also exposes parallelism.  It turns out that the default executor
      underlying TaskGroup prevents recursive parallelism - so an instance of a task
      group being alive will make nested ones become serial.
      
      This is a big issue in MLIR in some dialects, if they have a single instance of
      an outer op (e.g. a firrtl.circuit) that has many parallel ops within it (e.g.
      a firrtl.module).  This patch side-steps the problem by avoiding creating the
      TaskGroup in the unneeded case.  See this issue for more details:
      https://github.com/llvm/circt/issues/993
      
      Note that this isn't a really great solution for the general case of nested
      parallelism.  A redesign of the TaskGroup stuff would be better, but would be
      a much more invasive change.
      
      Differential Revision: https://reviews.llvm.org/D101699
      5fa9d416
  4. 03 Nov, 2020 1 commit
    • Reid Kleckner's avatar
      Add parallelTransformReduce and parallelForEachError · c0a922b3
      Reid Kleckner authored
      parallelTransformReduce is modelled on the C++17 pstl API of
      std::transform_reduce, except our wrappers do not use execution policy
      parameters.
      
      parallelForEachError allows loops that contain potentially failing
      operations to propagate errors out of the loop. This was one of the
      major challenges I encountered while parallelizing PDB type merging in
      LLD. Parallelizing a loop with parallelForEachError is not behavior
      preserving: the loop will no longer stop on the first error, it will
      continue working and report all errors it encounters in a list.
      
      I plan to use this to propagate errors out of LLD's
      coff::TpiSource::remapTpiWithGHashes, which currently stores errors an
      error in the TpiSource object.
      
      Differential Revision: https://reviews.llvm.org/D90639
      c0a922b3
  5. 05 May, 2020 1 commit
    • Reid Kleckner's avatar
      [Support] Move LLD's parallel algorithm wrappers to support · 932f0276
      Reid Kleckner authored
      Essentially takes the lld/Common/Threads.h wrappers and moves them to
      the llvm/Support/Paralle.h algorithm header.
      
      The changes are:
      - Remove policy parameter, since all clients use `par`.
      - Rename the methods to `parallelSort` etc to match LLVM style, since
        they are no longer C++17 pstl compatible.
      - Move algorithms from llvm::parallel:: to llvm::, since they have
        "parallel" in the name and are no longer overloads of the regular
        algorithms.
      - Add range overloads
      - Use the sequential algorithm directly when 1 thread is requested
        (skips task grouping)
      - Fix the index type of parallelForEachN to size_t. Nobody in LLVM was
        using any other parameter, and it made overload resolution hard for
        for_each_n(par, 0, foo.size(), ...) because 0 is int, not size_t.
      
      Remove Threads.h and update LLD for that.
      
      This is a prerequisite for parallel public symbol processing in the PDB
      library, which is in LLVM.
      
      Reviewed By: MaskRay, aganea
      
      Differential Revision: https://reviews.llvm.org/D79390
      932f0276
  6. 31 Mar, 2020 1 commit
    • Fangrui Song's avatar
      [lld][COFF][ELF][WebAssembly] Replace --[no-]threads /threads[:no] with... · eb4663d8
      Fangrui Song authored
      [lld][COFF][ELF][WebAssembly] Replace --[no-]threads /threads[:no] with --threads={1,2,...} /threads:{1,2,...}
      
      --no-threads is a name copied from gold.
      gold has --no-thread, --thread-count and several other --thread-count-*.
      
      There are needs to customize the number of threads (running several lld
      processes concurrently or customizing the number of LTO threads).
      Having a single --threads=N is a straightforward replacement of gold's
      --no-threads + --thread-count.
      
      --no-threads is used rarely. So just delete --no-threads instead of
      keeping it for compatibility for a while.
      
      If --threads= is specified (ELF,wasm; COFF /threads: is similar),
      --thinlto-jobs= defaults to --threads=,
      otherwise all available hardware threads are used.
      
      There is currently no way to override a --threads={1,2,...}. It is still
      a debate whether we should use --threads=all.
      
      Reviewed By: rnk, aganea
      
      Differential Revision: https://reviews.llvm.org/D76885
      eb4663d8
  7. 10 Oct, 2019 1 commit
    • Nico Weber's avatar
      win: Move Parallel.h off concrt to cross-platform code · d4960032
      Nico Weber authored
      r179397 added Parallel.h and implemented it terms of concrt in 2013.
      
      In 2015, a cross-platform implementation of the functions has appeared
      and is in use everywhere but on Windows (r232419).  r246219 hints that
      <thread> had issues in MSVC2013, but r296906 suggests they've been fixed
      now that we require 2015+.
      
      So remove the concrt code. It's less code, and it sounds like concrt has
      conceptual and performance issues, see PR41198.
      
      I built blink_core.dll in a debug component build with full symbols and
      in a release component build without any symbols.  I couldn't measure a
      performance difference for linking blink_core.dll before and after this
      patch.
      
      Differential Revision: https://reviews.llvm.org/D68820
      
      llvm-svn: 374421
      d4960032
  8. 25 Apr, 2019 1 commit
    • Fangrui Song's avatar
      Parallel: only allow the first TaskGroup to run tasks parallelly · f6a62909
      Fangrui Song authored
      Summary:
      Concurrent (e.g. nested) llvm::parallel::for_each() may lead to dead
      locks. See PR35788 (fixed by rLLD322041) and PR41508 (fixed by D60757).
      
      When parallel_for_each() is about to return, in ~Latch() called by
      ~TaskGroup(), a thread (in the default executor) may block in
      Latch::sync() waiting for Count to become zero. If all threads in the
      default executor are blocked, it is a dead lock.
      
      To fix this, force serial execution if the current TaskGroup is not the
      first one. For a nested llvm::parallel::for_each(), this parallelizes
      the outermost loop and serializes inner loops.
      
      Differential Revision: https://reviews.llvm.org/D61115
      
      llvm-svn: 359182
      f6a62909
  9. 19 Jan, 2019 1 commit
    • Chandler Carruth's avatar
      Update the file headers across all of the LLVM projects in the monorepo · 2946cd70
      Chandler Carruth authored
      to reflect the new license.
      
      We understand that people may be surprised that we're moving the header
      entirely to discuss the new license. We checked this carefully with the
      Foundation's lawyer and we believe this is the correct approach.
      
      Essentially, all code in the project is now made available by the LLVM
      project under our new license, so you will see that the license headers
      include that license only. Some of our contributors have contributed
      code under our old license, and accordingly, we have retained a copy of
      our old license notice in the top-level files in each project and
      repository.
      
      llvm-svn: 351636
      2946cd70
  10. 01 May, 2018 1 commit
    • Adrian Prantl's avatar
      Remove \brief commands from doxygen comments. · 5f8f34e4
      Adrian Prantl authored
      We've been running doxygen with the autobrief option for a couple of
      years now. This makes the \brief markers into our comments
      redundant. Since they are a visual distraction and we don't want to
      encourage more \brief markers in new code either, this patch removes
      them all.
      
      Patch produced by
      
        for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
      
      Differential Revision: https://reviews.llvm.org/D46290
      
      llvm-svn: 331272
      5f8f34e4
  11. 01 Apr, 2018 1 commit
    • Mandeep Singh Grang's avatar
      [include] Change std::sort to llvm::sort in response to r327219 · ba8033be
      Mandeep Singh Grang authored
      Summary:
      r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
      This will help in uncovering non-determinism caused due to undefined sorting
      order of objects having the same key.
      
      To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
      
      Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
      Refer the comments section in D44363 for a list of all the required patches.
      
      Reviewers: echristo, zturner, mzolotukhin, lhames
      
      Reviewed By: echristo
      
      Subscribers: llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D45135
      
      llvm-svn: 328940
      ba8033be
  12. 22 Mar, 2018 1 commit
  13. 21 Aug, 2017 1 commit
    • George Rimar's avatar
      [Support/Parallel] - Do not use a task group for a very small task. · d7305ef0
      George Rimar authored
      parallel_for_each_n splits a given task into small pieces of tasks and then
      passes them to background threads managed by a thread pool to process them
      in parallel. TaskGroup then waits for all tasks to be done, which is done by
      TaskGroup's destructor.
      
      In the previous code, all tasks were passed to background threads, and the
      main thread just waited for them to finish their jobs. This patch changes
      the logic so that the main thread processes a task just like other
      worker threads instead of just waiting for workers.
      
      This patch improves the performance of parallel_for_each_n for a task which
      is too small that we do not split it into multiple tasks. Previously, such task
      was submitted to another thread and the main thread waited for its completion.
      That involves multiple inter-thread synchronization which is not cheap for
      small tasks. Now, such task is processed by the main thread, so no inter-thread
      communication is necessary.
      
      Differential revision: https://reviews.llvm.org/D36607
      
      llvm-svn: 311312
      d7305ef0
  14. 11 May, 2017 3 commits
  15. 10 May, 2017 2 commits
  16. 05 May, 2017 2 commits
  17. 07 Apr, 2017 1 commit
    • James Henderson's avatar
      [Core] Fix parallel_for for Linux · 8abda20a
      James Henderson authored
      r299635 exposed a latent bug in the Linux implementation of parallel_for, which
      resulted in it calling the function outside of the range requested, resulting
      later in a segmentation fault. This change fixes this issue and adds a unit test.
      
      llvm-svn: 299748
      8abda20a
  18. 05 Dec, 2016 1 commit
  19. 27 Nov, 2016 1 commit
  20. 16 Nov, 2016 1 commit
    • Rui Ueyama's avatar
      Reduce number of tasks in parallel_for_each. · 87ff6fef
      Rui Ueyama authored
      TaskGroup has a fairly high overhead, so we don't want to partition
      tasks into too small tasks. This patch partition tasks into up to
      1024 tasks.
      
      I compared this patch with the original LLD's parallel_for_each.
      I reverted r287042 locally for comparison.
      
      With this patch, time to self-link lld with debug info changed from
      6.23 seconds to 4.62 seconds (-25.8%), with -threads and without -build-id.
      With both -threads and -build-id, it improved from 11.71 seconds
      to 4.94 seconds (-57.8%). Full results are below.
      
      BTW, GNU gold takes 11.65 seconds to link the same binary.
      
      NOW
      
      --no-threads --build-id=none
             6789.847776 task-clock (msec)         #    1.000 CPUs utilized            ( +-  1.86% )
                     685 context-switches          #    0.101 K/sec                    ( +-  2.82% )
                       4 cpu-migrations            #    0.001 K/sec                    ( +- 31.18% )
               1,424,690 page-faults               #    0.210 M/sec                    ( +-  1.07% )
          21,339,542,522 cycles                    #    3.143 GHz                      ( +-  1.49% )
          13,092,260,230 stalled-cycles-frontend   #   61.35% frontend cycles idle     ( +-  2.23% )
         <not supported> stalled-cycles-backend
          21,462,051,828 instructions              #    1.01  insns per cycle
                                                   #    0.61  stalled cycles per insn  ( +-  0.41% )
           3,955,296,378 branches                  #  582.531 M/sec                    ( +-  0.39% )
              75,699,909 branch-misses             #    1.91% of all branches          ( +-  0.08% )
      
             6.787630744 seconds time elapsed                                          ( +-  1.86% )
      
      --threads --build-id=none
            14767.148697 task-clock (msec)         #    3.196 CPUs utilized            ( +-  2.56% )
                  28,891 context-switches          #    0.002 M/sec                    ( +-  1.99% )
                     905 cpu-migrations            #    0.061 K/sec                    ( +-  5.49% )
               1,262,122 page-faults               #    0.085 M/sec                    ( +-  1.68% )
          43,116,163,217 cycles                    #    2.920 GHz                      ( +-  3.07% )
          33,690,171,242 stalled-cycles-frontend   #   78.14% frontend cycles idle     ( +-  3.67% )
         <not supported> stalled-cycles-backend
          22,836,731,536 instructions              #    0.53  insns per cycle
                                                   #    1.48  stalled cycles per insn  ( +-  1.13% )
           4,382,712,998 branches                  #  296.788 M/sec                    ( +-  1.33% )
              78,622,295 branch-misses             #    1.79% of all branches          ( +-  0.54% )
      
             4.621228056 seconds time elapsed                                          ( +-  1.90% )
      
      --threads --build-id=sha1
            24594.457135 task-clock (msec)         #    4.974 CPUs utilized            ( +-  1.78% )
                  29,902 context-switches          #    0.001 M/sec                    ( +-  2.62% )
                   1,097 cpu-migrations            #    0.045 K/sec                    ( +-  6.29% )
               1,313,947 page-faults               #    0.053 M/sec                    ( +-  2.36% )
          70,516,415,741 cycles                    #    2.867 GHz                      ( +-  0.78% )
          47,570,262,296 stalled-cycles-frontend   #   67.46% frontend cycles idle     ( +-  0.86% )
         <not supported> stalled-cycles-backend
          73,124,599,029 instructions              #    1.04  insns per cycle
                                                   #    0.65  stalled cycles per insn  ( +-  0.33% )
          10,495,266,104 branches                  #  426.733 M/sec                    ( +-  0.41% )
              91,444,149 branch-misses             #    0.87% of all branches          ( +-  0.83% )
      
             4.944291711 seconds time elapsed                                          ( +-  1.72% )
      
      PREVIOUS
      
      --threads --build-id=none
             7307.437544 task-clock (msec)         #    1.160 CPUs utilized            ( +-  2.34% )
                   3,128 context-switches          #    0.428 K/sec                    ( +-  4.37% )
                     352 cpu-migrations            #    0.048 K/sec                    ( +-  5.98% )
               1,354,450 page-faults               #    0.185 M/sec                    ( +-  2.20% )
          22,081,733,098 cycles                    #    3.022 GHz                      ( +-  1.46% )
          13,709,991,267 stalled-cycles-frontend   #   62.09% frontend cycles idle     ( +-  1.77% )
         <not supported> stalled-cycles-backend
          21,634,468,895 instructions              #    0.98  insns per cycle
                                                   #    0.63  stalled cycles per insn  ( +-  0.86% )
           3,993,062,361 branches                  #  546.438 M/sec                    ( +-  0.83% )
              76,188,819 branch-misses             #    1.91% of all branches          ( +-  0.19% )
      
             6.298101157 seconds time elapsed                                          ( +-  2.03% )
      
      --threads --build-id=sha1
            12845.420265 task-clock (msec)         #    1.097 CPUs utilized            ( +-  1.95% )
                   4,020 context-switches          #    0.313 K/sec                    ( +-  2.89% )
                     369 cpu-migrations            #    0.029 K/sec                    ( +-  6.26% )
               1,464,822 page-faults               #    0.114 M/sec                    ( +-  1.37% )
          40,668,449,813 cycles                    #    3.166 GHz                      ( +-  0.96% )
          18,863,982,388 stalled-cycles-frontend   #   46.38% frontend cycles idle     ( +-  1.82% )
         <not supported> stalled-cycles-backend
          71,560,499,058 instructions              #    1.76  insns per cycle
                                                   #    0.26  stalled cycles per insn  ( +-  0.14% )
          10,044,152,441 branches                  #  781.925 M/sec                    ( +-  0.19% )
              87,835,773 branch-misses             #    0.87% of all branches          ( +-  0.09% )
      
            11.711773314 seconds time elapsed                                          ( +-  1.51% )
      
      llvm-svn: 287140
      87ff6fef
  21. 15 Nov, 2016 1 commit
    • Rafael Espindola's avatar
      Use one task per iteration in parallel_for_loop. · e5cd5ecd
      Rafael Espindola authored
      This seems far more natural. A user can create larger chunks if the
      overhead is too large.
      
      With this linking xul with "--threads --build-id=sha1 goes from
      13.938177535 to 11.035953538 seconds on linux.
      
      llvm-svn: 287042
      e5cd5ecd
  22. 20 Oct, 2016 1 commit
  23. 28 Feb, 2016 2 commits
    • Rui Ueyama's avatar
      Remove dead code for ELF. · 8cca07ea
      Rui Ueyama authored
      The preload feature was buggy that we had disabled it even for ELF.
      
      llvm-svn: 262194
      8cca07ea
    • Rui Ueyama's avatar
      Remove lld/Core/range.h. · 18c6d313
      Rui Ueyama authored
      IIUC, range was an experiment to see how N3350 would work in LLD.
      It turned out it didn't get traction, and it is basically duplicate
      of iterator_range in ADT. We have only two occurrences of range,
      and all of them are easily rewritten without it.
      
      http://reviews.llvm.org/D17687
      
      llvm-svn: 262171
      18c6d313
  24. 10 Sep, 2015 1 commit
  25. 31 Aug, 2015 1 commit
    • Rui Ueyama's avatar
      Attempt to unbreak buildbots. · a41a3670
      Rui Ueyama authored
      It is currently failing with "'__uncaught_exception': identifier not found"
      error. I guess it is due to r246219 because after that change, eh.h is
      included only when threading is enabled.
      
      llvm-svn: 246416
      a41a3670
  26. 27 Aug, 2015 1 commit
  27. 13 Jul, 2015 1 commit
    • Nico Weber's avatar
      Fix lld tests with LLVM_ENABLE_THREADS disabled. · 0e80816f
      Nico Weber authored
      With LLVM_ENABLE_THREADS disabled, all the llvm code assumes that it runs on
      a single thread and doesn't use any mutexes.  lld still spawned lots of threads
      in that case and called into llvm, assuming that llvm is thread-safe.
      
      As fix, let lld use only a single thread if LLVM_ENABLE_THREADS is disabled.
      I left in all the mutexes in lld. That means lld is a bit slower than
      necessary in single-thread mode, but that's probably worth the simpler code.
      
      llvm-svn: 242004
      0e80816f
  28. 16 Mar, 2015 1 commit
  29. 13 Mar, 2015 1 commit
  30. 03 Mar, 2015 1 commit
  31. 30 Jan, 2015 1 commit
  32. 26 Jan, 2015 1 commit
    • Rui Ueyama's avatar
      Use parallel_sort in the LayoutPass. · 87d10ef3
      Rui Ueyama authored
      Time to link lld using lld improved from 5.7s to 5.4s on Windows.
      It's not a significant improvement but not bad for one-line change.
      
      This patch includes a bug fix for Parallel.h as the original code
      uses operator< instead of a compare function there.
      
      llvm-svn: 227132
      87d10ef3
  33. 18 Oct, 2014 1 commit
  34. 06 Mar, 2014 1 commit
  35. 10 Dec, 2013 1 commit