Unverified Commit 5c97f01a authored by Robert Hensing's avatar Robert Hensing Committed by GitHub
Browse files

Merge pull request #255025 from tweag/fileset.union

`lib.fileset.union`, `lib.fileset.unions`: init
parents f35534ca 94e103ee
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
@@ -9,7 +9,7 @@ File sets are easy and safe to use, providing obvious and composable semantics w
These sections apply to the entire library.
See the [function reference](#sec-functions-library-fileset) for function-specific documentation.

The file set library is currently very limited but is being expanded to include more functions over time.
The file set library is currently somewhat limited but is being expanded to include more functions over time.

## Implicit coercion from paths to file sets {#sec-fileset-path-coercion}

+13 −11
Original line number Diff line number Diff line
@@ -41,13 +41,21 @@ An attribute set with these values:
- `_type` (constant string `"fileset"`):
  Tag to indicate this value is a file set.

- `_internalVersion` (constant string equal to the current version):
  Version of the representation
- `_internalVersion` (constant `2`, the current version):
  Version of the representation.

- `_internalBase` (path):
  Any files outside of this path cannot influence the set of files.
  This is always a directory.

- `_internalBaseRoot` (path):
  The filesystem root of `_internalBase`, same as `(lib.path.splitRoot _internalBase).root`.
  This is here because this needs to be computed anyway, and this computation shouldn't be duplicated.

- `_internalBaseComponents` (list of strings):
  The path components of `_internalBase`, same as `lib.path.subpath.components (lib.path.splitRoot _internalBase).subpath`.
  This is here because this needs to be computed anyway, and this computation shouldn't be duplicated.

- `_internalTree` ([filesetTree](#filesettree)):
  A tree representation of all included files under `_internalBase`.

@@ -59,8 +67,8 @@ An attribute set with these values:
One of the following:

- `{ <name> = filesetTree; }`:
  A directory with a nested `filesetTree` value for every directory entry.
  Even entries that aren't included are present as `null` because it improves laziness and allows using this as a sort of `builtins.readDir` cache.
  A directory with a nested `filesetTree` value for directory entries.
  Entries not included may either be omitted or set to `null`, as necessary to improve efficiency or laziness.

- `"directory"`:
  A directory with all its files included recursively, allowing early cutoff for some operations.
@@ -169,15 +177,9 @@ Arguments:
## To update in the future

Here's a list of places in the library that need to be updated in the future:
- > The file set library is currently very limited but is being expanded to include more functions over time.
- > The file set library is currently somewhat limited but is being expanded to include more functions over time.

  in [the manual](../../doc/functions/fileset.section.md)
- > Currently the only way to construct file sets is using implicit coercion from paths.

  in [the `toSource` reference](./default.nix)
- > For now filesets are always paths

  in [the `toSource` implementation](./default.nix), also update the variable name there
- Once a tracing function exists, `__noEval` in [internal.nix](./internal.nix) should mention it
- If/Once a function to convert `lib.sources` values into file sets exists, the `_coerce` and `toSource` functions should be updated to mention that function in the error when such a value is passed
- If/Once a function exists that can optionally include a path depending on whether it exists, the error message for the path not existing in `_coerce` should mention the new function
+93 −47
Original line number Diff line number Diff line
#!/usr/bin/env bash
#!/usr/bin/env nix-shell
#!nix-shell -i bash -p sta jq bc nix -I nixpkgs=../..
# shellcheck disable=SC2016

# Benchmarks lib.fileset
# Run:
@@ -28,38 +30,6 @@ work="$tmp/work"
mkdir "$work"
cd "$work"

# Create a fairly populated tree
touch f{0..5}
mkdir d{0..5}
mkdir e{0..5}
touch d{0..5}/f{0..5}
mkdir -p d{0..5}/d{0..5}
mkdir -p e{0..5}/e{0..5}
touch d{0..5}/d{0..5}/f{0..5}
mkdir -p d{0..5}/d{0..5}/d{0..5}
mkdir -p e{0..5}/e{0..5}/e{0..5}
touch d{0..5}/d{0..5}/d{0..5}/f{0..5}
mkdir -p d{0..5}/d{0..5}/d{0..5}/d{0..5}
mkdir -p e{0..5}/e{0..5}/e{0..5}/e{0..5}
touch d{0..5}/d{0..5}/d{0..5}/d{0..5}/f{0..5}

bench() {
    NIX_PATH=nixpkgs=$1 NIX_SHOW_STATS=1 NIX_SHOW_STATS_PATH=$tmp/stats.json \
        nix-instantiate --eval --strict --show-trace >/dev/null \
        --expr '(import <nixpkgs/lib>).fileset.toSource { root = ./.; fileset = ./.; }'
    cat "$tmp/stats.json"
}

echo "Running benchmark on index" >&2
bench "$nixpkgs" > "$tmp/new.json"
(
    echo "Checking out $compareTo" >&2
    git -C "$nixpkgs" worktree add --quiet "$tmp/worktree" "$compareTo"
    trap 'git -C "$nixpkgs" worktree remove "$tmp/worktree"' EXIT
    echo "Running benchmark on $compareTo" >&2
    bench "$tmp/worktree" > "$tmp/old.json"
)

declare -a stats=(
    ".envs.elements"
    ".envs.number"
@@ -77,6 +47,57 @@ declare -a stats=(
    ".values.number"
)

runs=10

run() {
    # Empty the file
    : > cpuTimes

    for i in $(seq 0 "$runs"); do
        NIX_PATH=nixpkgs=$1 NIX_SHOW_STATS=1 NIX_SHOW_STATS_PATH=$tmp/stats.json \
            nix-instantiate --eval --strict --show-trace >/dev/null \
            --expr 'with import <nixpkgs/lib>; with fileset; '"$2"

        # Only measure the time after the first run, one is warmup
        if (( i > 0 )); then
            jq '.cpuTime' "$tmp/stats.json" >> cpuTimes
        fi
    done

    # Compute mean and standard deviation
    read -r mean sd < <(sta --mean --sd --brief <cpuTimes)

    jq --argjson mean "$mean" --argjson sd "$sd" \
        '.cpuTimeMean = $mean | .cpuTimeSd = $sd' \
        "$tmp/stats.json"
}

bench() {
    echo "Benchmarking expression $1" >&2
    #echo "Running benchmark on index" >&2
    run "$nixpkgs" "$1" > "$tmp/new.json"
    (
        #echo "Checking out $compareTo" >&2
        git -C "$nixpkgs" worktree add --quiet "$tmp/worktree" "$compareTo"
        trap 'git -C "$nixpkgs" worktree remove "$tmp/worktree"' EXIT
        #echo "Running benchmark on $compareTo" >&2
        run "$tmp/worktree" "$1" > "$tmp/old.json"
    )

    read -r oldMean oldSd newMean newSd percentageMean percentageSd < \
        <(jq -rn --slurpfile old "$tmp/old.json" --slurpfile new "$tmp/new.json" \
        ' $old[0].cpuTimeMean as $om
        | $old[0].cpuTimeSd as $os
        | $new[0].cpuTimeMean as $nm
        | $new[0].cpuTimeSd as $ns
        | (100 / $om * $nm) as $pm
        # Copied from https://github.com/sharkdp/hyperfine/blob/b38d550b89b1dab85139eada01c91a60798db9cc/src/benchmark/relative_speed.rs#L46-L53
        | ($pm * pow(pow($ns / $nm; 2) + pow($os / $om; 2); 0.5)) as $ps
        | [ $om, $os, $nm, $ns, $pm, $ps ]
        | @sh')

    echo -e "Mean CPU time $newMean (σ = $newSd) for $runs runs is \e[0;33m$percentageMean% (σ = $percentageSd%)\e[0m of the old value $oldMean (σ = $oldSd)" >&2

    different=0
    for stat in "${stats[@]}"; do
        oldValue=$(jq "$stat" "$tmp/old.json")
@@ -92,3 +113,28 @@ for stat in "${stats[@]}"; do
        fi
    done
    echo "$different stats differ between the current tree and $compareTo"
    echo ""
}

# Create a fairly populated tree
touch f{0..5}
mkdir d{0..5}
mkdir e{0..5}
touch d{0..5}/f{0..5}
mkdir -p d{0..5}/d{0..5}
mkdir -p e{0..5}/e{0..5}
touch d{0..5}/d{0..5}/f{0..5}
mkdir -p d{0..5}/d{0..5}/d{0..5}
mkdir -p e{0..5}/e{0..5}/e{0..5}
touch d{0..5}/d{0..5}/d{0..5}/f{0..5}
mkdir -p d{0..5}/d{0..5}/d{0..5}/d{0..5}
mkdir -p e{0..5}/e{0..5}/e{0..5}/e{0..5}
touch d{0..5}/d{0..5}/d{0..5}/d{0..5}/f{0..5}

bench 'toSource { root = ./.; fileset = ./.; }'

rm -rf -- *

touch {0..1000}
bench 'toSource { root = ./.; fileset = unions (mapAttrsToList (name: value: ./. + "/${name}") (builtins.readDir ./.)); }'
rm -rf -- *
+168 −18
Original line number Diff line number Diff line
@@ -3,15 +3,22 @@ let

  inherit (import ./internal.nix { inherit lib; })
    _coerce
    _coerceMany
    _toSourceFilter
    _unionMany
    ;

  inherit (builtins)
    isList
    isPath
    pathExists
    typeOf
    ;

  inherit (lib.lists)
    imap0
    ;

  inherit (lib.path)
    hasPrefix
    splitRoot
@@ -29,6 +36,10 @@ let
    cleanSourceWith
    ;

  inherit (lib.trivial)
    pipe
    ;

in {

  /*
@@ -51,16 +62,51 @@ in {
      } -> SourceLike

    Example:
      # Import the current directory into the store but only include files under ./src
      toSource { root = ./.; fileset = ./src; }
      # Import the current directory into the store
      # but only include files under ./src
      toSource {
        root = ./.;
        fileset = ./src;
      }
      => "/nix/store/...-source"

      # The file set coerced from path ./bar could contain files outside the root ./foo, which is not allowed
      toSource { root = ./foo; fileset = ./bar; }
      # Import the current directory into the store
      # but only include ./Makefile and all files under ./src
      toSource {
        root = ./.;
        fileset = union
          ./Makefile
          ./src;
      }
      => "/nix/store/...-source"

      # Trying to include a file outside the root will fail
      toSource {
        root = ./.;
        fileset = unions [
          ./Makefile
          ./src
          ../LICENSE
        ];
      }
      => <error>

      # The root needs to point to a directory that contains all the files
      toSource {
        root = ../.;
        fileset = unions [
          ./Makefile
          ./src
          ../LICENSE
        ];
      }
      => "/nix/store/...-source"

      # The root has to be a local filesystem path
      toSource { root = "/nix/store/...-source"; fileset = ./.; }
      toSource {
        root = "/nix/store/...-source";
        fileset = ./.;
      }
      => <error>
  */
  toSource = {
@@ -69,7 +115,7 @@ in {
      Paths in [strings](https://nixos.org/manual/nix/stable/language/values.html#type-string), including Nix store paths, cannot be passed as `root`.
      `root` has to be a directory.

<!-- Ignore the indentation here, this is a nixdoc rendering bug that needs to be fixed -->
<!-- Ignore the indentation here, this is a nixdoc rendering bug that needs to be fixed: https://github.com/nix-community/nixdoc/issues/75 -->
:::{.note}
Changing `root` only affects the directory structure of the resulting store path, it does not change which files are added to the store.
The only way to change which files get added to the store is by changing the `fileset` attribute.
@@ -78,25 +124,32 @@ The only way to change which files get added to the store is by changing the `fi
    root,
    /*
      (required) The file set whose files to import into the store.
      Currently the only way to construct file sets is using [implicit coercion from paths](#sec-fileset-path-coercion).
      File sets can be created using other functions in this library.
      This argument can also be a path,
      which gets [implicitly coerced to a file set](#sec-fileset-path-coercion).

<!-- Ignore the indentation here, this is a nixdoc rendering bug that needs to be fixed: https://github.com/nix-community/nixdoc/issues/75 -->
:::{.note}
If a directory does not recursively contain any file, it is omitted from the store path contents.
:::

    */
    fileset,
  }:
    let
      # We cannot rename matched attribute arguments, so let's work around it with an extra `let in` statement
      # For now filesets are always paths
      filesetPath = fileset;
      filesetArg = fileset;
    in
    let
      fileset = _coerce "lib.fileset.toSource: `fileset`" filesetPath;
      fileset = _coerce "lib.fileset.toSource: `fileset`" filesetArg;
      rootFilesystemRoot = (splitRoot root).root;
      filesetFilesystemRoot = (splitRoot fileset._internalBase).root;
      sourceFilter = _toSourceFilter fileset;
    in
    if ! isPath root then
      if isStringLike root then
        throw ''
          lib.fileset.toSource: `root` "${toString root}" is a string-like value, but it should be a path instead.
          lib.fileset.toSource: `root` ("${toString root}") is a string-like value, but it should be a path instead.
              Paths in strings are not supported by `lib.fileset`, use `lib.sources` or derivations instead.''
      else
        throw ''
@@ -105,27 +158,124 @@ The only way to change which files get added to the store is by changing the `fi
    # See also ../path/README.md
    else if rootFilesystemRoot != filesetFilesystemRoot then
      throw ''
        lib.fileset.toSource: Filesystem roots are not the same for `fileset` and `root` "${toString root}":
        lib.fileset.toSource: Filesystem roots are not the same for `fileset` and `root` ("${toString root}"):
            `root`: root "${toString rootFilesystemRoot}"
            `fileset`: root "${toString filesetFilesystemRoot}"
            Different roots are not supported.''
    else if ! pathExists root then
      throw ''
        lib.fileset.toSource: `root` ${toString root} does not exist.''
        lib.fileset.toSource: `root` (${toString root}) does not exist.''
    else if pathType root != "directory" then
      throw ''
        lib.fileset.toSource: `root` ${toString root} is a file, but it should be a directory instead. Potential solutions:
        lib.fileset.toSource: `root` (${toString root}) is a file, but it should be a directory instead. Potential solutions:
            - If you want to import the file into the store _without_ a containing directory, use string interpolation or `builtins.path` instead of this function.
            - If you want to import the file into the store _with_ a containing directory, set `root` to the containing directory, such as ${toString (dirOf root)}, and set `fileset` to the file path.''
    else if ! hasPrefix root fileset._internalBase then
      throw ''
        lib.fileset.toSource: `fileset` could contain files in ${toString fileset._internalBase}, which is not under the `root` ${toString root}. Potential solutions:
        lib.fileset.toSource: `fileset` could contain files in ${toString fileset._internalBase}, which is not under the `root` (${toString root}). Potential solutions:
            - Set `root` to ${toString fileset._internalBase} or any directory higher up. This changes the layout of the resulting store path.
            - Set `fileset` to a file set that cannot contain files outside the `root` ${toString root}. This could change the files included in the result.''
            - Set `fileset` to a file set that cannot contain files outside the `root` (${toString root}). This could change the files included in the result.''
    else
      builtins.seq sourceFilter
      cleanSourceWith {
        name = "source";
        src = root;
        filter = _toSourceFilter fileset;
        filter = sourceFilter;
      };

  /*
    The file set containing all files that are in either of two given file sets.
    This is the same as [`unions`](#function-library-lib.fileset.unions),
    but takes just two file sets instead of a list.
    See also [Union (set theory)](https://en.wikipedia.org/wiki/Union_(set_theory)).

    The given file sets are evaluated as lazily as possible,
    with the first argument being evaluated first if needed.

    Type:
      union :: FileSet -> FileSet -> FileSet

    Example:
      # Create a file set containing the file `Makefile`
      # and all files recursively in the `src` directory
      union ./Makefile ./src

      # Create a file set containing the file `Makefile`
      # and the LICENSE file from the parent directory
      union ./Makefile ../LICENSE
  */
  union =
    # The first file set.
    # This argument can also be a path,
    # which gets [implicitly coerced to a file set](#sec-fileset-path-coercion).
    fileset1:
    # The second file set.
    # This argument can also be a path,
    # which gets [implicitly coerced to a file set](#sec-fileset-path-coercion).
    fileset2:
    _unionMany
      (_coerceMany "lib.fileset.union" [
        {
          context = "first argument";
          value = fileset1;
        }
        {
          context = "second argument";
          value = fileset2;
        }
      ]);

  /*
    The file set containing all files that are in any of the given file sets.
    This is the same as [`union`](#function-library-lib.fileset.unions),
    but takes a list of file sets instead of just two.
    See also [Union (set theory)](https://en.wikipedia.org/wiki/Union_(set_theory)).

    The given file sets are evaluated as lazily as possible,
    with earlier elements being evaluated first if needed.

    Type:
      unions :: [ FileSet ] -> FileSet

    Example:
      # Create a file set containing selected files
      unions [
        # Include the single file `Makefile` in the current directory
        # This errors if the file doesn't exist
        ./Makefile

        # Recursively include all files in the `src/code` directory
        # If this directory is empty this has no effect
        ./src/code

        # Include the files `run.sh` and `unit.c` from the `tests` directory
        ./tests/run.sh
        ./tests/unit.c

        # Include the `LICENSE` file from the parent directory
        ../LICENSE
      ]
  */
  unions =
    # A list of file sets.
    # Must contain at least 1 element.
    # The elements can also be paths,
    # which get [implicitly coerced to file sets](#sec-fileset-path-coercion).
    filesets:
    if ! isList filesets then
      throw "lib.fileset.unions: Expected argument to be a list, but got a ${typeOf filesets}."
    else if filesets == [ ] then
      # TODO: This could be supported, but requires an extra internal representation for the empty file set, which would be special for not having a base path.
      throw "lib.fileset.unions: Expected argument to be a list with at least one element, but it contains no elements."
    else
      pipe filesets [
        # Annotate the elements with context, used by _coerceMany for better errors
        (imap0 (i: el: {
          context = "element ${toString i}";
          value = el;
        }))
        (_coerceMany "lib.fileset.unions")
        _unionMany
      ];

}
+166 −58

File changed.

Preview size limit exceeded, changes collapsed.

Loading