Unverified Commit d767cd1c authored by benaryorg's avatar benaryorg
Browse files

ceph: 19.2.3 -> 20.2.0

Moving ceph to its own scope allows separate packages for individual components.
Among other benefits this makes overriding a single package for the entire build easier.
The primary reason however is to allow calling code to access *overrideScope* to change internal packages such as *ceph-src*.
Since the source code is used by multiple packages, notably the C++ and the Python code – independently – any patches that touch both (such as the PyO3 patch introduced herein) need to apply to both.
Using just packages with *patches* set makes this harder to control.

Material changes:

- Python: 3.11 -> 3.12
- Boost: 1.83 -> 1.87
- Ceph: 19.2.3 -> 20.2.0
- fmt: 9 -> 12
- changes in patches (new and old) due to the above updates
- Python dependencies on *jmespath* and *xmltodict*
- `replace-fail` for *substituteInPlace*
- UADK disabled by default (otherwise aarch64 fails to compile)
- *ceph-mgr* wrapper now adds ceph binaries to *PATH*
- ZFS integration on the packaging level has been removed

In-depth information for most of this can be found in [#494583](https://redirect.github.com/NixOS/nixpkgs/pull/494583

).

Signed-off-by: default avatarbenaryorg <binary@benary.org>
parent 7682b4f8
Loading
Loading
Loading
Loading
+75 −0
Original line number Diff line number Diff line
# Ceph

Ceph is a Distributed storage system providing object based (rados and S3), block based (rbd and iscsi), and filesystem based (cephfs, NFS, samba) storage.

Because Ceph is used for massive storage clusters in the wild, any changes can lead to actual data loss.
Besides having to handle changes – in particular breaking changes – somewhat carefully it also means that Ceph does move slow in terms of development.
It should be stressed here that slow release pacing is not a bad thing!

If you are looking at this page because a change in, let's say Python packaging, broke the build of *qemu_full*, then if you are not actually using Ceph for anything[^krbd] it may be easier to mask out Ceph from your build[^mask].
One common way that Ceph is being pulled in for users who do not actually use it is due to the `enableCephFS` flag of *samba4Full*.

[^krbd]: Or are only using implicit *krbd* which lives in the kernel and thus doesn't require this package.
[^mask]: You can achieve this using several ways, including an overlay setting *ceph* to *null*, or something like `qemu_full.override { cephSupport = false; }`.

## Patches

Historically Ceph often needed patches to be compatible with newer versions of *gcc*, Python, and other related software.
On occasion there may also be patches which need to be backported from the development of the current release as they may have been merged, but no new release has been made including those changes.
Because Ceph uses embedded Python interpreters in some daemons these patches need to be applied to multiple builds.
To keep drift from happening and make this more visible [the source code itself is being patched](./src.nix) before passing it on as *src* for other derivations.

## Scope

This package often requires patches, or vendored packaging so we can keep an older version of a library around specifically for Ceph.
For this reason Ceph is being maintained [within its own scope](./scope.nix), which allows us to inject such vendored packages more easily.
An example at time of writing would be *RocksDB* which is the storage format used by the Ceph storage daemons (OSDs) to store data on block devices directly.

If for whatever reason you want or need to patch Ceph, you can use the *passthru* attribute `overrideScope` to get access to the full Ceph scope.
For instance an overlay can be used to override dependencies as well as patches.

<details><summary>Example overlay for overriding Ceph</summary>

```nix
globalFinal: globalPrev: {
  ceph = builtins.getAttr "ceph" (
    globalPrev.ceph.overrideScope (
      final: prev: {
        # `prev.rocksdb` here is the vendored version of rocksdb
        # this means you can apply patches to the otherwise hidden packages

        ceph-src = prev.ceph-src.overrideAttrs (
          {
            patches ? [ ],
            ...
          }:
          {
            patches = patches ++ [
              # your patch goes here
              (globalFinal.fetchurl {
                # ...
              })
            ];
          }
        );
      }
    )
  );
}
```

</details>

## Python

Ceph is very intertwined with Python due to the subinterpreters.
The [*nixpkgs* Ceph PyO3 tracking issue](https://github.com/NixOS/nixpkgs/issues/380823) has some information on this and during the [*nixpkgs* Ceph 20.2.1 merge](https://github.com/NixOS/nixpkgs/pull/494583) several issues surfaced.
When using Ceph packaged via *nixpkgs* you should not run into PyO3 issues, if you do, please file a bug report.

There are two MGR modules which do attempt to load libraries incompatible with PyO3; *cephadm* and *diskprediction_local*.
*cephadm* conceptually does not work on NixOS since NixOS intentionally makes *systemd* configuration read-only.
However the *cephadm* MGR module is enabled by default, and it is not been patched to be less prone to PyO3 issues.
Similarly *diskprediction_local* will attempt to load Numpy/SciPy, leading to PyO3 errors.
These errors are not fatal, however they do render the modules unusable.
You can disable the modules using `ceph mgr module disable` to silence the errors.
+0 −353
Original line number Diff line number Diff line
# This is is arrow-cpp < 20 used as a workaround for
# Ceph not supporting >= yet, taken from nixpkgs commit
#     97ae53798f6a7c7c3c259ad8c2cbcede6ca34b2a~
# This should be entirely removed when upstream bug
#     https://tracker.ceph.com/issues/71269
# is fixed.
{
  stdenv,
  lib,
  fetchurl,
  fetchpatch2,
  fetchFromGitHub,
  fixDarwinDylibNames,
  autoconf,
  aws-sdk-cpp,
  aws-sdk-cpp-arrow ? aws-sdk-cpp.override {
    apis = [
      "cognito-identity"
      "config"
      "identity-management"
      "s3"
      "sts"
      "transfer"
    ];
  },
  boost,
  brotli,
  bzip2,
  cmake,
  crc32c,
  curl,
  flatbuffers,
  gflags,
  glog,
  google-cloud-cpp,
  grpc,
  gtest,
  libbacktrace,
  lz4,
  minio,
  ninja,
  nlohmann_json,
  openssl,
  perl,
  pkg-config,
  protobuf_32,
  python3,
  rapidjson,
  re2,
  snappy,
  sqlite,
  thrift,
  tzdata,
  utf8proc,
  which,
  zlib,
  zstd,
  testers,
  enableShared ? !stdenv.hostPlatform.isStatic,
  enableFlight ? stdenv.buildPlatform == stdenv.hostPlatform,
  # Disable also on RiscV
  # configure: error: cannot determine number of significant virtual address bits
  enableJemalloc ?
    !stdenv.hostPlatform.isDarwin && !stdenv.hostPlatform.isAarch64 && !stdenv.hostPlatform.isRiscV64,
  enableS3 ? true,
  # google-cloud-cpp fails to build on RiscV
  enableGcs ? !stdenv.hostPlatform.isDarwin && !stdenv.hostPlatform.isRiscV64,
}:

let
  arrow-testing = fetchFromGitHub {
    name = "arrow-testing";
    owner = "apache";
    repo = "arrow-testing";
    rev = "4d209492d514c2d3cb2d392681b9aa00e6d8da1c";
    hash = "sha256-IkiCbuy0bWyClPZ4ZEdkEP7jFYLhM7RCuNLd6Lazd4o=";
  };

  parquet-testing = fetchFromGitHub {
    name = "parquet-testing";
    owner = "apache";
    repo = "parquet-testing";
    rev = "c7cf1374cf284c0c73024cd1437becea75558bf8";
    hash = "sha256-DThjyZ34LajHwXZy1IhYKUGUG/ejQ9WvBNuI8eUKmSs=";
  };

  version = "19.0.1";
in
stdenv.mkDerivation (finalAttrs: {
  pname = "arrow-cpp";
  inherit version;

  src = fetchFromGitHub {
    owner = "apache";
    repo = "arrow";
    rev = "apache-arrow-${version}";
    hash = "sha256-toHwUIOZRpgR0K7pQtT5nqWpO9G7AuHYTcvA6UVg9lA=";
  };

  sourceRoot = "${finalAttrs.src.name}/cpp";

  patches = [
    (fetchpatch2 {
      name = "protobuf-30-compat.patch";
      url = "https://github.com/apache/arrow/pull/46136.patch";
      hash = "sha256-WTpe/eT3himlCHN/R78w1sF0HG859mE2ZN70U+9N8Ag=";
      stripLen = 1;
    })
    (fetchpatch2 {
      name = "cmake-fix.patch";
      url = "https://github.com/apache/arrow/commit/48c0bbbd4a2eedcca518caeb7f7547c7988dc740.patch?full_index=1";
      hash = "sha256-i/vZy/61VYP+mo1AxfoiBSjTip04vhFOh3hGjHCJy6g=";
      stripLen = 1; # applying patch from within `cpp/` subdirectory
    })
  ];

  # versions are all taken from
  # https://github.com/apache/arrow/blob/apache-arrow-${version}/cpp/thirdparty/versions.txt

  env =
    lib.optionalAttrs enableJemalloc {
      # jemalloc: arrow uses a custom prefix to prevent default allocator symbol
      # collisions as well as custom build flags
      ARROW_JEMALLOC_URL = fetchurl {
        url = "https://github.com/jemalloc/jemalloc/releases/download/5.3.0/jemalloc-5.3.0.tar.bz2";
        hash = "sha256-LbgtHnEZ3z5xt2QCGbbf6EeJvAU3mDw7esT3GJrs/qo=";
      };
    }
    // {
      # mimalloc: arrow uses custom build flags for mimalloc
      ARROW_MIMALLOC_URL = fetchFromGitHub {
        owner = "microsoft";
        repo = "mimalloc";
        rev = "v2.0.6";
        hash = "sha256-u2ITXABBN/dwU+mCIbL3tN1f4c17aBuSdNTV+Adtohc=";
      };

      ARROW_XSIMD_URL = fetchFromGitHub {
        owner = "xtensor-stack";
        repo = "xsimd";
        rev = "13.0.0";
        hash = "sha256-qElJYW5QDj3s59L3NgZj5zkhnUMzIP2mBa1sPks3/CE=";
      };

      ARROW_SUBSTRAIT_URL = fetchFromGitHub {
        owner = "substrait-io";
        repo = "substrait";
        rev = "v0.44.0";
        hash = "sha256-V739IFTGPtbGPlxcOi8sAaYSDhNUEpITvN9IqdPReug=";
      };
    }
    // lib.optionalAttrs finalAttrs.doInstallCheck {
      ARROW_TEST_DATA = "${arrow-testing}/data";
      PARQUET_TEST_DATA = "${parquet-testing}/data";
      GTEST_FILTER =
        let
          # Upstream Issue: https://issues.apache.org/jira/browse/ARROW-11398
          filteredTests =
            lib.optionals stdenv.hostPlatform.isAarch64 [
              "TestFilterKernelWithNumeric/3.CompareArrayAndFilterRandomNumeric"
              "TestFilterKernelWithNumeric/7.CompareArrayAndFilterRandomNumeric"
              "TestCompareKernel.PrimitiveRandomTests"
            ]
            ++ lib.optionals enableS3 [
              "S3OptionsTest.FromUri"
              "S3RegionResolutionTest.NonExistentBucket"
              "S3RegionResolutionTest.PublicBucket"
              "S3RegionResolutionTest.RestrictedBucket"
              "TestMinioServer.Connect"
              "TestS3FS.*"
              "TestS3FSGeneric.*"
            ]
            ++ lib.optionals stdenv.hostPlatform.isDarwin [
              # TODO: revisit at 12.0.0 or when
              # https://github.com/apache/arrow/commit/295c6644ca6b67c95a662410b2c7faea0920c989
              # is available, see
              # https://github.com/apache/arrow/pull/15288#discussion_r1071244661
              "ExecPlanExecution.StressSourceSinkStopped"
            ];
        in
        "-${lib.concatStringsSep ":" filteredTests}";
    };

  nativeBuildInputs = [
    cmake
    pkg-config
    ninja
    autoconf # for vendored jemalloc
    flatbuffers
  ]
  ++ lib.optional stdenv.hostPlatform.isDarwin fixDarwinDylibNames;
  buildInputs = [
    boost
    brotli
    bzip2
    flatbuffers
    gflags
    glog
    gtest
    libbacktrace
    lz4
    nlohmann_json # alternative JSON parser to rapidjson
    protobuf_32 # substrait requires protobuf
    rapidjson
    re2
    snappy
    thrift
    utf8proc
    zlib
    zstd
  ]
  ++ lib.optionals enableFlight [
    grpc
    openssl
    sqlite
  ]
  ++ lib.optionals enableS3 [
    aws-sdk-cpp-arrow
    openssl
  ]
  ++ lib.optionals enableGcs [
    crc32c
    curl
    google-cloud-cpp
    grpc
    nlohmann_json
  ];

  preConfigure = ''
    patchShebangs build-support/
    substituteInPlace "src/arrow/vendored/datetime/tz.cpp" \
      --replace-fail 'discover_tz_dir();' '"${tzdata}/share/zoneinfo";'
  '';

  cmakeFlags = [
    "-DCMAKE_FIND_PACKAGE_PREFER_CONFIG=ON"
    "-DARROW_BUILD_SHARED=${if enableShared then "ON" else "OFF"}"
    "-DARROW_BUILD_STATIC=${if enableShared then "OFF" else "ON"}"
    "-DARROW_BUILD_TESTS=${if enableShared then "ON" else "OFF"}"
    "-DARROW_BUILD_INTEGRATION=ON"
    "-DARROW_BUILD_UTILITIES=ON"
    "-DARROW_EXTRA_ERROR_CONTEXT=ON"
    "-DARROW_VERBOSE_THIRDPARTY_BUILD=ON"
    "-DARROW_DEPENDENCY_SOURCE=SYSTEM"
    "-Dxsimd_SOURCE=AUTO"
    "-DARROW_DEPENDENCY_USE_SHARED=${if enableShared then "ON" else "OFF"}"
    "-DARROW_COMPUTE=ON"
    "-DARROW_CSV=ON"
    "-DARROW_DATASET=ON"
    "-DARROW_FILESYSTEM=ON"
    "-DARROW_FLIGHT_SQL=${if enableFlight then "ON" else "OFF"}"
    "-DARROW_HDFS=ON"
    "-DARROW_IPC=ON"
    "-DARROW_JEMALLOC=${if enableJemalloc then "ON" else "OFF"}"
    "-DARROW_JSON=ON"
    "-DARROW_USE_GLOG=ON"
    "-DARROW_WITH_BACKTRACE=ON"
    "-DARROW_WITH_BROTLI=ON"
    "-DARROW_WITH_BZ2=ON"
    "-DARROW_WITH_LZ4=ON"
    "-DARROW_WITH_NLOHMANN_JSON=ON"
    "-DARROW_WITH_SNAPPY=ON"
    "-DARROW_WITH_UTF8PROC=ON"
    "-DARROW_WITH_ZLIB=ON"
    "-DARROW_WITH_ZSTD=ON"
    "-DARROW_MIMALLOC=ON"
    "-DARROW_SUBSTRAIT=ON"
    "-DARROW_FLIGHT=${if enableFlight then "ON" else "OFF"}"
    "-DARROW_FLIGHT_TESTING=${if enableFlight then "ON" else "OFF"}"
    "-DARROW_S3=${if enableS3 then "ON" else "OFF"}"
    "-DARROW_GCS=${if enableGcs then "ON" else "OFF"}"
    # Parquet options:
    "-DARROW_PARQUET=ON"
    "-DPARQUET_BUILD_EXECUTABLES=ON"
    "-DPARQUET_REQUIRE_ENCRYPTION=ON"
  ]
  ++ lib.optionals (!enableShared) [ "-DARROW_TEST_LINKAGE=static" ]
  ++ lib.optionals stdenv.hostPlatform.isDarwin [
    "-DCMAKE_INSTALL_RPATH=@loader_path/../lib" # needed for tools executables
  ]
  ++ lib.optionals (!stdenv.hostPlatform.isx86_64) [ "-DARROW_USE_SIMD=OFF" ]
  ++ lib.optionals enableS3 [
    "-DAWSSDK_CORE_HEADER_FILE=${aws-sdk-cpp-arrow}/include/aws/core/Aws.h"
  ];

  doInstallCheck = true;

  __darwinAllowLocalNetworking = true;

  nativeInstallCheckInputs = [
    perl
    which
    sqlite
  ]
  ++ lib.optionals enableS3 [ minio ]
  ++ lib.optionals enableFlight [ python3 ];

  installCheckPhase =
    let
      disabledTests = [
        # flaky
        "arrow-flight-test"
        # requires networking
        "arrow-gcsfs-test"
        "arrow-flight-integration-test"
      ];
    in
    ''
      runHook preInstallCheck

      ctest -L unittest --exclude-regex '^(${lib.concatStringsSep "|" disabledTests})$'

      runHook postInstallCheck
    '';

  __structuredAttrs = true;

  meta = {
    description = "Cross-language development platform for in-memory data";
    homepage = "https://arrow.apache.org/docs/cpp/";
    license = lib.licenses.asl20;
    platforms = lib.platforms.unix;
    maintainers = with lib.maintainers; [
      tobim
      veprbl
      cpcloud
    ];
    pkgConfigModules = [
      "arrow"
      "arrow-acero"
      "arrow-compute"
      "arrow-csv"
      "arrow-dataset"
      "arrow-filesystem"
      "arrow-flight"
      "arrow-flight-sql"
      "arrow-flight-testing"
      "arrow-json"
      "arrow-substrait"
      "arrow-testing"
      "parquet"
    ];
  };
  passthru = {
    inherit
      enableFlight
      enableJemalloc
      enableS3
      enableGcs
      ;
    tests.pkg-config = testers.testMetaPkgConfig finalAttrs.finalPackage;
  };
})
+0 −69
Original line number Diff line number Diff line
Backported from <https://github.com/ceph/ceph/commit/857eedbe6c9ed80ed0625bd0aa27b1a1e85f8d59>.

Original author: Adam Emerson <aemerson@redhat.com>

diff --git a/CMakeLists.txt b/CMakeLists.txt
index bbd63a6a006..bbd7c737feb 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -666,7 +666,7 @@ option(WITH_SYSTEM_BOOST "require and build with system Boost" OFF)
 # Boost::thread depends on Boost::atomic, so list it explicitly.
 set(BOOST_COMPONENTS
   atomic chrono thread system regex random program_options date_time
-  iostreams context coroutine)
+  iostreams context coroutine url)
 set(BOOST_HEADER_COMPONENTS container)
 
 if(WITH_MGR)
diff --git a/src/mds/BoostUrlImpl.cc b/src/mds/BoostUrlImpl.cc
deleted file mode 100644
index 479f4c6d75d..00000000000
--- a/src/mds/BoostUrlImpl.cc
+++ /dev/null
@@ -1,8 +0,0 @@
-/*
- * https://www.boost.org/doc/libs/1_82_0/libs/url/doc/html/url/overview.html#url.overview.requirements
- *
- * To use the library as header-only; that is, to eliminate the requirement 
- * to link a program to a static or dynamic Boost.URL library, 
- * simply place the following line in exactly one source file in your project.
- */
-#include <boost/url/src.hpp>
diff --git a/src/mds/CMakeLists.txt b/src/mds/CMakeLists.txt
index 0c6c31a3c51..5c98db76e4d 100644
--- a/src/mds/CMakeLists.txt
+++ b/src/mds/CMakeLists.txt
@@ -45,12 +45,12 @@ set(mds_srcs
   QuiesceDbManager.cc
   QuiesceAgent.cc
   MDSRankQuiesce.cc
-  BoostUrlImpl.cc
   ${CMAKE_SOURCE_DIR}/src/common/TrackedOp.cc
   ${CMAKE_SOURCE_DIR}/src/common/MemoryModel.cc
   ${CMAKE_SOURCE_DIR}/src/osdc/Journaler.cc
   ${CMAKE_SOURCE_DIR}/src/mgr/MDSPerfMetricTypes.cc)
 add_library(mds STATIC ${mds_srcs})
 target_link_libraries(mds PRIVATE
+  Boost::url
   heap_profiler cpu_profiler osdc ${LUA_LIBRARIES})
 target_include_directories(mds PRIVATE "${LUA_INCLUDE_DIR}")
diff --git a/src/test/mds/CMakeLists.txt b/src/test/mds/CMakeLists.txt
index f80abe75083..18ebb648e68 100644
--- a/src/test/mds/CMakeLists.txt
+++ b/src/test/mds/CMakeLists.txt
@@ -18,11 +18,10 @@ target_link_libraries(unittest_mds_sessionfilter mds osdc ceph-common global ${B
 add_executable(unittest_mds_quiesce_db
   TestQuiesceDb.cc
   ../../../src/mds/QuiesceDbManager.cc
-  ../../../src/mds/BoostUrlImpl.cc
   $<TARGET_OBJECTS:unit-main>
 )
 add_ceph_unittest(unittest_mds_quiesce_db)
-target_link_libraries(unittest_mds_quiesce_db ceph-common global)
+target_link_libraries(unittest_mds_quiesce_db ceph-common global Boost::url)
 
 # unittest_mds_quiesce_agent
 add_executable(unittest_mds_quiesce_agent
-- 
2.47.0
+0 −20
Original line number Diff line number Diff line
Excerpted from <https://aur.archlinux.org/cgit/aur.git/commit/?h=ceph&id=8c5cc7d8deec002f7596b6d0860859a0a718f12b>.

Original author: Bazaah <github@luxolus.com>

diff --git a/src/mgr/PyModule.cc b/src/mgr/PyModule.cc
index 084cf3ffc1e..010a1177a88 100644
--- a/src/mgr/PyModule.cc
+++ b/src/mgr/PyModule.cc
@@ -36,6 +36,11 @@ std::string PyModule::mgr_store_prefix = "mgr/";
 
 // Courtesy of http://stackoverflow.com/questions/1418015/how-to-get-python-exception-text
 #define BOOST_BIND_GLOBAL_PLACEHOLDERS
+// Fix instances of "'BOOST_PP_ITERATION_02' was not declared in this scope; did you mean 'BOOST_PP_ITERATION_05'"
+// and related macro error bullshit that spans 300 lines of errors
+//
+// Apparently you can't include boost/python stuff _and_ have this header defined
+#undef BOOST_MPL_CFG_NO_PREPROCESSED_HEADERS
 // Boost apparently can't be bothered to fix its own usage of its own
 // deprecated features.
 #include <boost/python/extract.hpp>
+399 −0

File added.

Preview size limit exceeded, changes collapsed.

Loading