This document lists locations in the ACORN codebase where `clone()` calls could be replaced with `Arc` for better performance and memory management, organized by impact level.
> **Note:** High Impact #1 (Bucket/BucketOptions in `acorn-lib/src/io/config.rs`) has already been implemented.
---
## HIGH IMPACT
These are the most impactful changes: large data structures being cloned in parallel contexts, or structs cloned many times across the codebase.
---
### 2. ResearchActivity in parallel schema validation
**File:**`acorn-lib/src/analyzer/mod.rs`
**Lines:** 129, 265
**What is cloned:**
- Line 129: `data.clone()` -- `Cff` struct passed to `collect_validation_checks`
- Line 265: `data.clone()` -- `ResearchActivity` struct passed to `collect_validation_checks`
Both occur inside `.par_iter().flat_map()` closures. `ResearchActivity` is a very large struct containing nested `ResearchActivityMetadata`, `Sections`, `ContactPoint`, `AspectFramework`, and `Other` -- all with many `String`, `Vec<String>`, and nested struct fields.
**Why Arc is better:** These are parallel iterations over file paths. Each path is read, deserialized into a potentially large `ResearchActivity` or `Cff`, then immediately cloned for validation. Using `Arc<ResearchActivity>` would allow the validation checks to share ownership without deep-copying the entire struct. The `collect_validation_checks` function only needs read access.
**Estimated impact:****HIGH** -- `ResearchActivity` is the central data model; cloning it in parallel processing is one of the most expensive operations in the codebase.
**Suggested approach:**
```rust
// Before
.par_iter().flat_map(|path|{
letdata=ResearchActivity::read(path)?;
collect_validation_checks(data.clone())
})
// After
.par_iter().flat_map(|path|{
letdata=Arc::new(ResearchActivity::read(path)?);
collect_validation_checks(Arc::clone(&data))
})
```
---
### 3. ResearchActivity triple-cloned in website checks
**File:**`acorn-lib/src/analyzer/mod.rs`
**Lines:** 284, 292, 300
**What is cloned:**
- Line 284: `data.clone().meta.doi`
- Line 292: `data.clone().meta.websites`
- Line 300: `data.clone().contact.url`
Three separate clones of the full `ResearchActivity` struct within the same async closure, each to access a single nested field.
**Why Arc is better:** Three clones of a large struct to access individual fields. With `Arc<ResearchActivity>`, you'd clone only the `Arc` pointer (16 bytes on 64-bit) instead of the entire struct. Or simply borrow the fields directly since they're only read.
**Estimated impact:****HIGH** -- Three unnecessary full struct clones in a loop over paths.
**Suggested approach:**
```rust
// Before
asyncmove{
letdoi=data.clone().meta.doi;
letwebsites=data.clone().meta.websites;
leturl=data.clone().contact.url;
// ...
}
// After (Option 1: Arc)
asyncmove{
letdoi=data.meta.doi.clone();
letwebsites=data.meta.websites.clone();
leturl=data.contact.url.clone();
// ...
}
// After (Option 2: References if lifetimes allow)
// Borrow fields directly without cloning the parent struct
- Line 106: `paths.clone()` -- `Vec<PathBuf>` cloned just for `.len()`
- Line 108: `paths.clone()` -- same `Vec` cloned again for `.iter()`
- Line 110: `path.clone()` -- per-item `PathBuf` clone
- Line 112: `path.clone()` -- same path cloned again for context
- Line 125: `path.clone()` -- options field cloned
- Line 145: `reference_path.clone()`
- Line 155: `reference_extract_path.clone()` -- per slide in iteration
- Line 156: `paths.clone()` -- same `Vec` cloned a third time
- Line 170: `data.clone()` -- `ResearchActivity` cloned per item in parallel iteration
- Line 174: `paths[0].clone()` -- yet another clone
- Line 179: `reference_extract_path.clone()` -- another clone
**Why Arc is better:**`paths` is cloned 4 times in this function. `CommandOptions` is cloned for destructuring. `ResearchActivity` is cloned inside a `par_iter` block. Using `Arc<PathBuf>` for paths and `&ResearchActivity` (or `Arc<ResearchActivity>`) would eliminate most of these. The `reference_extract_path` is cloned once per slide -- an `Arc<PathBuf>` would reduce this to a pointer copy.
**Estimated impact:****HIGH** -- Many redundant clones in the same function, plus `ResearchActivity` clone in parallel context.
**Suggested approach:**
```rust
// Wrap paths in Arc once
letpaths=Arc::new(paths);
// Use Arc::clone instead of .clone() for Vec<PathBuf>
- Line 632: `vale.clone()` -- cloned again inside the `.map()` closure for each path
**Why Arc is better:**`Vale` is cloned once, then cloned again per path in the async iterator. The cloned `Vale` is only used to call `.run()`, which takes `&self`. Wrapping in `Arc<Vale>` would reduce per-path cloning to a pointer copy.
**Estimated impact:****HIGH** -- `Vale` cloning happens for every file being prose-checked, and prose checking is one of the more expensive operations.
**Suggested approach:**
```rust
// Before
letvale=Vale::new(...);
items.iter().map(|path|{
letvale=vale.clone();
asyncmove{vale.run(path).await}
})
// After
letvale=Arc::new(Vale::new(...));
items.iter().map(|path|{
letvale=Arc::clone(&vale);
asyncmove{vale.run(path).await}
})
```
---
## MEDIUM IMPACT
Structs that are cloned repeatedly but are smaller, or clones that occur less frequently.
---
### 6. CheckOptions cloned 3x in check command
**File:**`acorn-cli/src/commands/check/mod.rs`
**Lines:** 80, 113, 199
**What is cloned:**
- Line 80: `options.clone()` -- `CheckOptions` in `apply_early_exit_policy` (11 fields including `Vec<String>`)
- Line 113: `options.clone()` -- `CheckOptions` in `handle()`
- Line 199: `options.clone()` -- `CheckOptions` in `resolve_skipped_categories()`
**Why Arc is better:**`CheckOptions` is cloned 3 separate times in the check flow. While individual clones are cheap (primitives + one `Vec<String>`), the cumulative cost across many files adds up. More importantly, `CheckOptions` is passed by reference to many async operations that clone it internally. Sharing via `Arc<CheckOptions>` would allow all functions to share a single allocation.
**Estimated impact:****MEDIUM-HIGH** -- Moderate struct size, but cloned frequently across the check command's hot path.
---
### 7. CommandOptions in export
**File:**`acorn-cli/src/commands/export/mod.rs`
**Lines:** 141, 165
**What is cloned:**
- Line 141: `CommandOptions::init().path(path.clone()).maybe_output(output.clone()).build()` -- creates new options but clones inputs
- Line 165: `options.clone()` inside async closure for parallel processing
**Why Arc is better:** The options struct is rebuilt and cloned multiple times within the export flow. Using `Arc` for the shared fields would reduce allocations, especially for the parallel processing closure at line 165.
**Estimated impact:****MEDIUM** -- `CommandOptions` is moderate size (~5 Option fields), but cloned in a parallel context.
---
### 8. BucketOptions in download command
**File:**`acorn-cli/src/commands/download/mod.rs`
**Lines:** 51, 105
**What is cloned:**
- Line 51: `options.clone()` -- `BucketOptions`
- Line 105: Same pattern repeated for URL-based download
**Why Arc is better:** In the loop over buckets, `options.clone()` happens for each bucket. If a config file defines many buckets, this creates repeated allocations. `Arc<BucketOptions>` would share the options struct across all buckets.
**Estimated impact:****MEDIUM** -- Depends on number of buckets; typically small but can grow.
---
### 9. GitLab Options cloned 4x
**File:**`acorn-cli/src/commands/gather/mod.rs`
**Lines:** 60, 64, 78, 80
**What is cloned:**
- Line 60: `options.clone()` -- `gitlab::Options` passed to `runners()`
- Line 64: `options.clone()` -- cloned again with modified identifier
- Line 78: `options.clone()` -- passed to `create_runner()`
- Line 80: `options.clone()` -- passed to `groups()`
**Why Arc is better:**`gitlab::Options` is cloned 4 times in quick succession. If this struct contains HTTP client state or authentication tokens, the clone cost could be non-trivial. Using `Arc` would share the options across API calls.
**Estimated impact:****MEDIUM** -- Small number of clones, but potentially expensive struct.
---
### 10. Cff in website checks
**File:**`acorn-lib/src/analyzer/mod.rs`
**Lines:** 144, 174
**What is cloned:**
- Line 144: `path.clone()` -- in async closure for website checks
- Line 174: `value.clone()` -- `Identifier.value` String cloned
**Why Arc is better:** The `Cff` struct contains many `Vec` fields (`identifiers`, `references`) that are cloned when the struct is moved into the async closure. Using `Arc<Cff>` for the website check would avoid deep-cloning these collections.
**Estimated impact:****MEDIUM** -- `Cff` is smaller than `ResearchActivity` but still contains nested collections.
---
### 11. Bucket in file_paths method
**File:**`acorn-lib/src/io/config.rs`
**Lines:** 397-401
**What is cloned:**
- Line 397: `self.clone()` -- full `Bucket` struct destructured
- Line 398: `name.clone()` -- `Option<String>` cloned from the destructured bucket
- Line 401: `code_repository.clone()` -- cloned from the same bucket
**Why Arc is better:** The entire `Bucket` is cloned just to access `name` and `code_repository`. Using `&self` would eliminate this clone entirely since both fields are only read.
**Estimated impact:****MEDIUM** -- Single clone but of a struct containing `Repository` enum.
---
### 12. Vale cloned 7x across trait implementations
**File:**`acorn-lib/src/analyzer/mod.rs`
**Lines:** 413, 498, 517, 547, 567, 609, 686
**What is cloned:**
- Line 413: `self.clone().command()` -- `Vale` cloned to call `.command()`
- Line 498: `self.clone().download_checksums()` -- `Vale` cloned for async call
- Line 517: `self.clone().extract()` -- `Vale` cloned
- Line 547: `self.clone().command()` -- cloned again
- Line 567: `self.clone().command()` -- cloned again
- Line 609: `self.clone().command()` -- cloned again
- Line 686: `self.clone().command()` -- cloned again (in `with_system_command`)
**Why Arc is better:**`Vale` is cloned 7 times throughout its trait implementations, mostly just to call `.command()`. The `.command()` method takes `self` by value but only reads fields. Changing the signature to take `&self` would eliminate all these clones.
**Estimated impact:****MEDIUM** -- Many clones but struct is small; the pattern is the issue (consuming `self` for a read-only operation).
**Suggested approach:**
```rust
// Before
fncommand(self)->Command{
// ...
}
// After
fncommand(&self)->Command{
// ...
}
```
---
## LOW IMPACT
Small structs, single clones, or clones of types that are already cheap (e.g., `PathBuf` is ~24 bytes, `String` in trivial contexts).
---
### 13. PathBuf cloned in format command (per file, parallel)
**File:**`acorn-cli/src/commands/format/mod.rs`
**Lines:** 37, 47, 57
**What is cloned:**
- Line 37: `path.clone()` -- `PathBuf` in `par_iter`
- Line 47: `path.clone()` -- same path for context
- Line 57: `path.clone()` -- for display string
**Why Arc is better:**`PathBuf` is small (~24 bytes with inline short string optimization), but cloned 3 times per file in a parallel context. Using a reference or `Arc<PathBuf>` would reduce allocations when processing thousands of files.
**Estimated impact:****LOW** -- Small type, but frequent in parallel context.
---
### 14. PathBuf cloned in link command (per file, parallel)
**File:**`acorn-cli/src/commands/link/mod.rs`
**Lines:** 29, 39, 49, 55
**What is cloned:**
- Line 29: `path.clone()` -- in `par_iter`
- Line 39: `path.clone()` -- for context
- Line 49: `path.clone()` -- for display
- Line 55: `path.clone()` -- for filepath conversion
**Why Arc is better:** Same pattern as format command. 4 clones of `PathBuf` per file in parallel processing.
**Estimated impact:****LOW** -- Small type but frequent.
---
### 15. Options cloned in resolve_paths
**File:**`acorn-cli/src/cli/mod.rs`
**Lines:** 96
**What is cloned:**
- Line 96: `ignore.clone()` and `filter.clone().map(regex_inverse)` -- both `Option<String>` cloned
**Why Arc is better:** Minor; these are `Option<String>` clones that could be avoided with references.
**Estimated impact:****LOW** -- Small strings, single location.
---
### 16. Check struct cloned in render function
**File:**`acorn-cli/src/commands/check/mod.rs`
**Lines:** 171, 176
**What is cloned:**
- Line 171: `issue.clone()` -- `Check` struct cloned to access `.uri`
- Line 176: `issue.clone()` -- `Check` cloned to call `.with_index().report()`
**Why Arc is better:**`Check` struct contains several `Option` fields and an `ErrorKind` enum. Cloned twice per issue in the render function. Since the `Check` is only being read and then transformed, a reference-based approach or `&self` methods would avoid cloning.
**Estimated impact:****LOW** -- Small struct, limited to render path.
---
### 17. Check struct cloned in filter_by_visibility
**File:**`acorn-cli/src/commands/check/mod.rs`
**Line:** 151
**What is cloned:**
- Line 151: `.cloned().collect()` -- every `Check` in the slice is cloned to create a new `Vec`
**Why Arc is better:** If `Check` were `Arc<Check>`, this would become a cheap pointer copy instead of a full struct clone for each element. Alternatively, using references and changing downstream code to accept `&[Check]` would eliminate the clone entirely.
**Estimated impact:****LOW** -- One-time clone per check cycle, but of all results.
---
### 18. Database path cloned in main.rs
**File:**`acorn-cli/src/main.rs`
**Line:** 169
**What is cloned:**
- Line 169: `database_path.clone()` -- `Option<PathBuf>`
**Why Arc is better:** Single clone at startup, trivial impact.
**Estimated impact:****LOW** -- One-time clone at startup.
//! `acorn-lib` is a one-stop-shop for everything related to building and maintaining research activity data (RAD)-related technology, including the Accessible Content Optimization for Research Needs (ACORN) tool.
//! The modules, structs, enums and constants found here support the ACORN CLI, which checks, analyzes, and exports research activity data into useable formats.