Commit 7cd557da authored Aug 21, 2025 by Carson, Audrey

fix: additional copyedits and citations

parent 06647a72

paper.bib

+6 −0

Original line number	Diff line number	Diff line
		@@ -60,3 +60,9 @@
		@article{Wu:2023, title={An Analysis of Crosswalks from Research Data Schemas to Schema.org}, volume={5}, ISSN={2641-435X}, url={https://doi.org/10.1162/dint_a_00186}, DOI={10.1162/dint_a_00186}, number={1}, journal={Data Intelligence}, author={Wu, Mingfang and Richard, Stephen M. and Verhey, Chantelle and Castro, Leyla Jael and Cecconi, Baptiste and Juty, Nick}, year={2023}, month=mar, pages={100–121} }
		@article{Xia:2024, title={Improving Retrieval Augmented Language Model with Self-Reasoning}, url={http://arxiv.org/abs/2407.19813}, note={arXiv:2407.19813 [cs]}, number={arXiv:2407.19813}, publisher={arXiv}, author={Xia, Yuan and Zhou, Jingbo and Shi, Zhenhui and Chen, Jun and Huang, Haifeng}, year={2024}, month=july, language={en} }
		@article{Yang:2025, title={A Survey of AI Agent Protocols}, url={http://arxiv.org/abs/2504.16736}, DOI={10.48550/arXiv.2504.16736}, note={arXiv:2504.16736 [cs]}, number={arXiv:2504.16736}, publisher={arXiv}, author={Yang, Yingxuan and Chai, Huacan and Song, Yuanyi and Qi, Siyuan and Wen, Muning and Li, Ning and Liao, Junwei and Hu, Haoyi and Lin, Jianghao and Chang, Gaowei and Liu, Weiwen and Wen, Ying and Yu, Yong and Zhang, Weinan}, year={2025}, month=apr }
		@article{van_Dalen:2012, title={Intended and unintended consequences of a publish-or-perish culture: A worldwide survey}, volume={63}, rights={© 2012 ASIS&T}, ISSN={1532-2890}, url={https://onlinelibrary.wiley.com/doi/abs/10.1002/asi.22636}, DOI={10.1002/asi.22636}, number={7}, journal={Journal of the American Society for Information Science and Technology}, author={van Dalen, Hendrik P. and Henkens, Kène}, year={2012}, pages={1282–1293}, language={en} }
		@article{Sochat:2018, title={The Scientific Filesystem}, volume={7}, ISSN={2047-217X}, url={https://academic.oup.com/gigascience/article/doi/10.1093/gigascience/giy023/4931737}, DOI={10.1093/gigascience/giy023}, number={5}, journal={GigaScience}, author={Sochat, Vanessa}, year={2018}, month=may, language={en} }
		@article{Puebla:2024, title={Building Trust: Data Metrics as a Focal Point for Responsible Data Stewardship}, ISSN={2644-2353, 2688-8513}, url={https://hdsr.mitpress.mit.edu/pub/l3g0j3bk/release/1}, DOI={10.1162/99608f92.e1f349c2}, number={Special Issue 4}, journal={Harvard Data Science Review}, publisher={The MIT Press}, author={Puebla, Iratxe and Lowenberg, Daniella}, year={2024}, month=apr, language={en} }
		@book{OSTP:2022, title={Desirable Characteristics of Data Repositories for Federally Funded Research}, rights={https://creativecommons.org/publicdomain/zero/1.0/}, url={https://repository.si.edu/handle/10088/113528}, DOI={10.5479/10088/113528}, institution={Executive Office of the President of the United States}, author={White House Office of Science and Technology Policy (OSTP)}, year={2022}, month=may, language={en} }
		@article{Lin:2020, title={The TRUST Principles for digital repositories}, volume={7}, rights={2020 This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply}, ISSN={2052-4463}, url={https://www.nature.com/articles/s41597-020-0486-7}, DOI={10.1038/s41597-020-0486-7}, number={1}, journal={Scientific Data}, publisher={Nature Publishing Group}, author={Lin, Dawei and Crabtree, Jonathan and Dillo, Ingrid and Downs, Robert R. and Edmunds, Rorie and Giaretta, David and De Giusti, Marisa and L’Hours, Hervé and Hugo, Wim and Jenkyns, Reyna and Khodiyar, Varsha and Martone, Maryann E. and Mokrane, Mustapha and Navale, Vivek and Petters, Jonathan and Sierman, Barbara and Sokolova, Dina V. and Stockhause, Martina and Westbrook, John}, year={2020}, month=may, pages={144}, language={en} }

paper.md

+3 −5

Original line number	Diff line number	Diff line
		@@ -31,13 +31,11 @@ bibliography: paper.bib
		Accessible Content Optimization for Research Needs (ACORN) applies standardization, automation, linked data, and institutional knowledge to research activity data (RAD) to create actionable insights and ultimately enable new research. ACORN is a command line multitool that creates analysis-ready data from RAD. It can also run on remote continuous integration servers for shared RAD repositories. ACORN employs automated processes for informing and/or enforcing defined content schemas to create standardized, highly structured data. Because of its standardized data source, ACORN easily applies computer automation to generate communication assets such as PDFs, PPTs, and web pages. Built using memory-safe Rust, ACORN is portable and accessible for use on any Windows, Mac, or Linux machine. ACORN's standardized approach ingests and maintains data in a consistent format to enable immediate analysis and use, building progressively more powerful datasets.

		# Statement of need
		Communicating research can be difficult — from the high-level scope of a science-focused organization, down to singular projects within that organization. Science data systems created to help communicate research are often isolated and/or specialized to individual suborganizations, teams, or domains. True innovation requires reusable systems that can standardize data across domain boundaries and serve as a nexus for scientists, developers, and communicators.
		Communicating research can be challenging — from the high-level overview of a research institution, down to singular projects within that institution. Science data systems created to help communicate research are often isolated and/or specialized to individual suborganizations, teams, or domains. Research communication is further complicated by a lack of consistency and documentation in research data and metadata, preventing external audiences, such as jobseekers, policymakers, funders, and the general public from finding the information they need, despite federal guidance for clear, consistent documentation.[@Lin,2020],[OSTP:2022] True innovation requires reusable systems that can standardize data across domain boundaries and serve as a nexus for scientists, developers, and communicators.[@Sochat:2018],[@Puebla:2024]

		Research communication is further complicated by a lack of consistency and documentation in research data and metadata, preventing external audiences, such as jobseekers, policymakers, funders, and the general public from finding the information they need.
		Traditional research practices are built on antiquated, habitual processes. These are particularly dangerous in an environment where publishing is critical to career survival.[@Grimes:2018] Researchers may be tempted to do the bare minimum, skip steps, and pursue sensational or novel paths in the name of journal acceptance and credibility.[@van_Dalen:2012] These practices have led to the twin reproducibility [@Baker:2016] and replicability [@Camerer:2018] crises.

		Traditional research practices are built on antiquated processes that have become old habits. These are particularly dangerous in an environment in which getting published is critical to career livelihood.[@Grimes:2018] Researchers may be tempted to do the bare minimum, skip steps, and pursue sensational or novel paths in the name of journal acceptance and gaining credibility.[@van_Dalen:2012] These practices have led to the twin reproducibility [@Baker:2016] and replicability [@Camerer:2018] crises.

		Conducting tustworthy research is hard work. Cataloging and analyzing collections of research prove another challenge, and institutions should hold research metadata to a high standard of replicability, accessibility, and ensure they are an integral part of data systems.[@Baca:2016] Automation and data architecture, enabled through ACORN, can make it easier.
		Conducting tustworthy research is hard work. Cataloging and analyzing collections of research prove another challenge, and institutions should hold research metadata to a high standard of replicability, accessibility, and ensure they are an integral part of data systems.[@Baca:2016],[OSTP:2022] Automation and data architecture, enabled through ACORN, can make it easier.

		ACORN enables quick analysis of research project portfolios, allowing decision-makers to pick and pull solutions for execution, sponsor discussions, and mission applications. ACORN has three main outputs: analysis-ready data applicable to AI/ML research; target artifacts: from the ACORN-enabled content process that creates a single source of truth for research activity data from which users can generate communication artifacts; and understanding: maintaining data in the same format for programmatic analysis and enhanced understanding and better application of AI/ML practices. This collection of tools allows researchers to leverage the benefits of connected data and automate numerous tasks essential to science and communication.