fix: update future work section (7cf014d3) · Commits · Research Enablement / ACORN

paper.md

+7 −6

Original line number	Diff line number	Diff line
		@@ -26,7 +26,7 @@ bibliography: paper.bib
		---

		# Summary
		> Accessible Content Optimization for Research Needs (ACORN) applies standardization, automation, linked data, and institutional knowledge to research activity data (RAD) to draw insights that benefit multiple audiences and aims. ACORN is a command line multitool that creates analysis-ready data from RAD and can also run on remote continuous integration servers for shared RAD repositories. ACORN employs a set of automated processes for informing and/or enforcing defined content schemas to create standardized and highly structured data. Because of its standardized data source, ACORN easily applies computer automation to generate communication assets such as PDFs, PPTs, and web pages. Built using memory-safe Rust, ACORN is portable and accessible for use on any Windows, Mac, or Linux machine.
		Accessible Content Optimization for Research Needs (ACORN) applies standardization, automation, linked data, and institutional knowledge to research activity data (RAD) to draw insights that benefit multiple audiences and aims. ACORN is a command line multitool that creates analysis-ready data from RAD and can also run on remote continuous integration servers for shared RAD repositories. ACORN employs a set of automated processes for informing and/or enforcing defined content schemas to create standardized and highly structured data. Because of its standardized data source, ACORN easily applies computer automation to generate communication assets such as PDFs, PPTs, and web pages. Built using memory-safe Rust, ACORN is portable and accessible for use on any Windows, Mac, or Linux machine.

		# Statement of need
		Communicating research can be difficult — from the high-level scope of a science-focused organization, down to singular projects within that organization. Science data systems created to help communicate research are often isolated and/or specialized to individual suborganizations, teams, or domains. True innovation requires reusable systems that can standardize data across domain boundaries and serve as a nexus for scientists, developers, and communicators.
		@@ -47,16 +47,17 @@ At the intersection of research and communications, research activity data descr
		![Within an organization, ACORN allows research activity data to be documented, analyzed, and publicized - even for projects without publications. By entering at the project level, ACORN provides unique visibility into a new set of variables that incrementally begin to document the broader scientific ecosystem through widespread adoption.](./figures/acorn_workflow.png)

		# Future Work
		ACORN’s standardization allows great scalability possibilities. By publishing the RAD contents of in a structured way, including description of mission, challenge, approach, and impact, along with metadata, ACORN can help build knowledge graphs of particular research institutions, domains, and even the larger scientific community. Incorporating linked data through machine-readable, standards-based [JSON-LD]( https://json-ld.org/) can further build out a RAD network and build domain-specific ontologies to be queried.
		ACORN’s standardization allows great scalability and accessiblity possibilities. By publishing the RAD contents in a structured way, including description of mission, challenge, approach, and impact, along with metadata, ACORN can help build knowledge graphs of particular research institutions, domains, and even the larger scientific community. Incorporating linked data through machine-readable, standards-based [JSON-LD]( https://json-ld.org/) can further build out a RAD knowledgebase with domain-specific ontologies that enable complex queries and automated inference.

		Another piece of the research communications puzzle to address is representing often-verbalized and much-cited claims to bring provenance to RAD such as journal articles and fact sheets. This process can prove time-consuming and often results in redundant content, as multiple researchers try build novel claims on the same foundation of truth.

		Another piece of the research communications puzzle to address is representing often-verbalized and much-cited claims to bring provenance to RAD such as journal articles and fact sheets. This process can prove time-consuming and often results in redundant content as multiple researchers are trying to lay the same foundation of truth on which they’ll build their novel claims.
		Nanopublications can serve as a standard framework to represent these claims, providing a concentrated source of truth for RAD. Nanopublications can bring finer granularity to RAD-adjacent information, including instrument, device, and physical technology offerings. They’ve also been shown to be an indicator of contradiction between different claims. [@Asif: 2021]

		Bringing an interface to the knowledge graph, chatbots could be the initial implementation of ACORN AI functionality, created by embedding ACORN-formatted RAD in a vector database and querying an off-the-shelf large language model on the database. The LLM would use the query to find the most similar or related context, using RAD to inform retrieval augmented generation (RAG) for extra prompt content and then feed through the LLM again to give the user an answer.
		Bringing an interface to the knowledge graph, chatbots could be the initial implementation of ACORN AI functionality, created by embedding ACORN-formatted RAD in a vector database and querying an off-the-shelf large language model (LLMs) on the database. The LLM would use the query to find the most similar or related context, using RAD to inform retrieval augmented generation (RAG) for extra prompt content and then feed through the LLM again to give the user an answer. Because RAD is already input in JSON format, it's clearly defined, machine friendly, and in a predictable, easy-to-validate format. This makes research data highly accessible for AI applications, which, in turn, makes it highly accessible for the layperson.

		Coupled with RAG methodology, a chatbot could add necessary research data context for truly unique science communication capabilities. Training could allow an existing model to make a base model with desired characteristics (e.g. follow AP style guide) which would then be combined with RAG for the final product.
		Coupled with RAG methodology, a chatbot could add necessary research data context for truly unique science communication capabilities. Training could allow an existing model to make a base model with desired characteristics (e.g. follow AP style guide). Through existing open-source LLMs, a trained RAD chatbot could provide information on the same project to both a fifth-grader writing a science report to a researcher at a partnering national laboratory.

		The chatbot would let the science speak for itself and could become part of an envisioned federated AI agent ecosystem. This could allow users to ask questions of the entire Department of Energy national laboratories, for example. By using model context protocol, an LLM could provide a synopsis answer based on answers from all national lab agents to tell a user about the DOE Office of Science capabilities to address a particular challenge.
		The chatbot would let the science speak for itself and could become part of an envisioned federated AI agent ecosystem. This could allow users to ask questions of the entire Department of Energy national laboratories, for example. Leveraging agent-to-agent communication, such as [MCP](https://modelcontextprotocol.io/docs/getting-started/intro) or [A2A](https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/), could provide a synopsis answer based on answers from all national lab agents to tell a user about the DOE Office of Science capabilities to address a particular challenge.

		# Acknowledgment
		This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the work for publication, acknowledges that the US government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the submitted manuscript version of this work, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the [DOE Public Access Plan](https://energy.gov/doe-public-access-plan).