update paper (42a7b65f) · Commits · Research Enablement / ACORN

paper.bib

+10 −0

Original line number	Diff line number	Diff line
		@@ -17,3 +17,13 @@
		Title = {{1,500 scientists lift the lid on reproducibility}},
		Year = 2016
		}

		@article{Asif:2021,
		url = {https://ieeexplore.ieee.org/document/9582393},
		Author = {{Asif}, I. and {Tiddi}, I. and {Gray}, A.},
		Journal = {2021 IEEE 17th International Conference on eScience (eScience)},
		Keywords = {Nanopublications, knowledge graph, ontology},
		Month = oct,
		Title = {{Using Nanopublications to Detect and Explain Contradictory Research Claims}},
		Year = 2021
		}
		No newline at end of file

paper.md

+10 −5

Original line number	Diff line number	Diff line
		@@ -47,11 +47,16 @@ At the intersection of research and communications, research activity data descr
		![Within an organization, ACORN allows research activity data to be documented, analyzed, and publicized - even for projects without publications. By entering at the project level, ACORN provides unique visibility into a new set of variables that incrementally begin to document the broader scientific ecosystem through widespread adoption.](./figures/acorn_workflow.png)

		# Future Work
		- nanopublications
		- knowledgebase = ontology + instances
		- linked data
		- oakley (RAG, agents, etc.)
		- expansion (e.g, DoD, other labs, etc.)
		ACORN’s standardization allows great scalability possibilities. By publishing the RAD contents of in a structured way, including description of mission, challenge, approach, and impact, along with metadata, ACORN can help build knowledge graphs of particular research institutions, domains, and even the larger scientific community. Incorporating linked data through machine-readable, standards-based [JSON-LD]( https://json-ld.org/) can further build out a RAD network and build domain-specific ontologies to be queried.

		Another piece of the research communications puzzle to address is representing often-verbalized and much-cited claims to bring provenance to RAD such as journal articles and fact sheets. This process can prove time-consuming and often results in redundant content as multiple researchers are trying to lay the same foundation of truth on which they’ll build their novel claims.
		Nanopublications can serve as a standard framework to represent these claims, providing a concentrated source of truth for RAD. Nanopublications can bring finer granularity to RAD-adjacent information, including instrument, device, and physical technology offerings. They’ve also been shown to be an indicator of contradiction between different claims. [@Asif: 2021]

		Bringing an interface to the knowledge graph, chatbots could be the initial implementation of ACORN AI functionality, created by embedding ACORN-formatted RAD in a vector database and querying an off-the-shelf large language model on the database. The LLM would use the query to find the most similar or related context, using RAD to inform retrieval augmented generation (RAG) for extra prompt content and then feed through the LLM again to give the user an answer.

		Coupled with RAG methodology, a chatbot could add necessary research data context for truly unique science communication capabilities. Training could allow an existing model to make a base model with desired characteristics (e.g. follow AP style guide) which would then be combined with RAG for the final product.

		The chatbot would let the science speak for itself and could become part of an envisioned federated AI agent ecosystem. This could allow users to ask questions of the entire Department of Energy national laboratories, for example. By using model context protocol, an LLM could provide a synopsis answer based on answers from all national lab agents to tell a user about the DOE Office of Science capabilities to address a particular challenge.

		# Acknowledgment
		This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the work for publication, acknowledges that the US government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the submitted manuscript version of this work, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the [DOE Public Access Plan](https://energy.gov/doe-public-access-plan).