Talk Freely, Execute Strictly: Schema-Gated Agentic AI for Flexible and Reproducible Scientific Workflows

Pre-print paper in ‘arXiv’

A new pre-print paper from the Intellegens team explores the effective use of Large Language Models (LLMs) and agentic AI in scientific workflows. Interviews with industrial R&D teams uncover key requirements and the paper proposes a schema-gated architecture that can meet these needs.

Citation Strickland J., Vijeta A., Moores C., Bodek O., Nenchev B., Whitehead T., Phillips C., Tassenberg K., Conduit G.J., Pellegrini B. arXiv.

https://doi.org/10.48550/arXiv.2603.06394

Abstract

Large language models (LLMs) can now translate a researcher’s plain-language goal into executable computation, yet scientific workflows demand determinism, provenance, and governance that are difficult to guarantee when an LLM decides what runs. Semi-structured interviews with 18 experts across 10 industrial R&D stakeholders surface 2 competing requirements–deterministic, constrained execution and conversational flexibility without workflow rigidity–together with boundary properties (human-in-the-loop control and transparency) that any resolution must satisfy. We propose schema-gated orchestration as the resolving principle: the schema becomes a mandatory execution boundary at the composed-workflow level, so that nothing runs unless the complete action–including cross-step dependencies–validates against a machine-checkable specification.

We operationalize the 2 requirements as execution determinism (ED) and conversational flexibility (CF), and use these axes to review 20 systems spanning 5 architectural groups along a validation-scope spectrum. Scores are assigned via a multi-model protocol–15 independent sessions across 3 LLM families–yielding substantial-to-near-perfect inter-model agreement (Krippendorff a=0.80 for ED and a=0.98 for CF), demonstrating that multi-model LLM scoring can serve as a reusable alternative to human expert panels for architectural assessment.
The resulting landscape reveals an empirical Pareto front–no reviewed system achieves both high flexibility and high determinism–but a convergence zone emerges between the generative and workflow-centric extremes. We argue that a schema-gated architecture, separating conversational from execution authority, is positioned to decouple this trade-off, and distil 3 operational principles–clarification-before-execution, constrained plan-act orchestration, and tool-to-workflow-level gating–to guide adoption.

Search