Conversational AI for science: cutting through the noise

Our head of Agentic AI, Joel Strickland, offers a glimpse of the future of conversational AI for science in this second in a short series of ‘blogs from the road’ as he works remotely while travelling through South and Central America.

Blog post by Joel Strickland.

I’m writing this from Rio de Janeiro, with Carnival just around the corner, so “cutting through the noise” feels especially relevant right now!

Over the past couple of years at Intellegens, we have been exploring conversational AI for scientific work. The aim was never to build a chatbot that gives nice answers. It was to build something that can support real R&D workflows, where inputs need to be valid, steps need to be clear, and results need to be repeatable.

This started with NOA (No Ordinary Assistant), an internal testbed for using natural language to orchestrate validated workflows. In parallel, we have been building toward our broader Alchemite™ Insight roadmap for generative and agentic AI. That work taught us a lot, including where we were wrong, what not to do, and how to combine the flexibility of conversation with the rigour of science. Most importantly, the mistakes were useful, because they shaped what we are building next.

The core problem: flexibility vs rigour

If you want conversational interfaces in science, you have to deal with a basic tension. People want the flexibility of a conversation. They want to explore, ask follow-up questions, and change their minds halfway through. But scientific work needs rigour. You need to know what was run, on what data, with what settings, and you need to be able to reproduce it.

General assistants are fantastic at exploration. However, that is not the objective in science. Science is about being correct, checkable, and repeatable.

The main lesson for us was simple: you do not get rigour from prompting a general chatbot or an LLM. You get it from system design. You need an architecture that combines the conversational abilities of an agent with the robustness of proven scientific workflows.

Figure 1. Schema-gated tool validation: Before a tool is available to the agent, it is added to a validated tool library only after passing schema and compliance checks (for example type checks, documentation completeness, and service availability).

What surprised us in practice

Two things stood out once we started speaking to customers and observing real use patterns.

1) Smaller agents beat one big agent, if you orchestrate them well

We had more success breaking tasks into small specialist agents and using an orchestrator to control the overall flow and logic. That made the system easier to reason about. It was clearer what each part was responsible for, and easier to put guardrails in the right place. This is especially important in scientific software, where the cost of a “plausible but wrong” step can be high.

2) Determinism matters, a lot

In scientific workflows, repeatability is the baseline because it lets you check a result and trust that a colleague could reproduce it later. So we have leaned heavily on the Model Context Protocol (MCP), which gives a structured, deterministic way for agents to call tools through our productionised API. The conversation can be flexible, but execution cannot be. If the tool calls are not predictable, debuggable, and traceable, it is hard to trust the outputs and hard to repeat the findings.

Figure 2. Validated workflow library: Execution happens through predefined, validated workflows composed of approved tools. This is what makes conversational interaction compatible with deterministic, reproducible scientific execution.

What we are building now: Phase 3 Alchemite Insight Agent

These learnings are feeding directly into the Phase 3 Alchemite Insight Agent – the next stage in our roadmap to integrate generative and agentic AI with the machine learning power of our Alchemite software.

Our core idea is to embed specialist agents across the Alchemite platform, each focused on the page it supports (data exploration, model analysis, design of experiments, and so on). Each agent has access to the right tools, documentation, and context for the questions we expect users to ask on that page. Tool use is deterministic via MCP. Chat remains free-form, with conversation history and setup managed using Google’s Agent Development Kit (ADK).

In practice, this means users would be able to work naturally in language while the system keeps execution anchored to validated tools and workflows. The figures above show the foundation of that approach: tools are validated before they are available, and workflows are validated before they run. Our goal is not to replace scientific judgement. It is to make it quicker and easier for users to do the work they are already trying to do, without lowering the standards of rigour. That means:

  • reducing onboarding effort
  • helping users uncover new insights from their data
  • helping them get more value from the platform.

Where this is going

This new wave of technology has opened up real opportunities to improve the user experience in Alchemite™.

It also gives us new ways to improve the models themselves, for example, by using an LLM to support domain-specific feature engineering. It is a practical form of knowledge transfer from a foundation model into our models.

For me, some of the most important progress has come from things that did not work first time. Plenty of dead ends, and plenty of time spent digging into edge cases, so our users do not have to. We have learned quickly by building, testing, and correcting, and that has guided the roadmap. 

This post focuses on the key lessons. If you’d like more detail on the NOA architecture and how it connects to our roadmap, you can read the full technical paper here.

Next in this series, I’ll write a longer piece on the NOA architecture and what we learned from building schema-gated conversational workflows for science.

In the meantime, we are continuing Phase 3 development with a simple principle in mind: make scientific workflows easier to use without making them any less rigorous.

Search