If you pay attention to the world of life science research, you will have come across oligonucleotides. For the uninitiated, even the word itself seems complicated and a little obscure. This perhaps reflects the underlying science which, it turns out, can also be quite difficult. But this is an area that we should all care about, because it holds huge therapeutic potential for everything from genetic disorders to cancer. It also presents some really interesting technical challenges that machine learning is well-positioned to tackle.

Oligonucleotides are single-stranded chains of nucleotides, the building blocks of DNA and RNA. These molecules can be custom-built with any sequence we choose, which means we can design oligonucleotides to precisely interact with and influence gene expression inside cells, whether that’s by targeting DNA or RNA. In other words, oligonucleotides give us a way to directly control biological processes for therapeutic benefits. Even more exciting, they open the door to personalized medicine where treatments are tailored to the unique characteristics of each individual patient.
The challenge is that, even though oligonucleotides are short compared to full strands of DNA or RNA, they’re still larger and more complex than typical drug molecules. Like most biological systems, controlling how they’re synthesized and manufactured, especially at scale, isn’t easy. These production processes often result in high percentages of variations and impurities. In this emerging field, the expertise required to understand and fine-tune the many process parameters involved is concentrated in a relatively small number of organizations. Getting it right, even for tasks that should eventually become routine for large-scale production, currently depend heavily on senior experts.
On the surface this seems like a perfect scenario for machine learning. We have a growing body of data that is likely to hold crucial insights into how different process parameters impact experimental outcomes. Can’t we just build a machine learning model and use it to predict, for example, which parameters will maximize yields and minimize impurities for a given oligonucleotide sequence? The answer is yes, we can; but, of course, it’s easier said than done. This is because available data, while increasing all the time, is still limited and fragmented – and these are systems with many parameters, complex inter-relationships, and specialist data types. A useful machine learning model would need to handle not just numerical data, but elements such as sequence information and the outputs of analytical methods used to characterize oligonucleotides.
Tackling these challenges has been the focus of a two-year project led by Intellegens in collaboration with CPI and six major pharmaceutical and biotechnology companies. Together, the team has developed a software tool that makes it easy to bring together raw synthesis logs, deconvoluted mass spectrometry outputs, and reagent/yield data. Using the Alchemite™ method, the tool builds a machine learning model capable of providing sequence-specific recommendations for process parameters that will reduce the impurity burden, improve crude yield, and cut the number of experiments required. Such guidance reduces the need for senior expert input on more routine aspects of process development, so that valuable resources can be better-focused.


Project results have been impressive, with validation work reducing the experimental burden required to hit defined purity/yield targets, shifting from dozens of trials to single iteration optimization in many cases. In some examples, critical impurities reduced from around 10% to below 2% and there were crude purity gains of 7-12%. The tool is now available as the Alchemite™ for Oligonucleotide Manufacturing solution and it was demonstrated at a recent Intellegens webinar.
Why not take a look? If oligonucleotides is your area of expertise, you may also want to read the information linked below or watch the webinar recording for more detail about the solution. The oligonucleotide field remains a tricky area, but machine learning is making life for oligos researchers a little easier and bringing the associated dream of new therapies a little closer.