Machine learning tech that hunts for plant biomarkers awarded UKRI funding

TraitSeq, developed by Josh Colmer during his PhD at the Earlham Institute, is an end-to-end laboratory and computational pipeline that uses cutting-edge machine learning (ML) methods to generate biomarkers using transcriptomic data.

These biomarkers have the potential to predict useful physiological, biochemical, or metabolic traits and changes.

The technology is the culmination of Colmer’s involvement in a number of projects during his PhD in the Anthony Hall Group at the Earlham Institute.

Colmer and the team behind TraitSeq have received funding from UKRI’s Innovation to Commercialisation of University Research (ICURe) pilot programme, which supports research teams to shorten the time it normally takes to move a promising idea in the lab to the point of commercialisation.

The team behind TraitSeq will now spend 12 weeks carrying out market discovery activities to build a clearer picture of how the technology could be applied and the areas of industry with the greatest potential to benefit from it.

Josh Colmer, TraitSeq entrepreneurial lead and PhD student based at the Earlham Institute, said: “TraitSeq was born out of a few projects where we realised how valuable it’d be to have a diagnostic tool for spotting biomarkers.

“These could flag a range of important traits for plant breeders, from obvious benefits such as climate resilience or yield through to more subtle things like taste!”

TraitSeq involves both laboratory and computational approaches, which the Earlham Institute’s facilities are uniquely placed to support.

“The lab component consists of a low-cost, high throughput RNA extraction and sequencing pipeline optimised for plant material,” says Colmer. “The computational aspect consists of bespoke ML algorithms and bioinformatics tools for detecting biomarkers and producing trait prediction models using the resulting high-dimensional RNA-Seq datasets.”

Dr Liliya Serazetdinova, Head of Business Development and Impact at the Earlham Institute, said: “What makes TraitSeq so innovative is the computational component. This is how we’re able to accurately and robustly predict measurable targets that relate to changes in phenotype, physiology or metabolism under varying environmental conditions.

“The Earlham Institute works to bridge the gap between biology and data science, and this innovation is a perfect example of how data-intensive bioscience could deliver significant impact.”

TraitSeq uses a bespoke set of gene selection algorithms and machine learning models developed by Colmer and colleagues at the Earlham Institute. These are able to identify a specified number of transcriptomic biomarkers for prediction (trait measurement) and inference (gene regulation understanding) purposes.

With support from ICURe and Earlham Enterprises Ltd, the commercial arm of the Earlham Institute, Colmer will now dedicate time to develop a range of new skills in how to translate research into a commercial venture, how to test it in the market, and how to pitch it to potential investors.

Professor Anthony Hall, Group Leader at the Earlham Institute and science lead on this project, said: “Biomarker-based diagnostics has significantly advanced in precision medicine, yet this approach represents an opportunity for the agriculture sector.

“TraitSeq was initially designed to predict the presence of plant diseases and the circadian clock in plants, but we’ve shown it’s also applicable to human or even livestock trait prediction. The pipeline has already been used in a trial project to predict cancer subtypes in humans at unmatched levels of accuracy.”

Colmer is actively seeking input on the potential of TraitSeq from anyone who thinks the technology may have an application in their area of work. Please get in touch you’d like to find out more.