Metabolite Identification through Machine Learning — Tackling CASMI Challenge Using FingerID

Huibin Shen,Nicola Zamboni,M. Heinonen,Juho Rousu

Published 2013 in Metabolites

ABSTRACT

Metabolite identification is a major bottleneck in metabolomics due to the number and diversity of the molecules. To alleviate this bottleneck, computational methods and tools that reliably filter the set of candidates are needed for further analysis by human experts. Recent efforts in assembling large public mass spectral databases such as MassBank have opened the door for developing a new genre of metabolite identification methods that rely on machine learning as the primary vehicle for identification. In this paper we describe the machine learning approach used in FingerID, its application to the CASMI challenges and some results that were not part of our challenge submission. In short, FingerID learns to predict molecular fingerprints from a large collection of MS/MS spectra, and uses the predicted fingerprints to retrieve and rank candidate molecules from a given large molecular database. Furthermore, we introduce a web server for FingerID, which was applied for the first time to the CASMI challenges. The challenge results show that the new machine learning framework produces competitive results on those challenge molecules that were found within the relatively restricted KEGG compound database. Additional experiments on the PubChem database confirm the feasibility of the approach even on a much larger database, although room for improvement still remains.

PUBLICATION RECORD

Publication year
2013
Venue
Metabolites
Publication date
2013-06-01
Fields of study
Medicine, Chemistry, Computer Science
Identifiers
DOI 10.3390/metabo3020484 PMID 24958002 PMCID 3901273
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

ChemCalc: A Building Block for Tomorrow's Chemical Infrastructure
2013cited by this paper
Metabolite identification and molecular fingerprint prediction through machine learning
2012influential reference
CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures
2011cited by this paper
Open Babel: An open chemical toolbox
2011cited by this paper
In silico fragmentation for computer assisted identification of metabolite mass spectra
2010cited by this paper
MassBank: a public repository for sharing mass spectral data for life sciences.
2010influential reference
Computational mass spectrometry for metabolomics: Identification of metabolites and small molecules
2010cited by this paper
Computational methods for metabolic reconstruction.
2010cited by this paper
Computational strategies for metabolite identification in metabolomics.
2009cited by this paper
FiD: a software for ab initio structural identification of product ions from tandem mass spectrometric data.
2008cited by this paper
PubChem: Integrated Platform of Small Molecules and Biological Activities
2008cited by this paper
SIRIUS: decomposing isotope patterns for metabolite identification
2008cited by this paper
New mass-spectrometry-based strategies for lipids.
2007cited by this paper
Ab Initio Prediction of Molecular Fragments from Tandem Mass Spectrometry Data
2006cited by this paper
Isotopomer distribution computation from tandem mass spectrometric data with overlapping fragment spectra
2005cited by this paper
Probability Product Kernels
2004cited by this paper
Metabolomics and systems biology: making sense of the soup.
2004cited by this paper
[Plant metabolomics].
2003cited by this paper
Computing positional isotopomer distributions from tandem mass spectrometric data.
2002cited by this paper
KEGG: Kyoto Encyclopedia of Genes and Genomes
2000cited by this paper
KEGG: Kyoto Encyclopedia of Genes and Genomes
1999cited by this paper
Support-Vector Networks
1995cited by this paper
CALCULATION OF ISOTOPE DISTRIBUTIONS IN MASS SPECTROMETRY. A TRIVIAL SOLUTION FOR A NON-TRIVIAL PROBLEM
1991cited by this paper
A GENERAL APPROACH TO CALCULATING ISOTOPIC DISTRIBUTIONS FOR MASS SPECTROMETRY.
1983cited by this paper

CITED BY

A novel Transformer-MLP fusion network for metabolite identification from mass spectra.
2025cites this paper
Bridging Ethnobotanical Knowledge and Multi-Omics Approaches for Plant-Derived Natural Product Discovery
2025cites this paper
Artificial intelligence in digital health: opportunities, challenges, and future
2025cites this paper
Navigating common pitfalls in metabolite identification and metabolomics bioinformatics
2024cites this paper
Advances in the Application of Artificial Intelligence-Based Spectral Data Interpretation: A Perspective.
2023cites this paper
Non-Targeted Metabolomic Analysis of Arabidopsis thaliana (L.) Heynh: Metabolic Adaptive Responses to Stress Caused by N Starvation
2023cites this paper
CFM-ID 4.0 – a web server for accurate MS-based metabolite identification
2022cites this paper
Annotating metabolite mass spectra with domain-inspired chemical formula transformers
2022cites this paper
Emerging computational paradigms to address the complex role of gut microbial metabolism in cardiovascular diseases
2022cites this paper
CFM-ID 4.0: More Accurate ESI-MS/MS Spectral Prediction and Compound Identification.
2021cites this paper
A map of mass spectrometry-based in silico fragmentation prediction and compound identification in metabolomics
2021cites this paper
Incorporating structural similarity into a scoring function to enhance the prediction of binding affinities
2021influential citation
PubChem in 2021: new data content and improved web interfaces
2020cites this paper
Recent advances on constraint-based models by integrating machine learning.
2019cites this paper
Improved Small Molecule Identification through Learning Combinations of Kernel Regression Models
2019cites this paper
Challenges and emergent solutions for LC-MS/MS based untargeted metabolomics in diseases.
2018cites this paper
Machine Learning Methods for Analysis of Metabolic Data and Metabolic Pathway Modeling
2018cites this paper
Applications of Deep Learning in Biomedicine
2018cites this paper
Artificial intelligence used in genome analysis studies
2018cites this paper
Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics
2018cites this paper
Automated recommendation of metabolite substructures from mass spectra using frequent pattern mining
2017cites this paper
A new distance measure for non-identical data with application to image classification
2016cites this paper
Fast metabolite identification with Input Output Kernel Regression
2016cites this paper
Dereplication, sequencing and identification of peptidic natural products: from genome mining to peptidogenomics to spectral networks.
2016cites this paper
Using fragmentation trees and mass spectral trees for identifying unknown compounds in metabolomics.
2015cites this paper
MINEs: open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics
2015cites this paper
Metabolite identification through multiple kernel learning on fragmentation trees
2014cites this paper
Network biology approaches reveal a link between ribosome biogenesis and metabolic reprogramming in ageing skeletal muscles
2014cites this paper
Live and learn from mistakes: A lightweight system for document classification
2013cites this paper
CASMI: And the Winner is ..
2013cites this paper