https://doi.org/
You're currently viewing an old version of this dataset. To see the current version, click here.

Data and Workflows for: Machine learning-based hazard-driven prioritization of features in nontarget screening of environmental high-resolution mass spectrometry data

Nontarget high-resolution mass spectrometry screening (NTS HRMS/MS) can detect thousands of organic substances in environmental samples. However, new strategies are needed to focus time-intensive identification efforts on features with the highest potential to cause adverse effects instead of the most abundant ones. To address this challenge, we developed MLinvitroTox, a machine-learning framework that uses molecular fingerprints derived from fragmentation spectra (MS2) for a rapid classification of thousands of unidentified HRMS/MS features as toxic/nontoxic based on nearly 400 target-specific and over 100 cytotoxic endpoints from ToxCast/Tox21. Model development results demonstrated that using customized molecular fingerprints and models, over a quarter of toxic endpoints and the majority of associated mechanistic targets could be accurately predicted with sensitivities exceeding 0.95. Notably, SIRIUS molecular fingerprints and xboost (Extreme Gradient Boosting) models with SMOTE (Synthetic Minority Over-sampling Technique) for handling data imbalance was a universally successful and robust modeling configuration. Validation of MLinvitroTox on MassBank spectra showed that toxicity could be predicted from molecular fingerprints derived from MS2 with an average balanced accuracy of 0.75. By applying MLinvitroTox to environmental HRMS/MS data, we confirmed the experimental results obtained with targeted analysis and narrowed the analytical focus from tens of thousands of detected signals to 783 features linked to potential toxicity, including 109 spectral matches and 30 compounds with confirmed toxic activity.

Data and Resources

Citation

Metadata

Author
  • [
  • "
  • A
  • r
  • t
  • u
  • r
  • i
  • ,
  • K
  • a
  • s
  • i
  • a
  • "
  • ,
  • "
  • H
  • o
  • l
  • l
  • e
  • n
  • d
  • e
  • r
  • ,
  • J
  • u
  • l
  • i
  • a
  • n
  • e
  • "
  • ]
Curator Arturi, Kasia
Contact Hollender, Juliane <Juliane.Hollender@eawag.ch>