Untargeted metabolomics experiments suffer from large proportions of unannotated molecules. Using a reference data-driven approach, we increase the spectral annotation rate by assigning potential sources to molecular features. We have applied this approach using a food reference database to generate diet readouts from clinical samples.
Surveying the food-associated compounds detected in clinical samples, we can differentiate patients with specific diet types, such as predominantly animal protein- versus plant-based diet. In addition, we can identify specific foods associated with clinical outcomes in disease cohorts. With its broad applications, we envision this approach becoming invaluable in nutrition research as well as many other fields once additional reference datasets become available.