A quartet-based approach for inferring phylogenetically informative features from genomic and phenomic data
Brandenburg, Hack, Mosig
Computational and Structural Biotechnology Journal
Aug
27
2025
Neural networks are widely used in bioinformatics to extract features from morphological, structural, and sequence data of different taxa. A key question is whether such features are compatible with a known phylogenetic tree describing the evolutionary relationships among the taxa. We address this question with a machine learning approach that takes taxon-specific data and a reference tree as input, and trains a neural network to produce a latent feature space whose pairwise distances are consistent with the tree topology. Our approach builds on the established role of quartets in distance-based phylogeny, leading to a quartet-based loss function for neural network training. In a proof-of-concept study using bacterial ribosomal RNA sequences, we show that the learned feature distances closely match the reference phylogeny. This framework can be applied to diverse biological data types, providing a principled way to incorporate phylogenetic constraints into neural network-based feature extraction.