Finding a gene in the stacks

Genetic disease diagnosis can be time-consuming because of the extensive literature searching required. To speed this process, Birgmeier et al. developed AMELIE (Automatic Mendelian Literature Evaluation), an end-to-end machine learning approach with web interface that finds relevant literature supporting the disease causality of genetic variants and their association with different clinical presentations. The pipeline also parses the literature to rank the most likely candidate causative genes that best explain a given patient’s symptoms and outperformed similar algorithms when compared side by side. AMELIE could help clinicians narrow the field of possible causative genes, shortening the time required for expert diagnosis of Mendelian diseases.



The diagnosis of Mendelian disorders requires labor-intensive literature research. Trained clinicians can spend hours looking for the right publication(s) supporting a single gene that best explains a patient’s disease. AMELIE (Automatic Mendelian Literature Evaluation) greatly accelerates this process. AMELIE parses all 29 million PubMed abstracts and downloads and further parses hundreds of thousands of full-text articles in search of information supporting the causality and associated phenotypes of most published genetic variants. AMELIE then prioritizes patient candidate variants for their likelihood of explaining any patient’s given set of phenotypes. Diagnosis of singleton patients (without relatives’ exomes) is the most time-consuming scenario, and AMELIE ranked the causative gene at the very top for 66% of 215 diagnosed singleton Mendelian patients from the Deciphering Developmental Disorders project. Evaluating only the top 11 AMELIE-scored genes of 127 (median) candidate genes per patient resulted in a rapid diagnosis in more than 90% of cases. AMELIE-based evaluation of all cases was 3 to 19 times more efficient than hand-curated database–based approaches. We replicated these results on a retrospective cohort of clinical cases from Stanford Children’s Health and the Manton Center for Orphan Disease Research. An analysis web portal with our most recent update, programmatic interface, and code is available at

[Read more…]