Create your own conference schedule! Click here for full instructions

Abstract Detail

Comparative Genomics/Transcriptomics

Trostle, Alex [1], Goyal, Anshu [1], Galuska, Sally [1], Reardon, Chris [1], Tiley, George [2], Ellis, Jake [1], Li, Zheng [3], Sutherland, Brittany [4], Barker, Michael [3].

Machine learning approaches for the inference of WGDs from gene age distributions.

The inference of whole genome duplications (WGDs) from gene age distributions or Ks plots is frequently more of an art than exact science. Ancient WGDs leave characteristic peaks of gene duplication in Ks plots that are often relatively easy to identify by eye. However, depending on the data source, Ks estimation method, variation in gene birth and death rates, gene retention rates, and other variables, these peaks may not always appear to be prominent. Most of the statistical approaches applied to this problem often search for a peak of duplication that is statistically significant relative to a null background or fit normal distributions to a range of Ks values. Diagnosing WGDs in these cases can often be a fraught exercise because many peaks, frequently spurious, may be identified in Ks plots. Here, we present new machine learning approaches for the inference of ancient WGDs. We simulated millions of gene trees to generate hundreds of thousands of example distributions with and without WGDs of varying ages. Using a variety of classifiers, we achieved greater than 90% accuracy of WGD inference in a large collection of manually curated empirical data. This is a significant improvement over current approaches that can not achieve the same level of accuracy without human intervention. Combined with other recommended improvements in the analysis of gene age distributions, these new classifiers provide a rapid, automated, and accurate approach to infer WGDs.

Log in to add this item to your schedule

1 - University of Arizona, Department of Ecology & Evolutionary Biology, Tucson, AZ, 85721, USA
2 - Duke University, Department of Biology, Durham, NC, 27708, USA
3 - University of Arizona, Department Of Ecology & Evolutionary Biology, P.O. Box 210088, Tucson, Arizona, 85721, US
4 - University Of Arizona, Biology, Department Of Ecology And Evolutionary Biology, University Of Arizona, P.O. Box 210088, Tucson, AZ, 85721, United States

Machine Learning
gene duplication.

Presentation Type: Poster
Session: P, Comparative Genomics and Transcriptomics
Location: Grand Ballroom - Exhibit Hall/Mayo Civic Center
Date: Monday, July 23rd, 2018
Time: 5:30 PM This poster will be presented at 5:30 pm. The Poster Session runs from 5:30 pm to 7:00 pm. Posters with odd poster numbers are presented at 5:30 pm, and posters with even poster numbers are presented at 6:15 pm.
Number: PGT011
Abstract ID:598
Candidate for Awards:Genetics Section Poster Award

Copyright © 2000-2018, Botanical Society of America. All rights reserved