Jonas Andreas Sibbesen
University of Copenhagen

Accurate genotyping across variant classes and lengths using variant graphs

Monday, November 13, 2017

12:15 — 1:15 PM

Engineering 2, Room 599


Genotype estimates from genome sequencing data are typically based solely on alignments of reads to a reference genome. This works well for simple variation like SNVs, but reads originating from regions with more complex variation often fail to align or align only partially, which reduces the sensitivity for such variants. This problem can be mitigated by first collecting a set of candidate variants across variant signals, individuals and variant databases, and then realign the reads back to the candidates and the reference in an unbiased way. However, this realignment problem has proven computationally difficult.

I will here present BayesTyper, a new method that uses exact alignment of read k-mers to a graph representation of the reference and variant sequences to efficiently obtain unbiased information about the read support for any type of variant. This information is then used by the method to estimate genotypes across variants and individuals using a probabilistic model. BayesTyper provides superior variant sensitivity and genotyping accuracy relative to existing genotyping methods when used to integrate candidate variants across discovery approaches and individuals. This is true for both high and low coverage data, and across all variation classes – in particular for more complex variation such as long insertions and deletions. I will also demonstrate that significant further improvements in sensitivity can be obtained by including a “variation-prior” database containing already known variants – especially on low coverage data. Finally, I will show how this method was used, together with de novo assembly, to generate a rich set of structural variation containing many novel deletions and insertions in the GenomeDenmark project.

Hosted by Benedict Paten