Thursday Mar 16, 2017
3:30pm – 4:30pm, E2-599
I will describe an optimized version of the Broad Institute best practices BWA/GATK pipeline that we have developed to run in the Microsoft Azure cloud as an easy-to-use web service. This pipeline gives similar accuracy to standard BWA/GATK, but can process a 30x whole-genome FASTQ into aligned BAM and VCF in a few hours instead of a day or more. We took the original BWA and GATK algorithms and combined them with the high-performance input/output and compute framework in our SNAP aligner, then rewrote key components of the algorithms for speed while retaining compatibility. We also built a scalable, reliable, secure cloud service to simultaneously process large numbers of genomes across a fleet of machines.

Ravi Pandya works in the Microsoft Genomics group on high-performance algorithms for genome alignment, assembly, and structural variation. He is one of the authors of the SNAP short-read aligner, which can produce high-quality genome alignments up to 10x faster than other state-of-the art aligners. He is also involved in the BeatAML collaboration led by Brian Druker’s lab at OHSU, to apply machine learning, predictive analytics, and systems biology to recommend personalized, targeted drug combinations for refractory leukemia patients.