giab ashg webinar 160224

Download Giab ashg webinar 160224

If you can't read please download the document

Post on 17-Jan-2017

459 views

Category:

Health & Medicine

0 download

Embed Size (px)

TRANSCRIPT

Genome in a Bottle Consortium August 2015 NIST, Gaithersburg, MD

Genome in a Bottle Consortium February 24, 2016Reference Materials for Human Genome Sequencing

Justin Zook, Ph.D and Marc Salit, Ph.D. National Institute of Standards and Technology

genomeinabottle.org

OutlineGenome in a Bottle (GIAB) productsCurrent and future workBest practices for using GIAB products to benchmark variant callsGenome in a BottleOpen consortium to develop well-characterized genomes for benchmarking100-150 public, private, and academic participants at workshops

genomeinabottle.org

GIAB ScopeThe Genome in a Bottle Consortium is developing the reference materials, reference methods, and reference data needed to assess confidence in human whole genome variant calls. Priority is authoritative characterization of human genomes.GIAB steering committee, Aug 2015

genomeinabottle.orgNOT ctDNA, analytical controls for panels

Well-characterized, stable RMsObtain metrics for validation, QC, QA, PTDetermine sources and types of bias/errorLearn to resolve difficult structural variantsImprove reference genome assemblyOptimizationEnable regulated applications

genomeinabottle.org

Analytical PerformanceUse well-characterized genomic DNA reference materials to benchmark performanceTools to facilitate their useWith the Global Alliance Data Working Group Benchmarking Team

generic measurement process

genomeinabottle.orgHigh-confidence SNP/indel calls

Methods to develop SNP/indel call set described in manuscriptBroad and quick adoption of call set for benchmarkingstruck nerveZook et al., Nature Biotechnology, 2014.

genomeinabottle.org

Candidate NIST Reference MaterialsGenomePGP IDCoriell IDNIST IDNIST RM #CEPH Mother/DaughterN/AGM12878HG001RM8398AJ SonhuAA53E0GM24385HG002RM8391 (son)/RM8392 (trio)AJ Fatherhu6E4515GM24149HG003RM8392 (trio)AJ Motherhu8E87A9GM24143HG004RM8392 (trio)Asian Sonhu91BD69GM24631HG005RM8393Asian FatherhuCA017EGM24694N/AN/AAsian Motherhu38168CGM24695N/AN/A

Note: RMs 8391 to 8393 are planned for release by end of Q2 2016

genomeinabottle.orgDatasetAJ SonAJ ParentsChinese sonChinese parentsNA12878Illumina Paired-endXXXXXIllumina Long Mate pairXXXXXIllumina moleculoXXXXXComplete GenomicsXXXXXComplete Genomics LFRXXXIon exomeXXXXBioNanoXXXX10XXXXPacBioXXXSOLiD single endXXXIllumina exomeXXXXOxford NanoporeX

genomeinabottle.orgPaper describing the data

genomeinabottle.orgData Release: Real-time, Open, Public ReleaseIndividual DatasetsUploaded to GIAB FTP site as data are collectedIncludes raw reads, aligned reads, and variant/reference calls12 datasets described in bioRxiv paperDevelop SNP, indel, and homozygous reference calls similar to NA12878Developing methods to form high-confidence calls for difficult variant types and regionsReleased calls are versionedPreliminary call-sets will be made available to be critiquedIntegrated High-confidence Calls

genomeinabottle.org

SNP/Indel Integration Method UpdateImplementing refined integration methods Developed so others can readily reproduce resultsConsistent results for all GIAB genomesSimpler process taking advantage of best practices for each technologyValidating with released NA12878 RM dataPreliminary comparisons show minor changesApplication to PGP triosPlan to analyze AJ trio by Q2 2016Release of NIST RMs in Q2 2016Develop calls for GRCh38

genomeinabottle.org

Proposed approach to form high-confidence SV (and non-SV) callsAug/Dec 2015Aug 2015-Jan 2016Planning in Jan-Feb 2016Feb 2016 and beyond

genomeinabottle.orgPreliminary comparisons of 17 Deletion CallsetsSensitivity to calls in 2 technologies

NOTE: These are preliminary comparisons of data under active development and likely different from true sensitivity of callers

genomeinabottle.orgPreliminary comparisons of 17 Deletion CallsetsDifference between predicted size and median predicted size

NOTE: These are preliminary comparisons of data under active development and likely different from true size accuracy

genomeinabottle.orgPreliminary comparisons of 17 Deletion CallsetsNumber of unique calls

NOTE: These are preliminary comparisons of data under active development without filtering and unique calls may be correct

genomeinabottle.orgGeT-RM Browser from NCBI and CDC

http://www.ncbi.nlm.nih.gov/variation/tools/get-rm/Allows visualization of data underlying call each call

genomeinabottle.org

Global Alliance for Genomics and Health Benchmarking Task TeamProgress:Initial version of standardized definitions for performance metrics like TP, FP, and FN.Continued development of sophisticated benchmarking toolsvcfeval Len Trigghap.py Peter Kruschevgraph Kevin Jacobs Standardized intermediate and final file formatsStandardized bed files with difficult genome contexts for stratificationgithub.com/ga4gh/benchmarking-tools

genomeinabottle.org

Proposed Performance Metrics DefinitionsDefine TP/FP/FN/TN in 4 ways depending on required stringency of match:Loose match: TP if within x-bp of a true variantAllelle match: TP if ALT allele matchesGenotype match: TP if genotype and ALT allele matchPhasing match: TP if genotype, ALT allele, and phasing with nearby variants all matchTrue negatives are difficult to define because an infinite number of potential alleles exist

genomeinabottle.orgApproaches to Benchmarking Variant CallingWell-characterized whole genome Reference MaterialsMany samples characterized in clinically relevant regionsSynthetic DNA spike-insCell lines with engineered mutationsSimulated readsModified real readsModified reference genomesConfirming results found in real samples over time

genomeinabottle.orgChallenges in Benchmarking Small Variant CallingIt is difficult to do robust benchmarking of tests designed to detect many analytes (e.g., many variants)Easiest to benchmark only within high-confidence bed file, butBenchmark calls/regions tend to be biased towards easier variants and regionsSome clinical tests are enriched for difficult sitesChallenges with benchmarking complex variants near boundaries of high-confidence regionsAlways manually inspect a subset of FPs/FNsStratification by variant type and region is importantAlways calculate confidence intervals on performance metrics

genomeinabottle.orgBenchmarking on PrecisionFDA

genomeinabottle.orgAcknowledgmentsFDA Many members of Genome in a BottleNew members welcome!Sign up on website for email newsletters

GIAB Steering CommitteeMarc Salit Justin ZookDavid Mittelman Andrew Grupe Michael EberleSteve Sherry Deanna Church Francisco De La VegaChristian Olsen Monica Basehore Lisa Kalman Christopher Mason Elizabeth Mansfield Liz Kerrigan Leming Shi Melvin Limson Alexander Wait Zaranek Nils Homer Fiona HylandSteve Lincoln Don Baldwin Robyn Temple-Smolkin Chunlin XiaoKara NormanLuke Hickey

genomeinabottle.orgFor More Informationwww.genomeinabottle.org - sign up for general GIAB and Analysis Team google group emails

github.com/genome-in-a-bottle Guide to GIAB data & ftp

www.slideshare.net/genomeinabottle

www.ncbi.nlm.nih.gov/variation/tools/get-rm/ - Get-RM Browser

Data: http://biorxiv.org/content/early/2015/09/15/026468

Global Alliance Benchmarking Teamhttps://github.com/ga4gh/benchmarking-tools

Twice yearly public workshops Winter at Stanford University, California, USASummer at NIST, Maryland, USA

Justin Zook: jzook@nist.govMarc Salit: salit@nist.gov

genomeinabottle.orgtumor-normal-somatic-copy-numbeChrPosLenQualCopy NumberLOHchr116901000700003.013FALSEchr1142569000960003.033FALSEchr11448390004090003.023FALSEchr3195409000660003.023FALSEchr41905380002870003.013FALSEchr6573500002170003.043FALSEchr1042371000340003.023FALSEchr10469470002060003.053FALSEchr1154907000200003.013FALSEchr1418998000960003.023FALSEchr1419435000350003.023FALSEchr14201920002240003.533FALSEchr15203920001230003.013FALSEchr16323770002200003.013FALSEchr16338780001440003.013FALSEchr1924509000920003.023FALSEchr2194060002500003.023FALSEchr21103660002310003.033FALSEchr21107320001260003.033FALSErow.namesCGCNVCGSVCSHLassemblysnifflesBioNanoSpiralCortexCommonLawPBHoneySpotsPBHoneyTailsMetaSVParliamentPacBioParliamentAssembly10XMultibreakSVCNVnatorParliamentPacBioForceParliamentAssemblyForceBionanoHaplo1sens2techlt1000.000.000.560.000.000.910.290.000.500.000.020.010.010.000.740.000.580.530.002sens2tech100to1k0.000.420.590.020.010.540.060.000.620.000.790.760.670.000.770.350.710.610.003sens2tech1kto3k0.000.730.560.040.200.430.020.000.110.350.850.860.510.000.740.770.360.120.274sens2techgt3k0.260.570.430.030.460.390.000.000.010.650.730.460.030.010.500.900.190.010.555sens2tech0.030.410.560.020.080.570.090.000.470.110.670.620.480.000.730.400.590.480.09bp differenceSpiralParliamentAssemblyForceMultibreakSVCortexPBHoneySpotsCSHLassemblyCGSVMetaSVParliamentPacBioParliamentPacBioForceParliamentAssemblyPBHoneyTailsCNVnatorBionanoHaploBioNanosnifflesCGCNV10X03345712131717204312318920743217484401proportion different