The struggle to find reliable results in exome sequencing data: filtering out Mendelian errors

Zubin H. Patel,Leah C. Kottyan,Sara Lázaro,Marc S. Williams,D. Ledbetter,H. Tromp,Andrew M Rupert,Mojtaba Kohram,M. Wagner,Ammar Husami,Yaping Qian,C. Valencia,Kejian Zhang,M. Hostetter,J. Harley,K. Kaufman

Published 2014 in Frontiers in Genetics

ABSTRACT

Next Generation Sequencing studies generate a large quantity of genetic data in a relatively cost and time efficient manner and provide an unprecedented opportunity to identify candidate causative variants that lead to disease phenotypes. A challenge to these studies is the generation of sequencing artifacts by current technologies. To identify and characterize the properties that distinguish false positive variants from true variants, we sequenced a child and both parents (one trio) using DNA isolated from three sources (blood, buccal cells, and saliva). The trio strategy allowed us to identify variants in the proband that could not have been inherited from the parents (Mendelian errors) and would most likely indicate sequencing artifacts. Quality control measurements were examined and three measurements were found to identify the greatest number of Mendelian errors. These included read depth, genotype quality score, and alternate allele ratio. Filtering the variants on these measurements removed ~95% of the Mendelian errors while retaining 80% of the called variants. These filters were applied independently. After filtering, the concordance between identical samples isolated from different sources was 99.99% as compared to 87% before filtering. This high concordance suggests that different sources of DNA can be used in trio studies without affecting the ability to identify causative polymorphisms. To facilitate analysis of next generation sequencing data, we developed the Cincinnati Analytical Suite for Sequencing Informatics (CASSI) to store sequencing files, metadata (eg. relatedness information), file versioning, data filtering, variant annotation, and identify candidate causative polymorphisms that follow either de novo, rare recessive homozygous or compound heterozygous inheritance models. We conclude the data cleaning process improves the signal to noise ratio in terms of variants and facilitates the identification of candidate disease causative polymorphisms.

PUBLICATION RECORD

Publication year
2014
Venue
Frontiers in Genetics
Publication date
2014-02-12
Fields of study
Biology, Medicine
Identifiers
DOI 10.3389/fgene.2014.00016 PMID 24575121 PMCID 3921572
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing
2013cited by this paper
The role of replicates for error mitigation in next-generation sequencing
2013cited by this paper
Variant Callers for Next-Generation Sequencing Data: A Comparison Study
2013influential reference
Comparing a few SNP calling algorithms using low-coverage sequencing data
2013cited by this paper
Neurodegeneration: A repeat offense
2013influential reference
Deletion of KDM6A, a histone demethylase interacting with MLL2, in three patients with Kabuki syndrome.
2012influential reference
Limitations of the Human Reference Genome for Personalized Genomics
2012cited by this paper
De novo mutations revealed by whole-exome sequencing are strongly associated with autism
2012cited by this paper
Next Generation Sequence Analysis and Computational Genomics Using Graphical Pipeline Workflows
2012cited by this paper
Patterns and rates of exonic de novo mutations in autism spectrum disorders
2012influential reference
Exome sequencing reveals mutations in TRPV3 as a cause of Olmsted syndrome.
2012influential reference
Heterozygous missense mutations in SMARCA2 cause Nicolaides-Baraitser syndrome
2012influential reference
Novel comprehensive diagnostic strategy in Pitt–Hopkins syndrome: Clinical score and further delineation of the TCF4 mutational spectrum
2012cited by this paper
The UCSC Genome Browser database: extensions and updates 2013
2012cited by this paper
Dominant missense mutations in ABCC9 cause Cantú syndrome
2012influential reference
Clinical application of exome sequencing in undiagnosed genetic conditions
2012influential reference
CRB1 mutations in inherited retinal dystrophies
2012cited by this paper
Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations
2012cited by this paper
De novo germline and postzygotic mutations in AKT3, PIK3R2 and PIK3CA cause a spectrum of related megalencephaly syndromes
2012cited by this paper
Next-generation sequencing data interpretation: enhancing reproducibility and accessibility
2012cited by this paper
The Coffin–Siris syndrome: A proposed diagnostic approach and assessment of 15 overlapping cases
2012influential reference
De novo gene disruptions in children on the autistic spectrum.
2012influential reference
Multiplex Targeted Sequencing Identifies Recurrently Mutated Genes in Autism Spectrum Disorders
2012cited by this paper
Next-generation genetic testing for retinitis pigmentosa
2012cited by this paper
Mutations in SWI/SNF chromatin remodeling complex gene ARID1B cause Coffin-Siris syndrome
2012influential reference
Genetic defect in CYP24A1, the vitamin D 24-hydroxylase gene, in a patient with severe infantile hypercalcemia.
2012cited by this paper
Rare de novo variants associated with autism implicate a large functional network of genes involved in formation and function of synapses.
2011influential reference
Increased exonic de novo mutation rate in individuals with schizophrenia
2011cited by this paper
MLL2 mutation spectrum in 45 patients with Kabuki syndrome
2011influential reference
De novo nonsense mutations in ASXL1 cause Bohring-Opitz syndrome
2011cited by this paper
Applications of the pipeline environment for visual informatics and genomics computations
2011cited by this paper
Exome sequencing supports a de novo mutational paradigm for schizophrenia
2011influential reference
Mutation Screening of Multiple Genes in Spanish Patients with Autosomal Recessive Retinitis Pigmentosa by Targeted Resequencing
2011cited by this paper
Genotype and SNP calling from next-generation sequencing data
2011cited by this paper
A framework for variation discovery and genotyping using next-generation DNA sequencing data
2011cited by this paper
Disruption of the SCN2A and SCN3A genes in a patient with mental retardation, neurobehavioral and psychiatric abnormalities, and a history of infantile seizures
2011cited by this paper
ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data
2010cited by this paper
Sequence analysis Advance Access publication June 7, 2011 The variant call format and VCFtools
2010cited by this paper
Exome sequencing identifies WDR35 variants involved in Sensenbrenner syndrome.
2010influential reference
Reduced expression by SETBP1 haploinsufficiency causes developmental and expressive language delay indicating a phenotype distinct from Schinzel–Giedion syndrome
2010influential reference
The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.
2010influential reference
Neuroimaging Study Designs, Computational Analyses and Data Provenance Using the LONI Pipeline
2010cited by this paper
De novo mutations of SETBP1 cause Schinzel-Giedion syndrome
2010influential reference
A de novo paradigm for mental retardation
2010cited by this paper
Fast and accurate long-read alignment with Burrows–Wheeler transform
2010influential reference
The high fidelity and unique error signature of human DNA polymerase ε
2010cited by this paper
Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome
2010influential reference
High fidelity and lesion bypass capability of human DNA polymerase delta.
2009cited by this paper
The Sequence Alignment/Map format and SAMtools
2009cited by this paper
Exome sequencing identifies the cause of a Mendelian disorder
2009cited by this paper
The NCBI dbGaP database of genotypes and phenotypes
2007cited by this paper
Argonaute—a database for gene regulation by mammalian microRNAs
2005cited by this paper
Theoretical and practical advances in genome halving
2004cited by this paper
The LONI Pipeline Processing Environment
2003cited by this paper
True Pedigree Errors More Frequent Than Apparent Errors for Single Nucleotide Polymorphisms
1999cited by this paper

CITED BY

Next-generation Sequencing and Other Second Tier Tests in Newborn Screening for (X-linked) Agammaglobulinemia
2025influential citation
Exome sequencing of a Portuguese cohort of early-onset Alzheimer’s disease implicates the X-linked lysosomal gene GLA
2025cites this paper
Shotgun metagenomics reveals interkingdom association between intestinal bacteria and fungi involving competition for nutrients
2023cites this paper
Exome Sequencing of a Portuguese Cohort of Frontotemporal Dementia Patients: Looking Into the ALS-FTD Continuum
2022cites this paper
Genetic analysis reveals novel variants for vascular cognitive impairment
2022cites this paper
Rare variants in TP73 in a frontotemporal dementia cohort link this gene with primary progressive aphasia phenotypes
2022cites this paper
Whole-exome sequencing reveals PSEN1 and ATP7B combined variants as a possible cause of early-onset Lewy body dementia: a case study of genotype–phenotype correlation
2022cites this paper
High-throughput method for the hybridisation-based targeted enrichment of long genomic fragments for PacBio third-generation sequencing
2022cites this paper
Estimating sequencing error rates using families
2021cites this paper
Desmoplakin and periplakin genetically and functionally contribute to eosinophilic esophagitis
2021cites this paper
Eph and Ephrin Variants in Malaysian Neural Tube Defect Families
2021cites this paper
AutoMap is a high performance homozygosity mapping tool using next-generation sequencing data
2021cites this paper
Whole-exome sequencing of Finnish patients with vascular cognitive impairment
2020cites this paper
Parentage and relatedness reconstruction in Pinus sylvestris using genotyping-by-sequencing
2020cites this paper
CYLD variants in frontotemporal dementia associated with severe memory impairment in a Portuguese cohort.
2020cites this paper
Actionable Exomic Secondary Findings in 280 Lebanese Participants
2020influential citation
Saliva as a comparable-quality source of DNA for Whole Exome Sequencing on Ion platforms.
2020cites this paper
Identification of molecular-genetic causes for osteogenesis imperfecta, interdigital hyperplasia and ribosomopathies in cattle
2020cites this paper
Advancing Personalized Medicine Through the Application of Whole Exome Sequencing and Big Data Analytics
2019cites this paper
Exploring Infant Leukemia through Exome Sequencing and an In Vitro Model of Hematopoietic Development
2019cites this paper
Transcriptomic Analysis Links Eosinophilic Esophagitis and Atopic Dermatitis
2019cites this paper
Deaminase associated single nucleotide variants in blood and saliva-derived exomes from healthy subjects
2019cites this paper
Mendelian Inconsistent Signatures from 1314 Ancestrally Diverse Family Trios Distinguish Biological Variation from Sequencing Error
2019cites this paper
Accurate sequence variant genotyping in cattle using variation-aware genome graphs
2018cites this paper
Pathophysiology of Eosinophilic Esophagitis
2018cites this paper
Quality Control and Integration of Genotypes from Two Calling Pipelines for Whole Genome Sequence Data in the Alzheimer’s Disease Sequencing Project
2018cites this paper
Genetic variants at the 16p13 locus confer risk for eosinophilic esophagitis
2018cites this paper
De novo Mutations (DNMs) in Autism Spectrum Disorder (ASD): Pathway and Network Analysis
2018cites this paper
Using human sequencing to guide craniofacial research
2018cites this paper
Differential requirements of tubulin genes in mammalian forebrain development
2018cites this paper
A de novo missense mutation in TUBA1A results in reduced neural progenitor survival and differentiation
2017cites this paper
Big Data in Health: New Challenges and New Solutions in Data Management (A Lifecycle Review)
2017cites this paper
Genetics of eosinophilic esophagitis
2017cites this paper
Copb2 is essential for embryogenesis and hypomorphic mutations cause human microcephaly
2017cites this paper
Mendelian inheritance errors in whole genome sequenced trios are enriched in repeats and cluster within copy number losses
2017cites this paper
Food allergy and gastrointestinal disease Profound loss of esophageal tissue differentiation in patients with eosinophilic esophagitis
2017cites this paper
GENOTYPE QUALITY SCORE ALLOWS SPECIFIC DETECTION OF DE NOVO MUTATIONS IN NEXT- GENERATION SEQUENCING DATA
2016cites this paper
Identification of novel Parkinson’s disease genes in the South African population using a whole exome sequencing approach
2016cites this paper
Chapter 29 – Setting Up a Laboratory
2016cites this paper
Toward Pediatric Precision Medicine: Examples of Genomics-Based Stratification Strategies
2016cites this paper
A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree
2016cites this paper
A reference dataset of 5 . 4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree
2016cites this paper
Whole‐Genome Sequencing and Integrative Genomic Analysis Approach on Two 22q11.2 Deletion Syndrome Family Trios for Genotype to Phenotype Correlations
2015influential citation
Identification of Candidate Genes for Craniosynostosis
2015cites this paper
The foundation of precision medicine: integration of electronic health records with genomics through basic, clinical, and translational research
2015cites this paper
Clinical Impact and Cost-Effectiveness of Whole Exome Sequencing as a Diagnostic Tool: A Pediatric Center’s Experience
2015cites this paper
Whole Exome Sequencing Identifies Rare Protein‐Coding Variants in Behçet's Disease
2015cites this paper
Variations in brain DNA
2014cites this paper
Exome capture from saliva produces high quality genomic and metagenomic data
2014cites this paper
Neuropathy target esterase impairments cause Oliver–McFarlane and Laurence–Moon syndromes
2014cites this paper
Supplementary Issue: Classification, Predictive Modelling, and Statistical Analysis of Cancer Data (a)
year unknowncites this paper