Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2

Published 2020 in Genes

ABSTRACT

Gene duplication is a major mechanism for the evolution of gene novelty, and copy-number variation makes a major contribution to inter-individual genetic diversity. However, most approaches for studying copy-number variation rely upon uniquely mapping reads to a genome reference and are unable to distinguish among duplicated sequences. Specialized approaches to interrogate specific paralogs are comparatively slow and have a high degree of computational complexity, limiting their effective application to emerging population-scale data sets. We present QuicK-mer2, a self-contained, mapping-free approach that enables the rapid construction of paralog-specific copy-number maps from short-read sequence data. This approach is based on the tabulation of unique k-mer sequences from short-read data sets, and is able to analyze a 20X coverage human genome in approximately 20 min. We applied our approach to newly released sequence data from the 1000 Genomes Project, constructed paralog-specific copy-number maps from 2457 unrelated individuals, and uncovered copy-number variation of paralogous genes. We identify nine genes where none of the analyzed samples have a copy number of two, 92 genes where the majority of samples have a copy number other than two, and describe rare copy number variation effecting multiple genes at the APOBEC3 locus.

PUBLICATION RECORD

Publication year
2020
Venue
Genes
Publication date
2020-01-29
Fields of study
Biology, Medicine, Computer Science
Identifiers
DOI 10.3390/genes11020141 PMID 32013076 PMCID 7073954
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Adaptive archaic introgression of copy number variants and the discovery of previously unknown human genes
2019cited by this paper
Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology
2019cited by this paper
Development of Copy Number Variation Detection Algorithms and Their Application to Genome Diversity Studies
2019cited by this paper
Paralog buffering contributes to the variable essentiality of genes in cancer cell lines
2019cited by this paper
Comparison of village dog and wolf genomes highlights the role of the neural crest in dog domestication
2018influential reference
Human-Specific NOTCH2NL Genes Affect Notch Signaling and Cortical Neurogenesis.
2018cited by this paper
Long-read sequence and assembly of segmental duplications
2018cited by this paper
Frequent nonallelic gene conversion on the human lineage and its effect on the divergence of gene duplicates
2017cited by this paper
The birth of a human-specific neural gene by incomplete duplication and gene fusion
2017cited by this paper
Evaluation of three read-depth based CNV detection tools using whole-exome sequencing data
2017cited by this paper
The evolution and population diversity of human-specific segmental duplications
2017cited by this paper
Resolving Multicopy Duplications de novo Using Polyploid Phasing
2017cited by this paper
An evaluation of copy number variation detection tools for cancer using whole exome sequencing data
2017cited by this paper
Gene conversion and linkage: effects on genome evolution and speciation
2017cited by this paper
Evaluation of somatic copy number estimation tools for whole-exome sequencing data
2016cited by this paper
The APOBEC Protein Family: United by Structure, Divergent in Function.
2016cited by this paper
A Single Nucleotide Polymorphism in Human APOBEC3C Enhances Restriction of Lentiviruses
2016cited by this paper
Near-optimal probabilistic RNA-seq quantification
2016cited by this paper
Y-Chromosome Structural Diversity in the Bonobo and Chimpanzee Lineages
2016cited by this paper
Discovery of unfixed endogenous retrovirus insertions in diverse human populations
2016cited by this paper
Haplotyping germline and cancer genomes using high-throughput linked-read sequencing
2015cited by this paper
The 1000 Genomes Project: Welcome to a New World
2015cited by this paper
High level of inbreeding in final phase of 1000 Genomes Project
2015cited by this paper
Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping
2015cited by this paper
Global diversity, population stratification, and selection of human copy-number variation
2015cited by this paper
Large multi-allelic copy number variations in humans
2015cited by this paper
A global reference for human genetic variation
2015cited by this paper
Resolving the complexity of the human genome using single-molecule sequencing
2014cited by this paper
Whole-genome duplication in teleost fishes and its evolutionary consequences
2014cited by this paper
Natural Polymorphisms in Human APOBEC3H and HIV-1 Vif Combine in Primary T Lymphocytes to Affect Viral G-to-A Mutation Levels and Infectivity
2014cited by this paper
RNA-Skim: a rapid method for RNA-Seq quantification at transcript level
2014cited by this paper
Mutation identification by direct comparison of whole-genome sequencing data from mutant and wild-type individuals using k-mers
2013cited by this paper
Evolution and diversity of copy number variation in the great ape lineage
2013cited by this paper
Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser
2013cited by this paper
Accelerating read mapping with FastHASH
2013cited by this paper
Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives
2013cited by this paper
Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms
2013cited by this paper
The Database of Genomic Variants: a curated collection of structural variation in the human genome
2013cited by this paper
Recessive cancer genes engage in negative genetic interactions with their functional paralogs.
2013cited by this paper
Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms
2013cited by this paper
Gene duplication as a mechanism of genomic adaptation to a changing environment
2012cited by this paper
Evolution of human-specific neural SRGAP2 genes by incomplete segmental duplication.
2012cited by this paper
An integrated map of genetic variation from 1,092 human genomes
2012cited by this paper
A high-resolution integrated map of copy number polymorphisms within and between breeds of the modern domesticated dog
2011cited by this paper
Genome structural variation discovery and genotyping
2011cited by this paper
De novo assembly and genotyping of variants using colored de Bruijn graphs
2011cited by this paper
A fast, lock-free approach for efficient parallel counting of occurrences of k-mers
2011cited by this paper
Discovery and genotyping of genome structural polymorphism by sequencing on a population scale
2011cited by this paper
Assembly algorithms for next-generation sequencing data.
2010cited by this paper
Analysis of copy number variations among diverse cattle breeds.
2010cited by this paper
Origins and functional impact of copy number variation in the human genome
2010cited by this paper
Frequent gene conversion events between the X and Y homologous chromosomal regions in primates
2010cited by this paper
mrsFAST: a cache-oblivious algorithm for short-read mapping
2010cited by this paper
A Gibbs sampling strategy applied to the mapping of ambiguous short-sequence tags
2010cited by this paper
Diversity of Human Copy Number Variation and Multicopy Genes
2010influential reference
Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions
2010cited by this paper
Personalized Copy-Number and Segmental Duplication Maps using Next-Generation Sequencing
2009cited by this paper
The cancer genome
2009cited by this paper
Combinatorial Algorithms for Structural Variation Detection in High Throughput Sequenced Genomes
2009cited by this paper
The Sequence Alignment/Map format and SAMtools
2009cited by this paper
Distinguishing among evolutionary models for the maintenance of gene duplicates.
2009cited by this paper
Mouse segmental duplication and copy number variation
2008cited by this paper
Integrated detection and population-genetic analysis of SNPs and copy number variation
2008cited by this paper
Population Stratification of a Common APOBEC Gene Deletion Polymorphism
2007cited by this paper
Gene Family Evolution across 12 Drosophila Genomes
2007cited by this paper
Gene conversion: mechanisms, evolution and human disease
2007cited by this paper
Gene duplication: a drive for phenotypic diversity and cause of human disease.
2007cited by this paper
Proceedings of the SMBE Tri-National Young Investigators' Workshop 2005. What is the role of genome duplication in the evolution of complexity and diversity?
2006cited by this paper
Role and Mechanism of Action of the APOBEC3 Family of Antiretroviral Resistance Factors
2006cited by this paper
A genome-wide comparison of recent chimpanzee and human segmental duplications
2005cited by this paper
Duplication and divergence: the evolution of new genes and old ideas.
2004cited by this paper
Annotating large genomes with exact word matches.
2003cited by this paper
Abundant gene conversion between arms of palindromes in human and ape Y chromosomes
2003cited by this paper
Recent Segmental Duplications in the Human Genome
2002cited by this paper
BIOINFORMATICS APPLICATIONS NOTE
2001cited by this paper
A whole-genome assembly of Drosophila.
2000cited by this paper
The evolutionary fate and consequences of duplicate genes.
2000cited by this paper
Gene duplications and the origins of vertebrate development.
1994cited by this paper
UDP-glucuronosyltransferases: a family of detoxifying enzymes.
1990cited by this paper
Evolution by Gene Duplication
1970cited by this paper

CITED BY

Whole-Genome Sequencing Reveals Individual and Cohort Level Insights into Chromosome 9p Syndromes
2025cites this paper
m6A‐mRNA Reader YTHDF2 Identified as a Potential Risk Gene in Autism With Disproportionate Megalencephaly
2025cites this paper
LCPan: efficient variation graph construction using Locally Consistent Parsing.
2025cites this paper
Whole-genome sequencing reveals individual and cohort level insights into chromosome 9p syndromes
2025cites this paper
Segmental duplication-mediated rearrangements alter the landscape of mouse genomes
2025cites this paper
CHALLENGER: Detecting Copy Number Variants in Challenging Regions Using Whole Genome Sequencing Data
2025cites this paper
Human-specific gene expansions contribute to brain evolution.
2025cites this paper
CNPI: Rapid Analyses of Human Copy Number Data.
2025cites this paper
The association between salivary amylase gene copy number and enzyme activity with type 2 diabetes status
2025cites this paper
Human-specific gene expansions contribute to brain evolution
2024cites this paper
Duplications and Retrogenes Are Numerous and Widespread in Modern Canine Genomic Assemblies
2024cites this paper
Genome sequencing of 2000 canids by the Dog10K consortium advances the understanding of demography, genome function and architecture
2023influential citation
GeneToCN: an alignment-free method for gene copy number estimation directly from next-generation sequencing reads
2023cites this paper
LRRC37B is a human modifier of voltage-gated sodium channels and axon excitability in cortical neurons
2023cites this paper
Robust and accurate estimation of paralog-specific copy number for duplicated genes using whole-genome sequencing
2022cites this paper
A Subphenotype-to-Genotype Approach Reveals Disproportionate Megalencephaly Autism Risk Genes
2022cites this paper
Special Issue: A Tale of Genes and Genomes
2021cites this paper
From karyotypes to precision genomics in 9p deletion and duplication syndromes
2021cites this paper
Coding and noncoding variants in EBF3 are involved in HADDS and simplex autism
2021cites this paper
Sensitive alignment using paralogous sequence variants improves long-read mapping and variant calling in segmental duplications
2020cites this paper
Demonstrating the utility of flexible sequence queries against indexed short reads with FlexTyper
2020cites this paper
Diverse Molecular Mechanisms Contribute to Differential Expression of Human Duplicated Genes
2020influential citation
Loop 1 of APOBEC3C regulates its antiviral activity against HIV-1
2020cites this paper
De Novo Mutation in an Enhancer of EBF3 in simplex autism
2020cites this paper
Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes
2020cites this paper
Adaptive archaic introgression of copy number variants and the discovery of previously unknown human genes
2019cites this paper
Gene expansions contributing to human brain evolution
year unknowncites this paper