Detecting Genomic Signatures of Natural Selection with Principal Component Analysis: Application to the 1000 Genomes Data

N. Duforet-Frebourg,Keurcien Luu,G. Laval,Eric Bazin,M. Blum

Published 2015 in Molecular biology and evolution

ABSTRACT

To characterize natural selection, various analytical methods for detecting candidate genomic regions have been developed. We propose to perform genome-wide scans of natural selection using principal component analysis (PCA). We show that the common FST index of genetic differentiation between populations can be viewed as the proportion of variance explained by the principal components. Considering the correlations between genetic variants and each principal component provides a conceptual framework to detect genetic variants involved in local adaptation without any prior definition of populations. To validate the PCA-based approach, we consider the 1000 Genomes data (phase 1) considering 850 individuals coming from Africa, Asia, and Europe. The number of genetic variants is of the order of 36 millions obtained with a low-coverage sequencing depth (3×). The correlations between genetic variation and each principal component provide well-known targets for positive selection (EDAR, SLC24A5, SLC45A2, DARC), and also new candidate genes (APPBPP2, TP1A1, RTTN, KCNMA, MYO5C) and noncoding RNAs. In addition to identifying genes involved in biological adaptation, we identify two biological pathways involved in polygenic adaptation that are related to the innate immune system (beta defensins) and to lipid metabolism (fatty acid omega oxidation). An additional analysis of European data shows that a genome scan based on PCA retrieves classical examples of local adaptation even when there are no well-defined populations. PCA-based statistics, implemented in the PCAdapt R package and the PCAdapt fast open-source software, retrieve well-known signals of human adaptation, which is encouraging for future whole-genome sequencing project, especially when defining populations is difficult.

PUBLICATION RECORD

Publication year
2015
Venue
Molecular biology and evolution
Publication date
2015-04-08
Fields of study
Biology, Medicine
Identifiers
DOI 10.1093/molbev/msv334 arXiv 1504.04543 PMID 26715629 PMCID 4776707
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

CONCEPTS

1000 genomes data (phase 1)
dataset, genomic dataset

The human whole-genome sequence dataset from the first phase of the 1000 Genomes Project used as the study's main application data.

Aliases: 1000 Genomes phase 1, 1000 Genomes Project phase 1

박진우 (dztg5apj7m) extractionB (s683577b42) reviewq (76h6bfydm6) review
fst index
statistic, measure

A population-differentiation statistic used here as a reference for explaining variance across principal components.

Aliases: FST, F_ST

박진우 (dztg5apj7m) extractionB (s683577b42) reviewq (76h6bfydm6) review
innate immune system
biological pathway

The host defense pathway category whose beta-defensin-related genes are discussed as a polygenic adaptation signal.

Aliases: innate immunity

박진우 (dztg5apj7m) extractionB (s683577b42) reviewq (76h6bfydm6) review
lipid metabolism
biological pathway

The metabolic pathway category whose fatty-acid omega-oxidation component is discussed as a polygenic adaptation signal.

Aliases: fatty acid omega oxidation

박진우 (dztg5apj7m) extractionB (s683577b42) reviewq (76h6bfydm6) review
local adaptation
evolutionary process, phenomenon

Adaptation driven by selection in specific environments or populations that leaves locus-specific genetic signals.

Aliases: local adaptation signals

박진우 (dztg5apj7m) extractionB (s683577b42) reviewq (76h6bfydm6) review
pcadapt
software, package

The R package and open-source software implementing the PCA-based statistics described in the abstract.

Aliases: PCAdapt R package, PCAdapt fast open-source software

박진우 (dztg5apj7m) extractionB (s683577b42) reviewq (76h6bfydm6) review
positive selection
evolutionary process

Selection that increases the frequency of advantageous genetic variants and can create detectable genomic signatures.

Aliases: selection

박진우 (dztg5apj7m) extractionB (s683577b42) reviewq (76h6bfydm6) review
principal component analysis
method

A multivariate method that summarizes genetic variation into orthogonal components used here to scan the genome.

Aliases: PCA

박진우 (dztg5apj7m) extractionB (s683577b42) reviewq (76h6bfydm6) review

REFERENCES

Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia.
2016cited by this paper
Fast principal components analysis reveals convergent evolution of ADH1B gene in Europe and East Asia
2015cited by this paper
Reliable Detection of Loci Responsible for Local Adaptation: Inference of a Null Model through Trimming the Distribution of FST*
2015cited by this paper
Widespread signals of convergent adaptation to high altitude in Asia and America
2014influential reference
Exploring the occurrence of classic selective sweeps in humans using whole-genome sequencing data sets.
2014cited by this paper
Direct evidence for positive selection of skin, hair, and eye pigmentation in Europeans during the last 5,000 y
2014cited by this paper
Human genomic regions with exceptionally high levels of population differentiation identified from 911 whole-genome sequences
2014cited by this paper
Evaluation of demographic history and neutral parameterization on the performance of FST outlier tests
2014cited by this paper
The African Genome Variation Project shapes medical genetics in Africa
2014cited by this paper
Genome Scans for Detecting Footprints of Local Adaptation Using a Bayesian Factor Model
2014influential reference
Pervasive selection or is it…? why are FST outliers sometimes so frequent?
2013cited by this paper
Evidence for polygenic adaptation to pathogens in the human genome.
2013influential reference
Detecting natural selection in genomic data.
2013cited by this paper
Identifying recent adaptations in large-scale genomic data.
2013cited by this paper
A model-based approach for analysis of spatial structure in genetic data
2012cited by this paper
An integrated map of genetic variation from 1,092 human genomes
2012cited by this paper
Gowinda: unbiased analysis of gene set enrichment for genome-wide association studies
2012cited by this paper
Evidence for Positive Selection on a Number of MicroRNA Regulatory Interactions during Recent Human Evolution
2012cited by this paper
Detecting Signatures of Selection Through Haplotype Differentiation Among Hierarchically Structured Populations
2012cited by this paper
Robust Identification of Local Adaptation from Allele Frequencies
2012cited by this paper
Role of KCNMA1 in Breast Cancer
2012cited by this paper
On the genetic interpretation of Between-Group PCA on SNP data
2012cited by this paper
Anisotropic Isolation by Distance: The Main Orientations of Human Genetic Differentiation
2012cited by this paper
Signatures of Environmental Genetic Adaptation Pinpoint Pathogens as the Main Selective Pressure through Human Evolution
2011cited by this paper
Genome wide association study identifies KCNMA1 contributing to human obesity
2011cited by this paper
Classic Selective Sweeps Were Rare in Recent Human Evolution
2011cited by this paper
Human adaptations to diet, subsistence, and ecoregion are due to subtle shifts in allele frequency
2010cited by this paper
Population differentiation as a test for selective sweeps.
2010cited by this paper
The ADH1B Arg47His polymorphism in East Asian populations and expansion of rice domestication in history
2010influential reference
Positive selection drives population differentiation in the skeletal genes in modern humans.
2010cited by this paper
Two simple approximations to the distributions of quadratic forms.
2010cited by this paper
Detecting Selection in Population Trees: The Lewontin and Krakauer Test Extended
2010influential reference
Likelihood-Free Inference of Population Structure and Local Adaptation in a Bayesian Hierarchical Model
2010cited by this paper
From evolutionary genetics to human immunology: how selection shapes host defence genes
2010cited by this paper
Genome-Wide Identification of Susceptibility Alleles for Viral Infections through a Population Genetics Approach
2010cited by this paper
Signals of recent positive selection in a worldwide sample of human populations.
2009cited by this paper
A Genealogical Interpretation of Principal Components Analysis
2009influential reference
Genetics in geographically structured populations: defining, estimating and interpreting FST
2009cited by this paper
Newly discovered breast cancer susceptibility loci on 3p24 and 17q23.2
2009cited by this paper
Natural selection has driven population differentiation in modern humans
2008influential reference
Adaptations to Climate in Candidate Genes for Common Metabolic Disorders
2008cited by this paper
Bayesian Variable Selection for Detecting Adaptive Genomic Differences Among Populations
2008cited by this paper
Gene Expression Levels Are a Target of Recent Natural Selection in the Human Genome
2008cited by this paper
Directional and balancing selection in human beta-defensins
2008cited by this paper
The signature of long-standing balancing selection at the human defensin β-1 promoter
2008cited by this paper
Genes mirror geography within Europe
2008cited by this paper
The Population Reference Sample, POPRES: a resource for population, disease, and pharmacological genetics research.
2008cited by this paper
A Genome-Scan Method to Identify Selected Loci Appropriate for Both Dominant and Codominant Markers: A Bayesian Perspective
2008cited by this paper
Modelling the Differential Pressure at Sieves with Artificial Neural Networks (Multilayer Perceptron) - a Contribution to Reactor Safety Research
2007cited by this paper
Localizing Recent Adaptive Evolution in the Human Genome
2007cited by this paper
Genome-wide detection and characterization of positive selection in human populations
2007cited by this paper
Evidence of positive selection on a class I ADH locus.
2007influential reference
Positive Natural Selection in the Human Lineage
2006cited by this paper
Population Structure and Eigenanalysis
2006cited by this paper
Molecular signatures of natural selection.
2005cited by this paper
simuPOP: a forward-time population genetics simulation environment
2005cited by this paper
Genomic regions exhibiting positive selection identified from dense genotype data.
2005cited by this paper
Genetic signatures of strong recent positive selection at the lactase gene.
2004cited by this paper
Principal Component Analysis
2003cited by this paper
The power and promise of population genomics: from genotyping to genome typing
2003cited by this paper
Generating samples under a Wright-Fisher neutral model of genetic variation
2002cited by this paper
Complex signatures of natural selection at the Duffy blood group locus.
2002cited by this paper
LAPACK Users' Guide
1995cited by this paper
Controlling the false discovery rate: a practical and powerful approach to multiple testing
1995cited by this paper
Loading and correlations in the interpretation of principle compenents
1995cited by this paper
Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data.
1992cited by this paper
ESTIMATING F‐STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE
1984cited by this paper
Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms.
1973cited by this paper

CITED BY

Weak but consistent genomic signals of intragenerational selection during estuarine migration in the European eel (Anguilla anguilla)
2026influential citation
Semi-supervised detection of natural selection with positive-unlabeled learning
2025cites this paper
Transforming serayu sediment into compost-amended medium for soybean cultivation
2025cites this paper
Boosting LLM Performance on Boolean Questions with Cluster-Based Embeddings
2025influential citation
Strong Environmental and Genome‐Wide Population Differentiation Underpins Adaptation and High Genomic Vulnerability in the Dominant Australian Kelp (Ecklonia radiata)
2025cites this paper
Epigenetic variation in light of population genetic practice
2025cites this paper
In situ high-resolution insights into the dynamics of arsenic (As) species and heavy metals across the sediment-water interface in a deep karst reservoir.
2025cites this paper
Targeted genotyping (90K-SPET) facilitates genome-wide association studies and the prediction of yield-related traits in faba bean (Vicia faba L.)
2025cites this paper
Linking the spatial and genomic structure of adaptive potential for conservation management: a review.
2024cites this paper
Variable parallelism in the genomic basis of age at maturity across spatial scales in Atlantic Salmon
2024cites this paper
Integration of estimated regional gene expression with neuroimaging and clinical phenotypes at biobank scale
2024cites this paper
Spatial genetic structure of two conifers in a highly human-modified landscape of central Mexico
2024influential citation
GenoSiS: A Biobank-Scale Genotype Similarity Search Architecture for Creating Dynamic Patient-Match Cohorts
2024cites this paper
Combined Use of Univariate and Multivariate Approaches to Detect Selection Signatures Associated with Milk or Meat Production in Cattle
2024cites this paper
Seascape Genomics of the Smooth Hammerhead Shark Sphyrna zygaena Reveals Regional Adaptive Clinal Variation
2024cites this paper
Prognostic value and immune landscapes of disulfidptosis‑related lncRNAs in bladder cancer
2024cites this paper
Estimating scale-specific and localized spatial patterns in allele frequency
2024cites this paper
Inference on the proportion of variance explained in principal component analysis
2024cites this paper
Ancient genomes reveal insights into ritual life at Chichén Itzá
2024cites this paper
Investigating difficulties and enhancing understanding in linear algebra: Leveraging SageMath and ChatGPT for (orthogonal) diagonalization and singular value decomposition.
2023cites this paper
Genomics and conservation: Guidance from training to analyses and applications
2023cites this paper
Prognostic value and immune landscapes of immunogenic cell death-associated lncRNAs in lung adenocarcinoma
2023influential citation
Individual‐based landscape genomics for conservation: An analysis pipeline
2023cites this paper
Community genetics of the key plant species Carex gayana in high Andean wetlands and conservation implications
2023cites this paper
Zooplankton community and copepod carcasses and non‐predatory mortality in six tropical estuarine systems (Northeast of Brazil)
2023cites this paper
Recent natural selection conferred protection against schizophrenia by non-antagonistic pleiotropy
2023cites this paper
Clustering Methods for Vibro-Acoustic Sensing Features as a Potential Approach to Tissue Characterisation in Robot-Assisted Interventions
2023cites this paper
Prognostic value and immune landscapes of immunogenic cell death-related lncRNAs in hepatocellular carcinoma
2023cites this paper
A paralleled embedding high-dimensional Bayesian optimization with additive Gaussian kernels for solving CNOP
2023cites this paper
A genome-wide segmentation approach for the detection of selection footprints
2023cites this paper
Unraveling coevolutionary dynamics using ecological genomics.
2022cites this paper
Auto-Pattern Recognition for Diagnosis in Benign Paroxysmal Positional Vertigo Using Principal Component Analysis: A Preliminary Study
2022cites this paper
Genetics, Landscape
2022cites this paper
Population genetic structure of the maize weevil, Sitophilus zeamais, in southern Mexico
2022cites this paper
Phylogeny and disparate selection signatures suggest two genetically independent domestication events in pea (Pisum L.)
2022cites this paper
Redundancy analysis, genome-wide association studies, and the pigmentation of brown trout (Salmo trutta L.).
2022cites this paper
Ecological and genomic vulnerability to climate change across native populations of Robusta coffee (Coffea canephora)
2022cites this paper
Impact of in Situ Simulated Climate Change on Communities and Non-Indigenous Species: Two Climates, Two Responses
2022cites this paper
Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated
2022cites this paper
Genome Instability-Associated Long Non-Coding RNAs Reveal Biomarkers for Glioma Immunotherapy and Prognosis
2022cites this paper
Transcriptome‐based analyses of adaptive divergence between two closely related spruce species on the Qinghai–Tibet plateau and adjacent regions
2022cites this paper
A novel pyroptosis-related lncRNA prognostic signature associated with the immune microenvironment in lung squamous cell carcinoma
2022influential citation
A geometric relationship of F2, F3 and F4-statistics with principal component analysis
2022cites this paper
Deciphering signatures of natural selection via deep learning
2021influential citation
EigenGWAS: An online visualizing and interactive application for detecting genomic signatures of natural selection
2021cites this paper
Small freshwater ecosystems with dissimilar microbial communities exhibit similar temporal patterns
2021cites this paper
Genome Resequencing Reveals Rapid, Repeated Evolution in the Colorado Potato Beetle
2021cites this paper
Intraspecific niche partition without speciation: individual level web polymorphism within a single island spider population
2021cites this paper
Network-based analysis of allele frequency distribution among multiple populations identifies adaptive genomic structural variants
2021cites this paper
Similarity-Based Analysis of Allele Frequency Distribution among Multiple Populations Identifies Adaptive Genomic Structural Variants
2021cites this paper
Statistics, machine learning and deep learning for population genetic inference
2021cites this paper
Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations
2021cites this paper
Machine Learning Application in Genomic, Exercise, and Vital Datasets
2021cites this paper
Detecting Selection in Multiple Populations by Modeling Ancestral Admixture Components
2021influential citation
Genomic and niche divergence in an Amazonian palm species complex
2021cites this paper
The larval environment strongly influences the bacterial communities of Aedes triseriatus and Aedes japonicus (Diptera: Culicidae)
2021cites this paper
An overview of current population genomics methods for the analysis of whole‐genome resequencing data in eukaryotes
2021cites this paper
Why most Principal Component Analyses (PCA) in population genetic studies are wrong
2021cites this paper
The genomes of ancient date palms germinated from 2,000 y old seeds
2021cites this paper
Skim-Sequencing Based Genotyping Reveals Genetic Divergence of the Wild and Domesticated Population of Black Tiger Shrimp (Penaeus monodon) in the Indo-Pacific Region
2020cites this paper
Genome scans for selection and introgression based on k‐nearest neighbour techniques
2020cites this paper
Application of Correspondence Analysis in Exploring the Statistical Characteristics of Uterine Fibroids and Age
2020cites this paper
Non-predatory mortality of planktonic copepods in a reef area influenced by estuarine plume.
2020cites this paper
The Domestication Makeup: Evolution, Survival, and Challenges
2020cites this paper
Cytokine network analysis of immune responses before and after autologous dendritic cell and tumor cell vaccine immunotherapies in a randomized trial
2020cites this paper
Evolutionary Physiology and Genomics in the Highly Adaptable Killifish (Fundulus heteroclitus).
2020cites this paper
A Journey From Improper Gaussian Signaling to Asymmetric Signaling
2020cites this paper
Remotely sensed plant traits can provide insights into ecosystem impacts of plant invasions: a case study covering two functionally different invaders
2020cites this paper
A spectral theory for Wright’s inbreeding coefficients and related quantities
2020cites this paper
LEA 3: Factor models in population genetics and ecological genomics with R
2020cites this paper
Exploring Population Structure with Admixture Models and Principal Component Analysis.
2020cites this paper
Adaptive introgression from maize has facilitated the establishment of teosinte as a noxious weed in Europe
2020cites this paper
PCA via joint graph Laplacian and sparse constraint: Identification of differentially expressed genes and sample clustering on gene expression data
2019cites this paper
Genomic regions of speciation and adaptation among three species of grouse
2019cites this paper
Sun et al.’s study led to the underperformance of EigenGWAS
2019cites this paper
Supervised Discriminative Sparse PCA for Com-Characteristic Gene Selection and Tumor Classification on Multiview Biological Data
2019cites this paper
Epistatic mutations under divergent selection govern phenotypic variation in the crow hybrid zone
2019cites this paper
Temperature accounts for the biodiversity of a hyperdiverse group of insects in urban Los Angeles
2019cites this paper
Plant and soil microfaunal biodiversity across the borders between arable and forest ecosystems in a Mediterranean landscape
2019cites this paper
Point‐Combination Transect (PCT): Incorporation of small underwater cameras to study fish communities
2019cites this paper
Next‐generation phylogeography of the cockle Cerastoderma glaucum: Highly heterogeneous genetic differentiation in a lagoon species
2019cites this paper
Génomique de la spéciation chez le Grand Corégone (Coregonus clupeaformis) : caractérisation des bases génomiques associées à la différenciation phénotypique
2019cites this paper
Ecosystem hero and villain: Native frog consumes rice pests, while the invasive cane toad feasts on beneficial arthropods
2019cites this paper
Modulation of microbial growth and enzymatic activities in the marine environment due to exposure to organic contaminants of emerging concern and hydrocarbons.
2019cites this paper
Human Adaptations to Temporally and Spatially Variable Environments
2019cites this paper
Evolutionary transcriptomics reveals the origins of olives and the genomic changes associated with their domestication
2019influential citation
Microbial utilization of simple carbon substrates in boreal peat soils at low temperatures
2019cites this paper
Genetic architecture and adaptations of Nunavik Inuit
2019influential citation
Looking for Local Adaptation: Convergent Microevolution in Aleppo Pine (Pinus halepensis)
2019cites this paper
Genome Scans for Selection and Introgression based on k-nearest Neighbor Techniques
2019cites this paper
A Method to Calibrate the Carbon Dioxide (Chemical) Stimuli of Pneumatic Esthesiometer Externally
2019cites this paper
Taxonomic and functional β-diversity of ants along tree plantation chronosequences differ between contrasting biomes
2019cites this paper
Population Genomics of the Neotropical Brown Stink Bug, Euschistus heros: The Most Important Emerging Insect Pest to Soybean in Brazil
2019cites this paper
Genetic risk scores to predict the prognosis of chronic heart failure patients in Chinese Han
2019cites this paper
ImaGene: a convolutional neural network to quantify natural selection from genomic data
2019cites this paper
Precocious egg development in wild Calliphora vicina (Diptera: Calliphoridae) - An issue of relevance in forensic entomology?
2019cites this paper
Population Genomics of an Anadromous Hilsa Shad Tenualosa ilisha Species across Its Diverse Migratory Habitats: Discrimination by Fine-Scale Local Adaptation
2019cites this paper
Association Genetics and Local Adaptation of Populus trichocarpa Torr. & Gray
2019cites this paper
Ohana: detecting selection in multiple populations by modelling ancestral admixture components
2019cites this paper
Comparative Genome-Wide Survey of Single Nucleotide Variation Uncovers the Genetic Diversity and Potential Biomedical Applications among Six Macaca Species
2018cites this paper