CD-HIT: accelerated for clustering the next-generation sequencing data

Limin Fu,Beifang Niu,Zhengwei Zhu,Sitao Wu,Weizhong Li

Published 2012 in Bioinform.

ABSTRACT

Summary: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ∼24 cores and a quasi-linear speedup for up to ∼8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions. Availability: http://cd-hit.org. Contact: liwz@sdsc.edu Supplementary information: Supplementary data are available at Bioinformatics online.

PUBLICATION RECORD

Publication year
2012
Venue
Bioinform.
Publication date
2012-10-11
Fields of study
Biology, Medicine, Computer Science
Identifiers
DOI 10.1093/bioinformatics/bts565 PMID 23060610 PMCID 3516142
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

A human gut microbial gene catalogue established by metagenomic sequencing
2010influential reference
Community cyberinfrastructure for Advanced Microbial Ecology Research and Analysis: the CAMERA resource
2010cited by this paper
A core gut microbiome in obese and lean twins
2008cited by this paper
Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering
2008cited by this paper
Predicting disulfide bond connectivity in proteins by correlated mutations analysis
2008cited by this paper
UniRef: comprehensive and non-redundant UniProt reference clusters
2007cited by this paper
Unique folding of precursor microRNAs: quantitative evidence and implications for de novo identification.
2006cited by this paper
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences
2006cited by this paper
Clustering of highly homologous sequences to reduce the size of large protein databases
2001cited by this paper
Open Access Research Article Artificial and Natural Duplicates in Pyrosequencing Reads of Metagenomic Data
year unknowncited by this paper

CITED BY

Microbial traits mediate the functional components of soil organic carbon under long-term fertilization
2026cites this paper
Regulatory role of miR319a and functional validation of its target gene ArMYB33 in leaf color change of Acer rubrum.
2026cites this paper
Multi-omics reveals terpenoid biosynthesis and antioxidant activity at maturity stage in Zanthoxylum armatum DC. fruits
2026cites this paper
Diversity and potential environmental risks of DNA viruses on international ships' ballast water at Shanghai port, China.
2026cites this paper
Biological Sequence Clustering: A Survey
2026cites this paper
Coastal wetland converted to uplands reduce the diversity of soil nitrogen-related functional communities
2026cites this paper
Gut microbial signatures in schizophrenia: exploring archaea, fungi, and bacteria
2026influential citation
Independent origins of spicules reconcile paleontological and molecular evidence of sponge evolutionary history
2026cites this paper
Anaerobic oxidation of methane supports a minimal microbial community in a Subsurface Biofilm at Ginsburg Mud Volcano
2026cites this paper
Soil seed banks, microbial communities, and nutrients in degraded grasslands of the Qinghai-Tibet Plateau: implications for restoration
2026cites this paper
Dualistic MADS-box evolution forged legume diversity post-WGD
2026influential citation
Effects of earthworms on soil virus-associated ARGs and resistance phenotypes in long-term field cropping systems.
2026cites this paper
A comprehensive catalogue of receptor-binding domains in extracellular contractile injection systems
2026cites this paper
Application of organic and inorganic acids in activating soil phosphorus in agriculture
2026cites this paper
Haplotype-resolved T2T gap-free genomes of the winegrape cultivar Cabernet Sauvignon.
2026cites this paper
Integrating 16S rRNA identification for a promising epitope-based vaccine strategy against Bacillus licheniformis infections causing foodborne illness.
2026influential citation
Effects of copper- and iron-based Fenton-like treatments on ammonia emissions during sewage sludge composting: Microbial mechanisms under oxidative environments
2026cites this paper
An insight into the ovary and midgut transcriptome of Dermacentor nitens tick.
2026cites this paper
Target-site mutations and non-target-site detoxification confer ALS-inhibitor resistance in Bromus japonicus populations in China.
2026cites this paper
Mining a vibriophage depolymerase for enhanced pathogen control in aquaculture
2026cites this paper
Prevalence of antibiotic resistance gene in different wastewater treatment systems and effluent-irrigated soils through metagenomic analysis
2026cites this paper
Multi-omics reveals wastewater sludge bacteria with genomic potential to degrade poly(ethylene) terephthalate.
2026cites this paper
Strategic variations in sarbecovirus and merbecovirus Nsp1 linker regions for translation inhibition
2026cites this paper
Gut microbial and functional signatures in breast cancer: an integrated metagenomic and machine learning approach to non-invasive detection
2026cites this paper
An anti-CRISPR targets the sgRNA to block Cas9 and guides the design of enhanced genome editors
2026cites this paper
Exploring the Diversity and Metabolic Potential of CO2 fixation Mediated by RubisCO in Prokaryotes in the Japan Collection of Microorganisms.
2026cites this paper
Generating Dynamic Structures Through Physics-Based Sampling of Predicted Inter-Residue Geometries.
2026cites this paper
Comparative biomineralization of arsenite and arsenate driven by sulfate reduction in landfills
2026influential citation
Exposure to aged polypropylene nurdle leachates disrupts photosymbiosis in a kleptoplastic unicellular eukaryote.
2026cites this paper
HMRPred: A Machine Learning-Based Web Resource for Identification of Heavy Metal Resistance Proteins.
2026cites this paper
Towards General Protein Structure Representation Learning With a Protein Size Prompt
2026cites this paper
Improved multimodal protein language model-driven universal biomolecules-binding protein design with EiRA
2026cites this paper
Microbial genomic strategies control the soil iron-phosphorus nexus in successive rotations of Chinese fir plantation
2026cites this paper
Metatranscriptomics uncovers diet-driven structural, ecological, and functional adaptations in the rumen microbiome linked to feed efficiency
2026influential citation
Nitrite Stress Shapes Ecological Shifts from Generalists to Specialists in Denitrifying Phosphorus Removal Wastewater Treatment Systems
2026cites this paper
Diversity of ligand-gated ion channels in free-living and parasitic copepods (Crustacea).
2026cites this paper
Rumen microbiota inoculation indicates collaborative mechanisms enhancing propionate supply to alleviate weaning stress in lambs
2026influential citation
Seasonal shifts in vegetation, soil properties, and microbial communities in Western Himalayan forests
2026influential citation
Structural comparison between human and Leishmania infantum Sirtuin 2 NAD-dependent histone deacetylases.
2026cites this paper
The impact of rumen and hindgut microbiomes on the persistent productivity of long-lived dairy cows
2026influential citation
Neural network–based approach for improving the evaluation of antibody–antigen docking poses
2026influential citation
Impact of Tubifex tubifex supplementation on gut microbiota and health in Chinese sturgeon larvae during dietary transition
2026cites this paper
Multi-omics joint analysis of biofilm formation in kefir-derived mixed strains
2026cites this paper
Intercropping Tea Plants with Ophiopogon japonicus Alters Root Exudate Metabolites and Restructures Rhizosphere Microbiota to Promote Plant Growth
2026cites this paper
Sequencing Ultraconserved Elements (UCEs) for Marine Population Genomics: A Proof‐of‐Concept Using a Deep‐Sea Mussel Species
2026influential citation
Adaptation of Fe-S Cluster Assembly to Rising O2 Levels over Geological Time
2026cites this paper
Mechanisms and thermodynamics underlying pressure-enhanced methanogenesis in production of renewable natural gas with carbon sequestration in coal seams
2026cites this paper
Spatio-temporal dispersal patterns of SARS-CoV-2 in the Chinese mainland following the COVID-19 response adjustment.
2026cites this paper
Somatic deficiency of the human E3 ubiquitin ligase CBL in leukocytes impairs B cell but not T cell development and function
2026cites this paper
Spatiotemporal transmission mechanisms of resistance genes in the Chishui River: Perspectives from environmental drivers and microbial interactions.
2026cites this paper
Metatranscriptomics reveals system-specific viral adaptive strategies and prokaryotic defense trade-offs across anaerobic digestion systems.
2026cites this paper
Fecal metagenome and plasma metabolome analyses reveal changes in gut microbiota composition and plasma metabolites in rats with abemaciclib-induced diarrhea.
2026cites this paper
Simultaneous sulfide oxidation and sulfate reduction for intracellular redox homeostasis under highly acidic conditions
2026influential citation
On the use and misuse of pangenome and related terms
2026cites this paper
Occurrence, fate and correlation analysis of antibiotics and antibiotic resistance genes (ARGs) in rural domestic wastewater treatment facilities.
2026cites this paper
Integrative transcriptomic and physiological assessment of nanoencapsulated carvacrol and thymol oil as an antioxidant in thermal-stressed white shrimp (Litopenaeus vannamei)
2026cites this paper
BiGKbhb: a bi-directional gated recurrent unit model for predicting lysine β-hydroxybutyrylation sites
2026cites this paper
The role of the gut microbiome in the regulation of high‐altitude adaptation
2026cites this paper
Metagenomic insights into the trophic gradient influence on nitrogen cycling microbiomes in plateau lakes.
2026cites this paper
Lactobacillus delbrueckii surface protein P4430 attenuates intestinal inflammation by modulating macrophage polarization via Mincle.
2026cites this paper
Integrative chromosome-scale genome analysis of cupuassu provides insights into witches' broom disease resistance and expands genomic resources for Theobroma.
2026cites this paper
The combined application of chemical and microbial fertilizers enhanced microbial diversity and improved soil fertility in the peanut rhizosphere within a sugarcane-peanut intercropping system
2026cites this paper
Nitrogen fixation in Arctic coastal waters (Qeqertarsuaq, West Greenland): influence of glacial melt on diazotrophs, nutrient availability, and seasonal blooms
2026cites this paper
Naturally occurring variation in gene-associated transposable elements impacts gene expression and phenotypic diversity in woodland strawberry
2026influential citation
Unveiling the lignocellulose-degrading potential of a novel Talaromyces endophyticus through enzymatic hydrolysis and transcriptomic analysis
2026cites this paper
Nano zero-valent iron enhances medium-chain fatty acids production from wine cellar sludge: Insights into metabolic pathways and interspecies cooperation
2026cites this paper
Uncovering biogenic methane and vertical stratified cycling of elements in the Zhoushan offshore area
2026cites this paper
The food digestion strategies of three wild cold-water adult fishes
2026cites this paper
Microbial metabolism-mediated transformation of dissolved organic matter: Linking antecedent dry-weather degradation to first-flush effects in urban wet-weather overflows
2026cites this paper
Artificial sweeteners and antibiotics enhance the dissemination and diversity of antibiotic resistance genes in a soil-water continuum
2026cites this paper
Bacterial protein function prediction via multimodal deep learning
2026cites this paper
PLysPTM-HGNN: predicting lysine PTM sites of proteins using hybrid graph neural networks
2026cites this paper
The horse gut microbiota genome represents a vast novel reservoir of CAZymes.
2026cites this paper
Reprogramming AraC-type transcriptional factor to respond to new ligands 1,3-propanediol and 1,4-butanediol via directed evolution.
2026cites this paper
Classification of virulence factors based on dual-channel neural networks with pre-trained language models
2026cites this paper
Core microbiota recruited by healthy grapevines enhance resistance against root rot disease
2026cites this paper
Comparative transcriptome analysis of developmental stages and characterization of core RNAi-related genes in the spittlebug Mahanarva fimbriolata
2026influential citation
Effects of pesticides on soil microbial community structure and nitrogen transformation in tobacco fields affected by root rot
2026cites this paper
Acetochlor and sulfamethoxazole co-selection alter soil microbial nitrogen metabolism and resistome in agroecosystem.
2026cites this paper
Metagenomic and metabolomic analyses of rumen fiber digestion in Mongolian cattle fed fresh grass versus hay
2026cites this paper
Root-associated protein prediction using a protein large language model and hypergraph convolutional networks
2026influential citation
Metagenomic and metabolomic insights into kombucha fermentation with Stevia rebaudiana as a substrate
2026cites this paper
Unique ecological functions of viral communities potentially influence microbial adaptability in deep-sea ferromanganese nodule deposits.
2026cites this paper
Unveiling potential Helicobacter pylori vaccine candidates: A comprehensive multi-epitope approach
2026cites this paper
The association between history of appendectomy and gut microbiota composition: a follow-up cross-sectional study
2026cites this paper
The microbial mechanism of the impact of grazing degradation on the soil multifunctionality of the hummock wetland on the riverscape
2026cites this paper
Fungal diversity and composition in Pinus sylvestris needles are influenced by host genotype and seed orchard location
2026cites this paper
Coccidiosis prevention strategies shape the microbiome, resistome and mobilome composition in the broiler gut
2026cites this paper
Diversity, transfer potential, and transcriptional activity of virus‐carried antibiotic resistance genes in global estuaries
2026cites this paper
Can fungal degradation replace conventional biological processes for treatment of highly acidic and saline preserved fruit processing wastewater by virtue of Candida, Pichia and Saccharomyces?
2026cites this paper
Optimized library preparation, sequencing, and data analysis protocols for the generation of orbivirus consensus sequences
2026influential citation
Core Transcriptional Plasticity Pave the Way for Fish to Succeed in a High‐CO2 World
2026influential citation
Paracoccus jiaweipingae sp. nov. and Paracoccus zhouxuedongae sp. nov., isolated from blowhole swab samples of Yangtze finless porpoise (Neophocaena asiaeorientalis asiaeorientalis).
2026cites this paper
Metagenomic insights into the influence of soil habitat on rhizosphere microbial function and element cycling in ephemeral plants
2026cites this paper
Soil salinization alters biogeochemical cycles in agricultural ecosystems by reducing carbon-cycling microorganisms.
2026cites this paper
Intercropping reshapes soil stress resistance and growth promotion capabilities through rhizosphere exudates in conjunction with the microbiome
2026cites this paper
Biodegradable microplastics as mediators of iron‑carbon decoupling: Enhanced soil carbon mineralization in wetland
2026cites this paper
A multi-Omic resource for exploring microbial eukaryotes in the meromictic freshwater Lake Pavin
2026cites this paper
Quantitative Full-length transcriptome analysis by nanopore sequencing with Error-Aware UMI mapping
2026cites this paper
Long-term fertilization shaped soil organic matter molecular diversity via microbial functional regulation in paddy soil
2026cites this paper