An approach for clustering gene expression data with error information

Published 2006 in BMC Bioinformatics

ABSTRACT

BackgroundClustering of gene expression patterns is a well-studied technique for elucidating trends across large numbers of transcripts and for identifying likely co-regulated genes. Even the best clustering methods, however, are unlikely to provide meaningful results if too much of the data is unreliable. With the maturation of microarray technology, a wealth of research on statistical analysis of gene expression data has encouraged researchers to consider error and uncertainty in their microarray experiments, so that experiments are being performed increasingly with repeat spots per gene per chip and with repeat experiments. One of the challenges is to incorporate the measurement error information into downstream analyses of gene expression data, such as traditional clustering techniques.ResultsIn this study, a clustering approach is presented which incorporates both gene expression values and error information about the expression measurements. Using repeat expression measurements, the error of each gene expression measurement in each experiment condition is estimated, and this measurement error information is incorporated directly into the clustering algorithm. The algorithm, CORE (Clustering Of Repeat Expression data), is presented and its performance is validated using statistical measures. By using error information about gene expression measurements, the clustering approach is less sensitive to noise in the underlying data and it is able to achieve more accurate clusterings. Results are described for both synthetic expression data as well as real gene expression data from Escherichia coli and Saccharomyces cerevisiae.ConclusionThe additional information provided by replicate gene expression measurements is a valuable asset in effective clustering. Gene expression profiles with high errors, as determined from repeat measurements, may be unreliable and may associate with different clusters, whereas gene expression profiles with low errors can be clustered with higher specificity. Results indicate that including error information from repeat gene expression measurements can lead to significant improvements in clustering accuracy.

PUBLICATION RECORD

Publication year
2006
Venue
BMC Bioinformatics
Publication date
2006-01-12
Fields of study
Biology, Medicine, Computer Science
Identifiers
DOI 10.1186/1471-2105-7-17 PMID 16409635 PMCID 1360687
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

A probabilistic framework for semi-supervised clustering
2004cited by this paper
Bayesian mixture model based clustering of replicated microarray data
2004cited by this paper
Concept Decompositions for Large Sparse Text Data Using Clustering
2004cited by this paper
Supervised cluster analysis for microarray data based on multivariate Gaussian mixture
2004cited by this paper
Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data
2003cited by this paper
Clustering gene-expression data with repeated measurements
2003cited by this paper
Global RNA half-life analysis in Escherichia coli reveals positional patterns of transcript degradation.
2003cited by this paper
Boosting for Tumor Classification with Gene Expression Data
2003cited by this paper
Cluster analysis of gene expression dynamics
2002cited by this paper
Theoretical and experimental comparisons of gene expression indexes for oligonucleotide arrays
2002cited by this paper
Significance and statistical errors in the analysis of DNA microarray data
2002cited by this paper
Gene expression profiling predicts clinical outcome of breast cancer
2002cited by this paper
Analysis of repeatability in spotted cDNA microarrays.
2002cited by this paper
Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation.
2002cited by this paper
Escherichia coli Gene Expression Responsive to Levels of the Response Regulator EvgA
2002cited by this paper
Identifying operons and untranslated regions of transcripts using Escherichia coli RNA expression analysis
2002cited by this paper
Judging the quality of gene expression-based clustering methods using gene annotation.
2002influential reference
Identifying and Quantifying Sources of Variation in Microarray Data Using High-Density cDNA Membrane Arrays
2002cited by this paper
Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data
2002cited by this paper
Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons.
2002cited by this paper
A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments
2002cited by this paper
Bayesian infinite mixture model based clustering of gene expression profiles
2002cited by this paper
Semi-supervised Clustering by Seeding
2002cited by this paper
The MetaCyc Database
2002cited by this paper
Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data
2002cited by this paper
Mercer kernel-based clustering in feature space
2002cited by this paper
STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS
2002cited by this paper
A mixture model-based approach to the clustering of microarray expression data
2002cited by this paper
Mixture modelling of gene expression data from microarray experiments
2002cited by this paper
Analysis of matched mRNA measurements from two different microarray technologies
2002cited by this paper
Inference from Clustering with Application to Gene-Expression Microarrays
2002cited by this paper
Constrained K-means Clustering with Background Knowledge
2001cited by this paper
Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection.
2001cited by this paper
A Model for Measurement Error for Gene Expression Arrays
2001cited by this paper
Integrated genomic and proteomic analyses of a systematically perturbed metabolic network.
2001cited by this paper
Spectral Relaxation for K-means Clustering
2001cited by this paper
Empirical Bayes Analysis of a Microarray Experiment
2001cited by this paper
Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects.
2001cited by this paper
Expression Data
2001influential reference
On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data
2001cited by this paper
Calculation of the minimum number of replicate spots required for detection of significant gene expression fold change in microarray experiments
2001cited by this paper
RegulonDB (version 3.2): transcriptional regulation and operon organization in Escherichia coli K-12
2001cited by this paper
Knowledge-based analysis of microarray gene expression data by using support vector machines.
2000cited by this paper
Gene Ontology: tool for the unification of biology
2000cited by this paper
RNA expression analysis using a 30 base pair resolution Escherichia coli genome array
2000cited by this paper
Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations.
2000cited by this paper
Systematic management and analysis of yeast gene expression data.
2000cited by this paper
Testing for Differentially-Expressed Genes by Maximum-Likelihood Analysis of Microarray Data
2000cited by this paper
Functional discovery via a compendium of expression profiles.
2000cited by this paper
Estimating the number of clusters in a dataset via the gap statistic
2000cited by this paper
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.
1999cited by this paper
An algorithm for clustering cDNAs for gene expression analysis
1999cited by this paper
Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation.
1999cited by this paper
Systematic determination of genetic network architecture
1999cited by this paper
Refining Initial Points for K-Means Clustering
1998cited by this paper
How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis
1998cited by this paper
Cluster analysis and display of genome-wide expression patterns.
1998cited by this paper
Detecting features in spatial point processes with clutter via model-based clustering
1998cited by this paper
An examination of procedures for determining the number of clusters in a data set
1994cited by this paper
Neural Networks for Pattern Recognition
1993cited by this paper
Monographs on statistics and applied probability
1990cited by this paper
A Study of the Comparability of External Criteria for Hierarchical Cluster Analysis.
1986cited by this paper
The barley chloroplast DNA atpBE, trnM2, and trnV1 loci.
1984cited by this paper
K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality
1984cited by this paper
Estimating the Dimension of a Model
1978cited by this paper
Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper
1977cited by this paper

CITED BY

UdN: A Bio-Inspired Data Network for Significant Pattern Extraction in Cognitive Internet of Things
2025cites this paper
Asymmetrical regression: a cognitively driven approach to advanced forecasting in cognitive internet of things
2025cites this paper
Retroductive reasoning: a data-driven intelligent method for knowledge discovery using cognitive IoT
2025cites this paper
Abductive hypothesis testing in cognitive IoT sensor network
2025cites this paper
Robust discovery of causal gene networks via measurement error estimation and correction
2023cites this paper
Model-Based Clustering with Measurement or Estimation Errors
2020cites this paper
Gaussian mixture modeling and model-based clustering under measurement inconsistency
2020cites this paper
Mining of high dimensional data using enhanced clustering approach
2018cites this paper
Statistical methods for estimation, testing, and clustering with gene expression data
2017cites this paper
Tradeoffs between Dense and Replicate Sampling Strategies for High-Throughput Time Series Experiments.
2016cites this paper
Shall We Dense? Comparing Design Strategies for Time Series Expression Experiments.
2016cites this paper
Optimization of Clustering Algorithms for Gene Expression Data Analysis using Distance Measures
2016cites this paper
A Linear Mixed Model Spline Framework for Analysing Time Course ‘Omics’ Data
2015cites this paper
Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies
2015cites this paper
Impact of pixel intensity correlations on statistical inferences of expression levels in cDNA microarray experiments
2015cites this paper
Interpolation based consensus clustering for gene expression time series
2015cites this paper
Clustering gene expression data with a penalized graph-based metric
2011cites this paper
Statistical Applications in Genetics and Molecular Biology Including Probe-Level Measurement Error in Robust Mixture Clustering of Replicated Microarray Gene Expression
2011cites this paper
A large-scale method to measure absolute protein phosphorylation stoichiometries
2011cites this paper
Visual Clustering Analysis of CIS Logs to Inform Creation of a User-configurable Web CIS Interface
2011cites this paper
[Identification, modeling and simulation of key pathways underlying certain cancers].
2011cites this paper
AP-Based Consensus Clustering for Gene Expression Time Series
2010cites this paper
Dynamics of dendritic cell maturation are identified through a novel filtering strategy applied to biological time-course microarray replicates
2010influential citation
Importance of replication in analyzing time-series gene expression data: Corticosteroid dynamics and circadian patterns in rat liver
2010cites this paper
Statistical inference from large-scale genomic data
2009cites this paper
Recovery Rate of Clustering Algorithms
2009cites this paper
Mining Projected Clusters in High-Dimensional Spaces
2009cites this paper
Clustering of Gene Expression Data Based on Shape Similarity
2009cites this paper
Visual data mining in intrinsic hierarchical complex biodata
2009cites this paper
Transcriptome for Photobiological Hydrogen Production Induced by Sulfur Deprivation in the Green Alga Chlamydomonas reinhardtii
2008cites this paper
Title Genome-scale cluster analysis of replicated microarrays usingshrinkage correlation coefficient
2008cites this paper
Bioinformatics Resources for the Study of Gene Regulation in Bacteria
2008cites this paper
Approximation Algorithms for Biclustering Problems
2008cites this paper
Partial mixture model for tight clustering of gene expression time-course
2008cites this paper
MINIREVIEW Bioinformatics Resources for the Study of Gene Regulation in Bacteria (cid:1)
2008cites this paper
An unsupervised conditional random fields approach for clustering gene expression time series
2008cites this paper
Quality Weighted Mean and T-test in Microarray Analysis Lead to Improved Accuracy in Gene Expression Measurements and Reduced Type I and II Errors in Differential Expression Detection.
2008cites this paper
Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient
2008cites this paper
Biomarker clustering to address correlations in proteomic data
2007cites this paper
DWT–CEM: an algorithm for scale-temporal clustering in fMRI
2007cites this paper
FEATURE MINING WITH COMPUTATIONAL INTELLIGENCE AND ITS APPLICATIONS IN IMAGE STEGANALYSIS AND BIOINFORMATICS
2007cites this paper
Interferon-Mediated Immunopathological Events Are Associated with Atypical Innate and Adaptive Immune Responses in Patients with Severe Acute Respiratory Syndrome
2007cites this paper
Unsupervised Clustering of Gene Expression Time Series with Conditional Random Fields
2007cites this paper
CLUSTERING A SERIES OF REPLICATED POLYPLOID GENE EXPRESSION EXPERIMENTS IN MAIZE
2006influential citation
Quality-based distance measures and applications to clustering
2006cites this paper
Approximation Algorithms for Bi-clustering Problems
2006cites this paper