Robust Behrens–Fisher Statistic Based on Trimmed Means and Its Usefulness in Analyzing High-Throughput Data

G. Kang,Sedigheh Mirzaei,Hui Zhang,Liang Zhu,S. Rai,D. Srivastava

Published 2022 in Frontiers in Systems Biology

ABSTRACT

In the context of high-throughput data, the differences in continuous markers between two groups are usually assessed by ordering the p-values obtained from the two-sample pooled t-test or Wilcoxon–Mann–Whitney test and choosing a stringent cutoff such as 10–8 to control the family-wise error rate ( F W E R ) or false discovery rate ( F D R ) . All markers with p-values below the cutoff are declared to be significantly associated with the phenotype. This inherently assumes that the test procedure provides valid type I error estimates in extreme tails of the null distribution. The aforementioned tests assume homoscedasticity in the two groups, and the t-test further assumes underlying distributions to be normally distributed. Cao et al. (Biometrika, 2013, 100, 495–502) have shown that in the context of multiple hypotheses testing the approach based on F D R may not be valid under non-normality and/or heteroscedasticity. Therefore, having a test statistic that is robust to these violations is needed. In this study, we propose a robust analog of Behrens–Fisher statistic based on trimmed means, conduct an extensive simulation study to compare its performance with other competing approaches, and demonstrate its usefulness by applying it to DNA methylation data used by Teschendorff et al. (Genome Res., 2010, 20, 440–446). An R program to implement the proposed method is provided in the Supplementary Material.

PUBLICATION RECORD

Publication year
2022
Venue
Frontiers in Systems Biology
Publication date
2022-06-02
Fields of study
Not labeled
Identifiers
DOI 10.3389/fsysb.2022.877601
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

LESS VULNERABLE CONFIDENCE AND SIGNIFICANCE PROCEDURES FOR LOCATION BASED ON A SINGLE SAMPLE : TRIMMING/WINSORIZATION 1
2016cited by this paper
The optimal power puzzle: scrutiny of the monotone likelihood ratio assumption in multiple testing.
2013influential reference
Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer.
2010cited by this paper
Performance of five two-sample location tests for skewed distributions with unequal variances.
2009cited by this paper
Large‐scale multiple testing under dependence
2009cited by this paper
Assumption adequacy averaging as a concept for developing more robust methods for differential gene expression analysis
2009cited by this paper
A studentized permutation test for the non-parametric Behrens-Fisher problem
2007cited by this paper
Multiple Comparison Procedures
2005cited by this paper
A direct approach to false discovery rates
2002cited by this paper
Sample size calculations in clinical research.
2002cited by this paper
THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY
2001cited by this paper
The Nonparametric Behrens‐Fisher Problem: Asymptotic Theory and a Small‐Sample Approximation
2000cited by this paper
Asymptotic Distribution of P Values in Composite Null Models
2000cited by this paper
Studentized permutation tests for non-i.i.d. hypotheses and the generalized Behrens-Fisher problem
1997cited by this paper
Coefficients of lee-gurland two-sample test on normal means
1995influential reference
Assessing the significance of difference between two quick estimates of location
1992cited by this paper
A construction and appraisal of pooled trimmed-t statistics
1991cited by this paper
Experimental Designs
1990cited by this paper
Robust statistics: the approach based on influence functions
1986cited by this paper
Robust Statistics—The Approach Based on Influence Functions
1986cited by this paper
Robust Rank Procedures for the Behrens-Fisher Problem
1981cited by this paper
Studentizing Robust Estimates.
1975cited by this paper
Size and Power of Tests for Equality of Means of Two Normal Populations with Unequal Variances
1975cited by this paper
The two-sample trimmed t for unequal population variances
1974cited by this paper
On a Comparison of Means of Two Normal Samples
1968cited by this paper
An Analysis of Variance Test for Normality (Complete Samples)
1965cited by this paper
Selected Papers in Statistics and Probability.
1955cited by this paper
NON-NORMALITY AND TESTS ON VARIANCES
1953cited by this paper
Use of Ranks in One-Criterion Variance Analysis
1952cited by this paper
An examination and further development of a formula arising in the problem of comparing two mean values.
1948cited by this paper
The generalisation of student's problems when several different population variances are involved.
1947influential reference
An approximate distribution of estimates of variance components.
1946cited by this paper
THE SIGNIFICANCE OF THE DIFFERENCE BETWEEN TWO MEANS WHEN THE POPULATION VARIANCES ARE UNEQUAL
1938cited by this paper

CITED BY

No citing papers are available for this paper.