EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering

Soohyun Lee,Chaehwa Seo,B. Alver,Sanghyuk Lee,P. Park

Published 2015 in BMC Bioinformatics

ABSTRACT

RNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost. We introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods. EMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar

PUBLICATION RECORD

Publication year
2015
Venue
BMC Bioinformatics
Publication date
2015-09-03
Fields of study
Biology, Medicine, Computer Science
Identifiers
DOI 10.1186/s12859-015-0704-z PMID 26335049 PMCID 4559005
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium
2014cited by this paper
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
2014cited by this paper
voom: precision weights unlock linear model analysis tools for RNA-seq read counts
2014cited by this paper
A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis
2013cited by this paper
Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms
2013cited by this paper
Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms
2013cited by this paper
Modelling and simulating generic RNA-Seq experiments with the flux simulator
2012cited by this paper
Streaming fragment assignment for real-time analysis of sequencing experiments
2012influential reference
Models for transcript quantification from RNA-Seq
2011cited by this paper
Differential expression analysis for sequence count data
2011cited by this paper
A strand-specific library preparation protocol for RNA sequencing.
2011cited by this paper
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
2011influential reference
Accurate Estimation of Expression Levels of Homologous Genes in RNA-seq Experiments
2010cited by this paper
Comprehensive comparative analysis of strand-specific RNA sequencing methods
2010cited by this paper
Estimation of alternative splicing isoform frequencies from RNA-Seq data
2010cited by this paper
Accurate quantification of transcriptome from RNA-Seq data by effective length normalization
2010influential reference
Genome sequence of the palaeopolyploid soybean
2010cited by this paper
Differential expression analysis for sequence count data
2010cited by this paper
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data
2009cited by this paper
Statistical inferences for isoform expression in RNA-Seq
2009cited by this paper
Transcriptome analysis by strand-specific sequencing of complementary DNA
2009cited by this paper
A Global View of Gene Activity and Alternative Splicing by Deep Sequencing of the Human Transcriptome
2008cited by this paper
Alternative Isoform Regulation in Human Tissue Transcriptomes
2008cited by this paper
Mapping and quantifying mammalian transcriptomes by RNA-Seq
2008cited by this paper
The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements
2006influential reference
Large scale real-time PCR validation on gene expression measurements from two commercial long-oligonucleotide microarrays
2006cited by this paper
BMC Bioinformatics
2006cited by this paper
Initial sequencing and analysis of the human genome
2001cited by this paper
Suffix arrays: a new method for on-line string searches
1993cited by this paper
Immunohistological localization of the adhesion molecules L1, N‐CAM, and MAG in the developing and adult optic nerve of mice
1989cited by this paper

CITED BY

Multidimensional landscape of non‐alcoholic fatty liver disease‐related disease spectrum uncovered by big omics data: Profiling evidence and new perspectives
2023cites this paper
Quantitative analysis of high‐throughput biological data
2023influential citation
Deconvolution of expression for nascent RNA-sequencing data (DENR) highlights pre-RNA isoform diversity in human cells
2021cites this paper
Differential Expression Analysis of RNA-seq Reads: Overview, Taxonomy, and Tools
2020cites this paper
A comprehensive overview of computational tools for RNA-seq analysis
2020cites this paper
Bioinformatics services for analyzing massive genomic datasets
2020cites this paper
Further confirmation of second- and third-generation Eimeria necatrix merozoite DEGs using suppression subtractive hybridization
2019cites this paper
Understanding the Role of the WRKY Gene Family under Stress Conditions in Pigeonpea (Cajanus Cajan L.)
2019influential citation
A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs
2018cites this paper
Event Analysis: Using Transcript Events To Improve Estimates of Abundance in RNA-seq Data
2018cites this paper
Par-eXpress: A Tool for Analysis of Sequencing Experiments With Ambiguous Assignment of Fragments in Parallel
2017cites this paper
The genetic basis and evolution of red blood cell sickling in deer
2017cites this paper
Bioinformatics Tools and Genomic Resources Available in Understanding the Structure and Function of Gossypium
2016cites this paper
Near-optimal probabilistic RNA-seq quantification
2016cites this paper
Fast and accurate quantification and differential analysis of transcriptomes
2016cites this paper
Bioinformatics - Updated Features and Applications
2016cites this paper
A comparison of genetically matched cell lines reveals the equivalence of human iPSCs and ESCs
2015cites this paper