The plant transcriptome—from integrating observations to models

Published 2013 in Frontiers in Plant Science

ABSTRACT

Transcriptomes as assessed by either microarrays or next-generation sequencing have produced a hitherto unprecedented data flood regarding transcript identity and levels in plant systems. Microarray data has been extensively used over the last 15 years or so and evaluation of the data thus produced has progressed well beyond early statistically quality evaluation and descriptive lists to a mature science whereby gene networks and cascades have been able to provide mechanistic insight. The development of sensitive quantitative PCR for lowly expressed genes such as transcription factors has additionally allowed another layer of complexity to be accessed and the modeling of transcription factor expression with that of target genes has met considerable success. Yet more recently, data emanating from RNAseq studies have greatly improved the coverage of transcript profiling. That said, this technology further compounded transcriptome analysis by making it possible to identify differentially spliced transcripts etc. In this research topic we would like to provide an “on the fly” portrait of the use of either microarray or RNAseq based datasets in contemporary Plant Systems Biology. Given the relative simplicity of doing so, much information has been gleaned from microarray datasets by assuming guilt-by-association. The success of this approach is summarized by articles of Provart (2012) and Tohge and Fernie (2012), as are recent studies that go beyond transcription and link in physiological and metabolic aspects. As in the legal process from which the approach lifts its name it is important to note that suspects obtained this way require “fair trial” since assuming “guilt” is fraught with dangers as summarized in Usadel et al. (2009a). Thus, Tohge and Fernie extend the use of the co-expression approach for the annotation of assumed gene function and discuss bringing in further experimental “evidence” as provided by metabolomics, proteomics, or physiological measurements (Tohge et al., 2005; De Boldt et al., 2012). They then delve further into the subject by explaining how to make a more solid case by linking gene functions across multiple species (Mutwil et al., 2011; Obayashi et al., 2011). The review by Provart (2012) also reviews novel aspects of visualized correlations, however, pays more attention to marrying these data with subcellular localization and tissue/organ specific networks such as those defined by SeedNet (Kohl et al., 2011) and the overlay of such networks with those derived from protein-protein interaction studies (Geisler-Lee et al., 2007). Junker et al. (2012a) follow a similar direction extending on ideas put forward in their recent Trends in Biotechnology review (Junker et al., 2012b) here focusing their attention on visual analysis of the transcriptome. They provide an overview of plant transcriptomics repositories and detail how these can serve as useful resources for visualization programs such as HIVE as well as detailing how the color-coded output from such programs can be integrated with known biological networks using analysis of floral homeotic gene expression patterns and seed expression profiles as exemplary case studies. They further discuss information visualization standards as suggested by Card et al. (1999) and the eFP browser (Winter et al., 2007). Friedel et al. (2012) and Grene et al. (2012) follow a similar approach whereby they re-analyse data using both visualization and network techniques both interested in abiotic conditions. Whereas Friedel uses network approaches and functional categories to investigate stress responses, Grene focuses on winter hardening in spruce. Interestingly Grene et al. (2012) is able to show a reprogramming of the cell wall and nucleotide sugar metabolism using MapMan (Usadel et al., 2009b) and GO ontologies. However, when it comes to data analysis of whole genome expression datasets, particularly those obtained from complex temporally and/or spatially resolved experiments visualization helps in finding “the meaning within the noise.” Thus, currently the researcher typically zooms in on a particular subset of the data which excites their biological curiosity, often obtaining such data from public repositories such as genevestigator (https://www.genevestigator.com/gv). But much information and potentially knowledge is untapped by adopting this approach. This leaves one wondering if aided by modern biostatistics and bioinformatics one shouldn't be able to do better. To improve this situation Klie et al. (2012) present a computational solution wherein recent extension of the principal component analysis variants STATIS and dual-STATIS (Lavit et al., 1994; Abdi et al., 2012) is applied to study the time resolved response of Arabidopsis thaliana to perturbations in the prevailing light and/or temperature conditions. This proof-of-concept study illustrates that these tools can clearly aid in dataset-wide analyses and furthermore that they can specify the extent to which either the transcript levels or alternatively the experimental treatments reflect these perturbations thus providing biological insight across the entire datasets obtained. As is evident from the multitude of manuscripts dealing with microarray data, there is still much to be learned from these data sets. However, time moves on and whilst it seems difficult to teach old dogs new omics tricks, RNAseq is slowly becoming more and more popular. Already machine learning techniques are trickling in to help separating noise from the data. Thus, Thieme et al. (2012) try to find the proverbial needle in the haystack by identifying Argonaute sorting signals for miRNAs. Whilst mutual information didn't indicate any other than the 5′ position to dictate which of the 10 Argonaute proteins is processing which miRNA, Thieme solve the problem of having only four possible 5′ bases for 10 different proteins, by showing that other positions likely play a role as well. Such analyses are assuming, however, that one actually knows which transcripts to deal with. But one of the perceived beauties of RNAseq is that one could learn about the transcriptome on the fly whilst analysing the data by assembling the reads into transcripts. This seems, however, an ambituous goal and thus in their article Schliesky et al. (2012) address the question RNAseq assembly—are we there yet? They review plant applications of 454/Roche and Illumina sequencing which have in combination, to date, already been used to assess the transcriptome of over 50 plant species. Although they argue these approaches have been useful in downstream applications such as proteomics (Lopez-Casado et al., 2012) and the same can be argued for their recent use to augment recent genome sequencing efforts (Tomato Genome Consortium, 2012), assemblies may well not accurately reflect the actual plant transcriptomes, especially if not checked well. In order to ameliorate challenges for the transciptome assembly problem they provide a list of quality control parameters and the necessary scripts to produce them most likely providing an invaluable resource for this burgeoning area of transcriptomes and bringing the old idea of genomeless genomics (Rudd, 2005) within the reach of even the smallest labs. Rose et al. (2012) then round up the uses of RNAseq by providing both insights into how RNAseq has already benefited the plant communityand detailed examples where genomeless genomics was used. Extending beyond this, they show that RNAseq is also valuable in finding small non-coding RNA highlighting the manner demonstrated in the Thieme et al. (2012) article. In addition they demonstrate how important RNAseq can be for bulk segregant analysis and thus the identification of causal mutations. Alongside these illustrations they additionally provide the wet bench biologist with comprehensive workflows on how the RNA should be processed for these varied applications. Finally, in his article Kliebenstein (2012), tries to answer the other burning questions of RNA-seq—How deep does deep-sequencing need to go to capture the majority of network or genomic information present in a variety of transciptomics experiments? To address this question he applied Shannon entropy analysis to existing Arabidopsis transcriptomics data namely a co-expression network, an expression QTL analysis and a temporal analysis of the circadian clock. Intriguingly, he came to the conclusion that at least 80% of the information present in a transcriptomic study is likely obtainable by measuring only the top 10% of the transcripts within a sample. This, rather surprising, finding has important consequences for experimental design particularly with concern to the scale and affordability of large-scale studies.

PUBLICATION RECORD

Publication year
2013
Venue
Frontiers in Plant Science
Publication date
2013-03-11
Fields of study
Biology, Medicine, Computer Science, Environmental Science
Identifiers
DOI 10.3389/fpls.2013.00048 PMID 23483867 PMCID 3593623
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Catalyzing plant science research with RNA-seq
2013cited by this paper
Visual Analysis of Transcriptome Data in the Context of Anatomical Structures and Biological Networks
2012cited by this paper
Correlation networks visualization
2012influential reference
Compromise of Multiple Time-Resolved Transcriptomics Experiments Identifies Tightly Regulated Functions
2012cited by this paper
Mining and visualization of microarray and metabolomic data reveal extensive cell wall remodeling during winter hardening in Sitka spruce (Picea sitchensis)
2012cited by this paper
CORNET 2.0: integrating plant coexpression, protein-protein interactions, regulatory interactions, gene associations and functional annotations.
2012cited by this paper
RNA-Seq Assembly – Are We There Yet?
2012cited by this paper
Enabling proteomic studies with RNA‐Seq: The proteome of tomato pollen as a test case
2012cited by this paper
The tomato genome sequence provides insights into fleshy fruit evolution
2012cited by this paper
Wiring diagrams in biology: towards the standardized representation of biological information.
2012cited by this paper
STATIS and DISTATIS: optimum multitable principal component analysis and three way metric multidimensional scaling
2012cited by this paper
Co-expression and co-responses: within and beyond transcription
2012cited by this paper
Reverse Engineering: A Key Component of Systems Biology to Unravel Global Abiotic Stress Cross-Talk
2012cited by this paper
Give It AGO: The Search for miRNA-Argonaute Sorting Signals in Arabidopsis thaliana Indicates a Relevance of Sequence Positions Other than the 5′-Position Alone
2012cited by this paper
Exploring the Shallow End; Estimating Information Content in Transcriptomics Studies
2012cited by this paper
Cytoscape: software for visualization and analysis of biological networks.
2011cited by this paper
PlaNet: Combined Sequence and Expression Comparisons across Plant Networks Derived from Seven Species[W][OA]
2011cited by this paper
ATTED-II Updates: Condition-Specific Gene Coexpression to Extend Coexpression Analyses and Applications to a Broad Range of Flowering Plants
2011cited by this paper
Co-expression tools for plant biology: opportunities for hypothesis generation and caveats.
2009cited by this paper
A guide to using MapMan to visualize and compare Omics data in plants: a case study in the crop species, Maize.
2009cited by this paper
An “Electronic Fluorescent Pictograph” Browser for Exploring and Analyzing Large-Scale Biological Data Sets
2007cited by this paper
A Predicted Interactome for Arabidopsis1[C][W][OA]
2007cited by this paper
Engineered allosteric ribozymes that respond to specific divalent metal ions
2005cited by this paper
Functional genomics by integrated analysis of metabolome and transcriptome of Arabidopsis plants over-expressing an MYB transcription factor.
2005cited by this paper
openSputnik—a database to ESTablish comparative plant genomics using unsaturated sequence collections
2004cited by this paper
Readings in information visualization - using vision to think
1999cited by this paper
Reverse Engineering
1994cited by this paper

CITED BY

Role of phytohormones in regulating cold stress tolerance: Physiological and molecular approaches for developing cold-smart crop plants
2023cites this paper
RDBMS and NOSQL Based Hybrid Technology for Transcriptome Data Structuring and Processing
2020cites this paper
Step-by-Step Construction of Gene Co-expression Networks from High-Throughput Arabidopsis RNA Sequencing Data.
2018cites this paper
Editorial: an emerging view of plant cell walls as an apoplastic intelligent system.
2015cites this paper
Plant Systems Biology: Insights and Advancements
2015cites this paper
Ściana komórki roślinnej - struktura z przyszłością
2015cites this paper
A Comprehensive Analysis of the Transcriptomes of Marssonina brunnea and Infected Poplar Leaves to Capture Vital Events in Host-Pathogen Interactions
2015cites this paper
Descriptive vs. mechanistic network models in plant development in the post-genomic era.
2015cites this paper
Tomato fruit quality improvement facing the functional genomics revolution
2015cites this paper
Key Applications of Plant Metabolic Engineering
2014cites this paper
Transcriptomics of Desiccation Tolerance in the Streptophyte Green Alga Klebsormidium Reveal a Land Plant-Like Defense Reaction
2014cites this paper
Neue molekularbiologische und bioinformatische Methoden in der Unkrautforschung
2014cites this paper