Database Citation in Full Text Biomedical Articles

Published 2013 in PLoS ONE

ABSTRACT

Molecular biology and literature databases represent essential infrastructure for life science research. Effective integration of these data resources requires that there are structured cross-references at the level of individual articles and biological records. Here, we describe the current patterns of how database entries are cited in research articles, based on analysis of the full text Open Access articles available from Europe PMC. Focusing on citation of entries in the European Nucleotide Archive (ENA), UniProt and Protein Data Bank, Europe (PDBe), we demonstrate that text mining doubles the number of structured annotations of database record citations supplied in journal articles by publishers. Many thousands of new literature-database relationships are found by text mining, since these relationships are also not present in the set of articles cited by database records. We recommend that structured annotation of database records in articles is extended to other databases, such as ArrayExpress and Pfam, entries from which are also cited widely in the literature. The very high precision and high-throughput of this text-mining pipeline makes this activity possible both accurately and at low cost, which will allow the development of new integrated data services.

PUBLICATION RECORD

Publication year
2013
Venue
PLoS ONE
Publication date
2013-05-29
Fields of study
Biology, Medicine, Computer Science
Identifiers
DOI 10.1371/journal.pone.0063184 PMID 23734176 PMCID 3667078
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE
2012cited by this paper
The Pfam protein families database
2011influential reference
Extraction of data deposition statements from the literature: a method for automatically tracking research results
2011cited by this paper
Annotating genes and genomes with DNA sequences extracted from biomedical articles
2011influential reference
UKPMC: a full text article resource for the life sciences
2010cited by this paper
Text processing through Web services: calling Whatizit
2008cited by this paper
BioLit: integrating biological literature with databases
2008cited by this paper
The Pfam protein families database
2007cited by this paper
ArrayExpress—a public repository for microarray gene expression data at the EBI
2004cited by this paper
InterPro-an integrated documentation resource for protein families, domains and functional sites
2000cited by this paper
NAR's new requirement for data submission to the EMBL data library: information for authors.
1988cited by this paper
The EMBL data library.
1988cited by this paper

CITED BY

Are Researchers Citing Their Data? A Case Study from The U.S. Geological Survey
2024cites this paper
Beyond blast: enabling microbiologists to better extract literature, taxonomic distributions and gene neighbourhood information for protein families
2024cites this paper
Beyond Blast: Enabling Microbiologists to Better Extract Literature, Taxonomic Distributions and Gene Neighborhood Information for Protein Families
2023cites this paper
Transcriptomics data availability and reusability in the transition from microarray to next-generation sequencing
2021cites this paper
Data set entity recognition based on distant supervision
2021cites this paper
Quantitative monitoring of nucleotide sequence data from genetic resources in context of their citation in the scientific literature
2021cites this paper
The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences
2020cites this paper
Assigning credit to scientific datasets using article citation networks
2020cites this paper
Aplicación de técnicas de NLP a la literatura científica biomédica: detección de bases de datos y análisis predictivo de la terminología MeSH
2020cites this paper
The History and Future of Data Citation in Practice
2019cites this paper
The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences
2019cites this paper
Fit for purpose? A metascientific analysis of metabolomics data in public repositories
2019cites this paper
How research data is cited in scholarly literature: A case study of HINTS
2019cites this paper
Citations to chemical databases in scholarly articles: to cite or not to cite?
2019cites this paper
U-Index, a dataset and an impact metric for informatics tools and databases
2018cites this paper
An Emperical Study of Clustering Algorithms to extract Knowledge from PubMed Articles
2017cites this paper
Europe PMC in 2017
2017cites this paper
Theory and practice of data citation
2017cites this paper
Biomedical Text Mining for Research Rigor and Integrity: Tasks, Challenges, Directions
2017cites this paper
Assessing and tracing the outcomes and impact of research infrastructures
2017cites this paper
Automatic Identification of Research Articles Containing Data Usage Statements
2017cites this paper
Retrieving GPCR data from public databases.
2016cites this paper
Identifying Scientific Project-generated Data Citation from Full-text Articles: An Investigation of TCGA Data Citation
2016cites this paper
Patterns of database citation in articles and patents indicate long-term scientific and industry value of biological data resources
2016cites this paper
Database citation in supplementary data linked to Europe PubMed Central full text biomedical articles
2015influential citation
Making data count
2015cites this paper
The BioStudies database
2015cites this paper
Citing a Data Repository: A Case Study of the Protein Data Bank
2015cites this paper
Extraction of database and software usage patterns from the bioinformatics literature
2015cites this paper
PubServer: literature searches by homology
2014cites this paper
Europe PMC: a full-text literature database for the life sciences and platform for innovation
2014cites this paper
Cell-Line Annotation on Europe PubMed Central
2014cites this paper
ArrayExpress update—simplifying data submissions
2014cites this paper
Mining locus tags in PubMed Central to improve microbial gene annotation
2014cites this paper