Knowledge models from PDF textbooks

Published 2021 in New Rev. Hypermedia Multim.

ABSTRACT

ABSTRACT Textbooks are educational documents created, structured and formatted by domain experts with the primary purpose to explain the knowledge in the domain to a novice. Authors use their understanding of the domain when structuring and formatting the content of a textbook to facilitate this explanation. As a result, the formatting and structural elements of textbooks carry the elements of domain knowledge implicitly encoded by their authors. Our paper presents an extensible approach towards automated extraction of knowledge models from textbooks and enrichment of their content with additional links (both internal and external). The textbooks themselves essentially become hypertext documents where individual pages are annotated with important concepts in the domain. The evaluation experiments examine several aspects and stages of the approach, including the accuracy of model extraction, the pragmatic quality of extracted models using one of their possible applications— semantic linking of textbooks in the same domain, the accuracy of linking models to external knowledge sources and the effect of integration of multiple textbooks from the same domain. The results indicate high accuracy of model extraction on symbolic, syntactic and structural levels across textbooks and domains, and demonstrate the added value of the extracted models on the semantic level.

PUBLICATION RECORD

Publication year
2021
Venue
New Rev. Hypermedia Multim.
Publication date
2021-02-28
Fields of study
Computer Science, Education
Identifiers
DOI 10.1080/13614568.2021.1889692
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Modern Mathematical Statistics with Applications
2021cited by this paper
The Diachronic Spanish Sonnet Corpus (DISCO): TEI and Linked Open Data Encoding, Data Distribution and Metrical Findings
2020cited by this paper
Transformation of PDF Textbooks into Intelligent Educational Resources
2020cited by this paper
Interlingua: Linking Textbooks Across Different Languages
2019cited by this paper
C-HTS: A Concept-based Hierarchical Text Segmentation approach
2018cited by this paper
A Candidate Generation Algorithm for Named Entities Disambiguation Using DBpedia
2018cited by this paper
Extraction of Relevant Resources and Questions from DBpedia to Automatically Generate Quizzes on Specific Domains
2018cited by this paper
Using RDFa to Link Text and Dictionary Data for Medieval French
2018cited by this paper
A Benchmark and Evaluation for Text Extraction from PDF
2017cited by this paper
A comparison of Named-Entity Disambiguation and Word Sense Disambiguation
2016cited by this paper
Using TEI for textbook research
2016cited by this paper
AlgorithmSeer: A System for Extracting and Searching for Algorithms in Scholarly Big Data
2016cited by this paper
STAT 319 : Probability & Statistics for Engineers & Scientists Term 152 ( 1 ) Final Exam Wednesday 11 / 05 / 2016 8 : 00 – 10 : 30 AM
2016influential reference
CERMINE: automatic extraction of structured metadata from scientific literature
2015cited by this paper
Statistics for Scientists and Engineers: Shanmugam/Statistics for Scientists and Engineers
2015cited by this paper
Measuring statistical evidence using relative belief
2015cited by this paper
A hybrid approach to discover semantic hierarchical sections in scholarly documents
2015cited by this paper
A Comparative Survey of DBpedia , Freebase , OpenCyc , Wikidata , and YAGO
2015cited by this paper
A Logic-Based Approach to Named-Entity Disambiguation in the Web of Data
2015cited by this paper
Concept Hierarchy Extraction from Textbooks
2015cited by this paper
Path-Based Semantic Relatedness on Linked Data and Its Use to Word and Entity Disambiguation
2015cited by this paper
Named Entity Corpus Construction using Wikipedia and DBpedia Ontology
2014cited by this paper
Automatic Generation of the Domain Module from Electronic Textbooks: Method and Validation
2014cited by this paper
AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data
2014cited by this paper
Wikidata
2014cited by this paper
The Characteristics of Well-Designed Science Textbooks
2014cited by this paper
CiteSeerX: AI in a Digital Library Search Engine
2014cited by this paper
Entity Linking meets Word Sense Disambiguation: a Unified Approach
2014cited by this paper
Searching online book documents and analyzing book citations
2013cited by this paper
ICDAR 2013 Competition on Book Structure Extraction
2013cited by this paper
Automatic Detection of Pseudocodes in Scholarly Documents Using Machine Learning
2013cited by this paper
Automatic extraction of glossary terms from natural language requirements
2013cited by this paper
When One Textbook Is Not Enough: Linking Multiple Textbooks Using Probabilistic Topic Models
2013cited by this paper
Large-scale linked data integration using probabilistic reasoning and crowdsourcing
2013cited by this paper
Discovery hub: on-the-fly linked data exploratory search
2013cited by this paper
Extraction of References Using Layout and Formatting Information from Scientific Articles
2013cited by this paper
Table of Contents Recognition and Extraction for Heterogeneous Book Documents
2013cited by this paper
A Real-time Heuristic-based Unsupervised Method for Name Disambiguation in Digital Libraries
2013cited by this paper
An Introduction to Information Retrieval
2013influential reference
KORE: keyphrase overlap relatedness for entity disambiguation
2012influential reference
Layout-aware text extraction from full-text PDF of scientific articles
2012cited by this paper
BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network
2012cited by this paper
Statistics and Probability Theory: In Pursuit of Engineering Decision Support
2012cited by this paper
Statistics and Probability Theory
2012cited by this paper
Challenges in generating bookmarks from TOC entries in e-books
2012cited by this paper
Domain-Aware Ontology Matching
2012cited by this paper
Adaptation "in the Wild": Ontology-Based Personalization of Open-Corpus Learning Material
2012cited by this paper
TeamBeam - Meta-Data Extraction from Scientific Literature
2012cited by this paper
Setting up a competition framework for the evaluation of structure extraction from OCR-ed books
2011cited by this paper
Structure extraction from PDF-based book documents
2011cited by this paper
Mathematical Formula Identification in PDF Documents
2011cited by this paper
Statistics for Non-Statisticians
2011cited by this paper
A Concise Guide to Statistics
2011cited by this paper
DBpedia spotlight: shedding light on the web of documents
2011influential reference
A Generative Entity-Mention Model for Linking Entities with Knowledge Base
2011cited by this paper
A Table Detection Method for Multipage PDF Documents via Visual Seperators and Tabular Structures
2011cited by this paper
Extracting compound terms from domain corpora
2010cited by this paper
Semantic Wonder Cloud: Exploratory Search in DBpedia
2010influential reference
Ranking the Linked Data: The Case of DBpedia
2010influential reference
Table of contents recognition for converting PDF documents in e-book formats
2010cited by this paper
Fast and Accurate Annotation of Short Texts with Wikipedia Pages
2010cited by this paper
Association for the Advancement of Artificial Intelligence
2010cited by this paper
Object-level document analysis of PDF files
2009cited by this paper
A Linear Grammar Approach to Mathematical Formula Recognition from PDF
2009cited by this paper
DBpedia - A crystallization point for the Web of Data
2009cited by this paper
Media Meets Semantic Web - How the BBC Uses DBpedia and Linked Data to Make Connections
2009influential reference
On tables of contents and how to recognize them
2009cited by this paper
Basic Concepts of Probability and Statistics in the Law
2009cited by this paper
Comprehensive Global Typography Extraction System for Electronic Book Documents
2008cited by this paper
Sequential Decisions based on Algorithmic Probability
2008cited by this paper
ParsCit: an Open-source CRF Reference String Parsing Package
2008cited by this paper
Topic indexing with Wikipedia
2008cited by this paper
Learning to link with wikipedia
2008cited by this paper
A Modern Introduction to Probability and Statistics: Understanding Why and How
2007influential reference
A Modern Introduction to Probability and Statistics
2005influential reference
Recognition and Classification of Figures in PDF Documents
2005cited by this paper
Retrieving Hierarchical Text Structure from Typeset Scientific Articles – a Prerequisite for E-Science Text Mining
2005cited by this paper
Expository Text Comprehension: Helping Primary-Grade Teachers Use Expository Texts to Full Advantage
2005cited by this paper
Universal Artificial Intelligence
2004cited by this paper
Uncertainty Theory
2004cited by this paper
Layout and Content Extraction for PDF Documents
2004cited by this paper
Acquisition of the Domain Structure from Document Indexes Using Heuristic Reasoning
2004cited by this paper
ISABEL: a web-based differential diagnostic aid for paediatrics: results from an initial performance evaluation
2003cited by this paper
Introductory Statistics With R
2003cited by this paper
Latent Dirichlet Allocation
2003cited by this paper
Cumulated gain-based evaluation of IR techniques
2002cited by this paper
3. The characteristics of well-designed science textbooks
2002cited by this paper
CHAPTER 1 – About indexing
2001cited by this paper
Word Sense Disambiguation using Conceptual Density
1996cited by this paper
Term-Weighting Approaches in Automatic Text Retrieval
1988cited by this paper
The Chicago Manual of Style . By the University of Chicago Press. 13th ed. Chicago: University of Chicago Press, 1982. ix, 740 pp. Glossary of Technical Terms, Bibliography, Index. $25.
1984cited by this paper
Introductory statistics for business and economics
1983cited by this paper
A vector space model for automatic indexing
1975cited by this paper
Statistics for Scientists and Engineers.
1966cited by this paper
2009 10th International Conference on Document Analysis and Recognition PDF-TREX: An Approach for Recognizing and Extracting Tables from PDF Documents
year unknowncited by this paper
University of
year unknowncited by this paper

CITED BY

Intelligent Textbooks
2025cites this paper
HiPS: Hierarchical PDF Segmentation of Textbooks
2025cites this paper
Formative Feedback on Student-Authored Summaries in Intelligent Textbooks Using Large Language Models
2024cites this paper
Extracción de modelos de conocimiento a partir de libros de texto y su aplicación en los negocios
2024cites this paper
Empowering Asynchronous Arabic Language Learning Through PDF Hyperlink Media
2024cites this paper
Extraction of Knowledge Models from Textbooks
2024cites this paper
Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic Keyword Extraction
2023cites this paper
An unsupervised linguistic-based model for automatic glossary term extraction from a single PDF textbook
2023influential citation
Layout and Activity-based Textbook Modeling for Automatic PDF Textbook Extraction
2023cites this paper
Grade Level Filtering for Learning Object Search using Entity Linking
2022cites this paper
What’s in an Index: Extracting Domain-specific Knowledge Graphs from Textbooks
2022cites this paper
LAOps: Learning Analytics with Privacy-aware MLOps
2022cites this paper
Enriching Intelligent Textbooks with Interactivity: When Smart Content Allocation Goes Wrong
2022cites this paper
The Return of Intelligent Textbooks
2022cites this paper
Generation of Assessment Questions from Textbooks Enriched with Knowledge Models
2021cites this paper
Integrating Textbooks with Smart Interactive Content for Learning Programming
2021cites this paper