In this paper, we describe a general-purpose approach for partitioning Web page content. The novelty of our approach lies in the use of detailed layout information from a Web page renderer to determine spatial locality and identify visual separators, and the use of relaxed matching over presentation style information to determine presentation style similarity. We present several examples to illustrate the generality of our approach.
A General Approach for Partitioning Web Page Content Based on Geometric and Style Information
Hui Guo,J. Mahmud,Y. Borodin,Amanda Stent,I. Ramakrishnan
Published 2007 in IEEE International Conference on Document Analysis and Recognition
ABSTRACT
PUBLICATION RECORD
- Publication year
2007
- Venue
IEEE International Conference on Document Analysis and Recognition
- Publication date
2007-09-01
- Fields of study
Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-14 of 14 references · Page 1 of 1
CITED BY
Showing 1-19 of 19 citing papers · Page 1 of 1