A General Approach for Partitioning Web Page Content Based on Geometric and Style Information

Hui Guo,J. Mahmud,Y. Borodin,Amanda Stent,I. Ramakrishnan

Published 2007 in IEEE International Conference on Document Analysis and Recognition

ABSTRACT

In this paper, we describe a general-purpose approach for partitioning Web page content. The novelty of our approach lies in the use of detailed layout information from a Web page renderer to determine spatial locality and identify visual separators, and the use of relaxed matching over presentation style information to determine presentation style similarity. We present several examples to illustrate the generality of our approach.

PUBLICATION RECORD

  • Publication year

    2007

  • Venue

    IEEE International Conference on Document Analysis and Recognition

  • Publication date

    2007-09-01

  • Fields of study

    Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

CITED BY

Showing 1-19 of 19 citing papers · Page 1 of 1