Reliable Measures for Aligning Japanese-English News Articles and Sentences

M. Utiyama,H. Isahara

Published 2003 in Annual Meeting of the Association for Computational Linguistics

ABSTRACT

We have aligned Japanese and English news articles and sentences to make a large parallel corpus. We first used a method based on cross-language information retrieval (CLIR) to align the Japanese and English articles and then used a method based on dynamic programming (DP) matching to align the Japanese and English sentences in these articles. However, the results included many incorrect alignments. To remove these, we propose two measures (scores) that evaluate the validity of alignments. The measure for article alignment uses similarities in sentences aligned by DP matching and that for sentence alignment uses similarities in articles aligned by CLIR. They enhance each other to improve the accuracy of alignment. Using these measures, we have successfully constructed a large-scale article and sentence alignment corpus available to the public.

PUBLICATION RECORD

  • Publication year

    2003

  • Venue

    Annual Meeting of the Association for Computational Linguistics

  • Publication date

    2003-07-07

  • Fields of study

    Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

CITED BY

Showing 1-100 of 240 citing papers · Page 1 of 3