But Dictionaries Are Data Too

P. Brown,S. D. Pietra,V. D. Pietra,Meredith J. Goldsmith,Jan Hajic,R. Mercer,Surya Mohanty

Published 1993 in Human Language Technology - The Baltic Perspectiv

ABSTRACT

Although empiricist approaches to machine translation depend vitally on data in the form of large bilingual corpora, bilingual dictionaries are also a source of information. We show how to model at least a part of the information contained in a bilingual dictionary so that we can treat a bilingual dictionary and a bilingual corpus as two facets of a unified collection of data from which to extract values for the parameters of a probabilistic machine translation system. We give an algorithm for obtaining maximum likelihood estimates of the parameters of a probabilistic model from this combined data and we show how these parameters are affected by inclusion of the dictionary for some sample words.

PUBLICATION RECORD

  • Publication year

    1993

  • Venue

    Human Language Technology - The Baltic Perspectiv

  • Publication date

    1993-03-21

  • Fields of study

    Linguistics, Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

CITED BY

Showing 1-74 of 74 citing papers · Page 1 of 1