Although empiricist approaches to machine translation depend vitally on data in the form of large bilingual corpora, bilingual dictionaries are also a source of information. We show how to model at least a part of the information contained in a bilingual dictionary so that we can treat a bilingual dictionary and a bilingual corpus as two facets of a unified collection of data from which to extract values for the parameters of a probabilistic machine translation system. We give an algorithm for obtaining maximum likelihood estimates of the parameters of a probabilistic model from this combined data and we show how these parameters are affected by inclusion of the dictionary for some sample words.
But Dictionaries Are Data Too
P. Brown,S. D. Pietra,V. D. Pietra,Meredith J. Goldsmith,Jan Hajic,R. Mercer,Surya Mohanty
Published 1993 in Human Language Technology - The Baltic Perspectiv
ABSTRACT
PUBLICATION RECORD
- Publication year
1993
- Venue
Human Language Technology - The Baltic Perspectiv
- Publication date
1993-03-21
- Fields of study
Linguistics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-3 of 3 references · Page 1 of 1
CITED BY
Showing 1-74 of 74 citing papers · Page 1 of 1