In this paper we describe a novel data structure for phrase-based statistical machine translation which allows for the retrieval of arbitrarily long phrases while simultaneously using less memory than is required by current decoder implementations. We detail the computational complexity and average retrieval times for looking up phrase translations in our suffix array-based data structure. We show how sampling can be used to reduce the retrieval time by orders of magnitude with no loss in translation quality.
Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases
Chris Callison-Burch,C. Bannard,Josh Schroeder
Published 2005 in Annual Meeting of the Association for Computational Linguistics
ABSTRACT
PUBLICATION RECORD
- Publication year
2005
- Venue
Annual Meeting of the Association for Computational Linguistics
- Publication date
2005-06-25
- Fields of study
Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-12 of 12 references · Page 1 of 1