Pre-training via Paraphrasing

M. Lewis,Marjan Ghazvininejad,Gargi Ghosh,Armen Aghajanyan,Sida I. Wang,Luke Zettlemoyer

Published 2020 in Neural Information Processing Systems

ABSTRACT

We introduce MARGE, a pre-trained sequence-to-sequence model learned with an unsupervised multi-lingual multi-document paraphrasing objective. MARGE provides an alternative to the dominant masked language modeling paradigm, where we self-supervise the reconstruction of target text by retrieving a set of related texts (in many languages) and conditioning on them to maximize the likelihood of generating the original. We show it is possible to jointly learn to do retrieval and reconstruction, given only a random initialization. The objective noisily captures aspects of paraphrase, translation, multi-document summarization, and information retrieval, allowing for strong zero-shot performance on several tasks. For example, with no additional task-specific training we achieve BLEU scores of up to 35.8 for document translation. We further show that fine-tuning gives strong performance on a range of discriminative and generative tasks in many languages, making MARGE the most generally applicable pre-training method to date.

PUBLICATION RECORD

Publication year
2020
Venue
Neural Information Processing Systems
Publication date
2020-06-26
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 2006.15020
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
2020cited by this paper
MLSUM: The Multilingual Summarization Corpus
2020cited by this paper
Scaling Laws for Neural Language Models
2020cited by this paper
Multilingual Denoising Pre-training for Neural Machine Translation
2020influential reference
REALM: Retrieval-Augmented Language Model Pre-Training
2020cited by this paper
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
2020cited by this paper
A Primer in BERTology: What We Know About How BERT Works
2020cited by this paper
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
2020cited by this paper
CCMatrix: Mining Billions of High-Quality Parallel Sentences on the Web
2019cited by this paper
Unsupervised Cross-lingual Representation Learning at Scale
2019cited by this paper
MLQA: Evaluating Cross-lingual Extractive Question Answering
2019cited by this paper
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
2019cited by this paper
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
2019cited by this paper
Generalization through Memorization: Nearest Neighbor Language Models
2019cited by this paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019influential reference
Cross-lingual Language Model Pretraining
2019cited by this paper
No Training Required: Exploring Random Encoders for Sentence Classification
2019cited by this paper
Unified Language Model Pre-training for Natural Language Understanding and Generation
2019cited by this paper
XLNet: Generalized Autoregressive Pretraining for Language Understanding
2019cited by this paper
RoBERTa: A Robustly Optimized BERT Pretraining Approach
2019cited by this paper
PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification
2019cited by this paper
Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation
2019cited by this paper
Generating Wikipedia by Summarizing Long Sequences
2018cited by this paper
A Call for Clarity in Reporting BLEU Scores
2018cited by this paper
Document-Level Neural Machine Translation with Hierarchical Attention Networks
2018cited by this paper
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
2018cited by this paper
Billion-Scale Similarity Search with GPUs
2017cited by this paper
Generating Sentences by Editing Prototypes
2017cited by this paper
Unsupervised Neural Machine Translation
2017cited by this paper
Unsupervised Machine Translation Using Monolingual Corpora Only
2017cited by this paper
Learned in Translation: Contextualized Word Vectors
2017cited by this paper
Attention is All you Need
2017influential reference
Controllable Abstractive Summarization
2017cited by this paper
Overview of the Second BUCC Shared Task: Spotting Parallel Sentences in Comparable Corpora
2017cited by this paper
FastText.zip: Compressing text classification models
2016cited by this paper
SQuAD: 100,000+ Questions for Machine Comprehension of Text
2016cited by this paper
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
2016cited by this paper

CITED BY

SimulRAG: Simulator-based RAG for Grounding LLMs in Long-form Scientific QA
2025cites this paper
A Systematic Review of Key Retrieval-Augmented Generation (RAG) Systems: Progress, Gaps, and Future Directions
2025cites this paper
Retrieval-Augmented Generation of Event Collections from Web Archives and the Live Web
2025cites this paper
RWalks: Random Walks as Attribute Diffusers for Filtered Vector Search
2025cites this paper
CoRet: Improved Retriever for Code Editing
2025cites this paper
Patchwork: A Unified Framework for RAG Serving
2025cites this paper
20min-XD: A Comparable Corpus of Swiss News Articles
2025cites this paper
Rethinking Natural Language Generation with Layer-Wise Multi-View Decoding
2025cites this paper
PipeRAG: Fast Retrieval-Augmented Generation via Adaptive Pipeline Parallelism
2025cites this paper
RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving
2025influential citation
Unveiling Attractor Cycles in Large Language Models: A Dynamical Systems View of Successive Paraphrasing
2025cites this paper
Graph-Based Vector Search: An Experimental Evaluation of the State-of-the-Art
2025cites this paper
Optimal and Diffusion Transports in Machine Learning
2025cites this paper
Physical vs. Logical Indexing with IDEA: Inverted Deduplication-Aware Index
2025cites this paper
PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design
2024influential citation
Pre-training and Diagnosing Knowledge Base Completion Models
2024cites this paper
A Survey on RAG Meets LLMs: Towards Retrieval-Augmented Large Language Models
2024cites this paper
Multi-task learning on mental disorder detection, sentiment analysis, and emotion detection using social media posts.*
2024cites this paper
Revisiting the Robustness of Watermarking to Paraphrasing Attacks
2024cites this paper
Efficient Learned Query Execution over Text and Tables [Technical Report]
2024cites this paper
Academic Article Recommendation Using Multiple Perspectives
2024cites this paper
Automation, Trustworthy, Intelligent Prenatal Examinations - I Do!!
2024cites this paper
Memorization in Deep Learning: A Survey
2024cites this paper
Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection
2024cites this paper
A Lifelong Multilingual Multi-granularity Semantic Alignment Approach via Maximum Co-occurrence Probability
2024cites this paper
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models
2024cites this paper
Process Generation Method Based on Retrieval Enhancement
2024cites this paper
Predicting Fact Contributions from Query Logs with Machine Learning
2024cites this paper
Reverse Training to Nurse the Reversal Curse
2024cites this paper
Math Word Problem Generation via Disentangled Memory Retrieval
2024cites this paper
LAMPAT: Low-Rank Adaption for Multilingual Paraphrasing Using Adversarial Training
2024cites this paper
The Compute Divide in Machine Learning: A Threat to Academic Contribution and Scrutiny?
2024cites this paper
Learning to Adapt to Low-Resource Paraphrase Generation
2024influential citation
Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense
2023cites this paper
TRIP: Accelerating Document-level Multilingual Pre-training via Triangular Document-level Pre-training on Parallel Data Triplets
2023cites this paper
Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models
2023cites this paper
Auto-Encoding Questions with Retrieval Augmented Decoding for Unsupervised Passage Retrieval and Zero-Shot Question Generation
2023influential citation
In-Context Pretraining: Language Modeling Beyond Document Boundaries
2023cites this paper
Decouple knowledge from paramters for plug-and-play language modeling
2023cites this paper
Automatic Text Summarization Based on Pre-trained Models
2023cites this paper
Few-shot Named Entity Recognition: Definition, Taxonomy and Research Directions
2023cites this paper
Complex Reasoning in Natural Languag
2023cites this paper
Revisiting Commonsense Reasoning in Machine Translation: Training, Evaluation and Challenge
2023cites this paper
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models
2023cites this paper
Building a Chatbot using Natural Language Processing
2023cites this paper
Retrieval-Augmented Multimodal Language Modeling
2023influential citation
Decouple knowledge from parameters for plug-and-play language modeling
2023cites this paper
Towards Multi-Modal DBMSs for Seamless Querying of Texts and Tables
2023cites this paper
Decoupled Context Processing for Context Augmented Language Modeling
2022influential citation
A Survey of Pretrained Language Models Based Text Generation
2022cites this paper
A Survey on Retrieval-Augmented Text Generation
2022cites this paper
Memorizing Transformers
2022cites this paper
Meta-X_{NLG}: A Meta-Learning Approach Based on Language Clustering for Zero-Shot Cross-Lingual Transfer and Generation
2022influential citation
Non-Parallel Text Style Transfer with Self-Parallel Supervision
2022influential citation
kNN-NER: Named Entity Recognition with Nearest Neighbor Search
2022cites this paper
MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages
2022cites this paper
Neural Label Search for Zero-Shot Multi-Lingual Extractive Summarization
2022influential citation
MENTION MEMORY : INCORPORATING TEXTUAL KNOWLEDGE INTO TRANSFORMERS THROUGH ENTITY MENTION ATTENTION
2022influential citation
Dimension Reduction for Efficient Dense Retrieval via Conditional Autoencoder
2022cites this paper
Few-shot Mining of Naturally Occurring Inputs and Outputs
2022cites this paper
OneAligner: Zero-shot Cross-lingual Transfer with One Rich-Resource Language Pair for Low-Resource Sentence Retrieval
2022cites this paper
BanglaNLG: Benchmarks and Resources for Evaluating Low-Resource Natural Language Generation in Bangla
2022cites this paper
E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation
2022cites this paper
Augmenting Message Passing by Retrieving Similar Graphs
2022cites this paper
Learning Video-Text Aligned Representations for Video Captioning
2022cites this paper
OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering
2022influential citation
An Empirical Study of Retrieval-Enhanced Graph Neural Networks
2022cites this paper
Universal Multi-Modality Retrieval with One Unified Embedding Space
2022cites this paper
From Word Embeddings to Pre-Trained Language Models: A State-of-the-Art Walkthrough
2022cites this paper
Towards Multilingual Transitivity and Bidirectional Multilingual Agreement for Multilingual Document-level Machine Translation
2022influential citation
Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering
2022cites this paper
VTC: Improving Video-Text Retrieval with User Comments
2022cites this paper
LAFT: Cross-lingual Transfer for Text Generation by Language-Agnostic Finetuning
2022cites this paper
BARTSmiles: Generative Masked Language Models for Molecular Representations
2022cites this paper
GNN-SL: Sequence Labeling Based on Nearest Examples via GNN
2022influential citation
Advancing Multilingual Pre-training: TRIP Triangular Document-level Pre-training for Multilingual Language Models
2022cites this paper
BanglaNLG and BanglaT5: Benchmarks and Resources for Evaluating Low-Resource Natural Language Generation in Bangla
2022influential citation
Universal Vision-Language Dense Retrieval: Learning A Unified Representation Space for Multi-Modal Retrieval
2022cites this paper
Revamping Multilingual Agreement Bidirectionally via Switched Back-translation for Multilingual Neural Machine Translation
2022cites this paper
Let Your Heart Speak in its Mother Tongue: Multilingual Captioning of Cardiac Signals
2021influential citation
Hurdles to Progress in Long-form Question Answering
2021influential citation
COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
2021cites this paper
Long-Range Modeling of Source Code Files with eWASH: Extended Window Access by Syntax Hierarchy
2021cites this paper
DFKI SLT at GermEval 2021: Multilingual Pre-training and Data Augmentation for the Classification of Toxicity in Social Media Comments
2021cites this paper
RetroNLU: Retrieval Augmented Task-Oriented Semantic Parsing
2021influential citation
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
2021cites this paper
Sequential Reptile: Inter-Task Gradient Alignment for Multilingual Learning
2021cites this paper
The Inductive Bias of In-Context Learning: Rethinking Pretraining Example Design
2021cites this paper
Deep Transfer Learning & Beyond: Transformer Language Models in Information Systems Research
2021cites this paper
GNN-LM: Language Modeling based on Global Contexts via GNN
2021cites this paper
Reusing Monolingual Pre-Trained Models by Cross-Connecting Seq2seq Models for Machine Translation
2021cites this paper
ODIST: Open World Classification via Distributionally Shifted Instances
2021influential citation
Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization
2021cites this paper
When Curation Becomes Creation
2021cites this paper
Structure-inducing pre-training
2021cites this paper
WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models
2021cites this paper
RetGen: A Joint Framework for Retrieval and Grounded Text Generation Modeling
2021influential citation
Towards Unsupervised Dense Information Retrieval with Contrastive Learning
2021cites this paper
DOCmT5: Document-Level Pretraining of Multilingual Language Models
2021cites this paper
Analyzing the Limits of Self-Supervision in Handling Bias in Language
2021cites this paper