SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations

Dheeraj Mekala,Vivek Gupta,Bhargavi Paranjape,H. Karnick

Published 2016 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

We present a feature vector formation technique for documents - Sparse Composite Document Vector (SCDV) - which overcomes several shortcomings of the current distributional paragraph vector representations that are widely used for text representation. In SCDV, word embeddings are clustered to capture multiple semantic contexts in which words occur. They are then chained together to form document topic-vectors that can express complex, multi-topic documents. Through extensive experiments on multi-class and multi-label classification tasks, we outperform the previous state-of-the-art method, NTSG. We also show that SCDV embeddings perform well on heterogeneous tasks like Topic Coherence, context-sensitive Learning and Information Retrieval. Moreover, we achieve a significant reduction in training and prediction times compared to other representation methods. SCDV achieves best of both worlds - better performance with lower time and space complexity.

PUBLICATION RECORD

  • Publication year

    2016

  • Venue

    Conference on Empirical Methods in Natural Language Processing

  • Publication date

    2016-12-20

  • Fields of study

    Computer Science

  • Identifiers
  • External record

    Open on Semantic Scholar

  • Source metadata

    Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-37 of 37 references · Page 1 of 1

CITED BY

Showing 1-49 of 49 citing papers · Page 1 of 1