This paper examines language similarity in messages over time in an online community of adolescents from around the world using three computational measures: Spearman's Correlation Coefficient, Zipping and Latent Semantic Analysis. Results suggest that the participants' language diverges over a six-week period, and that divergence is not mediated by demographic variables such as leadership status or gender. This divergence may represent the introduction of more unique words over time, and is influenced by a continual change in subtopics over time, as well as community-wide historical events that introduce new vocabulary at later time periods. Our results highlight both the possibilities and shortcomings of using document similarity measures to assess convergence in language use.
Computational Measures for Language Similarity Across Time in Online Communities
David A. Huffaker,Joseph Jorgensen,Francisco Iacobelli,Paul Tepper,Justine Cassell
Published 2006 in HLT-NAACL 2006
ABSTRACT
PUBLICATION RECORD
- Publication year
2006
- Venue
HLT-NAACL 2006
- Publication date
2006-06-08
- Fields of study
Linguistics, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-28 of 28 references · Page 1 of 1
CITED BY
Showing 1-30 of 30 citing papers · Page 1 of 1