{"corpus_id":249431834,"paper_sha":"0f72e329d3b0f1cfe388d102ef5fec0677ac7558","doi":"10.1145/3514094.3534162","arxiv_id":"2206.03390","pmid":null,"pmcid":null,"mag_id":null,"dblp_id":"conf/aies/CaliskanACWB22","acl_id":null,"title":"Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics","year":2022,"publication_date":"2022-06-07","venue":"AAAI/ACM Conference on AI, Ethics, and Society","journal":{"name":"Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society","pages":null,"volume":null},"journal_issn":null,"journal_title":null,"publication_types":["JournalArticle","Book"],"pubmed_pub_types":null,"s2_fields_of_study":["Linguistics","Computer Science"],"reference_count":61,"citation_count":74,"influential_citation_count":1,"is_open_access":true,"arxiv_categories":["cs.CY","cs.AI","cs.CL","cs.LG"],"arxiv_license":"http://creativecommons.org/licenses/by-nc-sa/4.0/","arxiv_journal_ref":null,"mesh_headings":null,"chemicals":null,"comments_corrections":null,"source_flags":1,"s2_open_access_pdf_url":"https://dl.acm.org/doi/pdf/10.1145/3514094.3534162","s2_open_access_landing_url":"https://www.semanticscholar.org/paper/0f72e329d3b0f1cfe388d102ef5fec0677ac7558","s2_open_access_license":null,"s2_open_access_status":"BRONZE","pmc_open_access_pdf_url":null,"pmc_open_access_landing_url":null,"pmc_open_access_license":null,"pmc_open_access_status":null,"unpaywall_open_access_pdf_url":null,"unpaywall_open_access_landing_url":null,"unpaywall_open_access_license":null,"unpaywall_open_access_status":null,"abstract":"Word embeddings are numeric representations of meaning derived from word co-occurrence statistics in corpora of human-produced texts. The statistical regularities in language corpora encode well-known social biases into word embeddings (e.g., the word vector for family is closer to the vector women than to men). Although efforts have been made to mitigate bias in word embeddings, with the hope of improving fairness in downstream Natural Language Processing (NLP) applications, these efforts will remain limited until we more deeply understand the multiple (and often subtle) ways that social biases can be reflected in word embeddings. Here, we focus on gender to provide a comprehensive analysis of group-based biases in widely-used static English word embeddings trained on internet corpora (GloVe 2014, fastText 2017). While some previous research has helped uncover biases in specific semantic associations between a group and a target domain (e.g., women - family), using the Single-Category Word Embedding Association Test, we demonstrate the widespread prevalence of gender biases that also show differences in: (1) frequencies of words associated with men versus women; (b) part-of-speech tags in gender-associated words; (c) semantic categories in gender-associated words; and (d) valence, arousal, and dominance in gender-associated words. We leave the analysis of non-binary gender to future work due to the challenges in accurate group representation caused by limitations inherent in data. First, in terms of word frequency: we find that, of the 1,000 most frequent words in the vocabulary, 77% are more associated with men than women, providing direct evidence of a masculine default in the everyday language of the English-speaking world. Second, turning to parts-of-speech: the top male-associated words are typically verbs (e.g., fight, overpower) while the top female-associated words are typically adjectives and adverbs (e.g., giving, emotionally). Gender biases in embeddings also permeate parts-of-speech. Third, for semantic categories: bottom-up, cluster analyses of the top 1,000 words associated with each gender. The top male-associated concepts include roles and domains of big tech, engineering, religion, sports, and violence; in contrast, the top female-associated concepts are less focused on roles, including, instead, female-specific slurs and sexual content, as well as appearance and kitchen terms. Fourth, using human ratings of word valence, arousal, and dominance from a ~20,000 word lexicon, we find that male-associated words are higher on arousal and dominance, while female-associated words are higher on valence. Ultimately, these findings move the study of gender bias in word embeddings beyond the basic investigation of semantic relationships to also study gender differences in multiple manifestations in text. Given the central role of word embeddings in NLP applications, it is essential to more comprehensively document where biases exist and may remain hidden, allowing them to persist without our awareness throughout large text corpora.","claims":[{"public_id":"cl_b5f46ce7fdad0556213d2d934b6d2791","status":"active","text":"Cluster analyses of the top 1,000 words associated with each gender identify male-associated concepts involving big tech, engineering, religion, sports, and violence, while female-associated concepts include female-specific slurs, sexual content, appearance, and kitchen terms.","confidence":0.94,"contributors":[{"id":136,"public_id":"3c2apqe3ut","public_label":"Anonymous (3c2apqe3ut)","roles":["extraction"],"url":"https://sah.borca.ai/u/3c2apqe3ut"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/claims/cl_b5f46ce7fdad0556213d2d934b6d2791"},{"public_id":"cl_735129dd7472d6b060102e5cfff2a558","status":"active","text":"Male-associated words are higher on arousal and dominance, while female-associated words are higher on valence.","confidence":0.97,"contributors":[{"id":136,"public_id":"3c2apqe3ut","public_label":"Anonymous (3c2apqe3ut)","roles":["extraction"],"url":"https://sah.borca.ai/u/3c2apqe3ut"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/claims/cl_735129dd7472d6b060102e5cfff2a558"},{"public_id":"cl_746768acd3bdb0cf9c0983b4b694696b","status":"active","text":"Of the 1,000 most frequent words in the vocabulary, 77% are more associated with men than women, supporting a masculine default in everyday English language.","confidence":0.98,"contributors":[{"id":136,"public_id":"3c2apqe3ut","public_label":"Anonymous (3c2apqe3ut)","roles":["extraction"],"url":"https://sah.borca.ai/u/3c2apqe3ut"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/claims/cl_746768acd3bdb0cf9c0983b4b694696b"},{"public_id":"cl_13dfae85c97fbcf22ef5f649543d329f","status":"active","text":"Top male-associated words are typically verbs, while top female-associated words are typically adjectives and adverbs.","confidence":0.96,"contributors":[{"id":136,"public_id":"3c2apqe3ut","public_label":"Anonymous (3c2apqe3ut)","roles":["extraction"],"url":"https://sah.borca.ai/u/3c2apqe3ut"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/claims/cl_13dfae85c97fbcf22ef5f649543d329f"},{"public_id":"cl_f74806f7e1904d92f0997f95edfad064","status":"active","text":"Widely-used static English word embeddings trained on internet corpora show widespread gender biases across word frequency, part-of-speech tags, semantic categories, and valence, arousal, and dominance.","confidence":0.95,"contributors":[{"id":136,"public_id":"3c2apqe3ut","public_label":"Anonymous (3c2apqe3ut)","roles":["extraction"],"url":"https://sah.borca.ai/u/3c2apqe3ut"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/claims/cl_f74806f7e1904d92f0997f95edfad064"}],"concepts":[{"public_id":"co_15d2500c16eb2f354368ce30bfa8cdea","status":"active","name":"internet corpora","description":"Text collections from the internet used to train the analyzed word embeddings.","types":["dataset"],"aliases":[],"contributors":[{"id":136,"public_id":"3c2apqe3ut","public_label":"Anonymous (3c2apqe3ut)","roles":["extraction"],"url":"https://sah.borca.ai/u/3c2apqe3ut"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_15d2500c16eb2f354368ce30bfa8cdea"},{"public_id":"co_20b60578d1899c0448a85e4c6b2d8d27","status":"active","name":"GloVe 2014","description":"A widely-used static English word embedding model trained on internet corpora and analyzed for gender bias.","types":["model"],"aliases":[],"contributors":[{"id":136,"public_id":"3c2apqe3ut","public_label":"Anonymous (3c2apqe3ut)","roles":["extraction"],"url":"https://sah.borca.ai/u/3c2apqe3ut"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_20b60578d1899c0448a85e4c6b2d8d27"},{"public_id":"co_44cb2df7ee3aad86e2bd45e67187ea11","status":"active","name":"Single-Category Word Embedding Association Test","description":"A word embedding association test used to measure associations between one target category and attribute words.","types":["method"],"aliases":[],"contributors":[{"id":136,"public_id":"3c2apqe3ut","public_label":"Anonymous (3c2apqe3ut)","roles":["extraction"],"url":"https://sah.borca.ai/u/3c2apqe3ut"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_44cb2df7ee3aad86e2bd45e67187ea11"},{"public_id":"co_4ebc95bed923ff420749a85462c79edd","status":"active","name":"valence, arousal, and dominance","description":"Human-rated affective dimensions used to characterize gender-associated words.","types":["measure"],"aliases":[],"contributors":[{"id":136,"public_id":"3c2apqe3ut","public_label":"Anonymous (3c2apqe3ut)","roles":["extraction"],"url":"https://sah.borca.ai/u/3c2apqe3ut"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_4ebc95bed923ff420749a85462c79edd"},{"public_id":"co_5de7032531c02229a026ad63c091f542","status":"active","name":"word embeddings","description":"Numeric representations of word meaning derived from word co-occurrence statistics in human-produced text corpora.","types":["representation"],"aliases":[],"contributors":[{"id":136,"public_id":"3c2apqe3ut","public_label":"Anonymous (3c2apqe3ut)","roles":["extraction"],"url":"https://sah.borca.ai/u/3c2apqe3ut"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_5de7032531c02229a026ad63c091f542"},{"public_id":"co_85a44e646e384c3ec90ec44de05c55a0","status":"active","name":"fastText 2017","description":"A widely-used static English word embedding model trained on internet corpora and analyzed for gender bias.","types":["model"],"aliases":[],"contributors":[{"id":136,"public_id":"3c2apqe3ut","public_label":"Anonymous (3c2apqe3ut)","roles":["extraction"],"url":"https://sah.borca.ai/u/3c2apqe3ut"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_85a44e646e384c3ec90ec44de05c55a0"},{"public_id":"co_864a900fee386652c85aee011a583aaa","status":"active","name":"semantic categories","description":"Meaning-based groupings of words associated with men or women in the analyzed embeddings.","types":["semantic feature"],"aliases":[],"contributors":[{"id":136,"public_id":"3c2apqe3ut","public_label":"Anonymous (3c2apqe3ut)","roles":["extraction"],"url":"https://sah.borca.ai/u/3c2apqe3ut"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_864a900fee386652c85aee011a583aaa"},{"public_id":"co_a78ee239b989f682aa81518f845cfa40","status":"active","name":"static English word embeddings","description":"English word vector models analyzed in fixed form rather than contextualized at use time.","types":["model"],"aliases":[],"contributors":[{"id":136,"public_id":"3c2apqe3ut","public_label":"Anonymous (3c2apqe3ut)","roles":["extraction"],"url":"https://sah.borca.ai/u/3c2apqe3ut"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_a78ee239b989f682aa81518f845cfa40"},{"public_id":"co_b00967bae7382ccbb2ed2c0c156262a8","status":"active","name":"masculine default","description":"A pattern in everyday English in which frequent words are more strongly associated with men than women.","types":["phenomenon"],"aliases":[],"contributors":[{"id":136,"public_id":"3c2apqe3ut","public_label":"Anonymous (3c2apqe3ut)","roles":["extraction"],"url":"https://sah.borca.ai/u/3c2apqe3ut"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_b00967bae7382ccbb2ed2c0c156262a8"},{"public_id":"co_d238b72d23f49d693e7e8dd2f0ae4f0e","status":"active","name":"part-of-speech tags","description":"Grammatical labels such as verbs, adjectives, and adverbs assigned to gender-associated words.","types":["linguistic feature"],"aliases":[],"contributors":[{"id":136,"public_id":"3c2apqe3ut","public_label":"Anonymous (3c2apqe3ut)","roles":["extraction"],"url":"https://sah.borca.ai/u/3c2apqe3ut"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_d238b72d23f49d693e7e8dd2f0ae4f0e"},{"public_id":"co_d8a384b1f6a33b6714767a9749037cc1","status":"active","name":"gender biases","description":"Group-based differences in associations between gender terms and other words within the analyzed embeddings.","types":["phenomenon"],"aliases":[],"contributors":[{"id":136,"public_id":"3c2apqe3ut","public_label":"Anonymous (3c2apqe3ut)","roles":["extraction"],"url":"https://sah.borca.ai/u/3c2apqe3ut"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_d8a384b1f6a33b6714767a9749037cc1"},{"public_id":"co_f0583e27e7d04aeb506752502f15a134","status":"active","name":"cluster analyses","description":"Bottom-up analyses used to group highly gender-associated words into semantic concepts.","types":["method"],"aliases":[],"contributors":[{"id":136,"public_id":"3c2apqe3ut","public_label":"Anonymous (3c2apqe3ut)","roles":["extraction"],"url":"https://sah.borca.ai/u/3c2apqe3ut"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_f0583e27e7d04aeb506752502f15a134"},{"public_id":"co_fa83e1b7397b4669d800ced171e8ea4d","status":"active","name":"gender-associated words","description":"Words identified as more strongly associated with men or women in the embedding space.","types":["lexical set"],"aliases":[],"contributors":[{"id":136,"public_id":"3c2apqe3ut","public_label":"Anonymous (3c2apqe3ut)","roles":["extraction"],"url":"https://sah.borca.ai/u/3c2apqe3ut"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_fa83e1b7397b4669d800ced171e8ea4d"},{"public_id":"co_fd8f6703378293fbcd7b04862fe08c25","status":"active","name":"word frequency","description":"The frequency with which words occur in the embedding vocabulary or source language data.","types":["measure"],"aliases":[],"contributors":[{"id":136,"public_id":"3c2apqe3ut","public_label":"Anonymous (3c2apqe3ut)","roles":["extraction"],"url":"https://sah.borca.ai/u/3c2apqe3ut"},{"id":2,"public_id":"4715169a40","public_label":"AK (4715169a40)","roles":["review"],"url":"https://sah.borca.ai/u/4715169a40"},{"id":17,"public_id":"322360f1c1","public_label":"Killer Whale (322360f1c1)","roles":["review"],"url":"https://sah.borca.ai/u/322360f1c1"}],"url":"https://sah.borca.ai/concepts/co_fd8f6703378293fbcd7b04862fe08c25"}],"external_ids":{"DOI":"10.1145/3514094.3534162","ArXiv":"2206.03390","PubMed":null,"PubMedCentral":null,"MAG":null,"DBLP":"conf/aies/CaliskanACWB22","ACL":null},"open_access":{"is_open_access":true,"pdf_url":"https://dl.acm.org/doi/pdf/10.1145/3514094.3534162","landing_url":"https://www.semanticscholar.org/paper/0f72e329d3b0f1cfe388d102ef5fec0677ac7558","source":"semantic_scholar","pdf_url_source":"semantic_scholar_open_access_pdf","license":null,"status":"BRONZE","reason":null},"reference_availability":{"status":"available","references_indexed":true,"full_text_available":true,"full_text_source":"arxiv","count_basis":"semantic_scholar_metadata","extraction_status":"not_applicable","reason":null},"source":{"provider":"episteme2","base_corpus":"semantic_scholar_dump","freshness_mode":"unknown","basis":["semantic_scholar_metadata","postgres_metadata"],"limits":["paper metadata is based on indexed upstream scholarly datasets","claims and concepts are available only for extracted papers","absence of claims or concepts means no extracted graph data is available in this response"],"status":"available","degraded":false,"degraded_reasons":[],"diagnostics":{"status":"available","degraded":false,"degraded_reasons":[],"metadata_status":"available","graph_status":"available","abstract_status":"available"},"source_flags":1},"paper_id":631369,"paper_uid":"a3ca4232-16a4-430f-b921-519e853c1686","canonical_identity":{"paper_id":631369,"paper_uid":"a3ca4232-16a4-430f-b921-519e853c1686","identity_status":"available","lookup_basis":"semantic_scholar_external_id","compatibility_path":"corpus_id"},"url":"https://sah.borca.ai/papers/249431834"}