Measuring Societal Biases from Text Corpora with Smoothed First-Order Co-occurrence

Authors

Navid Rekabsaz,Robert West,James Henderson,Allan Hanbury

Johannes Kepler University, Linz Institute of Technology, AI Lab,EPFL,Idiap Research Institute,TU Wien, Complexity Science Hub Vienna

Proceedings:

Vol. 15 (2021): Fifteenth International AAAI Conference on Web and Social Media

Volume

Issue:

Vol. 15 (2021): Fifteenth International AAAI Conference on Web and Social Media

Track:

Full Papers

Downloads:

Download PDF

Abstract:

Text corpora are widely used resources for measuring societal biases and stereotypes. The common approach to measuring such biases using a corpus is by calculating the similarities between the embedding vector of a word (like nurse) and the vectors of the representative words of the concepts of interest (such as genders). In this study, we show that, depending on what one aims to quantify as bias, this commonly-used approach can introduce non-relevant concepts into bias measurement. We propose an alternative approach to bias measurement utilizing the smoothed first-order co-occurrence relations between the word and the representative concept words, which we derive by reconstructing the co-occurrence estimates inherent in word embedding models. We compare these approaches by conducting several experiments on the scenario of measuring gender bias of occupational words, according to an English Wikipedia corpus. Our experiments show higher correlations of the measured gender bias with the actual gender bias statistics of the U.S. job market – on two collections and with a variety of word embedding models – using the first-order approach in comparison with the vector similarity-based approaches. The first-order approach also suggests a more severe bias towards female in a few specific occupations than the other approaches.

DOI:

10.1609/icwsm.v15i1.18083

ICWSM

Vol. 15 (2021): Fifteenth International AAAI Conference on Web and Social Media

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.