| Literature DB >> 34151257 |
Mascha Kurpicz-Briki1, Tomaso Leoni1.
Abstract
Publicly available off-the-shelf word embeddings that are often used in productive applications for natural language processing have been proven to be biased. We have previously shown that this bias can come in different forms, depending on the language and the cultural context. In this work, we extend our previous work and further investigate how bias varies in different languages. We examine Italian and Swedish word embeddings for gender and origin bias, and demonstrate how an origin bias concerning local migration groups in Switzerland is included in German word embeddings. We propose BiasWords, a method to automatically detect new forms of bias. Finally, we discuss how cultural and language aspects are relevant to the impact of bias on the application and to potential mitigation measures.Entities:
Keywords: bias; digital ethics; fairness; gender; language models; natural language processing; training data; word embeddings
Year: 2021 PMID: 34151257 PMCID: PMC8209512 DOI: 10.3389/fdata.2021.625290
Source DB: PubMed Journal: Front Big Data ISSN: 2624-909X
Size comparison of the different fastText word embeddings used in this paper.
| Zip file (GB) | 1.33 | 1.28 | 1.29 | 1.27 | 1.25 | 0.0654 |
| Wikipedia articles | 6′161′892 | 2′481′753 | 2′251′004 | 1′636′253 | 3′675′493 | 3′695 |
| CC share percentage | 43.7532 | 5.5476 | 4.6213 | 2.4019 | 0.7487 | 0.0014 |
The terms from the original WEAT5 experiment (Caliskan et al., 2017) and our adaptations/translations to Italian and Swedish.
| Group 1 | Brad, Brendan, Geoffrey, Greg, Brett, Jay, Matthew, Neil, Todd, Allison, Anne, Carrie, Emily, Jill, Laurie, Kristen, Meredith, Sarah | Andrea, Francesco, Alessandro, Matteo, Luca, Martina, Alessia, Giulia, Chiara, Sara | Lucas, Liam, William, Elias, Noah, Hugo, Oliver, Oscar, Adam, Alice, Olivia, Astrid, Maja, Vera, Ebba, Ella, Wilma, Alma |
| Group 2 | Darnell, Hakim, Jermaine, Kareem, Jamal, Leroy, Rasheed, Tremayne, Tyrone, Aisha, Ebony, Keisha, Kenya, Latonya, Lakisha, Latoya, Tamika, Tanisha | Kevin, Denise, Thomas, Aaron, Jennifer, Anita, Gabriell, Michael, Thiago, Ivan, Iris, Santiago, Igor, William, Sharon, Abigail | Muhammad, Sai, Advik, Rudra, Arya, Saanvi, Maryam, Amaira, Hussain, Omar, Usman, Khadija, Zuleikha, Fatima, Farid, Hassan, Amira, Iman |
| Pleasant | Joy, love, peace, wonderful, pleasure, friend, laughter, happy | Gioia, amore, pace, incredibile, piacere, amichevole, ridere, felice | glädje, förälskad, fred, ofattbar, nöje, vän, skratt, glad |
| Unpleasant | Agony, terrible, horrible, nasty, evil, war, awful, failure | Agonia, terribile, orribile, sgradevole, crudele, guerra, terrificante, fallimento | kval, hemsk, förskräcklig, otrevlig, ondska, krig, skrämmande, fel |
The terms from the original WEAT6 experiment (Caliskan et al., 2017) and our adaptations/translations to Italian and Swedish.
| Group 1 | John, Paul, Mike, Kevin, Steve, Greg, Jeff, Bill | Andrea, Francesco, Alessandro, Matteo, Luca, Lorenzo, Marco, Davide, Simone, Giuseppe | Marco, Luca, Andrea, Giuseppe, Alessandro, Francesco, Antonio, Roberto | Lucas, Liam, William, Elias, Noah, Hugo, Oliver, Oscar |
| Group 2 | Amy, Joan, Lisa, Sarah, Diana, Kate, Ann, Donna | Martina, Alessia, Giulia, Chiara, Sara, Francesca, Frederica, Giorgia, Anna, Elisa | Maria, Anna, Daniela, Sara, Laura, Elena, Francesca, Giulia | Alice, Olivia, Astrid, Maja, Vera, Ebba, Ella, Wilma |
| Career | Executive, management, professional, corporation, salary, office, business, career | dirigente, gestione, professionale, corporazione, salario, ufficio, affari, carriera | dirigente, gestione, professionale, corporazione, salario, ufficio, affari, carriera | ledare, företagsledning, professionell, bolag, lön, byrå, företag, karriär |
| Family | Home, parents, children, family, cousins, marriage, weddings, relatives | casa, genitori, bambini, famiglia, cugini, matrimonio, nozze, parenti | casa, genitori, bambini, famiglia, cugini, matrimonio, nozze, parent | hem, föräldrar, barn, familj, kusiner, gifte, bröllop, släkt |
For Italian, a Swiss and an Italian version of the experiment was used.
The terms from the original WEAT7 experiment (Caliskan et al., 2017) and our adaptations/translations to Italian and Swedish.
| Math | Math, algebra, geometry, calculus, equations, computation, numbers, addition | matematica, algebra, geometria, calcolo, equazioni, computo, numeri, addizione | matematik, algebra, geometri, kalkyl, ekvation, beräkning, siffror, addition |
| Arts | Poetry, art, dance, literature, novel, symphony, drama, sculpture | poesia, arte, danza, letteratura, romanzo, sinfonia, dramma, scultura | poesi, konst, dans, litteratur, roman, symfoni, drama, skulptur |
| Male terms | Male, man, boy, brother, he, him, his, son | maschio, uomo, ragazzo, fratello, egli, lui, suo, figlio | manlig, man, pojke, bror, han, honom, hans, son |
| Female terms | Female, woman, girl, sister, she, her, hers, daughter | femmina, donna, ragazza, sorella, ella, lei, suo, figlia | kvinnlig, kvinna, flicka, syster, hon, henne, hennes, dotter |
The terms from the original WEAT8 experiment (Caliskan et al., 2017) and our adaptations/translations to Italian and Swedish.
| Science | Science, technology, physics, chemistry, Einstein, NASA, experiment, astronomy | scienza, tecnologia, fisica, chimica, Einstein, NASA, esperimento, astronomia | vetenskap, teknologi, fysik, kemi, Einstein, NASA, försök, astronomi |
| Arts | Poetry, art, Shakespeare, dance, literature, novel, symphony, drama | poesia, arte, Shakespeare, dansa, letteratura, romanzo, sinfonia, dramma | poesi, konst, Shakespeare, dans, litteratur, roman, symfoni, drama |
| Male terms | Brother, father, uncle, grandfather, son, he, his, him | fratello, padre, zio, nonno, figlio, egli, suo, lui | bror, far, farbror, farfar, son, han, hans, honom |
| Female terms | Sister, mother, aunt, grandmother, daughter, she, hers, her | sorella, madre, zia, nonna, figlia, ella, suo, lei | syster, mor, moster, mormor, dotter, hon, hennes, henne |
The terms from the new experiments investigating origin bias in Switzerland on German word embeddings.
| Group 1 | Peter, Daniel, Hans, Thomas, Andreas, Martin, Markus, Michael, Maria, Anna, Ursula, Ruth, Monika, Elisabeth, Verena, Sandra | Peter, Daniel, Hans, Thomas, Andreas, Martin, Markus, Michael | Maria, Anna, Ursula, Ruth, Monika, Elisabeth, Verena, Sandra |
| Group 2 | Fatime, Bajram, Emine, Bekim, Aferdita, Valon, Egzon, Luljeta, Stojan, Marija, Snežana, Aleksandar, Mehmet, Mustafa, Fatma, Ayşe | Valon, Egzon, Stojan, Aleksandar, Mehmet, Mustafa, Bajram, Bekim | Fatime, Emine, Aferdita, Luljeta, Marija, Snežana, Fatma, Ayşe |
| Positive | Spass, Liebe, Frieden, wunderbar, Freude, Lachen, glücklich | Führungskraft, Verwaltung, beruflich, Konzern, Gehalt, Büro, Geschäft, Werdegang | Führungskraft, Verwaltung, beruflich, Konzern, Gehalt, Büro, Geschät, Werdegang |
| Negative | Qual, furchtbar, schrecklich, übel, böse, Krieg, scheusslich, Versagen | versagen, Abbruch, Armut, arbeitslos, Sozialhilfe, untätig, unqualifiziert, Last | versagen, Abbruch, Armut, arbeitslos, Sozialhilfe, untätig, unqualifiziert, Last |
Results of the validation for Italian and Swedish.
| WEAT5-ita | 0.01209 | 0.870561014 | ✓ |
| WEAT6-ita | 0.01865 | 3.084297864 | ✓ |
| WEAT6-ita-ch | 0.00115 | 3.136910288 | ✓ |
| WEAT7-ita | 0.04995 | 0.155938722 | ✓ |
| WEAT8-ita | 0.4916 | 0.505244366 | × |
| WEAT5-swe | 0.0396 | 1.575493266 | ✓ |
| WEAT6-swe | 0.12559 | 3.74003113 | × |
| WEAT7-swe | < 10−3 | 0.23436922 | ✓ |
| WEAT8-swe | 0.00185 | 0.460947275 | ✓ |
We report p-values (p) and absolute value of effect size (d).
Results of the new experiments investigating origin bias in Switzerland on German word embeddings.
| WEAT5-origin | 0.00192 | 1.027370342 | ✓ |
| WEAT6-origin-m | 0.00026 | 0.021395449 | ✓ |
| WEAT6-origin-f | 0.00302 | 0.047869917 | ✓ |
We report p-values (p) and absolute value of effect size (d).
Results of the new word sets created with the BiasWords method and its validation using the WEAT method.
| Group 1 | Mikrosensorik, Verkehrsingenieurwesen, Mikrosystemtechnik, Experimentalphysik, Kommunikationsinformatik, Chemieingenieurwesens, Biophysik, Wirtschaftsingenieurwesens | Sternenkunde, Nuklearchemie, US-Weltraumbehörde, Techologie, Kernphysik, Goenner, Planetenbeobachtung, Amateurastronomie | Handlungswissen, Alltagswissen, Menschengeist, Moralität, Lebenshauch, Instinkt, Seele, Hirn |
| Group 2 | Pferdemedizin, Ernährungswissenschaften, Entwicklungswissenschaft, Bildungswissenschaft, Bildungswissenschaften, Tierärzte, Kleintiermedizin, Biopsychologie | Volkstanz, Drama, Nicht-Kunst, poetischen, Getanzte, Hip-Hop-Tanz, Comicliteratur, Liebesroman | Gläubigkeit, Affektion, Wahrnehmungsleistungen, Verständigkeit, Sensibilität, Abstraktionsfähigkeit, Volksreligiosität, Wahrnehmungskompetenz |
| Target 1 | Onkel, Bauernjunge, Knabe, Jugendlicher, Jugendliche, Eheman, Schwiegervater, Stiefsohn | Onkel, Cousin, Enkelsohn, Kumpel, Opa, Urgrossonkel, Grossvater, Schulfreund | Jugendlicher, Sohn, Schwager, Schwiegervater, Neffe, Bruder, Schwestersohn, Bauernjunge |
| Target 2 | Halbschwester, Teenager, Christine, Partnerin, Lebensgefährtin, Ärztin, Enkelin, Geschwister | Tochter, Enkeltochter, Ur-Grossmutter, Gattin, Mitschwester, Eltern, Nachbarin, Schwägerin | Freundin, Jungen, Baby-Mädchen, Tochter, Frau, Mutter, Kollegin, Mächen |
| 0.00757 | 0.04704 | 0.00192 | |
| Effect Size (d) | 2.17228637 | 0.71880645 | 0.48409739 |
We report p-values (p) and absolute value of effect size (d).