| Literature DB >> 35440899 |
Abstract
Objective: We recently showed that the gender detection tools NamSor, Gender API, and Wiki-Gendersort accurately predicted the gender of individuals with Western given names. Here, we aimed to evaluate the performance of these tools with Chinese given names in Pinyin format.Entities:
Keywords: Chinese; accuracy; gender detection; misclassification; name; name-to-gender; performance
Mesh:
Year: 2022 PMID: 35440899 PMCID: PMC9014919 DOI: 10.5195/jmla.2022.1289
Source DB: PubMed Journal: J Med Libr Assoc ISSN: 1536-5050
Confusion matrix showing six possible classification outcomes
| Female (predicted) | Male (predicted) | Unknown (predicted) | |
|---|---|---|---|
| Female (actual) | ff | fm | fu |
| Male (actual) | mf | mm | mu |
Confusion matrices for gender detection tools (n=20,000 given names)
| Gender detection tool | Classified as women n (%) | Classified as men n (%) | Not classified n (%) |
|---|---|---|---|
| Gender API | |||
| Women | 1,836 (22.8) | 3,066 (38.1) | 3,142 (39.1) |
| Men | 1,974 (16.5) | 5,173 (43.3) | 4,809 (40.2) |
| NamSor | |||
| Women | 1,545 (19.2) | 4,869 (60.5) | 1,630 (20.3) |
| Men | 1,635 (13.7) | 7,950 (66.5) | 2,371 (19.8) |
| Wiki-Gendersort | |||
| Women | 806 (10.0) | 771 (9.6) | 6,467 (80.4) |
| Men | 941 (7.9) | 1,140 (9.5) | 9,875 (82.6) |
Performance metrics for gender detection tools (n=20,000 given names)
| Gender detection tool | errorCoded | errorCodedWithoutNA | naCoded | errorGenderBias |
|---|---|---|---|---|
| Gender API | 0.6496 | 0.4183 | 0.3976 | −0.0906 |
| NamSor | 0.5253 | 0.4065 | 0.2001 | −0.2021 |
| Wiki-Gendersort | 0.9027 | 0.4680 | 0.8171 | 0.0465 |
Confusion matrices for gender detection tools after removing all unisex names we were able to identify (n=9,077 given names)
| Gender detection tool | Classified as women n (%) | Classified as men n (%) | Not classified n (%) |
|---|---|---|---|
| Gender API | |||
| Women | 347 (13.8) | 544 (21.7) | 1,616 (64.5) |
| Men | 472 (7.2) | 2,777 (42.3) | 3,321 (50.5) |
| NamSor | |||
| Women | 409 (16.3) | 1,585 (63.2) | 513 (20.5) |
| Men | 501 (7.6) | 4,791 (72.9) | 1,278 (19.5) |
| Wiki-Gendersort | |||
| Women | 95 (3.8) | 80 (3.2) | 2,332 (93.0) |
| Men | 244 (3.7) | 451 (6.9) | 5,875 (89.4) |
Performance metrics for gender detection tools after removing all unisex names we were able to identify (n=9,077 given names)
| Gender detection tool | errorCoded | errorCodedWithoutNA | naCoded | errorGenderBias |
|---|---|---|---|---|
| Gender API | 0.6558 | 0.2454 | 0.5439 | −0.0174 |
| NamSor | 0.4271 | 0.2863 | 0.1973 | −0.1488 |
| Wiki-Gendersort | 0.9400 | 0.3724 | 0.9042 | 0.1885 |