| Literature DB >> 34629970 |
Abstract
OBJECTIVE: To evaluate the performance of gender detection tools that allow the uploading of files (e.g., Excel or CSV files) containing first names, are usable by researchers without advanced computer skills, and are at least partially free of charge.Entities:
Keywords: accuracy; gender detection; misclassification; name; name-to-gender; performance
Mesh:
Year: 2021 PMID: 34629970 PMCID: PMC8485937 DOI: 10.5195/jmla.2021.1185
Source DB: PubMed Journal: J Med Libr Assoc ISSN: 1536-5050
Confusion matrix showing six possible classification outcomes
| Female (predicted) | Male (predicted) | Unknown (predicted) | |
|---|---|---|---|
|
| ff | fm | fu |
|
| mf | mm | mu |
Confusion matrices for gender detection tools (n=6,131 physicians)
| Gender detection tool | Classified as female physicians n (%) | Classified as male physicians n (%) | Nonclassified physicians n (%) |
|---|---|---|---|
| Gender API | |||
| Female physicians | 3006 (97.4) | 67 (2.2) | 12 (0.4) |
| Male physicians | 23 (0.8) | 3014 (98.9) | 9 (0.3) |
| NamSor | |||
| Female physicians | 3031 (98.2) | 54 (1.8) | 0 (0.0) |
| Male physicians | 70 (2.3) | 2976 (97.7) | 0 (0.0) |
| Wiki-Gendersort | |||
| Female physicians | 2832 (91.8) | 85 (2.8) | 168 (5.4) |
| Male physicians | 43 (1.4) | 2895 (95.0) | 108 (3.6) |
| genderize.io | |||
| Female physicians | 2519 (81.7) | 59 (1.9) | 507 (16.4) |
| Male physicians | 17 (0.6) | 2529 (83.0) | 500 (16.4) |
Performance metrics for gender detection tools (n=6,131 physicians)
| Gender detection tool | errorCoded | errorCodedWithoutNA | naCoded | errorGenderBias |
|---|---|---|---|---|
| Gender API | 0.0181 | 0.0147 | 0.0034 | -0.0072 |
| NamSor | 0.0202 | 0.0202 | 0.0000 | 0.0026 |
| Wiki-Gendersort | 0.0659 | 0.0219 | 0.0450 | -0.0072 |
| genderize.io | 0.1766 | 0.0148 | 0.1643 | -0.0082 |
Confusion matrices for gender detection tools after removing duplicates (i.e., physicians with identical first names and gender) (n=3,013 physicians)
| Gender detection tool | Classified as female physicians n (%) | Classified as male physicians n (%) | Nonclassified physicians n (%) |
|---|---|---|---|
| Gender API | |||
| Female physicians | 1551 (96.2) | 49 (3.0) | 12 (0.8) |
| Male physicians | 14 (1.0) | 1379 (98.4) | 8 (0.6) |
| NamSor | |||
| Female physicians | 1564 (97.0) | 48 (3.0) | 0 (0.0) |
| Male physicians | 44 (3.1) | 1357 (96.9) | 0 (0.0) |
| Wiki-Gendersort | |||
| Female physicians | 1421 (88.2) | 54 (3.3) | 137 (8.5) |
| Male physicians | 30 (2.1) | 1303 (93.0) | 68 (4.9) |
| genderize.io | |||
| Female physicians | 1173 (72.8) | 38 (2.3) | 401 (24.9) |
| Male physicians | 16 (1.1) | 992 (70.8) | 393 (28.1) |
Performance metrics for gender detection tools, after removing duplicates (i.e. physicians with identical first names and gender) (n=3,013 physicians)
| Gender detection tool | errorCoded | errorCodedWithoutNA | naCoded | errorGenderBias |
|---|---|---|---|---|
| Gender API | 0.0276 | 0.0211 | 0.0066 | -0.0117 |
| NamSor | 0.0305 | 0.0305 | 0.0000 | 0.0013 |
| Wiki-Gendersort | 0.0959 | 0.0299 | 0.0680 | -0.0086 |
| genderize.io | 0.2815 | 0.0243 | 0.2635 | -0.0099 |
Origin of physicians' first names (n=6,131 physicians)
| Origin | N |
|---|---|
| French-speaking country | 1679 (32.2) |
| English-speaking country | 751 (14.4) |
| Spanish-speaking country | 404 (7.7) |
| Asian country | 344 (6.6) |
| Eastern European country | 324 (6.2) |
| Italian-speaking country | 288 (5.5) |
| Western European country | 272 (5.2) |
| Arabic-speaking country | 259 (5.0) |
| German-speaking country | 259 (5.0) |
| Northern European country | 220 (4.2) |
| Southern European country | 217 (4.2) |
| Portuguese-speaking country | 198 (3.8) |
The total number of physicians does not add up to 6,131 because of missing values (no assignments for 916 physicians (14.9%))
If not already classified in another group (e.g., the Arabic-speaking country group for some Asian countries)