Literature DB >> 35789726

Poor reproducibility of percentage of normally shaped sperm using the World Health Organization Fifth Edition strict grading criteria.

Karen C Baker1, Anne Z Steiner2, Karl R Hansen3, Kurt T Barnhart4, Marcelle I Cedars5, Richard S Legro6, Michael P Diamond7, Stephen A Krawetz8, Rebecca Usadi9, Valerie L Baker10, R Matthew Coward11, Fangbai Sun12, Robert Wild13, Puneet Masson14, James F Smith15, Nanette Santoro16, Heping Zhang17.   

Abstract

Objective: To determine the reproducibility of the World Health Organization Fifth Edition (WHO5) strict grading methodology by comparing the percentage of morphologically normal sperm (PNS) recorded by the core laboratory with results obtained at the fertility centers participating in a multisite clinical trial. Design: Secondary cohort analysis of data from the Males, Antioxidants, and Infertility trial. Setting: Fertility centers. Patients: Semen values of 171 men participating in a multicenter, double-blind, randomized, placebo-controlled trial evaluating the effect of antioxidants on male fertility. Interventions: Not applicable. Main Outcome Measures: Strict morphology expressed as PNS as determined at each fertility center and the core central laboratory for the same semen sample.
Results: No correlation was found in the PNS values for the same semen sample between the core laboratory and fertility center laboratories either as a group or by individual site. Interobserver agreement was similarly low (κ = 0.05 and 0.15) between the core and fertility laboratories as a group for strict morphology, categorized by the WHO5 lower reference limits of 4% and 0, respectively. Moderate agreement was found between the core and 2 individual fertility laboratories for the cutoff value of 0 (κ = 0.42 and 0.57). The remainder of the comparisons demonstrated poor to fair agreement. Conclusions: Strict morphology grading using the WHO5 methodology demonstrated overall poor reproducibility among a cohort of experienced fertility laboratories. This lack of correlation and agreement in the PNS values calls into question the reproducibility, and thereby the potential applicability, of sperm strict morphology testing.
© 2022 The Authors.

Entities:  

Keywords:  Semen analysis; male factor infertility; quality control; spermatozoa; teratozoospermia

Year:  2022        PMID: 35789726      PMCID: PMC9250115          DOI: 10.1016/j.xfre.2022.03.003

Source DB:  PubMed          Journal:  F S Rep        ISSN: 2666-3341


The semen analysis is the standard test for quantifying male reproductive fitness because of its accessibility, lack of invasiveness, and low cost. In addition to semen volume, sperm concentration, and assessment of sperm motility, the grading of sperm morphology, typically expressed as the percentage of morphologically normal sperm (PNS), is commonly reported as part of a standard semen analysis. In 2010, the World Health Organization (WHO) endorsed the use of “strict” grading criteria and adopted a threshold value of 4% as the lower reference limit for PNS—citing “evidence supporting the relationship between the percentage of normal forms […] and fertilization rates in vivo” (1). The methodology is detailed in the WHO Laboratory Manual for the Examination and Processing of Human Semen, Fifth Edition (WHO5) (1). In brief, semen samples are mixed, fixed on duplicate slides, and stained. A total of 200 individual, randomly selected sperm per slide are graded as normal or abnormal based on the presence, size, and/or appearance of the head, midpiece, principle piece (tail), and excessive retained cytoplasm. The replicated slide is graded, and the PNS is calculated. The broad overlap of semen parameters between fertile and infertile couples is a recognized limitation of semen analysis. Despite efforts to standardize methodology for grading sperm morphology, publications examining the impact morphology on spontaneous pregnancy, intrauterine insemination, and in vitro fertilization reach varying conclusions (2, 3, 4, 5, 6, 7, 8, 9, 10). Likewise, outcomes are inconsistent for studies correlating lifestyle and environmental exposures with sperm morphology (11, 12, 13, 14, 15). These divergent results are distressing to couples seeking fertility care and potentially confusing to practitioners. Additionally, the lay press, government, and scientific community increasingly recognize that inconsistent outcomes have the potential to erode confidence in both the scientific method and investigators (16, 17, 18, 19). One potential cause for variable outcomes reported for sperm morphology is poor reproducibility of the strict grading method. For the purposes of this manuscript, reproducibility is defined as the ability to duplicate the results of a prior study/test using the same material and procedures used by the original investigator (20). This definition aligns with the recommendations of the National Academies of Sciences, Engineering, and Medicine, which defines reproducibility as “obtaining consistent results using the same input data, computational steps, methods, and code; and conditions of analysis” (21). “Reproducibility is the minimum necessary condition for a finding to be believable and informative”(20), and as such, it is appropriate to ask if strict grading methodology delivers consistent results across a panel of experienced fertility laboratories. The primary objective of this study was to investigate the reproducibility of the WHO5 strict sperm morphology grading by examining the degree of agreement between the PNS reported by the core laboratory and the values reported for the same semen sample by the site laboratories during the Reproductive Medicine Network (RMN)’s Males, Antioxidants, and Infertility (MOXI) trial. The secondary objective was to investigate patient factors associated with teratospermia.

Materials and methods

We performed a secondary analysis of data from the RMN MOXI trial (22). The full details of the MOXI trial are available at ClinicalTrials.gov (NCT02421887). In brief, the MOXI trial was a multicenter, randomized, double-blind, placebo-controlled trial analyzing the effect of antioxidants on semen parameters among couples with mild male factor infertility. Participants completed an extensive questionnaire that included tobacco, alcohol, and drug use; presence and laterality of varicocele; occupation; and exposures to pesticides, toxic chemicals, radiation, and heat. Participants provided semen samples to their fertility site at time of the randomization (visit 1) and after 90 days of treatment (visit 3). Participants received standardized instructions for precollection abstinence and collection methods. Standard semen analyses, including sperm morphology, were performed at each clinical site’s andrology laboratory using the WHO5 methodology. Semen smears were then shipped to the RMN core laboratory for centralized grading of sperm morphology using the WHO5 “strict” criteria. All study sites are College of American Pathologists certified and perform internal quality control for sperm morphology. Both local and central laboratories were blinded to the treatment assignment of all participants. Approval for the study was obtained from the University of Pennsylvania, which served as the single institutional review board for each site, with additional local site review. Our study cohort is comprised of MOXI participants with strict morphology graded by the core laboratory at visit 1. We analyzed pairs of strict morphology values by comparing the PNS as graded by the core (PNScore) to the PNS as graded by the RMN clinical site laboratories (PNSsite) for the semen samples submitted by our cohort for MOXI visits 1 and 3. In additional to the PNS values, the pairs were also analyzed by the cutoff values of <4% or ≥4% and 0 or >0, which correspond to the threshold values commonly used in clinical practice (4). Continuous data were expressed as mean ± standard deviation and analyzed using the one-way ANOVA if the data were normally distributed. Otherwise, they were expressed as median with interquartile range and analyzed using the Kruskal-Wallis test. Categorical variables were presented as number and frequency. The relationship between the PNScore and PNSsite was investigated using the Pearson’s correlation and Spearman’s correlation for normally and not normally distributed data, respectively. Strong correlation was defined as r ≥ +0.7. Agreement was calculated by simple κ, and meaningful agreement was defined a κ > 0.6. Univariable analyses were performed to evaluate the influence of patient characteristics, lifestyle and occupation exposures, and/or geographic location on the PNScore. Ethnicity, category of baseline semen abnormality, and occupational category were excluded from the overall analysis due to insufficient numbers. Race, pesticide exposure, radiation exposure, and study site were excluded from the analysis of the PNS of 0 due to insufficient numbers. Subsequently, a multivariable logistic regression model was created including the variables found to be significantly associated with the PNScore in the univariable analyses. Multivariable logistic regression analysis was conducted in a stepwise fashion, with a P value of <.2 to enter and P value of <.05 to stay. Due to the relatively large number of sites, the site variable was treated as a random effect in the multivariable analysis and was shown to be insignificant. Tables are presented with odds ratios and the corresponding 95% confidence intervals for the predictors for the logistic regression analysis. A P value of <.05 was considered statistically significant (23). All statistical tests were two-sided. Analyses were performed with SAS, version 9.4 (SAS Institute, Cary NC).

Results

Baseline Characteristics of the Cohort

A total of 171 male subjects participated in the MOXI trial, of whom 126 had PNScore recorded in the dataset for visit 1 at the termination of the trial. These 126 subjects constitute the cohort for this study. Detailed descriptive statistics of the cohort are available in Supplemental Table 1 (available online). The mean age and median body mass index of subjects were 33.6 years and 28.6, respectively. The cohort was predominantly white (75.4%), college or higher educated (69.1%), and from households with an annual income of >75K (59.5%). The majority had primary infertility (65.9%), had no previous fertility treatment (74.6%), and did not currently smoke (88.9%). Nineteen percent of subjects reported an occupation potentially associated with gonadotoxic exposure (e.g., mining/extraction, healthcare, and farming). Less than 10% of subjects reported varicocele, recreational drug use, or known exposure to pesticides, toxic chemicals, radiation, or heat. Forty-five percent of subjects had >1 semen parameter below the WHO5 lower reference limit at enrollment in the MOXI trial, whereas 39.7% of subjects had isolated teratospermia (PNS ≤ 4). The median PNScore at visit 1 was 5% (interquartile range, 3%–9%) (Table 1). Thirty-five percent of participants had a PNScore of <4%, and 9.5% had a PNScore of 0. The median PNScore did not differ significantly between the sites. The prevalence of teratospermia, defined by the threshold value of either <4 or 0, did not differ between the sites.
Table 1

Percentage of morphologically normal sperm at visit 1.

VariablesAll sites combined
Site 1
Site 2
Site 3
Site 4
Site 5
Site 6
Site 7
Site 8
Site 9
N = 126N = 27N = 7N = 15N = 9N = 4N = 12N = 35N = 10N = 7
PNS, median (range)5.0 (3.0, 9.0)5.0 (3.0, 8.0)3.0 (2.0, 7.0)8.0 (2.0, 11.0)5.0 (4.0, 5.0)5.0 (2.5, 6.5)4.5 (3.5, 9.5)5.0 (2.0, 9.0)6.5 (5.0, 10.0)8.0 (0.0, 19.0)
PNS < 4%, n (%)43 (34.1)11 (40.7)4 (57.1)6 (40.0)1 (11.1)1 (25.0)3 (25.0)13 (37.1)2 (20.0)2 (28.6)
PNS ≥ 4%, n (%)83 (65.9)16 (59.3)3 (42.9)9 (60.0)8 (88.9)3 (75.0)9 (75.0)22 (62.9)8 (80.0)5 (71.4)
PNS > 0, n (%)114 (90.5)26 (96.3)6 (85.7)14 (93.3)9 (100.0)3 (75.0)12 (100.0)29 (82.9)10 (100.0)5 (71.4)
PNS = 0, n (%)12 (9.5)1 (3.7)1 (14.3)1 (6.7)0 (0.0)1 (25.0)0 (0.0)6 (17.1)0 (0.0)2 (28.6)

Note: The percentage of morphologically normal sperm by strict grading criteria as assessed by core laboratory at Males, Antioxidants, and Infertility trial visit 1. The difference among sites did not reach statistical significance. PNS = percentage of morphologically normal sperm.

Percentage of morphologically normal sperm at visit 1. Note: The percentage of morphologically normal sperm by strict grading criteria as assessed by core laboratory at Males, Antioxidants, and Infertility trial visit 1. The difference among sites did not reach statistical significance. PNS = percentage of morphologically normal sperm. There were no statistically significant differences in patient characteristics, semen parameters at enrollment, or treatment group assignments across the RMN sites (Supplemental Tables 1 and 2).

Reproducibility of Strict Morphology Grading

At the time the MOXI trial was terminated, the dataset contained 110 pairs of PNS values consisting of 48 pairs of PNScore and PNSsite from visit 1 and 62 pairs of PNScore and PNSsite from visit 3. These pairs represent 77.8% of participants, 43.6% of semen samples, and 6 of the 9 fertility sites. We found no correlation between the paired PNScore and PNSsite scores overall (Table 2). Likewise, analysis by site demonstrated poor and nonsignificant correlation between strict morphology values reported by the core and those reported individual fertility sites (Table 2).
Table 2

Correlation of the percentage of morphologically normal sperm between the core and site laboratories for all visits.

Core laboratory
# SpecimensCorrelation
All sites110r = 0.124a, P =.20
Site laboratorySite 131r = 0.071b, P =.71
Site 211r = 0.435b, P =.18
Site 329r = 0.040b, P =.84
Site 78r = 0.025a, P =.95
Site 819r = -0.200b, P =.41
Site 912r = 0.432a, P =.16

Note: The results of strict morphology graded at sites 4, 5, and 6 were not incorporated into the dataset before the termination of the Males, Antioxidants, and Infertility trial.

Spearman’s correlation.

Pearson’s correlation.

Correlation of the percentage of morphologically normal sperm between the core and site laboratories for all visits. Note: The results of strict morphology graded at sites 4, 5, and 6 were not incorporated into the dataset before the termination of the Males, Antioxidants, and Infertility trial. Spearman’s correlation. Pearson’s correlation. Teratospermia was then analyzed by the threshold values of PNS of <4% and 0 (Table 3). Interobserver agreement between the core laboratory and the sites as a group was poor for PNS of <4% and 0% (κ = 0.05 and 0.15, respectively). When analyzed by individual fertility center, there was moderate agreement between the core and 2 fertility centers when teratospermia was defined as a PNS of 0 (κ = 0.42 and 0.57). Interobserver agreement was poor to fair for the remainder of the sites and comparisons.
Table 3

Level of agreement for the percentage of morphologically normal sperm between the core and site laboratories for all visits.

Core laboratory
# Specimen<4%0
All sites110κ = 0.053, P=.16κ = 0.146, P=.04
Site laboratorySite 131κ = 0.073, P=.28κ = 0.073, P=.28
Site 211κ = 0.000, P=1.00κ = 0.421, P=.09
Site 329κ = 0.004, P=.96κ = 0.074, P=.29
Site 78κ = 0.040, P=.69κ = 0.000, P=1.00
Site 819κ = 0.000, P=1.00κ = 0.000, P=1.00
Site 912κ = 0.125, P=.37κ = 0.571, P=.03

Note: The results of strict morphology graded at sites 4, 5, and 6 were not incorporated into the dataset before the termination of the Males, Antioxidants, and Infertility trial. κ = simple kappa coefficient.

Level of agreement for the percentage of morphologically normal sperm between the core and site laboratories for all visits. Note: The results of strict morphology graded at sites 4, 5, and 6 were not incorporated into the dataset before the termination of the Males, Antioxidants, and Infertility trial. κ = simple kappa coefficient.

Analysis of Variables Associated with PNS

Univariable and multivariable analyses found no association found between the PNScore of <4% and any variable (data not shown). Younger age and self-reported exposure to toxic chemicals were associated with a PNScore of 0 during univariable analysis; however, only age remained significant during multivariable analysis (Supplemental Table 3, available online). Notably, there was no association between fertility center site and teratospermia.

Discussion

This study demonstrates poor reproducibility of strict sperm morphology values between a central core laboratory and a cohort of experienced, licensed andrology laboratories grading the same semen samples. Not only did we find no correlation between strict morphology values between the core laboratory and the fertility center laboratories either as a group or by individual site, but we also found no agreement between the core and site laboratories for teratospermia defined as a PNS of <4% and only poor agreement overall for teratospermia defined as a PNS of 0. Of note, teratospermia defined as a PNScore of <4% was not associated with any patient characteristics in our dataset. Teratospermia defined as a PNScore of 0 was associated with younger age and self-reported toxic chemical exposure during univariable analysis; however, only age remained a significant association during multivariable analysis. In light of the poor reproducibility of morphology grading, the clinical applicability of these associations should be interpreted with caution. Our results show that assessing sperm morphology appears to remain a highly subjective exercise despite adoption of the strict grading criteria by the WHO in 1999 and addition of the WHO5 lower threshold value for normal morphology of ≥4% in 2010. Our finding is consistent with several studies that have documented high interobserver variability when assessing sperm morphology with the strict grading criteria. In 2016, Punjabi et al. (24) published the outcomes of over 100 Belgian laboratories that participated in a thrice yearly voluntary external quality control (EQC) program for semen analysis over a 15-year period spanning the publication of WHO5. Two centrally prepared air-dried semen smears were sent to each laboratory per EQC event to assess the performance of strict morphology grading. The overall coefficient of variation (CV) for PNS was 79.4% for the duration of the study. The investigators noted that performance improved after adoption of the WHO5 methods; however, the variation in morphology grading remained “unacceptably high” with an extrapolated PNS CV of ≥40% in the years after 2010. It is notable that 20% of the results were discordant in the final year of the study when analyzed by PNS above or below the WHO5 threshold value. Similarly, Filimberti et al. (25) analyzed the outcomes of 56 Tuscan laboratories that participated in a dedicated training program in the WHO5 methods and found that the PNS CV was 88.6% despite targeted training. The investigators concluded that despite improvements in performance with training “the course was not sufficient to limit variability in the results of morphology, as the overall average CV of the laboratories remained very high.” These studies highlight the marked variation in grading sperm morphology under “real-world” conditions despite adoption of the WHO5 methodology and participating in an EQC program. In 2014, Wang et al. (26) reported good agreement for PNS but moderate or worse agreement for categorization of sperm defects among experienced graders. In brief, high-resolution pictures of 5,296 sperm from healthy donors were sent to 3 experienced graders at 3 different high volume centers. Each grader individually scored all sperm in accordance with the WHO5 criteria. The investigators found that the mean PNS was 20.87% and the CV was 4.8%. Agreement among the graders for the overall PNS was good (κ = 0.47–0.52). There was marked variation in scoring among graders based on by defect category (e.g., head defect) and specific defect type (e.g., tapered head) with CV varying between 6.8% and 15.6% for defect category and 11.2% and 133% for specific defect. This large degree of variation underscores the subjectivity inherent in grading sperm morphology even when experienced graders are assessing the same individual spermatozoa. Our dataset demonstrated no association between teratospermia and geographic location (as defined by a fertility center). This finding echoes the results of Swan et al. (15) who found no difference in the PNS among 4 US cities despite a difference in concentration and motility. Likewise, Auger et al. (14) compared the morphology values for 1,001 men from 4 European cities and found no difference in the PNS. The investigators did find an association between poorer sperm morphology and occupational and lifestyle exposures (e.g., metal welding, alcohol consumption, and chemical spraying). In contrast, our analysis found no association between lifestyle exposure and a PNS of <4%. Younger age was associated with the absence of any morphologically normal sperm in our dataset during both univariable and multivariable analyses. As previously mentioned, exposure to toxic chemicals was associated with higher odds of a PNScore of 0 during univariable analysis, but this association did not remain significant during multivariable analysis. These associations should be interpreted with caution, however, given that the “true value” for PNS cannot be established due to the poor reproducibility of strict morphology grading between the core and site laboratories.

Limitations

This study is a secondary analysis—the primary trial was not designed to evaluate the reproducibility of morphological grading. Because the original MOXI trial was terminated early in accordance with a prespecified internal pilot study, our dataset was limited by the absence of data pairs because some PNSsite values were not entered into the dataset before data lock. The identity of the interpreting technician was not recorded at the core or site laboratories; therefore, the analysis of the performance of individual technicians is not possible. All sites were certified andrology laboratories with internal quality assurance/quality improvement programs, and there is ample literature assessing individual performance of sperm morphological grading using both the WHO5 and other grading criteria for experienced and inexperienced graders (14, 24, 25, 26, 27). Therefore, we felt it was both timely and valuable to focus the reproducibility of the WHO5 method, rather than further scrutinize individual laboratory personnel. The broader applicability of our results is based on our presumption that the performance of strict morphology grading at RMN facilities is not inferior to other fertility centers and laboratories. Despite these limitations and presumptions, this study mirrors “real-world” conditions as training and quality control practices differ among laboratories, and the resulting variations in morphological grading may influence both generalizability of reproductive science and its applicability to patient care.

Conclusion

In conclusion, our analysis demonstrates that almost 10 years after the adoption the WHO5, inconsistencies remain in the scoring of sperm morphology, even among a cohort of highly experienced fertility laboratories grading the same semen smears. Our study found no correlation in PNS as a continuous variable and little to no agreement in PNS for clinically meaningful categories as defined by the WHO threshold value and by the complete absence of morphologically normal sperm. The poor reproducibility of strict sperm morphology grading calls into question the applicability of morphology values between laboratories and, by extension, the generalizability of strict morphology in assessing male reproductive potential and predicting treatment outcomes. Combined with recent publications demonstrating fecundity in the absence of morphologically normal sperm, the clinical relevance of strict sperm morphology seems increasingly uncertain.
  21 in total

Review 1.  The relationship between isolated teratozoospermia and clinical pregnancy after in vitro fertilization with or without intracytoplasmic sperm injection: a systematic review and meta-analysis.

Authors:  James M Hotaling; James F Smith; Mitchell Rosen; Charles H Muller; Thomas J Walsh
Journal:  Fertil Steril       Date:  2010-10-28       Impact factor: 7.329

2.  The impact of sperm morphology on the outcome of intrauterine insemination cycles with gonadotropins in unexplained and male subfertility.

Authors:  Mehmet Erdem; Ahmet Erdem; Mehmet Firat Mutlu; Seckin Ozisik; Sule Yildiz; Ismail Guler; Cengiz Karakaya
Journal:  Eur J Obstet Gynecol Reprod Biol       Date:  2015-12-19       Impact factor: 2.435

3.  Phthalate exposure, even below US EPA reference doses, was associated with semen quality and reproductive hormones: Prospective MARHCS study in general population.

Authors:  Qing Chen; Huan Yang; Niya Zhou; Lei Sun; Huaqiong Bao; Lu Tan; Hongqiang Chen; Xi Ling; Guowei Zhang; Linping Huang; Lianbing Li; Mingfu Ma; Hao Yang; Xiaogang Wang; Peng Zou; Kaige Peng; Taixiu Liu; Xiefei Shi; Dejian Feng; Ziyuan Zhou; Lin Ao; Zhihong Cui; Jia Cao
Journal:  Environ Int       Date:  2017-04-25       Impact factor: 9.621

4.  High variability in results of semen analysis in andrology laboratories in Tuscany (Italy): the experience of an external quality control (EQC) programme.

Authors:  E Filimberti; S Degl'Innocenti; M Borsotti; M Quercioli; P Piomboni; I Natali; M G Fino; C Caglieresi; L Criscuoli; L Gandini; A Biggeri; M Maggi; E Baldi
Journal:  Andrology       Date:  2013-01-11       Impact factor: 3.842

5.  Predictive value of abnormal sperm morphology in in vitro fertilization.

Authors:  T F Kruger; A A Acosta; K F Simmons; R J Swanson; J F Matta; S Oehninger
Journal:  Fertil Steril       Date:  1988-01       Impact factor: 7.329

6.  Fifteen years of Belgian experience with external quality assessment of semen analysis.

Authors:  U Punjabi; C Wyns; A Mahmoud; K Vernelen; B China; G Verheyen
Journal:  Andrology       Date:  2016-07-13       Impact factor: 3.842

7.  Predictive value of sperm morphology and progressively motile sperm count for pregnancy outcomes in intrauterine insemination.

Authors:  Louise Lemmens; Snjezana Kos; Cornelis Beijer; Jacoline W Brinkman; Frans A L van der Horst; Leonie van den Hoven; Dorit C Kieslinger; Netty J van Trooyen-van Vrouwerff; Albert Wolthuis; Jan C M Hendriks; Alex M M Wetzels
Journal:  Fertil Steril       Date:  2016-03-02       Impact factor: 7.329

8.  Sperm morphology as diagnosed by strict criteria: probing the impact of teratozoospermia on fertilization rate and pregnancy outcome in a large in vitro fertilization population.

Authors:  D R Grow; S Oehninger; H J Seltman; J P Toner; R J Swanson; T F Kruger; S J Muasher
Journal:  Fertil Steril       Date:  1994-09       Impact factor: 7.329

9.  Associations between urinary phthalate concentrations and semen quality parameters in a general population.

Authors:  M S Bloom; B W Whitcomb; Z Chen; A Ye; K Kannan; G M Buck Louis
Journal:  Hum Reprod       Date:  2015-09-07       Impact factor: 6.918

10.  Men with a complete absence of normal sperm morphology exhibit high rates of success without assisted reproduction.

Authors:  Jason R Kovac; Ryan P Smith; Miguel Cajipe; Dolores J Lamb; Larry I Lipshultz
Journal:  Asian J Androl       Date:  2017 Jan-Feb       Impact factor: 3.285

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.