Literature DB >> 24903202

Prognostic models for high and low ovarian responses in controlled ovarian stimulation using a GnRH antagonist protocol.

Frank J Broekmans¹, Pierre J M Verweij², Marinus J C Eijkemans³, Bernadette M J L Mannaerts², Han Witjes².

Abstract

STUDY QUESTION: Can predictors of low and high ovarian responses be identified in patients undergoing controlled ovarian stimulation (COS) in a GnRH antagonist protocol? SUMMARY ANSWER: Common prognostic factors for high and low ovarian responses were female age, antral follicle count (AFC) and basal serum FSH and LH. WHAT IS KNOWN ALREADY: Predictors of ovarian response have been identified in GnRH agonist protocols. With the introduction of GnRH antagonists to prevent premature LH rises during COS, and the gradual shift in use of long GnRH agonist to short GnRH antagonist protocols, there is a need for data on the predictability of ovarian response in GnRH antagonist cycles. STUDY DESIGN, SIZE, DURATION: A retrospective analysis of data from the Engage trial and validation with the Xpect trial. Prognostic models were constructed for high (>18 oocytes retrieved) and low (<6 oocytes retrieved) ovarian response. Model building was based on the recombinant FSH (rFSH) arm (n = 747) of the Engage trial. Multivariable logistic regression models were constructed in a stepwise fashion (P < 0.15 for entry). Validation based on calibration was performed in patients with equivalent treatment (n = 199) in the Xpect trial. PARTICIPANTS/MATERIALS, SETTING,
METHODS: Infertile women with an indication for COS prior to IVF. The Engage and Xpect trials included patients of similar ethnic origins from North America and Europe who had regular menstrual cycles. The main causes of infertility were male factor, tubal factor and endometriosis. MAIN RESULTS AND THE ROLE OF CHANCE: In the Engage trial, 18.3% of patients had a high and 12.7% had a low ovarian response. Age, AFC, serum FSH and serum LH at stimulation Day 1 were prognostic for both high and low ovarian responses. Higher AFC and LH were associated with an increased chance of high ovarian response. Older age and higher FSH correlated with an increased chance of low ovarian response. Region (North America/Europe) and BMI were prognostic for high ovarian response, and serum estradiol at stimulation Day 1 was associated with low ovarian response. The area under the receiver operating characteristic (ROC) curve (AUC) for the model for a high ovarian response was 0.82. Sensitivity and specificity were 0.82 and 0.73; positive and negative predictive values were 0.40 and 0.95, respectively. The AUC for the model for a low ovarian response was 0.80. Sensitivity and specificity were 0.77 and 0.73, respectively; positive and negative predictive values were 0.29 and 0.96, respectively. In Xpect, 19.1% of patients were high ovarian responders and 16.1% were low ovarian responders. The slope of the calibration line was 0.81 and 1.35 for high and low ovarian responses, respectively, both not statistically different from 1.0. In summary, common prognostic factors for high and low ovarian responses were female age, AFC and basal serum FSH and LH. Simple multivariable models are presented that are able to predict both a too low or too high ovarian response in patients treated with a GnRH antagonist protocol and daily rFSH. LIMITATIONS, REASONS FOR CAUTION: Anti-Müllerian hormone was not included in the prediction modelling. WIDER IMPLICATIONS OF THE
FINDINGS: The findings will help with the identification of patients at risk of a too high or too low ovarian response and individualization of COS treatment. STUDY FUNDING/COMPETING INTERESTS: Financial support for this study and the editorial work was provided by Merck, Sharp & Dohme Corp. (MSD), a subsidiary of Merck & Co. Inc., Whitehouse Station, NJ, USA. F.J.B. received a grant from CVZ to his institution; P.J.M.V. and H.W. are employees of MSD, and B.M.J.L.M. was an employee of MSD at the time of development of this manuscript. TRIAL REGISTRATION NUMBERS: NCT 00696800 and NCT00778999.

Entities: Chemical Disease Gene Species

Keywords: GnRH antagonist; ovarian response; predictive modelling; recombinant FSH

Mesh：

Substances：

Year: 2014 PMID： 24903202 PMCID： PMC4093990 DOI： 10.1093/humrep/deu090

Source DB: PubMed Journal: Hum Reprod ISSN： 0268-1161 Impact factor: 6.918

Introduction

In assisted reproduction treatment (ART) an optimal response to controlled ovarian stimulation (COS) is of crucial importance. Both too low an ovarian response and too high an ovarian response are associated with increased cancellation rates and lower pregnancy rates, and previous literature suggests an optimal range of oocytes below and above which outcomes are compromised (van der Gaast ; Sunkara ). A high ovarian response may also increase the risk of developing ovarian hyperstimulation syndrome (Papanikolaou ). For this reason it is clinically relevant to identify predictors of ovarian response that may enable clinicians to identify patients at risk of a too high or too low ovarian response and to individualize COS treatment for these patients (Fauser ). Moreover, such individualization could be more cost-effective as it could both increase the efficacy and reduce the costs of ART. Many studies have been conducted in the field of ovarian response prediction during the last 10 years (Popovic-Todorovic ) and various predictors for low ovarian response have been proposed (Hendriks ; Verberg ). Broekmans performed a systematic review of these tests and found that antral follicle count (AFC) and basal FSH had the best sensitivity and specificity for predicting low ovarian response, with the recent addition of anti-Müllerian hormone (AMH) as possibly the most reliable predictor (Broer ). More recently, predictors for a high ovarian response have also been identified, with AMH and AFC demonstrating similar sensitivity and specificity (Broer ). However, it should be noted that the majority of this research has been performed in the context of GnRH agonist protocols. The introduction of GnRH antagonists to prevent premature LH rises during COS and the gradual shift of current care from long GnRH agonist to short GnRH antagonist protocols (Kolibianakis ; Al-Inany ) have prompted the need for research on the predictability of ovarian response in GnRH antagonist cycles. A recent prospective study including patients with and without oral contraceptive pretreatment indicated that AMH and basal FSH are statistically significant predictors of both the number of oocytes retrieved and the occurrence of an excessive ovarian response, whereas AMH alone was the main predictor for low ovarian response (Nyboe Andersen ). The aim of this paper is to identify prognostic factors for high and low ovarian responses in COS using the GnRH antagonist protocol. With the identified predictors, simple prognostic models for low and excessive response are constructed from which patient-specific probabilities for either outcome can be derived, as the basis for studies on FSH starting dose adjustment.

Methods

The prognostic models for high and low ovarian responses presented in this paper were developed and validated in different data sets: model building was based on data from the Engage trial (Devroey ), whereas model validation was performed using data from the Xpect trial (Nyboe Andersen ). A high ovarian response was defined as the collection of >18 oocytes at retrieval or cycle cancellation due to high ovarian response, according to trial protocol. A low ovarian response was defined as the retrieval of less than six oocytes or cycle cancellation due to low ovarian response, according to trial protocol.

Data sets

Engage [NCT00696800] was a double-blind, randomized, non-inferiority trial assessing the ongoing pregnancy rates after one injection of 150 µg corifollitropin alfa during the first week of stimulation, compared with daily injections of 200 IU recombinant FSH (rFSH; Puregon Pen, N.V. Organon, The Netherlands) using a standard GnRH antagonist protocol (0.25 mg ganirelix, Orgalutran, N.V. Organon). The intention-to-treat population comprised 1506 subjects with a mean age of 31.5 years and body weight of 68.6 kg. Data from the rFSH arm (750 subjects) of this study were used to construct the models for predicting high and low ovarian responses. The data used in the current analyses reflect minor corrections to the previously published Engage trial data (Devroey ) (see corrigendum Devroey ). Xpect [NCT00778999] was a multinational trial to identify prognostic factors for an ovarian response. Subjects were randomized to receive either OC pretreatment or no OC pretreatment prior to their COS cycle. A treatment regimen of 200 IU rFSH and 0.25 mg GnRH antagonist was applied during the COS cycle (i.e. the same as in the daily rFSH arm of the Engage study). The intention-to-treat population consisted of 408 subjects of similar age and body weight as in Engage (mean, 31.7 years and 64.8 kg, respectively). Data from the non-OC arm (199 subjects) were used to validate the models for high and low ovarian responses. The two studies had similar inclusion and exclusion criteria which allowed only patients with regular menstrual cycles to be included and were conducted in the same time frame (2006–2007 for Engage and 2006–2008 for Xpect). Ethnicity was also similar in Engage (86.7% White, 3.6% Black, 2.8% Asian; 6.8% ‘Other’) and Xpect (91.5% White, 2.0% Black, 5.0% Asian; 1.5% ‘Other’). Finally, both studies included subjects from Europe (n = 347 and n = 101 in the relevant arms of Engage and Xpect, respectively) as well as North America (n = 403 and n = 98 in Engage and Xpect, respectively). Validated immunoassays were performed at a central laboratory to measure serum levels of FSH, LH, inhibin B, estradiol (E2) and progesterone. Levels of FSH, LH, E2 and progesterone were determined by time-resolved fluoroimmunoassay (AutoDelfia® immunofluorometric assay, PerkinElmer Life and Analytical Sciences, Brussels, Belgium) with a coefficient of variation of 10%. Detection limits were 0.25 IU/l, 0.6 IU/l, 49.9 pmol/l and 0.38 ng/ml for FSH, LH, E2 and progesterone, respectively. Serum inhibin B levels were determined by using a validated immunoassay by Diagnostic Systems Laboratories (DSL; Webster, TX, USA) with a coefficient of variation of 10% and a detection limit of 10.0 pg/ml. AMH was only measured in the Xpect trial. Since it was not measured in the Engage trial, AMH could not be considered for inclusion in the prognostic models in the present study.

Model building

Model building was based on data from the rFSH arm of the Engage trial (Devroey ). Since prognostic factors for a high ovarian response may be different from those for a low ovarian response, separate logistic regression models were constructed for these two end-points. Age was included in both models by default. Other candidate prognostic factors or covariates were as follows: For each candidate prognosticator, the association with a high or low ovarian response was assessed using the χ2 test (i.e. the score test in a logistic regression model). After the inclusion of age, covariates were selected using forward selection (P < 0.15 for entry). Backward elimination (P > 0.15 for removal) confirmed the covariate selection for the final model. The number of subjects with missing values for the covariates selected in the final models was limited: 66 in Engage and 26 in Xpect. Missing data were mainly for hormones (54 and 26 subjects in Engage and Xpect, respectively). The fact of whether data were missing or not was not associated with a high or low ovarian response. All subjects were included in the final models with missing covariate values imputed using linear regression (with covariates for age and region), if applicable. No other imputation of missing data was performed, except for setting hormone levels below the lower limit of detection to 0.5 times than the lower limit (as is common practice). First-order interaction terms and quadratic terms were tested, but not found to be statistically significant. Age at menarche (years). Average menstrual cycle length (days). Duration of infertility (years). Alcohol use (self-reported; yes/no). Smoking status (self-reported; yes/no). BMI at baseline (kg/m2). FSH at Day 1 of stimulation (IU/l). LH at Day 1 of stimulation (IU/l). E2 at Day 1 of stimulation (pmol/l). Progesterone at Day 1 of stimulation (nmol/l). Inhibin B at Day 1 of stimulation (pg/ml). AFC at Day 1 of stimulation (number of follicles <11 mm). Total ovarian volume (ml). Study region (North America versus Europe). Previous IVF/ICSI (yes/no). For the final logistic regression model for a high or low ovarian response the receiver operating characteristic (ROC) curve was plotted and the area under the curve (AUC, or c-statistic) was calculated. The ‘optimal’ point on the ROC curve is the one that provides the best trade-off between sensitivity and specificity (i.e. the point that is closest in distance to the upper left-hand corner where sensitivity and specificity are equal to 1). Associated with this point is the ‘optimal’ probability cut-off that provides the best balance between false positives and false negatives for a high (or low) ovarian response. If the predicted probability for a given patient exceeded this optimal cut-off the patient was predicted to become a high (or low) ovarian responder, otherwise not. Sensitivity, specificity, positive predictive value and negative predictive value at the optimal cut-off were calculated. These characteristics are data driven and presumably too optimistic. For this reason the calculated values were denoted as ‘apparent’ AUC, sensitivity, etc. Optimism-corrected values were calculated using leave-one-out cross-validation, i.e. the regression coefficients associated with the ‘final model’ were re-estimated with each subject left out in turn. We then combined the ‘leave-one-out’ regression coefficient with the subject's covariate values in order to mimic the prediction of the outcome for each subject. Finally, a logistic regression model was fitted with the resulting ‘leave-one-out’ prognostic index (PI) as the only covariate in order to obtain the optimism-corrected AUC. Histograms displaying the distribution of the predicted probabilities were plotted separately for high or low ovarian responders and non-high (non-low) responders. Score charts (Hunault ) were constructed for easier application of the two models.

Model validation

A vital aspect of prediction is that a model derived from one data set can be transported to another. ‘The idea of validating a prognostic model is generally taken to mean establishing that it works satisfactorily for patients other than those from whose data the model was derived’ (Altman and Royston, 2000). External model validation was based on the non-OC arm of the Xpect study (Nyboe Andersen ) and focused on two aspects: discrimination and calibration (Leushuis ). Discrimination is the ability of the model to distinguish between subjects with and without the event of interest, in this case between patients with a high or low ovarian response and patients without a high or low response. Discrimination was measured by the area under the ROC curve, the c-statistic. This statistic ranges from 0.5 (no discrimination) to 1 (perfect discrimination) and can be interpreted as the probability that for any discordant pair of subjects (i.e. one subject with the event and one without), the subject with the event has a higher predicted probability than the subject without the event (Harrell ). Calibration refers to correspondence between the predicted probabilities for a high or low ovarian response and the observed proportions. Calibration was assessed visually by comparing predicted probabilities and observed proportions after dividing patients in 10 groups based on their predicted probability and, more formally, by fitting a logistic regression model with a single covariate for the so-called PI, a linear combination of the subject's covariate values and the associated regression coefficients. Ideally, the regression coefficient of the PI is close to 1 and the intercept is close to 0. Usually the regression coefficient is <1, indicating that the impact of the prognostic factors is less strong in new data: the well-known shrinkage phenomenon (Copas, 1983). An intercept different from 0 indicates that the overall event rate (in this case high and low ovarian responses, respectively) in the new data is different from the old data set. All analyses were performed using SAS PC version 9.1. A P < of 0.05 was considered statistically significant.

Results

Descriptive statistics for potential predictors are given in Tables I and II for the Engage and Xpect trials, respectively. Three patients in the Engage trial who discontinued their COS cycle due to an adverse event had a missing outcome and were excluded from the analysis, leaving 747 patients for analysis. A total of 137 patients had a high ovarian response and 95 patients had a low ovarian response, according to the definitions. In Xpect (n = 199), there were 38 high responders and 32 low responders. The percentages of a high ovarian response in Engage and Xpect were similar (18.3 versus 19.1%), but the percentages of low responders were slightly different (12.7 versus 16.1%).

Table I

Descriptive statistics of potential predictors (covariates) for ovarian response in the rFSH arm of the Engage study—overall and by ovarian response category.

Covariate	Overall (n = 747)	Low (n = 95)	Normal (n = 515)	High (n = 137)	P-value*
Covariate	Overall (n = 747)	Low (n = 95)	Normal (n = 515)	High (n = 137)	High versus normal/low	Low versus normal/high
Age at baseline (years)
Mean	31.5	32.8	31.7	30.2	<0.001	<0.001
SD	3.2	2.8	3.1	3.4
Age at menarche (years)
Mean	12.7	12.7	12.7	12.7	0.971	0.545
SD	1.3	1.4	1.3	1.3
Average menstrual cycle length (days)
Mean	28.5	28	28.4	28.8	0.020	0.016
SD	1.7	1.7	1.7	1.7
Duration of infertility (years)
Mean	3.2	3.3	3.2	3.2	0.901	0.731
SD	2.2	2.2	2.2	2.4
Alcohol use (%)	42.3	38.9	44.3	37.2	0.148	0.563
Smoking (%)	8.9	7.4	9.1	8.8	0.987	0.584
BMI at baseline (kg/m²)
Mean	24.8	25.1	24.7	25.2	0.199	0.292
SD	2.7	2.9	2.6	2.8
Region (North America) (%)	53.7	54.7	48.9	70.8	<0.001	0.919
Race (White) (%)	86.7	88.4	87.4	83.2	0.579	0.266
Previous IVF/ICSI (%)	57.3	55.8	58.8	52.6	0.256	0.824
Cause of infertility**
Male factor (%)	46.3	47.4	47	43.1	0.448	0.737
Tubal factor (%)	25.4	18.9	25.6	29.2	0.337	0.107
Endometriosis (%)	15.4	15.8	14	20.4	0.111	0.947
FSH at Day 1 of stimulation (IU/l)^a
Median	6.4	7.6	6.5	5.6	<0.001	<0.001
LH at Day 1 of stimulation (IU/l)^a
Median	4.4	4.1	4.5	4.6	0.043	0.608
E₂ at Day 1 of stimulation (pmol/l)^a
Median	119.3	123	119.3	114.9	0.384	0.042
Progesterone at Day 1 of stimulation (nmol/l)^a
Median	1.7	1.7	1.7	1.8	0.053	0.974
Inhibin B at Day 1 of stimulation (pg/ml)^a
Median	50.3	42.1	49.6	61.4	<0.001	0.003
AFC at Day 1 of stimulation (n)
Mean	12.4	9.5	12.3	15.1	<0.001	<0.001
SD	4.5	9.5	12.3	15.1
Total ovarian volume (ml)^b
Mean	13.2	11.9	12.7	15.8	<0.001	0.065
SD	7.1	11.9	12.7	15.8
^an	693	90	478	125
^bn	627	77	440	120

rFSH, recombinant FSH; E2, estradiol; AFC, antral follicle count.

*From the χ2 score test in a logistic regression model.

**Subjects could have more than one cause.

Table II

Descriptive statistics of potential predictors for an ovarian response in the non-OC arm of the Xpect study (validation set)—overall and by ovarian response category.

Covariate	Overall (n = 199)	Low (n = 32)	Normal (n = 129)	High (n = 38)
Age at baseline (years)
Mean	31.6	33.3	31.6	30.2
SD	4.1	3.3	4.3	3.9
Age at menarche (years)
Mean	12.9	12.6	13.0	12.9
SD	1.5	1.6	1.5	1.5
Average menstrual cycle length (days)
Mean	28.5	27.6	28.5	29.3
SD	1.8	1.4	1.8	1.7
Duration of infertility (years)
Mean	3.7	3.8	3.7	3.4
SD	3.0	3.1	3.1	3.0
Alcohol use (%)	43.2	40.6	47.3	31.6
Smoking (%)	17.1	28.1	14.7	15.8
BMI at baseline (kg/m²)
Mean	23.6	24.0	23.4	23.8
SD	3.4	4.3	3.3	2.9
Region (North America) (%)	49.2	37.5	47.3	65.8
Race (White) (%)	91.5	96.9	90.7	89.5
Previous IVF*	638	71.9	62.0	63.2
Cause of infertility
Male factor (%)	55.3	56.3	57.4	47.4
Tubal factor (%)	19.6	15.6	20.2	21.1
Endometriosis (%)	9.0	9.4	10.1	5.3
FSH at Day 1 of stimulation (IU/l)^a
Median	6.7	8.1	6.7	5.5
LH at Day 1 of stimulation (IU/l)^a
Median	5.0	5.0	5.0	4.8
E₂ at Day 1 of stimulation (pmol/l)^a
Median	100.6	107.5	102.2	91.9
Progesterone at Day 1 of stimulation (nmol/l)^a
Median	1.6	1.7	1.6	1.5
Inhibin B at Day 1 of stimulation (pg/ml)^a
Median	47.9	25.3	49.7	57.2
AFC at Day 1 of stimulation (n)
Mean	11.7	8.5	12.1	13.3
SD	5.9	3.3	5.8	6.7
Total ovarian volume (ml)
Mean	12.0	9.4	12.0	14.1
SD	5.8	4.2	5.4	7.2
^an	173	25	114	34

OC, observed cases.

*Subjects could have more than one cause.

Descriptive statistics of potential predictors (covariates) for ovarian response in the rFSH arm of the Engage study—overall and by ovarian response category. rFSH, recombinant FSH; E2, estradiol; AFC, antral follicle count. *From the χ2 score test in a logistic regression model. **Subjects could have more than one cause. Descriptive statistics of potential predictors for an ovarian response in the non-OC arm of the Xpect study (validation set)—overall and by ovarian response category. OC, observed cases. *Subjects could have more than one cause.

High ovarian response

In the Engage data the following factors had a strong (P < 0.001) association with a high ovarian response (Table I): AFC at Day 1 of stimulation, FSH at Day 1 of stimulation, female age, total ovarian volume, study region and inhibin B. The multivariable logistic regression model (Table III) included female age, AFC Day 1, FSH level Day 1, LH level Day 1, study region and BMI as independent predictors.

Table III

Covariate	OR	95% CI	P-value	AUC^a	AUC^b
Age	0.89	0.83–0.95	0.0003	0.64	0.61
AFC	1.13	1.08–1.20	<0.0001	0.75	0.74
FSH	0.57	0.48–0.69	<0.0001	0.79	0.78
LH	1.26	1.11–1.46	0.0005	0.81	0.80
Region	2.24	1.44–3.49	0.0004	0.82	0.81
BMI	1.07	0.99–1.15	0.0890	0.82	0.81

Odds ratio (OR) for region is USA versus Europe. All other ORs are per unit increase. CI, confidence interval; AUC, area under the curve.

aApparent.

bOptimism corrected.

Logistic regression model for a high ovarian response (>18 oocytes): stepwise-built logistic model, each row depicting the cumulative contribution of a variable to a model including all variables from previous rows. Odds ratio (OR) for region is USA versus Europe. All other ORs are per unit increase. CI, confidence interval; AUC, area under the curve. aApparent. bOptimism corrected. As shown in Table III, some factors that were not, or only marginally, statistically significant in the univariate analysis were still included in the multivariate model (e.g. BMI and LH). On the other hand, factors that were statistically significant when considered univariately (e.g. total ovarian volume and inhibin B) were not included in the multivariate model. The prognostic impact of these factors was apparently captured by other factors already in the model. It appears that higher AFC, LH and BMI increased the chance of a high ovarian response, whereas higher FSH and older age decreased this risk. Also, a high ovarian response was more common in North America than in Europe. More details of the model for a high ovarian response and application are given in the Supplementary data (see Supplementary text ‘Model formulas’ and Supplementary Table SI). The apparent area under the ROC curve for a high ovarian response (Fig. 1a) was 0.82. The optimism-corrected AUC was only slightly lower (0.81). The optimal probability cut-off for the prediction of a high ovarian response was 17.9%. That is: if the model-based probability is higher than this value, a patient is classified as a ‘predicted’ high ovarian responder. The apparent sensitivity and specificity from this cut-off were 0.82 and 0.73, respectively. The apparent positive and negative predictive values were 0.40 and 0.95, respectively.

Figure 1

(a). Receiver operating characteristic (ROC) curves for models for a high ovarian response (>18 oocytes) in controlled ovarian stimulation (COS) using a GnRH antagonist protocol. (b). ROC curves for models for a low ovarian response (<6 oocytes) in COS using a GnRH antagonist protocol. The discrimination achieved by models with fewer predictors was already close to that of the final model. A model with age, AFC, FSH and LH reached an AUC of 0.81. The ROC curve for this model was plotted in Fig. 1a. A model with only age and AFC, however, provided limited discriminatory capacity (AUC 0.75). Histograms displaying the predicted probabilities for a high ovarian response based on the final model are given in the Supplementary data (see Supplementary data, Fig. S1). To assist in making model-based calculations in daily practice, a score chart was developed, together with a probability plot (Table IV, Fig. 2, for the model with four factors age, AFC, FSH and LH). The use of this chart is best illustrated by an example. Suppose we have a patient, aged 36 years with an AFC (2–10 mm) of 16, a basal FSH of 4.9 IU/l and a basal LH of 2.9 IU/l, using the score chart the total score for this patient can be calculated as 1 + 10 + 5 + 6 = 22. In the probability plot it can be seen that the predicted probability for this patient to become a high ovarian responder is ∼13%. The ‘optimal’ probability cut-off for a high ovarian response (17.9%) approximately corresponds to a total score of 23. It should be noted that the score chart uses categorized covariates leading to some loss of information (apparent AUC 0.78 versus 0.81 for continuous covariates).

Table IV

Score chart for a high or low ovarian response.

Variable	High ovarian response			Low ovarian response
Variable	Range^a		Score	Range^a		Score
Age (years)	—	28	5	—	24	6
	29	31	4	25	28	7
	32	33	3	29	31	8
	34	35	2	32	33	9
	36	—	1	33	—	10
AFC	—	6	6	—	6	5
	7	8	7	7	7	4
	9	10	8	8	10	3
	11	13	9	11	13	2
	14	—	10	14	—	1
FSH (IU/l)	—	5.5	5	—	6	6
	5.5	6	4	6	6.5	7
	6	6.5	3	6.5	7.5	8
	6.5	7	2	7.5	8	9
	7	—	1	8	—	10
LH (IU/l)	—	4	6	—	4	5
	4	5	7	4	5	4
	5	6	8	5	6.5	3
	6	8	9	6.5	9	2
	8	—	10	9	—	1

aLower limit excluded; upper limit included.

Figure 2

Probability plot for a high or low ovarian response in COS using a GnRH antagonist protocol.

Score chart for a high or low ovarian response. aLower limit excluded; upper limit included. Probability plot for a high or low ovarian response in COS using a GnRH antagonist protocol. Interpretation and application of the model would be further simplified if the continuous covariates age, AFC, FSH and LH were classified as ‘high’ or ‘low’, for example by using the median as a cut-off. However, it is well known that dichotomization of continuous covariates leads to loss of information. Indeed, the AUC of the simpler model drops to 0.77 (details not shown). Similarly, if we would simply count the number of risk factors present for each patient (0–6), the AUC of a model based on that count is only 0.74 (details not shown).

Low ovarian response

In the Engage data, FSH at Day 1 of stimulation, AFC at Day 1 of stimulation and age were strongly (P < 0.001) related to low ovarian response (Table I). In the multivariable logistic regression model (Table V) female age, AFC Day 1, basal FSH level, basal LH level and E2 on Day 1 were included as independent predictors.

Table V

Covariate	OR	95% CI	P-value	AUC^a	AUC^b
Age	1.08	1.00–1.18	0.0560	0.63	0.58
AFC	0.87	0.82–0.93	<0.0001	0.75	0.74
FSH	1.47	1.28–1.68	<0.0001	0.78	0.77
LH	0.81	0.69–0.95	0.0085	0.80	0.78
E₂	1.01	1.00–1.01	0.0454	0.80	0.78

OR are per unit increase.

aApparent.

bOptimism corrected.

Logistic regression model for a low ovarian response (<6 oocytes): stepwise-built logistic model, each row depicting the cumulative contribution of a variable to a model including all variables from previous rows. OR are per unit increase. aApparent. bOptimism corrected. Four prognostic factors identified for a low ovarian response were also identified for a high ovarian response. As expected, the direction of the effects was reversed: higher FSH and older age increased the chance of a low ovarian response, whereas higher AFC and LH decreased this risk. More details of the model for a low ovarian response and application are given in the Supplementary data (see Supplementary text ‘Model formulas’ and Supplementary data, Table SII). The apparent AUC of the ROC curve for the complete model (Fig. 1b) was 0.80. The optimal probability cut-off for the prediction of a low ovarian response was 12.8% (i.e. a patient is classified as a predicted low ovarian responder if the model-based probability is above this value). The apparent sensitivity and specificity for this cut-off level were 0.77 and 0.73, respectively. The apparent positive and negative predictive values were 0.29 and 0.96, respectively. Again, it appeared that the discrimination achieved by a simpler model was close to that of the complete final model (Table V). A model with age, AFC, FSH and LH already achieved an AUC of 0.80. The ROC curve for this model is plotted in Fig. 1b. Histograms with the predicted probabilities for a low ovarian response are given in the Supplementary data (see Supplementary Fig. S2). A score chart was also provided for a low ovarian response (Table IV, again for the model with the four factors age, AFC, FSH and LH). It should be noted that for the same variable, the categorizations and scores are different from the score chart for high response. Continuing the example of the 36-year-old patient, the total score for this patient can be calculated as 10 + 1 + 6 + 5 = 22. In the probability plot (Fig. 2) it can be seen that the predicted probability for this patient to become a low ovarian responder is <10%. The ‘optimal’ probability cut-off for a low ovarian response (12.8%) approximately corresponds to a total score of 23. Note, again, that some information is lost due to categorization of covariates in the score chart (apparent AUC 0.78 versus 0.80). Again, the interpretation of the model could be further simplified by classifying the covariates as ‘high’ or ‘low’ based on their median values. However, the AUC of the simpler model would then drop to 0.73 (details not shown). Similarly, the AUC of a model based on the number of risk factors present (0–5) would become 0.71 (details not shown). A calibration plot for a high ovarian response (see Supplementary Fig. S3) demonstrated that there was reasonable agreement between the observed percentages in the Xpect data and the predicted probabilities based on the model derived from the Engage trial. A logistic regression model for a high ovarian response in the Xpect data with the PI as the only covariate resulted in a regression coefficient of 0.81, smaller than unity but not statistically significantly so (P = 0.26). The intercept was virtually zero (P = 0.98), indicating that, corrected for the PI, the percentage of high responders was well predicted. The associated AUC was 0.78, smaller than the apparent AUC (0.82). The calibration plot for a low ovarian response (see Supplementary Fig. S4) showed again agreement between predicted and observed percentages, except for one outlier. Surprisingly, the regression coefficient of the PI for a low ovarian response was greater than 1 (1.35), although the difference from unity was not statistically significant (P = 0.18). The associated AUC was 0.84, in fact, greater than the apparent AUC of 0.80, suggesting an increased ability to distinguish patients, something that is not observed very often in prognostic modelling. The intercept was 0.77 (P = 0.090) suggesting that, when corrected for the PI, the percentage of low responders in Xpect was underestimated. Apparently, the model could not fully explain the difference in low responder rates between Engage (12.7%) and Xpect (16.1%).

Model building and validation using a model for a high ovarian response based on the number of follicles

Model building and validation using a definition of a high ovarian response as >18 follicles ≥11 mm diameter on the day of hCG administration are given in the Supplementary data (see Supplementary text ‘Alternative model for a high ovarian response based on the number of follicles’, Supplementary data, Table SIII and Figs S5 and S6).

Discussion

The present study confirms the ability of prior prediction of high and low responders to COS using a GnRH antagonist for LH rise prevention. The common prognostic factors for high and low ovarian responses were female age, AFC and basal serum FSH and LH. In conjunction, these factors provide sufficiently accurate response prediction models for studies on individualized tailoring of the FSH stimulation dosage. The importance of AFC and basal FSH, as well as female age, is in line with data from long GnRH agonist protocols (Broekmans ; Fauser ; Broer ). Although AFC and basal FSH may both relate to the quantity of FSH-sensitive follicles, their independent contribution to at least the prediction of low response has been demonstrated in several studies (Verhagen ). The estimate of overall sensitivity and specificity of published prediction models for a low ovarian response, based on the summary ROC curve in a published meta-analysis (Verhagen ), clearly matched the findings for the currently presented model. For exaggerated response prediction, formal multifactor prediction models have not been published, as most of the attention has focused on single-test predictors, such as AMH and AFC (Broer ). The association between LH and ovarian hypo- and hyper-response has not been identified previously. A limited number of studies have included LH levels in an LH/FSH ratio, with the purpose of assessing its value for outcome prediction (Mukherjee ; Shrim ). However, a formal meta-analysis of these studies is lacking, and its value seems limited. The association between elevated LH levels and polycystic ovary syndrome may explain the current findings, although a more linear relation with the number of antral follicles is clearly absent for this factor. The inclusion of study region in the model for a high ovarian response improves predictions, but lacks any biological rationale, other than a possible imbalance in predictive factors between European and North American populations. Therefore, we investigated whether the region effect could be explained by other factors. It appeared that there were differences between regions, but only for covariates that were not included in the model: smoking status (Europe versus North America: 13.6 versus 4.8%), serum progesterone at Day 1 of stimulation (median 1.6 versus 1.8 nmol/l) and total ovarian volume (median 9.5 versus 13.7 ml). Forced inclusion of these factors in the model did not eliminate the effect of study region. The only remaining explanation is that study region captures differences in variables that have not been specifically recorded, for example the oocyte retrieval procedure. The fact that the present findings and those of a previous report (Nyboe Andersen ) clearly confirms the predictability of ovarian response categories in antagonist co-treatment cycles is an important finding. In view of the differences in the way the ovaries are exposed to exogenous FSH, the possibility was expressed that submaximal stimulation could undermine the predictability by factors such as AMH and AFC. Assuming that these factors would correctly indicate the number of FSH-sensitive follicles, increased variation in the proportion of follicles that will indeed grow and deliver an oocyte in antagonist cycles could create a possible source for inaccuracy. Apparently, the proportional relation between cohort size at initiation of stimulation and the oocyte yield at the end of the track is not different when agonist and antagonist cycles are compared, though a systematic difference in oocyte yield has been firmly demonstrated for these two treatment approaches (Al-Inany ). No uniform definitions were available for excessive and a low ovarian response at the time of writing of this paper. We have used >18 and <6 oocytes for high and low ovarian responses, respectively (Ferraretti ). Alternative definitions for high ovarian (>15 rather than >18 oocytes) and low ovarian responses (<5 rather than <6 oocytes) were explored, but the same variables were selected with similar regression coefficients (results not shown). The best operative definition for either response type ultimately depends on the way a diagnostic category (for example ‘low responder’) will lead to a certain change in management. Current understanding points towards the range of 6–14 oocytes as the range of optimal response associated with the highest probability of a live birth (Sunkara ). Certainly, the optimal limits may further be affected by the risk of complications, such as ovarian hyperstimulation syndrome, and the likelihood that, in cases with a predicted response outside of this range, adjusted management can alter the outcome to a response in the normal range. Expectations here may be more optimistic regarding prevention of an excessive response than for a low response (Klinkert ; Lekamge ; Olivennes, 2010; Jayaprakasan ; Nelson ). The strength of the prediction models presented here is that both were validated in an independent study, showing good discrimination and calibration in a cohort of comparable patients. The prediction model included both FSH and LH, which were both consistently measured by a central laboratory using the same immunoassays. Due to the well-known differences between commercial gonadotrophin immunoassays, the external value of the model may become slightly different if other commercial FSH and LH assays are applied. A weakness is the absence in the models of AMH, a factor that had a high prognostic value in agonist cycles (Broer ). When modelling high and low response based on the Xpect study, where AMH was collected, this parameter turned out to be predictive for both high and low ovarian responses, replacing AFC in the models (Nyboe Andersen ). Although AMH has appeared to be a solid biomarker of ovarian reserve with a considerable degree of intra- and inter-cycle consistency (Hehenkamp ; van Disseldorp ), the AMH assay suffers from a certain degree of variability that may hamper reliable predictions of ovarian response (Rustamov ). One of the sources of this variation is the between-sample variation during one or subsequent menstrual cycles. This variation has appeared to be quite substantial, specifically in younger women (Overbeek ; Rustamov ) and is believed to represent biological fluctuation parallel to fluctuation in antral follicle numbers (van Disseldorp ). Moreover, nomograms or prognostic models should be based on studies where the samples have been measured by the same AMH immunoassay to ensure accurate predictions (Nelson and La Marca, 2011). Based on the present findings and studies in agonist cycles, AMH and AFC may serve as highly overlapping predictors, with currently no definite conclusion as to the factor with the highest performance (Broer ). The lack of AMH as a factor in the model may not be permanent. Prognostic models may be updated when new predictors or tests become available and techniques for quick updating (as opposed to extensive model revisions) exist (Steyerberg ). Another large trial in patients undergoing COS using a GnRH antagonist protocol has been completed recently [Pursue (NCT01144416)]. Since this trial is similar to Engage in design and sample size and includes AMH assessments, an update of the presented models may be indicated in due course.

Implications for practice

The usefulness of ovarian response prediction for clinical practice will depend on two issues. First, the accuracy of the response class prediction needs to limit the number of false predictions. For the models presented here, ∼75% of real low or high responders can be identified; however, at the same time, a positive test will, in some 15% of cases, wrongly suggest that the patient is producing too few or too many oocytes. It is crucial to consider that cases with a normal test will receive standard treatment, while cases with abnormal tests will be managed differently, for example, by dosage increase or dosage reduction. Secondly, dose reduction may create low response in falsely predicted high responders, while dose increase in falsely predicted low responders may create excessive responses. To what extent this will affect the overall efficacy of prior response predicting and subsequent adjustments in the stimulation regimen must be assessed from well-powered randomized trials. In such trials, both the efficacy of adjusted treatment in normalizing response and the effect of inaccuracies of prediction will be combined. Relevant outcome measures, such as overall programme performance, cancellation rates and costs, will in concert help to determine the true value of treatment individualization based on response prediction. Published scenario studies to date were non-randomized or not well controlled (Olivennes, 2010; Nardo ; Nelson ). Currently executed studies will help to define the desired added value of tailored stimulation protocols (van Tilborg ).

Summary

Prognostic models to predict poor or excessive ovarian response in antagonist co-medicated ovarian hyperstimulation treatment for IVF appear to be as accurate as in agonist controlled cycles. This finding opens avenues for trials on individualized treatment protocols.

Supplementary data

Supplementary data are available at http://humrep.oxfordjournals.org/.

Authors’ roles

F.J.B., P.J.M.V., M.J.C.E., B.M.J.L.M. and H.W. took part in the analysis and interpretation of data, writing the manuscript and in the final approval of the version to be published.

Funding

Financial support for this study was provided by Merck, Sharp & Dohme Corp., a subsidiary of Merck & Co. Inc., Whitehouse Station, NJ, USA. Medical writing and editorial assistance was provided by P. Milner, PhD, of PAREXEL, UK. This assistance was funded by Merck, Sharp & Dohme Corp., a subsidiary of Merck & Co. Inc., Whitehouse Station, NJ, USA. Funding to pay the Open Access publication charges for this article was provided by Merck & Co., Inc., Whitehouse Station, NJ.

Conflict of interest

F.J.B.: grant to his institution from CVZ. M.J.C.E.: none. P.J.M.V. and H.W. are employees of Merck, Sharp & Dohme Corp. (MSD) and B.M.J.L.M. was an employee of MSD.

35 in total

1. Anti-Müllerian hormone levels in the spontaneous menstrual cycle do not show substantial fluctuation.

Authors: Wouter J K Hehenkamp; Caspar W N Looman; Axel P N Themmen; Frank H de Jong; E R Te Velde; Frank J M Broekmans
Journal: J Clin Endocrinol Metab Date: 2006-06-27 Impact factor: 5.958

2. Optimum number of oocytes for a successful first IVF treatment cycle.

Authors: M H van der Gaast; M J C Eijkemans; J B van der Net; E J de Boer; C W Burger; F E van Leeuwen; B C J M Fauser; N S Macklon
Journal: Reprod Biomed Online Date: 2006-10 Impact factor: 3.828

3. Incidence and prediction of ovarian hyperstimulation syndrome in women undergoing gonadotropin-releasing hormone antagonist in vitro fertilization cycles.

Authors: Evangelos G Papanikolaou; Cristina Pozzobon; Efstratios M Kolibianakis; Michel Camus; Herman Tournaye; Human M Fatemi; Andre Van Steirteghem; Paul Devroey
Journal: Fertil Steril Date: 2006-01 Impact factor: 7.329

Review 4. A systematic review of tests predicting ovarian reserve and IVF outcome.

Authors: F J Broekmans; J Kwee; D J Hendriks; B W Mol; C B Lambalk
Journal: Hum Reprod Update Date: 2006-08-04 Impact factor: 15.610

5. Elevated day 3 FSH/LH ratio due to low LH concentrations predicts reduced ovarian response.

Authors: A Shrim; S E Elizur; D S Seidman; J Rabinovici; A Wiser; J Dor
Journal: Reprod Biomed Online Date: 2006-04 Impact factor: 3.828

6. What do we mean by validating a prognostic model?

Authors: D G Altman; P Royston
Journal: Stat Med Date: 2000-02-29 Impact factor: 2.373

7. The accuracy of multivariate models predicting ovarian reserve and pregnancy after in vitro fertilization: a meta-analysis.

Authors: T E M Verhagen; D J Hendriks; L F J M M Bancsi; B W J Mol; F J M Broekmans
Journal: Hum Reprod Update Date: 2008 Mar-Apr Impact factor: 15.610

8. Predictors of low response to mild ovarian stimulation initiated on cycle day 5 for IVF.

Authors: M F G Verberg; M J C Eijkemans; N S Macklon; E M E W Heijnen; B C J M Fauser; F J Broekmans
Journal: Hum Reprod Date: 2007-05-07 Impact factor: 6.918

Review 9. Predictors of ovarian response: progress towards individualized treatment in ovulation induction and ovarian stimulation.

Authors: B C J M Fauser; K Diedrich; P Devroey
Journal: Hum Reprod Update Date: 2007-11-15 Impact factor: 15.610

10. Increased gonadotrophin stimulation does not improve IVF outcomes in patients with predicted poor ovarian reserve.

Authors: Dharmawijaya N Lekamge; Michelle Lane; Robert B Gilchrist; Kelton P Tremellen
Journal: J Assist Reprod Genet Date: 2008-10-30 Impact factor: 3.412

14 in total

1. The assesment of follicular fluid presepsin levels in poor ovarian responder womenandits relationship with the reproductive outcomes.

Authors: Ali Ovayolu; Özkan Özdamar; İsmet Gün; Cansev Y Arslanbuğa; Tayfun Kutlu; Gülden Tunalı; Ramazan Uluhan
Journal: Int J Clin Exp Med Date: 2015-06-15

2. A Flexible Multidose GnRH Antagonist versus a Microdose Flare-Up GnRH Agonist Combined with a Flexible Multidose GnRH Antagonist Protocol in Poor Responders to IVF.

Authors: Gayem İnayet Turgay Çelik; Havva Kömür Sütçü; Yaşam Kemal Akpak; Münire Erman Akar
Journal: Biomed Res Int Date: 2015-06-16 Impact factor: 3.411

3. Cumulative Live Birth Rates After the First ART Cycle Using Flexible GnRH Antagonist Protocol vs. Standard Long GnRH Agonist Protocol: A Retrospective Cohort Study in Women of Different Ages and Various Ovarian Reserve.

Authors: Wanlin Zhang; Duo Xie; Hengde Zhang; Jianlei Huang; Xifeng Xiao; Binrong Wang; Yafei Tong; Ye Miao; Xiaohong Wang
Journal: Front Endocrinol (Lausanne) Date: 2020-05-08 Impact factor: 5.555

4. Endometriosis and ART: A prior history of surgery for OMA is associated with a poor ovarian response to hyperstimulation.

Authors: Mathilde Bourdon; Jade Raad; Yaniv Dahan; Louis Marcellin; Chloé Maignien; Marc Even; Khaled Pocate-Cheriet; Marie Charlotte Lamau; Pietro Santulli; Charles Chapron
Journal: PLoS One Date: 2018-08-20 Impact factor: 3.240

5. Prediction of in vitro fertilization outcome at different antral follicle count thresholds combined with female age, female cause of infertility, and ovarian response in a prospective cohort of 8269 women.

Authors: ShuJie Liao; Jianwu Xiong; Haiting Tu; Cheng Hu; Wulin Pan; Yudi Geng; Wei Pan; Tingjuan Lu; Lei Jin
Journal: Medicine (Baltimore) Date: 2019-10 Impact factor: 1.817

6. In-Vitro Fertilization Outcome Predictors in Women With High Baseline Follicle-Stimulating Hormone Levels: Analysis of Over 1000 Cycles From A Tertiary Center.

Authors: Gülnaz Sahin; Aysin Akdogan; Murat Hakan Aydın; Mustafa Agah Tekindal; Ege Nazan Tavmergen Göker; Erol Tavmergen
Journal: JBRA Assist Reprod Date: 2021-04-27

7. Cell-free DNA in Human Follicular Microenvironment: New Prognostic Biomarker to Predict in vitro Fertilization Outcomes.

Authors: Sabine Traver; Elodie Scalici; Tiffany Mullet; Nicolas Molinari; Claire Vincens; Tal Anahory; Samir Hamamah
Journal: PLoS One Date: 2015-08-19 Impact factor: 3.240

8. Predictive factors for ovarian response in a corifollitropin alfa/GnRH antagonist protocol for controlled ovarian stimulation in IVF/ICSI cycles.

Authors: Sergio Oehninger; Scott M Nelson; Pierre Verweij; Barbara J Stegmann
Journal: Reprod Biol Endocrinol Date: 2015-10-31 Impact factor: 5.211

9. The deferred embryo transfer strategy improves cumulative pregnancy rates in endometriosis-related infertility: A retrospective matched cohort study.

Authors: Mathilde Bourdon; Pietro Santulli; Chloé Maignien; Vanessa Gayet; Khaled Pocate-Cheriet; Louis Marcellin; Charles Chapron
Journal: PLoS One Date: 2018-04-09 Impact factor: 3.240

10. Infertility-related stress, anxiety and ovarian stimulation: can couples be reassured about the effects of psychological factors on biological responses to assisted reproductive technology?

Authors: Zaira Donarelli; Gianluca Lo Coco; Salvatore Gullo; Angelo Marino; Aldo Volpes; Laura Salerno; Adolfo Allegra
Journal: Reprod Biomed Soc Online Date: 2016-11-05