Literature DB >> 33252643

Standardizing definitions and reporting guidelines for the infertility core outcome set: an international consensus development study† ‡.

J M N Duffy^1,2, S Bhattacharya³, S Bhattacharya³, M Bofill⁴, B Collura⁵, C Curtis^6,7, J L H Evers⁸, L C Giudice^9,10, R G Farquharson¹¹, S Franik¹², M Hickey¹³, M L Hull¹⁴, V Jordan⁴, Y Khalaf¹⁵, R S Legro¹⁶, S Lensen¹³, D Mavrelos¹⁷, B W Mol¹⁸, C Niederberger¹⁹, E H Y Ng^20,21, L Puscasiu²², S Repping^23,24, I Sarris¹, M Showell²⁵, A Strandell²⁶, A Vail²⁷, M van Wely²³, M Vercoe²⁵, N L Vuong²⁸, A Y Wang²⁹, R Wang¹⁸, J Wilkinson²⁷, M A Youssef³⁰, C M Farquhar^4,25, Ahmed M. Abou-Setta, Juan J. Aguilera, Hisham AlAhwany, Oluseyi O. A. Atanda, Eva M. E. Balkenende, Kurt T. Barnhart, Yusuf Beebeejaun, Georgina M. Chambers, Abrar A. Chughtai, Irene Cuevas-Sáiz, Cate Curtis, Arianna D'Angelo, Danielle D. Dubois, Kirsten Duckitt, Carlos Encinas, Marie-Odile Gerval, Nhu H. Giang, Ahmed Gibreel, Lynda J. Gingel, Elizabeth J. Glanville, Demian Glujovsky, Ingrid Granne, Georg Griesinger, Devashana Gupta Repromed, Zeinab Hamzehgardeshi, Martin Hirsch, Marcos Horton, Shikha Jain, Marta Jansa Perez, Claire A. Jones, Mohan S. Kamath, José Knijnenburg, Elena Kostova, Antonio La Marca, Tien Khac Le, Arthur Leader, Brigitte Leeviers, Jian Li Chinese, Olabisi M. Loto, Karen L. Marks, Rodrigo M. Martinez-Vazquez, Alison R. McTavish, David J. Mills, Raju R. Nair, Dung Thi Phuong Nguyen, Anne-Sophie Otter, Allan A. Pacey, Satu Rautakallio-Hokkanen, Lynn C. Sadler, Peggy Sagle, Juan-Enrique Schwarze, Heather M. Shapiro, Joe L. Simpson, Charalampos S. Siristatidis, Akanksha Sood, Catherine Strawbridge, Helen L. Torrance, Cam Tu Tran, Emma L. Votteler, Chi Chiu Wang, Andrew Watson, Menem Yossry.

Abstract

STUDY QUESTION: Can consensus definitions for the core outcome set for infertility be identified in order to recommend a standardized approach to reporting? SUMMARY ANSWER: Consensus definitions for individual core outcomes, contextual statements and a standardized reporting table have been developed. WHAT IS KNOWN ALREADY: Different definitions exist for individual core outcomes for infertility. This variation increases the opportunities for researchers to engage with selective outcome reporting, which undermines secondary research and compromises clinical practice guideline development. STUDY DESIGN, SIZE, DURATION: Potential definitions were identified by a systematic review of definition development initiatives and clinical practice guidelines and by reviewing Cochrane Gynaecology and Fertility Group guidelines. These definitions were discussed in a face-to-face consensus development meeting, which agreed consensus definitions. A standardized approach to reporting was also developed as part of the process. PARTICIPANTS/MATERIALS, SETTING,
METHODS: Healthcare professionals, researchers and people with fertility problems were brought together in an open and transparent process using formal consensus development methods. MAIN RESULTS AND THE ROLE OF CHANCE: Forty-four potential definitions were inventoried across four definition development initiatives, including the Harbin Consensus Conference Workshop Group and International Committee for Monitoring Assisted Reproductive Technologies, 12 clinical practice guidelines and Cochrane Gynaecology and Fertility Group guidelines. Twenty-seven participants, from 11 countries, contributed to the consensus development meeting. Consensus definitions were successfully developed for all core outcomes. Specific recommendations were made to improve reporting. LIMITATIONS, REASONS FOR CAUTION: We used consensus development methods, which have inherent limitations. There was limited representation from low- and middle-income countries. WIDER IMPLICATIONS OF THE
FINDINGS: A minimum data set should assist researchers in populating protocols, case report forms and other data collection tools. The generic reporting table should provide clear guidance to researchers and improve the reporting of their results within journal publications and conference presentations. Research funding bodies, the Standard Protocol Items: Recommendations for Interventional Trials statement, and over 80 specialty journals have committed to implementing this core outcome set. STUDY FUNDING/COMPETING INTEREST(S): This research was funded by the Catalyst Fund, Royal Society of New Zealand, Auckland Medical Research Fund and Maurice and Phyllis Paykel Trust. Siladitya Bhattacharya reports being the Editor-in-Chief of Human Reproduction Open and an editor of the Cochrane Gynaecology and Fertility Group. J.L.H.E. reports being the Editor Emeritus of Human Reproduction. R.S.L. reports consultancy fees from Abbvie, Bayer, Ferring, Fractyl, Insud Pharma and Kindex and research sponsorship from Guerbet and Hass Avocado Board. B.W.M. reports consultancy fees from Guerbet, iGenomix, Merck, Merck KGaA and ObsEva. C.N. reports being the Editor-in-Chief of Fertility and Sterility and Section Editor of the Journal of Urology, research sponsorship from Ferring, and a financial interest in NexHand. E.H.Y.N. reports research sponsorship from Merck. A.S. reports consultancy fees from Guerbet. J.W. reports being a statistical editor for the Cochrane Gynaecology and Fertility Group. A.V. reports that he is a Statistical Editor of the Cochrane Gynaecology & Fertility Review Group and of the journal Reproduction. His employing institution has received payment from Human Fertilisation and Embryology Authority for his advice on review of research evidence to inform their 'traffic light' system for infertility treatment 'add-ons'. N.L.V. reports consultancy and conference fees from Ferring, Merck and Merck Sharp and Dohme. The remaining authors declare no competing interests in relation to the work presented. All authors have completed the disclosure form. TRIAL REGISTRATION NUMBER: Core Outcome Measures in Effectiveness Trials Initiative: 1023.

Entities: Chemical

Keywords: female infertility; infertility; male infertility / effectiveness / safety / outcomes

Mesh：

Year: 2020 PMID： 33252643 PMCID： PMC7744157 DOI： 10.1093/humrep/deaa243

Source DB: PubMed Journal: Hum Reprod ISSN： 0268-1161 Impact factor: 6.918

Introduction

Randomized controlled trials (RCTs) evaluating potential treatments for infertility have reported many different outcomes (Wilkinson ). Such variation contributes to challenges in comparing, contrasting and combining individual trials, limiting the usefulness of research to inform clinical practice (Duffy ). The development, dissemination and implementation of a minimum data set, known as a core outcome set, will help to standardize outcome selection, collection and reporting across future infertility research. A core outcome set for infertility (Fig. 1) has been developed (Duffy ). However, there are inconsistencies in how individual core outcomes are currently defined by fertility trials. For example, definitions of live birth include a viable fetus after 24 weeks of gestation, pregnancy continuation beyond 28 weeks of gestation and delivery of a living baby (Wilkinson ). Such variation makes it possible for researchers to selectively report their results based on statistical significance. For example, researchers can undertake multiple statistical analyses at different gestational thresholds for live birth and selectively report the most favorable result.

Figure 1.

A core outcome set for future infertility research.

A core outcome set for future infertility research. There are unique challenges when reporting the results of infertility research because of the multistage nature of the treatment, particularly in the context of IVF (Wilkinson ). Multiple clinical and procedural events can occur during treatment. These events can be reported in subgroups containing only those patients who reach a certain milestone, for example, oocyte retrieval, embryo transfer and implantation. When reporting individual core outcomes, there could be many denominators available. This enables researchers to undertake multiple analyses using different denominators and selectively report results. The variation in definitions and poor reporting practices makes comparing and combining individual RCTs challenging. When these practices are common, it is likely the benefits of fertility treatments are being overestimated and the harms of treatments are being underestimated (Duffy ). This undermines secondary research, including individual patient data (IPD) meta-analysis and network meta-analysis, and compromises clinical practice guideline development. Standardizing definitions and improving reporting for individual core outcomes creates an opportunity to develop additional consistency in future infertility trials and ensure that secondary research can be undertaken prospectively, efficiently and harmoniously (Duffy ). No guidelines have established recommendations regarding the development of consensus definitions and reporting guidelines for individual core outcomes (Williamson ). Outside the context of core outcome set development, the Harbin Consensus Conference Workshop has developed a standardized definition for live birth (The Harbin Consensus Conference Workshop Group, 2014) and the International Committee for Monitoring Assisted Reproductive Technologies (ICMART) has standardized definitions related to infertility and ART (Zegers-Hochschild ). Motivated by the desire to maximize the potential of infertility research to inform clinical practice, an international collaboration coordinated by the Cochrane Gynaecology and Fertility Group has brought healthcare professionals, researchers and people with infertility together to standardize definitions for the core outcome set for infertility.

Materials and methods

The study was prospectively registered with the Core Outcome Measures in Effectiveness Trials (COMET) initiative, registration number 1023. An international steering group, including healthcare professionals, researchers and people with fertility problems, was established to provide a perspective to inform key methodological decisions. The important work of the Harbin Consensus Conference Working Group and ICMART is complementary to this study. A protocol describing the study’s consensus methods has been published (Duffy ). The protocol was developed with reference to the COMET initiative handbook (Williamson ). The protocol was also informed by a systematic review of registered, progressing and completed core outcome sets relevant to women’s and newborn health (Duffy et al., 2017b) and the experiences of steering group members involved in other core outcome set development studies (Hirsch ; Khalil ; Webbe ; Whitehouse ; Khalil ). Potential definitions for individual core outcomes were extracted from definition development initiatives and national and international clinical practice guidelines and by reviewing the Cochrane Gynaecology and Fertility Group’s standardized guidance for infertility reviews. A systematic review was undertaken searching the COMET initiative register to identify definition development initiatives relevant to infertility research, from inception to October 2018. Clinical practice guidelines relevant to infertility were identified by searching bibliographical databases, including Embase, MEDLINE and PubMed, from inception to October 2018. The Cochrane Gynaecology and Fertility Group provided access to their editorial policy, which describes their standardized approach to the selection of outcomes and definitions across Cochrane reviews evaluating potential fertility treatments. Using a pilot-tested and standardized data extraction form, definitions were extracted verbatim from all sources. An inventory was developed by organizing potential definitions within an organizational framework (Supplementary Fig. S1). Steering group members with expertise in statistics and research methodology prepared discussion points related to the analysis and reporting of the core outcome set. The inventory and discussion points were discussed during a face-to-face consensus development meeting held in Auckland, New Zealand. The consensus development conference is a formal consensus development method developed by the US National Institutes of Health and has been used to reach consensus for definitions, clinical practice recommendations and professional competencies (Ferguson, 1996). The consensus method was developed to include aspects of judicial decision-making, scientific conferences and the town hall meeting. Participants hear evidence on which they will later deliberate and are able to ask questions as the evidence is presented. The chairperson is responsible for directing the discussion. The group discussion follows an informal format. Healthcare professionals, researchers and people with fertility problems who had participated in the Delphi survey, which informed the development of the core outcome set for infertility, were invited to participate (Duffy ). The study aimed to recruit between 10 and 15 participants, as this number has yielded sufficient results and assured validity in other studies (Murphy ). Before starting the meeting, participants provided demographic details. The group discussion followed an informal format with the chairperson providing direction. Each core outcome was discussed in turn. Potential definitions were displayed within the definition hierarchy. Each participant was asked to contribute their opinions. Participants were encouraged to suggest other potential definitions or reformulate individual definitions to improve clarity or comprehension. Although the group was encouraged to reach consensus, members were able to express minority or alternative views when consensus could not be achieved. Participants were encouraged to agree contextual statements to highlight important methodological issues which would need to be considered when reporting individual core outcomes. Participants also developed consensus guidance regarding statistical analysis and a reporting table.

Results

Potential definitions were inventoried across four definition development initiatives, including the Brighton Collaboration (Chen ), Harbin Consensus Conference Workshop Group, ICMART and World Health Organization (WHO) (World Health Organization, 2018), 12 clinical practice guidelines (American Urological Association, 2010a,b,c; Jarvi ; Kroon et al., 2011; Koch ; National Institute for Health and Care Excellence, 2013; Loh ; Carranza-Mamane ; Practice Committee of the American Society for Reproductive Medicine; 2017a,b; Jungwirth ), and the standardized methods advocated by the Cochrane Gynaecology and Fertility Group for the preparation of systematic reviews evaluating potential fertility treatments. Forty-four potential definitions were discussed during the consensus development meeting. Twenty-seven participants, including 14 healthcare professionals, 7 researchers and 6 people with fertility problems, from 11 countries, participated in the consensus development meeting (Table I).

Table I

Participant characteristics.

	Participants n = 27
Stakeholder group, n
Health professionals	14
Researchers	7
People with fertility problems	6
Gender, n
Male	12
Female	15
Age (years), n
Under 29	1
30–39	6
40–49	3
50–59	9
Over 60	5
Prefer not to say	3
Geographical location, n
Africa	0
Asia	3
Australia and New Zealand	9
Europe	12
North America	3
South America	0

Participant characteristics.

Live birth

When considering live birth, participants noted the Improving the reporting of clinical trials of infertility treatments (IMPRINT) statement recommended a gestational age threshold of 20 completed weeks. This statement was specifically developed to improve outcome reporting in infertility trials by modifying the Consolidated Standards of Reporting Trials (CONSORT) statement (Moher ). Given this context, participants agreed the IMPRINT gestational threshold should be recommended to ensure consistency across comparable initiatives standardizing outcome reporting in RCT (Table II).

Table II

Standardized definitions for the core outcome set for infertility.

Viable intrauterine pregnancy confirmed by ultrasound	A pregnancy diagnosed by ultrasonographic examination of at least one fetus with a discernible heartbeat.	Researchers should report at which gestation the ultrasound examination was performed. Pregnancies are counted as pregnancy events, for example, a twin pregnancy is counted as one pregnancy event. Effect size estimates and 95% confidence interval should be reported for pregnancy events. The denominator should be per participant randomized. Singleton, twin and higher multiple pregnancy should be reported separately.
Pregnancy loss		When considering twin and higher multiple pregnancies, pregnancy loss should be explicitly accounted for.
Ectopic pregnancy	A pregnancy outside the uterine cavity, diagnosed by ultrasound, surgical visualization or histopathology.
Miscarriage	The spontaneous loss of an intrauterine pregnancy prior to 20 completed weeks of gestational age.	Miscarriage should be reported after a viable pregnancy has been confirmed by ultrasound.
Stillbirth	The death of a fetus prior to the complete expulsion or extraction from its mother after 20 completed weeks of gestational age. The death is determined by the fact that, after such separation, the fetus does not breathe or show any other evidence of life, such as heartbeat, umbilical cord pulsation or definite movement of voluntary muscles.	When considering stillbirth involving twins and higher multiple births they should be reported as a single event.
Termination of pregnancy	Intentional loss of an intrauterine pregnancy, through intervention by medical, surgical or unspecified means.	Selective embryo or fetal reduction should be reported.
Live birth	The complete expulsion or extraction from a woman of a product of fertilization, after 20 completed weeks of gestational age; which, after such separation, breathes or shows any other evidence of life, such as heart beat, umbilical cord pulsation or definite movement of voluntary muscles, irrespective of whether the umbilical cord has been cut or the placenta is attached. A birth weight of 350 g or more can be used if gestational age is unknown.	Live births are counted as birth events, for example, twin live birth is counted as one live birth event. Effect size estimates and 95% confidence interval should be reported for live birth events. The denominator should be per participant randomized. Singletons, twin and higher multiple births should be reported separately.
Gestational age at birth	The age of a fetus is calculated by the best obstetric estimate determined by assessments which may include early ultrasound, and the date of the last menstrual period, and/or perinatal details. In the case of assisted reproductive techniques, it is calculated by adding 14 days to the number of completed weeks since fertilization.	The gestational age of both live births and stillbirths should be reported. Gestational age at birth should be reported as a median and interquartile range. Reporting the mean and standard deviation in addition would support future meta-analysis.
Birthweight	Birth weight should be collected within 24 h of birth and assessed using a calibrated electronic scale with 10-g resolution.	The birthweight of singletons, twins and higher multiples should be reported separately. Birthweight for each newborn infant of the multiple birth set should be reported. Birthweight should not be adjusted for gestational age. The birthweight of stillbirths should be reported.
Neonatal mortality	Death of a live born baby within 28 days of birth. This can be sub-divided into early neonatal mortality, if death occurs in the first 7 days after birth and late neonatal, if death occurs between 8 and 28 days after birth.	Mortality related to preterm infants should be collected up to 28 days beyond their estimated due date. If a member of a multiple birth set dies in the neonatal period this should be explicitly reported.
Major congenital anomaly	Structural, functional and genetic anomalies, that occur during pregnancy, and identified antenatally, at birth, or later in life, and require surgical repair of a defect, or are visually evident, or are life-threatening, or cause death.	Major congenital anomalies should be classified using a standardized taxonomy. Major congenital anomaly should be reported as an infant with at least one major congenital anomaly detected. If a major congenital anomaly is identified in a member of a multiple set this should be explicitly reported.

Standardized definitions for the core outcome set for infertility. Researchers should report at which gestation the ultrasound examination was performed. Pregnancies are counted as pregnancy events, for example, a twin pregnancy is counted as one pregnancy event. Effect size estimates and 95% confidence interval should be reported for pregnancy events. The denominator should be per participant randomized. Singleton, twin and higher multiple pregnancy should be reported separately. When considering twin and higher multiple pregnancies, pregnancy loss should be explicitly accounted for. Miscarriage should be reported after a viable pregnancy has been confirmed by ultrasound. When considering stillbirth involving twins and higher multiple births they should be reported as a single event. Selective embryo or fetal reduction should be reported. Live births are counted as birth events, for example, twin live birth is counted as one live birth event. Effect size estimates and 95% confidence interval should be reported for live birth events. The denominator should be per participant randomized. Singletons, twin and higher multiple births should be reported separately. The gestational age of both live births and stillbirths should be reported. Gestational age at birth should be reported as a median and interquartile range. Reporting the mean and standard deviation in addition would support future meta-analysis. The birthweight of singletons, twins and higher multiples should be reported separately. Birthweight for each newborn infant of the multiple birth set should be reported. Birthweight should not be adjusted for gestational age. The birthweight of stillbirths should be reported. Mortality related to preterm infants should be collected up to 28 days beyond their estimated due date. If a member of a multiple birth set dies in the neonatal period this should be explicitly reported. Major congenital anomalies should be classified using a standardized taxonomy. Major congenital anomaly should be reported as an infant with at least one major congenital anomaly detected. If a major congenital anomaly is identified in a member of a multiple set this should be explicitly reported. When considering the reporting of live birth, participants recommended twin and higher multiple births should be reported as a single live birth event (Table III). This will ensure treatments which increase twin and higher multiple births are not favored. The participants agreed that the summary effect size estimate and 95% CI should be calculated for live birth events only, and recommended the number of participants randomized as the most appropriate denominator. In addition to reporting live birth events, singleton, twin and higher multiple births should be reported narratively. When calculating the corresponding percentages for live birth events and singleton, twin and higher multiple births, the number of participants randomized is the recommended denominator.

Table III

Generic reporting table.

	Experimental	Control	Effect size estimate (95% CI) ^*
	N	N	Effect size estimate (95% CI) ^*
Live birth event, no. (%)^†
Singleton, no. (%)
Twin, no. (%)
Higher multiples, no. (%)
Viable pregnancy confirmed by ultrasound, no. (%)^†
Singleton pregnancy, no. (%)
Twin pregnancy, no. (%)
Higher multiple pregnancy, no. (%)
Pregnancy loss^‡
Ectopic pregnancy, no.
Miscarriage, no.
Stillbirth, no.
Termination of pregnancy, no.
Gestational age at delivery (weeks of gestation), median (IQR)^§
Birthweight
Singleton, g. (mean, SD)
Twin, g. (mean, SD)^‖
Higher multiples, g (mean, SD)^‖
Neonatal mortality, no.^¶
Major congenital anomaly, no.^#

Effect size estimates and 95% CI should only be reported for live birth event and viable pregnancy confirmed by ultrasound. The remaining data should be summarized narratively.

For live birth event and viable pregnancy confirmed by ultrasound the number of participants randomized should be used as the denominator.

When considering twin and higher multiple pregnancies, pregnancy loss should be explicitly accounted for within the table footnote.

For gestational age at delivery reporting the mean and SD within the table footnote would support future meta-analysis.

The birthweight for each newborn infant of the multiple birth set should be reported.

If a member of a multiple birth set dies in the neonatal period this should be explicitly stated within the table footnote.

Reported as an infant with at least one major congenital anomaly detected. If a major congenital anomaly is identified in a member of a multiple set this should be explicitly stated within the table footnote.

g, grams; N, number of randomized participants; No, number of events; IQR, interquartile range.

Generic reporting table. Effect size estimates and 95% CI should only be reported for live birth event and viable pregnancy confirmed by ultrasound. The remaining data should be summarized narratively. For live birth event and viable pregnancy confirmed by ultrasound the number of participants randomized should be used as the denominator. When considering twin and higher multiple pregnancies, pregnancy loss should be explicitly accounted for within the table footnote. For gestational age at delivery reporting the mean and SD within the table footnote would support future meta-analysis. The birthweight for each newborn infant of the multiple birth set should be reported. If a member of a multiple birth set dies in the neonatal period this should be explicitly stated within the table footnote. Reported as an infant with at least one major congenital anomaly detected. If a major congenital anomaly is identified in a member of a multiple set this should be explicitly stated within the table footnote. g, grams; N, number of randomized participants; No, number of events; IQR, interquartile range. Carefully selecting an appropriate denominator will avoid common issues associated with the analyses of data arising from infertility trials, particularly for studies related to ART. These issues are discussed in detail within the discussion.

Viable intrauterine pregnancy confirmed by ultrasound

Participants agreed a consensus definition, which included visualization of a heartbeat. Participants discussed the reporting of twin and higher multiple pregnancies and recommended they should be reported as a single pregnancy event. The effect size estimate and 95% CI should be calculated for pregnancy events only. Participants concluded that it was also important for singleton, twin and higher multiple pregnancy to be routinely reported. When calculating the corresponding percentages for pregnancy events and singleton, twin and higher multiple pregnancies, the number of participants randomized is the denominator which should be used. Participants discussed the importance of embedding RCT within routine clinical practice and were reluctant to insist upon mandatory urinary or serum beta-hCG testing or ultrasonographic examinations in addition to routine care. The variation in routine ultrasonographic examination between countries was discussed, for example, routine ultrasound scans are performed between 6 and 8 weeks in the USA, between 11 and 13 weeks in the UK and following 16 weeks in the Netherlands. Following the discussion, a contextual statement was recommended to ensure researchers consistently reported the gestation at which the ultrasonographic examination diagnosing viable intrauterine pregnancy was performed.

Pregnancy loss

Ectopic pregnancy

Following discussion, consensus was reached to adopt the ICMART definition of ectopic pregnancy.

Miscarriage

Participants discussed the WHO’s definition for miscarriage and observed this definition was the most widely used within an international context. The definition includes a gestational age threshold of 20 completed weeks. They observed such a threshold would correlate well with the IMPRINT statement’s definition of live birth, which was previously adopted. Participants unanimously agreed to modify the ICMART definition of late fetal loss to include an estimated gestational age threshold of 20 completed weeks. Within the context of this core outcome set, participants recommended miscarriage should only be reported after a viable pregnancy has been confirmed by ultrasound.

Stillbirth

Participants discussed the variety of contextual factors including local cultural influences, legislative framework and national and international reporting requirements, which would influence the different gestational age thresholds incorporated in different definitions of stillbirth. They highlighted the importance of accounting for all pregnancy losses and the gestational age threshold for stillbirth would need to consider the threshold already agreed for miscarriage. Participants unanimously agreed to modify the ICMART definition to include a gestational age threshold of 20 completed weeks with an appropriate adjustment for birthweight. When considering stillbirth involving twins and higher multiple pregnancies, participants recommended they should be reported as a single event.

Termination of pregnancy

Following discussion, consensus was reached to adopt the ICMART definition of termination of pregnancy. Participants noted the importance of reporting selective embryo or fetal reduction. Participants discussed the reporting of pregnancy loss and recommended ectopic pregnancies, miscarriages, stillbirths and terminations of pregnancy should be reported numerically. Percentages and effect size estimates should not be reported. When considering twin and higher multiple pregnancies, participants recommended pregnancy losses should be accounted for within the footnotes of the reporting table and summarized narratively within the study report.

Gestational age at delivery

Following discussion, consensus was reached to adopt the ICMART definition of gestational age. Participants recommended that gestational age at delivery should be reported for both live births and stillbirths. Participants agreed gestational age at delivery should be reported as the median and interquartile range. An effect size estimate should not be reported. Participants recommended that researchers should be encouraged to report the mean and SD within the reporting table footnote to support future meta-analysis.

Birthweight

Participants noted the measurement of birthweight as being well-characterized. Participants noted best practice recommendations, which recommend collecting birthweight within 24 h of birth and using a calibrated electronic scale with 10-g resolution. If there is limited availability of correctly calibrated electronic scales, the type of scale and its calibration should be clearly reported. Participants recommended birthweight should not be adjusted for gestational age. Participants agreed birthweight, reported as a mean and SD, should be recorded separately for singleton, twin and higher multiple infants. The birthweight for each infant of a multiple birth set should be reported.

Neonatal mortality

Participants noted the consistent use of the WHO definition for neonatal mortality across definition development initiatives, including ICMART, international and national guidelines and Cochrane systematic reviews. A contextual statement was agreed to ensure researchers report any mortality of preterm infants up to 28 days beyond their estimated due date. Participants agreed neonatal mortality should be reported numerically. Percentages and effect size estimates should not be reported. If a member of a multiple birth set dies in the neonatal period this should be stated within the reporting table footnote and summarized narratively within the study report.

Major congenital anomaly

Participants discussed how congenital anomalies varied in severity, with severe anomalies impacting upon an infant’s health, development and survival. Participants reached a view that future RCT should consistently report major congenital anomalies. Participants unanimously agreed to modify the ICMART definition to include criteria to ensure only major congenital anomalies are reported. Participants stated the importance of classifying congenital anomalies using a standardized taxonomy (DeSilva ). Participants agreed major congenital anomalies should be reported as an infant with at least one major congenital anomaly detected. If a major congenital anomaly is identified in a member of a multiple set this should be stated within the reporting table footnote and summarized narratively within the study report. Percentages and effect size estimates should not be reported.

Time to pregnancy leading to live birth

Detailed guidance regarding the collection, analysis and reporting of time to pregnancy leading to live birth was approved by the meeting participants and has been provided as Supplementary Data File S1.

Discussion

Definition development initiatives, clinical practice guidelines and Cochrane reviews have defined individual core outcomes in different ways. Through formal consensus methods, 14 healthcare professionals, 7 researchers and 6 people with fertility problems, from 11 countries, have successfully developed consensus definitions for all core outcomes. Specific recommendations have been made to improve the reporting of core outcomes. This study has used formal consensus methods to develop consensus definitions for the core outcome set for infertility. The consensus development conference is a formal consensus development method developed by the US National Institutes of Health and has been used to reach consensus on a variety of topics in many different countries including, Canada, UK and Sweden. The study has engaged a range of different stakeholders, including healthcare professionals, researchers and people with fertility problems, from different countries. Such diversity should secure the generalizability of the results and increase its credibility with other researchers. The study has developed clear and concise recommendations to enable future researchers to collect core outcomes in a standardized approach and report their results in a clear and transparent manner. This study is not without limitations. There is significant uncertainty regarding the optimal methods for core outcome set development (Duffy and McManus, 2016; Williamson ; Duffy ). The COMET initiative has made no formal recommendations regarding the development of consensus definitions. They advocate the use of formal consensus development method in other aspects of core outcome set development, which informed the methodological choices we made in this study. Different formal consensus methods, including the modified Delphi method and modified Nominal Group Technique, could have been used. Further methodological research is required to evaluate the most appropriate consensus methods for studies similar to ours. Consideration should be given to the representativeness of the steering group and consensus meeting participants. Many consensus meeting participants were from European countries (n = 12; 44%) and there was limited representation from low- and middle-income countries, which could have impacted upon the development of consensus definitions. Further research should be undertaken to evaluate virtual or blended formats to improve representation while preserving limited resources. Analyses of data arising from infertility trials, particularly for studies related to ART, are frequently undermined by the use of an inappropriate denominator (Wilkinson ). Two main issues exist. The first is the use of a post-randomization denominator, for example, when live birth rates are calculated per embryo transferred, rather than per woman randomized. Analyses conducted on this basis do not reflect the randomized comparisons, as the groups being compared may differ with respect to their characteristics, and therefore, also with respect to their outcomes (Hirji and Fagerland, 2009). The second issue relates to analyses that commit a unit of analysis error (Vail and Gardener, 2003). This error occurs when proportions are calculated using an inappropriate denominator, for example, the number of oocytes or number of embryos. Unit of analysis errors commonly occurs when researchers calculate the pregnancy rate by dividing the number of gestational sacs on ultrasound by the number of embryos transferred. As the outcomes of a couple’s embryos are correlated, this approach is incorrect as standard statistical tests assume that the tested observations are independent. To avoid these important issues, it is good practice to calculate viable pregnancy confirmed by ultrasound and live birth events using the number of participants randomized as the denominator. This approach is explicitly stated within the core outcome set recommendations. Sophisticated statistical analysis methods capable of accommodating post-randomization comparisons and clustered data do exist. They could be reported in addition to the core outcome set if researchers had access to the necessarily statistical expertise. This study has developed the generic building blocks for future infertility research. A minimum data set affords the opportunity for researchers to easily populate protocols, case report forms and other data collection tools with core outcomes and consensus definitions. The generic reporting table should assist researchers to clearly report their results within journal publications and conference presentations. Implementing a standardized approach should reduce poor reporting practices, for example, incomplete reporting, selective reporting based on statistical significance and inappropriate use of denominators (Wilkinson ). It is anticipated that research studies with limited access to methodological and statistical advice will benefit the most. Systematic implementation of this core outcome set should ensure the core outcomes are consistently defined by individual trials. Symmetrical application of standardized definitions in all trial arms is known to reduce measurement bias, including observer and verification bias (Mansournia ). Blinding outcome assessors to the treatment allocation would further reduce bias (Sterne ). Outcome assessors should also undertake comprehensive training. Other strategies can help to ensure consensus definitions are applied correctly and, in a manner, which is unlikely to vary, including standardized data collection tools, internal validation studies and independent adjudication panels. A freely available electronic case report form and data repository are currently being planned to standardize the collection of the core outcome set within future infertility trials (COMMIT-Collection). The Core Outcomes in Women’s and Newborn Health (CROWN) initiative, supported by over 80 specialty journals, including the Cochrane Gynaecology and Fertility Group, Fertility and Sterility and Human Reproduction, have resolved to implement the core outcome set for infertility (Core Outcomes in Women's and Newborn Health Initiative, 2014). In the future participating journals will request researchers to report the definitions for individual core outcomes within published trial reports. When the consensus definition has not been used, the researchers will be asked to report this observation and its implications for their findings. Reporting will be facilitated by the recommendations made within this study. The need to combine the results of individual trials evaluating fertility treatments should be anticipated by researchers (Wilkinson ). Implementing the core outcome set, including consensus definitions, should be considered good practice and could make a significant contribution in improving the coordination, development and delivery of fertility research within regional, national and international settings (Devall ). Standardization will facilitate pairwise meta-analysis and more sophisticated secondary research, including IPD and network meta-analysis (Duffy ). These approaches could provide unique insights into the effectiveness and safety of fertility treatments. The consensus definitions developed as part of this study could be incorporated into other core outcome sets to promote additional harmony across women’s health. Other core outcome sets have been developed for endometriosis, hyperemesis gravidarum and preterm birth, which share common core outcomes including live birth, neonatal mortality and major congenital anomalies (van 't Hooft ; Jansen ; Duffy ). Core outcome set developers should be encouraged to use the consensus definitions developed as part of this study. Standardized consensus definitions are not meant to limit regional, national and international requirements to collect and report collect core outcomes using specific definitions, including live birth, stillbirth and congenital anomalies. For example, researchers undertaking research in the UK may wish to define stillbirth as occurring after 22 weeks of gestation, in line with national recommendations (Da Silva ). Researchers wishing to collect data using other definitions in the context of their own RCT would continue to be able to do so. Selective reporting should be avoided by presenting findings for both the consensus definition and any other definition used. Researchers would need to carefully consider how these data would be collected to fulfill different definitions and reporting obligations. The ultimate objective of an infertility trial is a healthy baby who develops normally. There are significant challenges in developing an objective consensus definition regarding what constitutes a ‘healthy baby’ as contextual factors, including local practices, cultural influences and legal implications, are important considerations. Consensus was reached to define live birth based on a 20-week gestational age threshold, reflecting IMPRINT recommendations and WHO guidelines. The current limit of viability is considered to be 22 weeks of gestation; however, the threshold is constantly challenged as perinatal and neonatal medicine advances. This context was also considered and a clear threshold has been decided through a robust consensus process to facilitate clear reporting across future infertility research (The Harbin Consensus Conference Workshop Group, 2014). The core outcome set should be reported by all future RCTs evaluating potential fertility treatments. This context is important when considering the consensus definition developed for pregnancy and miscarriage. Routine urinary or serum beta-hCG testing is a common feature of IVF research, however, is less likely when evaluating other interventions. To take this into account, the consensus definition for pregnancy and miscarriage includes ultrasound, which is a common component of antenatal care. An extension to the core outcome set specifically for IVF research (COMMIT-IVF) is currently being developed and includes pregnancy confirmed by urinary or serum beta-hCG testing and early miscarriage. The development of consensus definitions has provided additional focus upon the language researchers commonly use when reporting infertility research. People with fertility problems and the patient organizations involved in this study have routinely commented upon terminology. It has been often perceived as lacking a patient-centric approach including terms such as missed spontaneous abortion, induced abortion and fetal loss. Researchers should recognize the language used to report fertility research is important and holds significance to people with fertility problems. The standardization of terminology within this core outcome set has been developed to ensure precision and with consideration to good practice guidelines in partnership with people with fertility problems and the patient organizations. The COMMIT initiative has committed to undertaking further research to assess the uptake and implementation of the core outcome set for infertility (COMMIT-Implementation). Assessing the uptake of the core outcome set, including the use of consensus definitions, will be undertaken by examining registry records, published protocols and RCT. Further research is planned to examine and understand the reasons why researchers do, and do not, implement the core outcome set for infertility. By identifying perceived barriers to implementation, strategies will be developed to promote implementation of the core outcome set across future infertility research. In conclusion, ensuring that core outcomes are consistently defined across RCT evaluating potential fertility treatments will secure evidence which is more accessible and facilitate the translation of research into clinical practice. Standardized reporting should help limit poor reporting practices. Future researchers should benefit from core outcomes and consensus definitions, which can be included in protocols, case report forms and other data collection tools. The generic reporting table should assist researchers in clearly reporting their results in journal publications and conference presentations. Click here for additional data file. Click here for additional data file.

39 in total

1. CUA Guideline: The workup of azoospermic males.

Authors: Keith Jarvi; Kirk Lo; Anthony Fischer; John Grantmyre; Armand Zini; Victor Chow; Victor Mak
Journal: Can Urol Assoc J Date: 2010-06 Impact factor: 1.862

Review 2. Consensus development methods, and their use in clinical guideline development.

Authors: M K Murphy; N A Black; D L Lamping; C M McKee; C F Sanderson; J Askham; T Marteau
Journal: Health Technol Assess Date: 1998 Impact factor: 4.014

3. Reducing research waste in benign gynaecology and fertility research.

Authors: Jmn Duffy; S Bhattacharya; M Herman; B Mol; A Vail; J Wilkinson; C Farquhar
Journal: BJOG Date: 2017-02 Impact factor: 6.531

4. Methodological decisions influence the identification of potential core outcomes in studies related to pre-eclampsia: an analysis informing the development of recommendations for future core outcome set developers.

Authors: Jmn Duffy; M Hirsch; S Ziebland; R J McManus
Journal: BJOG Date: 2019-09-05 Impact factor: 6.531

5. Fibroids in infertility--consensus statement from ACCEPT (Australasian CREI Consensus Expert Panel on Trial evidence).

Authors: Ben Kroon; Neil Johnson; Michael Chapman; Anusch Yazdani; Roger Hart
Journal: Aust N Z J Obstet Gynaecol Date: 2011-03-22 Impact factor: 2.100

6. Removal of myomas in asymptomatic patients to improve fertility and/or reduce miscarriage rate: a guideline.

Authors:
Journal: Fertil Steril Date: 2017-09 Impact factor: 7.329

7. Improving the reporting of clinical trials of infertility treatments (IMPRINT): modifying the CONSORT statement†‡.

Authors: Richard S Legro; Xiaoke Wu; Kurt T Barnhart; Cynthia Farquhar; Bart C J M Fauser; Ben Mol
Journal: Hum Reprod Date: 2014-09-12 Impact factor: 6.918

8. The International Glossary on Infertility and Fertility Care, 2017.

Authors: Fernando Zegers-Hochschild; G David Adamson; Silke Dyer; Catherine Racowsky; Jacques de Mouzon; Rebecca Sokol; Laura Rienzi; Arne Sunde; Lone Schmidt; Ian D Cooke; Joe Leigh Simpson; Sheryl van der Poel
Journal: Fertil Steril Date: 2017-07-29 Impact factor: 7.329

Review 9. Tackling poorly selected, collected, and reported outcomes in obstetrics and gynecology research.

Authors: James M N Duffy; Sue Ziebland; Peter von Dadelszen; Richard J McManus
Journal: Am J Obstet Gynecol Date: 2018-09-28 Impact factor: 8.661

10. A Core Outcome Set for Evaluation of Interventions to Prevent Preterm Birth.

Authors: Janneke van 't Hooft; James M N Duffy; Mandy Daly; Paula R Williamson; Shireen Meher; Elizabeth Thom; George R Saade; Zarko Alfirevic; Ben Willem J Mol; Khalid S Khan
Journal: Obstet Gynecol Date: 2016-01 Impact factor: 7.661

11 in total

1. Laparoscopic surgery for endometriosis.

Authors: Celine Bafort; Yusuf Beebeejaun; Carla Tomassetti; Jan Bosteels; James Mn Duffy
Journal: Cochrane Database Syst Rev Date: 2020-10-23

2. Preconception lifestyle advice for people with infertility.

Authors: Tessy Boedt; Anne-Catherine Vanhove; Melissa A Vercoe; Christophe Matthys; Eline Dancet; Sharon Lie Fong
Journal: Cochrane Database Syst Rev Date: 2021-04-29

3. Developing a core outcome set for future infertility research: an international consensus development study† ‡.

Authors: J M N Duffy; H AlAhwany; S Bhattacharya; B Collura; C Curtis; J L H Evers; R G Farquharson; S Franik; L C Giudice; Y Khalaf; J M L Knijnenburg; B Leeners; R S Legro; S Lensen; J C Vazquez-Niebla; D Mavrelos; B W J Mol; C Niederberger; E H Y Ng; A S Otter; L Puscasiu; S Rautakallio-Hokkanen; S Repping; I Sarris; J L Simpson; A Strandell; C Strawbridge; H L Torrance; A Vail; M van Wely; M A Vercoe; N L Vuong; A Y Wang; R Wang; J Wilkinson; M A Youssef; C M Farquhar
Journal: Hum Reprod Date: 2020-12-01 Impact factor: 6.918

4. Top 10 priorities for future infertility research: an international consensus development study† ‡.

Authors: J M N Duffy; G D Adamson; E Benson; S Bhattacharya; S Bhattacharya; M Bofill; K Brian; B Collura; C Curtis; J L H Evers; R G Farquharson; A Fincham; S Franik; L C Giudice; E Glanville; M Hickey; A W Horne; M L Hull; N P Johnson; V Jordan; Y Khalaf; J M L Knijnenburg; R S Legro; S Lensen; J MacKenzie; D Mavrelos; B W Mol; D E Morbeck; H Nagels; E H Y Ng; C Niederberger; A S Otter; L Puscasiu; S Rautakallio-Hokkanen; L Sadler; I Sarris; M Showell; J Stewart; A Strandell; C Strawbridge; A Vail; M van Wely; M Vercoe; N L Vuong; A Y Wang; R Wang; J Wilkinson; K Wong; T Y Wong; C M Farquhar; Hisham AlAhwany; Ofra Balaban; Faith Barton; Yusuf Beebeejaun; Jacky Boivin; Jan J. A. Bosteels; Carlos Calhaz-Jorge; Arianna D’Angelo; Leona F. Dann; Christopher J. De Jonge; Elyce du Mez; Rui A. Ferriani; Marie-Odile Gerval; Lynda J. Gingel; Ellen M. Greenblatt; Geraldine Hartshorne; Charlie Helliwell; Lynda J. Hughes; Junyoung Jo; Jelena Jovanović; Ludwig Kiesel; Chumnan Kietpeerakool; Elena Kostova; Tansu Kucuk; Rajesh Kumar; Robyn L. Lawrence; Nicole Lee; Katy E. Lindemann; Olabisi M. Loto; Peter J. Lutjen; Michelle MacKinven; Mariano Mascarenhas; Helen McLaughlin; Selma M. Mourad; Linh K. Nguyen; Robert J. Norman; Maja Olic; Kristine L. Overfield; Maria Parker-Harris; Sjoerd Repping; Roberta Rizzo; Pietro Salacone; Catherine H. Saunders; Rinku Sengupta; Ioannis A. Sfontouris; Natalie R. Silverman; Helen L. Torrance; Eleonora P. Uphoff; Sarah A. Wakeman; Tewes Wischmann; Bryan J. Woodward; Mohamed A. Youssef
Journal: Hum Reprod Date: 2020-12-01 Impact factor: 6.918

Review 5. Study design flaws and statistical challenges in evaluating fertility treatments.

Authors: Jack Wilkinson; Katie Stocking
Journal: Reprod Fertil Date: 2021-06-17

Review 6. Protocol for developing a core outcome set for male infertility research: an international consensus development study.

Authors: Michael P Rimmer; Ruth A Howie; Richard A Anderson; Christopher L R Barratt; Kurt T Barnhart; Yusuf Beebeejaun; Ricardo Pimenta Bertolla; Siladitya Bhattacharya; Lars Björndahl; Pietro Bortoletto; Robert E Brannigan; Astrid E P Cantineau; Ettore Caroppo; Barbara L Collura; Kevin Coward; Michael L Eisenberg; Christian De Geyter; Dimitrios G Goulis; Ralf R Henkel; Vu N A Ho; Alayman F Hussein; Carin Huyser; Jozef H Kadijk; Mohan S Kamath; Shadi Khashaba; Yoshitomo Kobori; Julia Kopeika; Tansu Kucuk; Saturnino Luján; Thabo Christopher Matsaseng; Raj S Mathur; Kevin McEleny; Rod T Mitchell; Ben W Mol; Alfred M Murage; Ernest H Y Ng; Allan Pacey; Antti H Perheentupa; Stefan Du Plessis; Nathalie Rives; Ippokratis Sarris; Peter N Schlegel; Majid Shabbir; Maciej Śmiechowski; Venkatesh Subramanian; Sesh K Sunkara; Basil C Tarlarzis; Frank Tüttelmann; Andy Vail; Madelon van Wely; Mónica H Vazquez-Levin; Lan N Vuong; Alex Y Wang; Rui Wang; Armand Zini; Cindy M Farquhar; Craig Niederberger; James M N Duffy
Journal: Hum Reprod Open Date: 2022-03-16

7. Can hysterosalpingo-foam sonography replace hysterosalpingography as first-choice tubal patency test? A randomized non-inferiority trial.

Authors: Nienke van Welie; Joukje van Rijswijk; Kim Dreyer; Machiel H A van Hooff; Jan Peter de Bruin; Harold R Verhoeve; Femke Mol; Wilhelmina M van Baal; Maaike A F Traas; Arno M van Peperstraten; Arentje P Manger; Judith Gianotten; Cornelia H de Koning; Aafke M H Koning; Neriman Bayram; David P van der Ham; Francisca P J M Vrouenraets; Michaela Kalafusova; Bob I G van de Laar; Jeroen Kaijser; Arjon F Lambeek; Wouter J Meijer; Frank J M Broekmans; Olivier Valkenburg; Lucy F van der Voet; Jeroen van Disseldorp; Marieke J Lambers; Rachel Tros; Cornelis B Lambalk; Jaap Stoker; Madelon van Wely; Patrick M M Bossuyt; Ben Willem J Mol; Velja Mijatovic
Journal: Hum Reprod Date: 2022-05-03 Impact factor: 6.353

8. Individual participant data meta-analysis of trials comparing frozen versus fresh embryo transfer strategy (INFORM): a protocol.

Authors: Rui Wang; David J McLernon; Shimona Lai; Marian G Showell; Zi-Jiang Chen; Daimin Wei; Richard S Legro; Ze Wang; Yun Sun; Keliang Wu; Lan N Vuong; Pollyanna Hardy; Anja Pinborg; Sacha Stormlund; Xavier Santamaría; Carlos Simón; Christophe Blockeel; Femke Mol; Anna P Ferraretti; Bruce S Shapiro; Forest C Garner; Rong Li; Christos A Venetis; Ben W Mol; Siladitya Bhattacharya; Abha Maheshwari
Journal: BMJ Open Date: 2022-07-18 Impact factor: 3.006

9. Development of a core outcome set and outcome definitions for studies on uterus-sparing treatments of adenomyosis (COSAR): an international multistakeholder-modified Delphi consensus study.

Authors: T Tellum; J Naftalin; C Chapron; M Dueholm; S-W Guo; M Hirsch; E R Larby; M G Munro; E Saridogan; Z M van der Spuy; D Jurkovic
Journal: Hum Reprod Date: 2022-08-25 Impact factor: 6.353

10. Comparison of Stimulated Cycles with Low Dose r-FSH versus Hormone Replacement Cycles for Endometrial Preparation Prior to Frozen-Thawed Embryo Transfer in Young Women with Polycystic Ovarian Syndrome: A Single-Center Retrospective Cohort Study from China.

Authors: Li Li; Dan-Dan Gao; Yi Zhang; Jing-Yan Song; Zhen-Gao Sun
Journal: Drug Des Devel Ther Date: 2021-06-28 Impact factor: 4.162