Literature DB >> 31206038

A systematic review of database validation studies among fertility populations.

V Bacal^1,2,3, M Russo^4,5, D B Fell^1,6,7, H Shapiro^4,5, M Walker^1,2,3, L M Gaudet^1,2,3.

Abstract

STUDY QUESTION: Are routinely collected data from fertility populations adequately validated? SUMMARY ANSWER: Of the 19 studies included, only one validated a national fertility registry and none reported their results in accordance with recommended reporting guidelines for validation studies. WHAT IS KNOWN ALREADY: Routinely collected data, including administrative databases and registries, are excellent sources of data, particularly for reporting, quality assurance, and research. However, these data are subject to misclassification bias due to misdiagnosis or errors in data entry and therefore need to be validated prior to using for clinical or research purposes. STUDY DESIGN SIZE DURATION: We conducted a systematic review by searching Medline, Embase, and CINAHL from inception to 6 October 2016 to identify validation studies of databases that contain routinely collected data in an ART setting. Webpages of international ART centers were also searched. PARTICIPANTS/MATERIALS SETTING
METHODS: We included studies that compared at least two data sources to validate ART population data. Key words and MeSH terms were adapted from previous systematic reviews investigating routinely collected data (e.g. administrative databases and registries), measures of validity (including sensitivity, specificity, and predictive value), and ART (including infertility, IVF, advanced reproductive age, and diminished ovarian reserve). Only full-text studies in English were considered. Results were synthesized qualitatively. The electronic search yielded 1074 citations, of which 19 met the inclusion criteria. MAIN RESULTS AND THE ROLE OF CHANCE: Two studies validated a fertility database using medical records; seven studies used an IVF registry to validate vital records or maternal questionnaires, and two studies failed to adequately describe their reference standard. Four studies investigated the validity of mode of conception from birth registries; two studies validated diagnoses or treatments in a fertility database; four studies validated a linkage algorithm between a fertility registry and another administrative database; one study created an algorithm in a single database to identify a patient population. Sensitivity was the most commonly reported measure of validity (12 studies), followed by specificity (9 studies). Only three studies reported four or more measures of validation, and five studies presented CIs for their estimates. The prevalence of the variable in the target population (pre-test prevalence) was reported in seven studies; however, only four of the studies had prevalence estimates from the study population (post-test prevalence) within a 2% range of the pre-test estimate. The post-test estimate was largely discrepant from the pre-test value in two studies. LIMITATIONS REASONS FOR CAUTION: The search strategy was limited to the studies and reports published in English, which may not capture validation studies from countries that do not speak English. Furthermore, only three specific fertility-based diagnostic variables (advanced reproductive age, diminished ovarian reserve, and chorionicity) were searched in Medline, Embase, and CINAHL. Consequently, published studies with other diagnoses or conditions relevant to infertility may not have been captured in our review. WIDER IMPLICATIONS OF THE
FINDINGS: There is a paucity of literature on validation of routinely collected data from a fertility population. Furthermore, the prevalence of the markers that have been validated are not being presented, which can lead to biased estimates. Stakeholders rely on these data for monitoring outcomes of treatments and adverse events; therefore, it is essential to ascertain the accuracy of these databases and make the reports publicly available. STUDY FUNDING/COMPETING INTERESTS: This study was supported by Canadian Institutes of Health Research (CIHR) (FDN-148438). There are no competing interests for any of the authors. REGISTRATION NUMBER: International Prospective Register of Systematic Reviews ID: CRD42016048466.

Entities: Chemical

Keywords: ART; database; infertility; quality assurance; reproductive epidemiology; validation

Year: 2019 PMID： 31206038 PMCID： PMC6561328 DOI： 10.1093/hropen/hoz010

Source DB: PubMed Journal: Hum Reprod Open ISSN： 2399-3529

WHAT DOES THIS MEAN FOR PATIENTS?

The World Health Organization recognizes the inability to have a healthy child after 1 year of attempting pregnancy as a disease or a disability. The psychosocial implications of infertility are vast, including depression, discrimination, and ostracism, the latter being of particular importance in lower income countries. Determining the prevalence and burden of infertility, as well as performing regular surveillance on ART treatments and outcomes, is essential to inform policy, conduct research, and counsel patients. For example, the International Committee for Monitoring Assisted Reproductive Technologies relies on large-population data from regional and national ART registries around the world. With these data, they are able to provide reports depicting trends in practice, utilization of health care, and pregnancy outcomes after treatment. Accurate and robust data are paramount to providing such reports. While these reports are reliant on administrative databases, our systematic review demonstrated that the quality assurance practices to establish accurate and reliable data are lacking in the literature. Moreover, where reports were published, adherence to reporting guidelines for studies using administrative data was also insufficient. We have provided a comprehensive review of the current literature, describing current practices, various strategies, and guidelines for which a validation study should adhere to in order to ensure accurate data.

Introduction

Infertility burdens 1.9% to 10.5% of child-seeking women worldwide and was estimated to affect 48.5 million couples in 2010 (Mascarenhas ). According to the International Committee for Monitoring Assisted Reproductive Technologies (ICMART), 1.4 to 1.6 million ART cycles were initiated per year from 2008 to 2010, resulting in approximately 800 000 babies born over this time period (Dyer ). ART is a rapidly evolving field in medicine with new advances in research and technology. From freezing techniques for gametes and embryos (Loutradi ; AbdelHafez ; National Institute for Health and Care Excellence, 2013) to the number of embryos replaced (Practice Committee of the Society for Assisted Reproductive Technology and Practice Committee of the American Society for Reproductive Medicine, 2012; National Institute for Health and Care Excellence, 2013) and utilization of PGD (Harton ), reproductive technologies and guidelines are changing regularly. It is, therefore, prudent to ensure we can adequately monitor treatment outcomes and adverse events. Studies from the USA and Europe estimate that the prevalence of live births born after IVF ranges from 1% to 6% (Sullivan ; Sunderam ). The risk of adverse obstetrical events is significantly higher in ART compared to naturally conceived pregnancies (McGovern ; Sazonova ; Qin ). However, the prevalence of these complications attributed to ART, such as ectopic pregnancies, placenta previa, and congenital anomalies, is low, with estimates 1–2% (Perkins ; Santos-Ribeiro ), 1.6% (Romundstad ), and ~8% (Davies ), respectively. Similarly, other neonatal outcomes, including small for gestational age, preterm delivery, and admission to a critical unit, also occur infrequently (McGovern ; Sazonova ; Qin ). Therefore, in order to adequately understand the implications of such treatment, studies using large sample sizes are required. Collections of routinely collected data, such as administrative databases and registries, are excellent sources of population-level data. These databases often contain sociodemographic information, health care utilization, treatment, and diagnostic information affiliated with health care visits. However, these data are not collected for a specific research question and are prone to error resulting from clerical errors, illegible charts, and documentation problems (Hierholzer Jr, 1991). If not validated adequately, utilization of these data for surveillance, quality improvement, and research can lead to misclassification bias and unmeasured confounding due to missing data (Benchimol ). Many studies that use large administrative and registry databases to identify patients who undergo ART treatments indicate that they are using validated data (Fedder ; Traeger-Synodinos ; De Geyter ). However, the literature is scarce on validation studies and measures performed to ensure accuracy among these databases. There is extensive literature indicating the importance of presenting measures of validity, including sensitivity, specificity, and positive predictive values (PPVs), to reflect whether these data can be reliably used for research and reporting (Sørensen ; Herrett ; Van Walraven ; Benchimol ). Ideally, a validation study uses a gold standard as a measure to guide the accuracy and reliability of the validated variable. Based on the Standards for Reporting of Diagnostic Accuracy Studies guidelines for evaluating diagnostic tests, a gold standard should be the best available test in identifying the condition of interest (Bossuyt ). To this end, the gold standard of determining accuracy of database variables or data elements has not been established (Sørensen ; Juurlink ; Lain ; Benchimol ). In the absence of a true gold standard, some argue that the medical record should serve as the reference standard (Widdifield ; Frosst ). With this in mind, we conducted a systematic review to identify the validation studies of databases that contain these routinely collected data (including administrative data and registry data) in an ART setting. Our primary objective was to assess how ART centers (either databases maintained by a clinic or those managed by a region or country) validate and report their fertility data, their rationale for choosing specific data elements for validation activities, the extent to which a database is considered valid for use, and actions taken when validity was deemed poor. Our secondary objective was to investigate whether ART centers were reporting their validation studies in accordance with the published reporting guidelines for validation studies (Benchimol ) with details pertaining to the method of validation and quality control, the variables chosen to validate the database, and the outcome measures.

Materials and Methods

This review was conducted in accordance with a protocol developed and registered a priori (International Prospective Register of Systematic Reviews ID: CRD42016048466). We selected studies that performed a validation of ART population data, which were based on comparison of at least two data sources (health administrative or registry databases, chart reabstraction, self-reported questionnaires). Large administrative or registry databases were defined as those collecting data routinely without an a priori research question. We included studies that validated specific data elements or variables (e.g. diagnoses and treatments), case-finding algorithms within fertility databases or registries, or linkage studies between two or more databases that include a fertility registry. In this setting, a data element or variable could include (but was not limited to) diagnosis, treatment, and patient characteristic. Case-finding algorithms were defined as combinations of data elements used to identify a patient population. Linkage studies were defined as studies that used two databases that are joined together to identify or create a study population. The outcomes of interest were measures of diagnostic validity including sensitivity, specificity, PPV, negative predictive value (NPV), likelihood ratio, kappa coefficient, area under the receiver operating characteristic curve or c-statistic, accuracy, or agreement of the selected data elements. Only full-text articles published in English were considered. The search strategy was developed with the aid of an information specialist with expertise in clinical research, adapted from previous systematic reviews (Benchimol ; Shiff ). Electronic bibliographic databases, specifically Medline, Embase, and CINAHL, were searched using specific vocabulary and MeSH keywords (see Supplementary Data 1 for Medline search strategy). Reference lists of all included articles and relevant systematic reviews were screened to identify additional studies. Web pages for major international fertility surveillance systems were searched to account for validation activities presented within surveillance reports, which are typically not indexed in bibliographic databases. We also contacted these surveillance programs to request reports that were not publicly available. These programs included https://www.belrap.be/Public/Default.aspx?Lg=En (Belgium), https://www.sart.org/ (USA), https://www.asrm.org/about-us/contact-us/ (American Society of Reproductive Medicine), https://npesu.unsw.edu.au/data-collection/australian-new-zealand-assisted-reproduction-database-anzard (Australia and New Zealand), https://www.hfea.gov.uk/ (UK), https://www.eshre.eu/Home/Contact-us.aspx (ESHRE), and https://www.icmr.gov.in/icmrnews/art/contact_us.htm (India). Citations were imported into EndNote and managed within Covidence (www.covidence.org). This process was recorded using the preferred reporting items for systematic reviews and meta-analysis (PRISMA) flow diagram (Moher ). We performed the final search on 6 October 2016. As validating the dataset or part of the dataset is often a secondary objective of studies using routinely collected data, our search strategy could not capture all relevant validation studies. For example, investigators performing a cohort or case-control study involving a specific diagnosis or treatment may validate that data element prior to its use. While that diagnosis or treatment may be relevant to an infertility population, these studies would not be identified as validation studies and thus would not be included. Screening was performed in two steps by two independent reviewers (V.B. and M.R.) using the eligibility criteria. Title and abstract screening was performed initially, followed by full-text screening. Disagreements were resolved by consensus or through consultation with a senior expert where consensus could not be reached. Reasons for excluding studies in the full-text screening step were documented. We extracted data from each included study on country of origin, year of publication, number of clinics involved, number of treatment records, sample size calculation, variables or algorithms used, method of validation (chart review versus survey of patients versus another validated database), whether datasets were linked, how datasets were linked (probabilistic versus deterministic), prevalence of the variable(s) under investigation estimated both prior to the study from the target population (pre-test prevalence) and from the study population (post-test prevalence), and validation outcome measures (listed above). Two independent reviewers extracted these data (i.e. in duplicate). We used items from previously published reporting guidelines for validation studies as a guide to evaluate whether the included studies used rigorous methodology to conduct their validation (Benchimol ). This checklist was implemented by two independent reviewers. We made a post-hoc decision after protocol registration to adapt the quality assessment tools used by two previously published systematic reviews to assess both reporting and quality of studies (Benchimol ; Grams ). All results were synthesized qualitatively.

Results

The electronic search yielded 1074 citations after removing duplicates. Upon applying the inclusion and exclusion criteria, we identified 65 studies for full-text screening after title and abstract screening. Seven additional studies were identified for full-text screening after reviewing the references of pertinent articles and searching web pages. Of these 72 studies, 53 did not meet inclusion criteria for various reasons including wrong study design, comparator, or patient population (details can be found in Supplementary Data 2). Nineteen studies were included for final analysis (Fig. 1), representing the USA (Sunderam ; Molinaro ; Zhang , 2012, Buck Louis , 2015; Cohen ; Kotelchuck ; Liberman ; Centers for Disease Control and Prevention ; Luke ; Stern , 2016b), Finland (Hemminki ; Gissler ), Denmark (Hvidtjorn ), the Netherlands (Overbeek ), Israel (Rosenfeld and Strulov, 2009a), and the UK (Williams ). Of these studies, four did not use any reference standard (Hemminki ; Gissler ; Sunderam ; Williams ), and the reference was poorly described in two studies (Molinaro ; Rosenfeld and Strulov, 2009a) (Table I). Two studies used medical records to validate a fertility database (Molinaro ; Centers for Disease Control and Prevention ), seven studies used an IVF registry as the reference standard to validate either vital records or maternal questionnaires (Hvidtjorn ; Zhang ; Williams ; Cohen ; Liberman ; Buck Louis ; Luke ), one study utilized maternal report as the reference standard for validation (Buck Louis ), and three studies used vital records (birth and death certificates) as the reference standard (Overbeek ; Kotelchuck ; Stern ). Finally, one study used both IVF registries and vital records as the reference standard depending on the data element validated (Stern ).

Figure 1

Preferred reporting items for systematic reviews and meta-analysis flow diagram.

Table I

Descriptive characteristics of included studies.

Authors	Year	Country	Data source being validated	Reference standard	Population	Sample size
Buck Louis GM and Druschel C	2015	USA	Questionnaire (Upstate New York Infant Development Screening Program Study)	IVF registry (SART CORS)	Mothers who had live births in Upstate New York between July 2008 and May 2010 in whom `Infertility treatment’ was checked on birth certificate and multiple births matched to singleton infants whose treatment box was not checked	5034
Buck Louis GM and Hediger ML	2014	USA	Administrative database (Perinatal Data System)	Questionnaire	Mothers who had live births in Upstate New York between July 2008 and May 2010 in whom `Infertility treatment’ was checked on birth certificate and multiple births matched to singleton infants whose treatment box was not checked	4989
Centers for Disease Control and Prevention	2016	USA	Fertility database (SART)	Medical record	ART cycle data from 458 fertility clinics in the US during the 2014 cycle year. A random selection of 34 clinics were selected	1996
Cohen B	2014	USA	Administrative database (birth certificates)	IVF registry (NASS)	Live births to Florida or Massachusetts resident mothers that occurred in state from March 2004 to December 2006	856 165
Gissler M	2004	Finland	Administrative database (medical birth record)	NA (compared ad hoc IVF research and IVF statistics, no reference standard)	Newborns from fertility treatments from 1996 to 1998	176 698
Hemminki E	2003	Finland	Administrative database (Drug Reimbursement Register)	Internal examination of data and linkage to Birth Register	Women exposed to ART between 1996 and 1998	24 318
Hvidtjørn D	2009	Denmark	Administrative database	IVF Registry	Women who participated in the first Danish National Birth Cohort (study) interview with a pregnancy resulting in a live born child between October 2007 and June 2003	88 151
Kotelchuck M	2014	USA	IVF registry (SART)	Administrative database (PELL)	Children born to Massachusetts resident women in MA hospitals from July 2004 to December 2008 conceived by ART	10 138
Liberman RF	2014	USA	Questionnaire (National Birth Defects Prevention Study)	IVF registry	Women who completed the NBDPS with in-state deliveries between September 2004 and December 2008	77
Luke B	2016	USA	Administrative database (birth certificates)	IVF registry	Live births in Florida, Massachusetts, New York, Pennsylvania, Texas, California, Ohio, and Colorado between 2004 and 2009. IVF cycles from SART CORS were linked to birth certificates.	716 103
Molinaro TA	2009	USA	IVF registry	Medical records	IVF patients enrolled for other studies at the University of Pennsylvania between December 2003 and June 2006	590
Overbeek A	2013	Netherlands	Questionnaire (DCOG LATER-VEVO Study—nationwide cohort study)	Administrative database (Netherlands Perinatal Registry)	Childhood cancer survivors who achieved pregnancy and their sibling controls	524
Rosenfeld Y	2009a	Israel	IVF reporting system	Medical record	Women who receive fertility treatment in the District of Haifa and Western Galilee of the General Health Services	108
Stern JE andGopal D	2016a	USA	IVF registry (SART)	Administrative database (Massachusetts BDMP Registry)	ART deliveries from 1 July 2004 to 31 December 2008 in Massachusetts	9092
Stern JE and McLain AC	2016b	USA	Questionnaire (Upstate New York Infant Development Screening Program Study)	SART database for current cycle; Questionnaire for prior treatment information	Mothers who participated in Upstate KIDS Study linked with SART CORS	617
Sunderam S	2006	USA	Administrative database	IVF registry	Infants born in 1997 and 1998 in MA, RI, NH, CT to MA-resident mothers who used ART clinics in MA or RI	2703
Williams CL	2013	UK	Administrative database (National Registry of Childhood Tumours)	IVF registry (HFEA)	Children born between 1 January 1992 and 31 December 2008	106 013
Zhang Y	2012	USA	Administrative database	IVF registry (NASS)	Live births to MA-resident mothers that occurred in MA during 1997-2000	6139
Zhang Z	2010	USA	Administrative database (Massachusetts Registry of Vital Records and Statistics-MBC)	IVF registry (NASS)	Live births to MA-resident mothers that occurred in MA during 1997–2000	5190

BDMP, Birth Defects Monitoring Program; NASS, National ART Surveillance System; NBDP, National Birth Defects Prevention; NBPDS, National Birth Defects Prevention Study; PELL, Pregnancy to Early Life Longitudinal data system; SART CORS, Society for Assisted Reproductive Technology Clinical Outcomes Reporting System.

Preferred reporting items for systematic reviews and meta-analysis flow diagram. Descriptive characteristics of included studies. BDMP, Birth Defects Monitoring Program; NASS, National ART Surveillance System; NBDP, National Birth Defects Prevention; NBPDS, National Birth Defects Prevention Study; PELL, Pregnancy to Early Life Longitudinal data system; SART CORS, Society for Assisted Reproductive Technology Clinical Outcomes Reporting System. Four studies validated method of conception from birth registries (Gissler ; Zhang ; Cohen ; Luke ), two validated diagnoses or treatment variables within the fertility database (Molinaro ; Centers for Disease Control and Prevention ), one study created an algorithm to identify a patient population (Hemminki ), and four studies validated linkage algorithms between a fertility and a second administrative database (Sunderam ; Zhang ; Williams ; Kotelchuck ). Sensitivity was the most commonly reported validation measure. Twelve studies reported sensitivity (Hvidtjorn ; Zhang , 2012; Overbeek ; Buck Louis , 2015; Cohen ; Kotelchuck ; Liberman ; Luke ; Stern , 2016b), nine reported specificity (Hvidtjorn ; Zhang ; Overbeek ; Buck Louis , 2015; Cohen ; Kotelchuck ; Liberman ; Luke ), six reported PPV (Hvidtjorn ; Zhang ; Overbeek ; Cohen ; Kotelchuck ; Buck Louis ), one reported NPV (Buck Louis ), five reported the Kappa coefficient (Gissler ; Overbeek ; Buck Louis ; Kotelchuck ; Stern ), and seven reported percentage agreement (Gissler ; Hvidtjorn ; Zhang ; Overbeek ; Buck Louis ; Stern , 2016b) (Table II). The data quality measures are presented in Supplementary Data 3. Only three studies reported four or more measures of validation (Hvidtjorn ; Buck Louis , 2015). Nine studies presented 95% CIs with the estimates (Gissler ; Zhang , 2012; Overbeek ; Cohen ; Liberman ; Buck Louis ; Centers for Disease Control and Prevention ; Stern ), of which five reported CIs for all estimates (Zhang ; Cohen ; Liberman ; Buck Louis ; Centers for Disease Control and Prevention ).

Table II

Summary of reported validity measures.

Study	Sensitivity	Specificity	PPV	NPV	Kappa	% Agreement	ICC	AUC/c-statistic	Likelihood ratios	Four or more measures of validity	Number of measures	95% CI
Buck Louis et al. (2015)	10/10	10/10	10/10	10/10	No	No	No	No	No	10/10	4	10/10
Buck Louis et al. (2014)	1/4	1/4	No	No	1/4	4/4	No	No	No	1/4	4	0/4
Centers for Disease Control and Prevention et al. (2016)	No	No	No	No	No	No	No	No	No	No	1	18/18
Cohen et al. (2014)	2/2	2/2	2/2	No	No	No	No	No	No	No	3	2/2
Gissler et al. (2004)	No	No	No	No	1/2	2/2	No	No	No	No	2	1/2
Hemminki et al. (2003)	No	No	No	No	No	No	No	No	No	No	0	NA
Hvidtjørn et al. (2009)	3/3	3/3	3/3	No	No	3/3	No	No	No	3/3	4	0/3
Kotelchuck et al. (2014)	3/3	1/3	3/3	No	3/3	No	No	No	No	No	4	0/3
Liberman et al. (2014)	5/5	5/5	No	No	No	No	No	No	No	No	2	5/5
Luke et al. (2016)	1/1	1/1	No	No	No	No	No	No	No	No	2	No
Molinaro et al. (2009)	No	No	No	No	No	No	No	No	No	No	0	NA
Overbeek et al. (2013)	10/26	10/26	10/26	No	16/26	16/26	No	No	No	No	4	16/26
Rosenfeld and Strulov (2009)	No	No	No	No	No	No	No	No	No	No	2	No
Stern et al. (2016a)	6/11	No	No	No	2/11	2/11	No	No	No	No	3	6/11
Stern et al. (2016b)	13/13	No	No	No	No	3/13	No	No	No	No	5	No
Sunderam et al. (2006)	No	No	No	No	No	No	No	No	No	No	0	NA
Williams et al. (2013)	No	No	No	No	No	No	No	No	No	No	1	No
Zhang et al. (2012)	1/1	No	No	No	No	No	No	No	No	No	2	1/2
Zhang et al. (2010)	1/1	1/1	1/1	No	No	No	No	No	No	No	3	3/3

AUC, area under the curve; ICC, intraclass correlation coefficient; NPV, negative predictive value; PPV, positive predictive value.

Numerator: number of validated variables; denominator: total number of variables considered for validation in each study.

Summary of reported validity measures. AUC, area under the curve; ICC, intraclass correlation coefficient; NPV, negative predictive value; PPV, positive predictive value. Numerator: number of validated variables; denominator: total number of variables considered for validation in each study. The elements of data quality are summarized in Tables III and IV. Sixteen studies (84.2%) adequately described their data source, and all but one described the type of patient records from which data were extracted (Rosenfeld and Strulov, 2009a). The studies predominantly described inclusion and exclusion criteria and their methods for determining the validity of the data. Fifteen studies adequately described their method of patient sampling while 14 studies sampled the entire population in the database (Hemminki ; Sunderam ; Hvidtjorn ; Zhang , 2012; Williams ; Overbeek ; Buck Louis , 2015; Cohen ; Kotelchuck ; Liberman ; Luke ; Stern ); one study performed a random sampling strategy (Centers for Disease Control and Prevention ). Only one group performed their study using an a priori sample size (Centers for Disease Control and Prevention ), and none provided statistical justification for their sample size.

Table III

Reporting quality of methodology of included studies.

Methods	Frequency	%
Describes the data source
Yes	16/19	84.2
Incomplete	2/19	10.5
Unclear	1/19	5.3
Describes type of records (inpatient, outpatient, linked records)
Yes	18/19	94.7
Unclear	1/19	5.3
Describes setting and locations where data were collected
Yes	18/19	94.7
Incomplete	1/19	5.3
Reports a priori sample size
Yes	1/19	5.3
Provides statistical justification for the sample size
Yes	0/19	0.0
Describe recruitment procedure of validation cohort (from a database, based on diagnostic codes)
Yes	17/19	89.5
Unclear	2/19	10.5
Describe patient sampling (Random, consecutive, all)
Random sampling	1/19	5.3
All	14/19	73.7
Unclear	2/19	10.5
Incomplete	2/19	10.5
Describe how participants were chosen for data collection and analysis
Yes	15/19	78.9
Unclear	2/19	10.5
Describes inclusion/exclusion criteria
Yes	14/19	73.7
Incomplete	1/19	5.3
Describes who identified patients (for patients identified from medical records)
Yes	1/19	5.3
Incomplete	1/19	5.3
Describes who collected data
Yes	3/19	15.8
Describes use of a priori data collection form
Yes	13/19	68.4
Unclear	1/19	5.3
Use of a split sample or an independent sample (revalidation using a separate cohort)
Yes	1/19	5.3
Describes the reference standard
Yes	13/17	76.5
Reports the number of persons reading the reference standard
Yes	2/17	11.8
Describes the training or expertise of persons reading reference standard
Yes	1/17	5.9
Readers of the reference standard were blinded to the results of the classification by routinely collected data for that patient (reference standard: medical records)
Yes	1/17	5.9
Reports a measure of concordance if >1 persons reading the reference standard
Yes	0/17	0.0
Describes the linkage procedure, if done (probabilistic/deterministic)
Yes	8/15	50.0
Incomplete	6/15	37.5
Describes the methods of linkage quality evaluation
Yes	7/15	46.7
Incomplete	2/15	13.3
Describes explicit methods for calculating or comparing measures of accuracy and statistical methods used to quantify uncertainty
Yes	13/19	68.4

Table IV

Reporting quality of the results of included studies.

	Frequency	%
Reports the number of participants satisfying the inclusion/exclusion criteria
Yes	13/18	68.4
Incomplete	1/18	5.6
Describes the characteristics of misclassified patients (false positives and/or false negatives)
Yes	13/18	68.4
Unclear	2/18	11.1
Provides a study flow diagram
Yes	4/19	21.1
Reports the number of records unable to link
Yes	11/12	91.7
Incomplete	1/12	8.3
Reports missing medical records or reports the number of patients unwilling to participate
Yes	10/19	52.6
Reports incomplete records
Yes	13/19	68.4
Presents a cross tabulation of results of the validated source to the reference standard
Yes	11/19	57.9
Incomplete	1/19	5.3
Reports the pretest prevalence in the study sample
Yes	5/19	26.3
Incomplete	2/19	10.5
Tests and reports results of multiple algorithms
Yes	6/15	40.0
Reports estimates of test reproducibility of the split or independent sample if done
Yes	0/19	0.0

Reporting quality of methodology of included studies. Reporting quality of the results of included studies. Where multiple databases were linked using a common patient identifier, the linkage procedures were adequately described in eight (53.3%) of the studies (Sunderam ; Zhang , 2012; Williams ; Cohen ; Kotelchuck ; Stern , 2016b). The quality of these procedures was described in only seven studies (46.7%) (Hemminki ; Sunderam ; Zhang , 2012; Williams ; Kotelchuck ; Stern ). The pre-test prevalence of the validated variables was provided in seven studies (Sunderam ; Zhang ; Buck Louis ; Cohen ; Kotelchuck ; Liberman ; Luke ) (Table V). The post-test prevalence of these variables was within a 2% range of the pre-test values for four of the studies (Zhang ; Cohen ; Kotelchuck ; Liberman ); however, in two studies, the post-test prevalence was largely discrepant from pre-test values (Buck Louis ; Luke ).

Table V

Description of the pre- and post-test prevalence of measured estimates of validity in included studies.

Study	Prevalence estimate reported	Pre-test prevalence (%)	Post-test prevalence^*(%)
Buck Louis et al. (2015)	No	—	—
Buck Louis et al. (2014)	ART conceived infant	1.40	14.0
Centers for Disease Control and Prevention (2016)	No	—	—
Cohen et al. (2014)	ART conceived infant	1.40	0.45
Gissler et al. (2004)	No	—	—
Hemminki et al. (2003)	No	—	—
Hvidtjørn et al. (2009)	No	—	—
Kotelchuck et al. (2014)	ART conceived infant	1.60	2.72
Liberman et al. (2014)	ART conceived infant in MA	4.30	5.30
Luke et al. (2016)	ART conceived infant	1.70	9.80
Molinaro et al. (2009)	No	—	—
Overbeek et al. (2013)	No	—	—
Rosenfeld and Strulov (2009)	No	—	—
Stern et al. (2016a)	Incomplete	—	—
Stern et al. (2016b)	No	—	—
Sunderam et al. (2006)	Yes	3.00	—
Williams et al. (2013)	No	—	—
Zhang et al. (2012)	No	—	—
Zhang et al. (2010)	ART Live birth deliveries	3.00	1.70

*Based on reference standard.

Description of the pre- and post-test prevalence of measured estimates of validity in included studies. *Based on reference standard.

Discussion

This study demonstrates that there is a paucity of the literature on the validation of data elements within fertility databases and registries. There were numerous studies that validated ART information derived from maternal report or birth and death certificates by comparing those data to the reference standard of a fertility registry; however, we only identified one study that assessed the validity of a fertility registry by comparing data elements from the database to the reference standard of the patient record (Centers for Disease Control and Prevention ). Furthermore, only seven studies published the baseline prevalence of the data element being validated (Sunderam ; Zhang ; Buck Louis ; Cohen ; Kotelchuck ; Liberman ; Luke ), of which only four studies’ sample prevalence approximated that of the population (Buck Louis ; Luke ). There are three commonly cited validation study designs: ecological studies, reabstraction studies, and gold standard studies (Van Walraven and Austin, 2012). Ecological studies compare measures of disease prevalence in the database to those obtained from more reliable methods, like those published elsewhere. Reabstraction studies compare the database variable or element to the medical record. Finally, gold standard studies compare the database variable to a case definition, either based on clinical or laboratory values or clinical consensus (Van Walraven and Austin, 2012). Hemminki ) and Gissler ) both performed ecological studies using national statistics. Hemminki ) and Gissler ) created a case-finding algorithm using data from a drug reimbursement register and a physician examination and intervention register to identify an infertility population in Finland. They subsequently compared these data to national statistics to validate their algorithm. Gissler ) compared prevalence estimates both from a birth registry and from aggregate IVF statistics to estimates generated from Hemminki’s study to assess the completeness and validity of these routinely collected data sources. Firstly, these reference standards rely on the accuracy of the national statistics, which were not established and should not be implicitly assumed. Secondly, as the comparison is based on aggregate data rather than patient-level data, identifying specific differences and agreements is impossible. Of the 19 studies included in our review, only 2 used the medical record as the reference standard (Molinaro ; Centers for Disease Control and Prevention ), and only 1 presented measures of validation (Centers for Disease Control and Prevention ). The others used either another database or patient report as the reference. Molinaro ) attempted to validate diagnosis variables in The Society of American Reproductive Technologies (SART) using case definitions based on clinical values in the patients’ charts rather than relying on the expertise of clinicians. They did not report their measures of validity, making it challenging to determine if this method is superior. Using objective measures, such as laboratory tests and strict diagnostic criteria, for validation compared to documentation may be more reliable, though such approaches were not identified by our review of ART validation studies. The study performed by SART assessed multiple patient variables at one time, comparing SART data to patient charts (Centers for Disease Control and Prevention ). However, due to the presentation of discrepancy rates without other important measures of validity, such as sensitivity, kappa coefficients, or PPVs, it is difficult to determine how reliable these data are. A subgroup evaluation by the size of the clinic or geography would be useful to investigate whether specific variables are largely problematic or if there is an issue at a specific clinic. A Canadian study investigating the validity of diagnostic codes in 10 major hospitals found that the sensitivity and specificity were highly dependent on the hospital, where some had a high accuracy and others demonstrated poor sensitivity (Juurlink ). Clinics may have specific expertise with respect to their patient populations, and the prevalence of certain conditions or treatments may vary based on health care provider. Predictive tests (PPV, likelihood ratios) are highly dependent on the baseline prevalence of the specific treatment or disease (Altman and Bland, 1994). Furthermore, in certain cases, the sensitivity and specificity may vary with the prevalence (Brenner and Gefeller, 1997). While the accuracy of those records would not necessarily be influenced, the metrics such as PPV, NPV, and sensitivity will be affected. Only four of the included studies presented post-test prevalence estimates that approximated the reported pre-test prevalence; it, therefore, puts into question the degree of bias in the estimates presented. As such, it is essential to describe both the source of data and prevalence of the variable of interest to adequately interpret the results. There is insufficient documentation in the literature with respect to how national fertility registries are validating their databases. SART publishes a publicly available report on an annual basis indicating which variables are discrepant between the medical chart and the database (Centers for Disease Control and Prevention ). According to ICMART’s world report, there were 61 countries that submitted nationwide ART data for surveillance (Dyer ). Unfortunately, none of the other national databases have generated such reports or have made them easily accessible. The Human Fertilisation and Embryology Authority in the UK, Australian & New Zealand Assisted Reproduction Database (ANZARD), and the Belgian Register for Assisted Procreation endorse strict adherence to quality assurance practices; however, no reports were available describing their data-validation processes (written communication with Belgium and ANZARD). As all stakeholders, including patients, health care practitioners, researchers, and policy makers, rely on these data to understand the implications of fertility treatments, including the prevalence of disease, practice patterns, and complications and outcomes of ART, it is essential that these reports are made publicly available (Butler, 2003; Chambers ; Canadian Fertility Andrology Society, 2014; Harris ; Human Fertilisation and Embryology Authority, 2016). It is clear from this review that databases are audited, but tracking that process and determining which data elements are reliable are challenging. Therefore, a gold standard from this source should not be implicitly accepted. More studies investigating the accuracy of routinely collected data in local or national registries need to be performed and published, with adherence to reporting guidelines. Upon demonstration of data validity, research can be performed utilizing these databases with measures to reduce bias. Finally, patient report is subject to recall bias, particularly as increasing time has passed from the event to the survey (Leong ). Our review has several limitations. We restricted our inclusion criteria to published reports in English. As many of the internal processes are likely to occur in the primary language of the registry or organization, it is possible that we were unable to capture validation processes from registries. A comprehensive search on the internet did not yield any results, even in other languages, however. Moreover, only four studies were excluded from our database search due to language restriction (Lidegaard and Hammerum, 2002; Rosenfeld and Strulov, 2009b; Ameri and Alizadeh, 2014; Pierron ). Our study was also limited by the search strategy developed for Medline, Embase, and CINAHL. While the strategy was quite general for routinely collected databases, the list was not exhaustive for specific diagnoses relevant to infertility. Consequently, it is probable that other published studies were not captured in our review. In spite of these limitations, our study is strengthened by the systematic and comprehensive approach to searching the articles and analyzing the measures of validity. This is the first study to our knowledge to assess the utility of validation tools for fertility registries. Although many of these reports were not published in indexed bibliographic databases, numerous attempts were made to contact ART surveillance database managers in the UK, Denmark, Belgium, Australia, New Zealand, and the USA to obtain unpublished or ad hoc reports on data maintenance and quality assurance. This review highlights an important gap in the field of fertility research where the validation of widely utilized databases has not been well described. Big data are increasingly used for research, quality assurance, and policy; therefore, the accuracy of these data is essential. Furthermore, during the validation process, the prevalence of the variables and the statistical estimates need to be adequately measured and compared to the prevalence from the drawn study population. This would allow the reader to assess the generalizability of the study population to the general population. As the prevalence of the condition varies based on health care provider or geographic location, so will these measures. Future studies need to be conducted and published using rigorous methodology that will allow for greater transparency and accuracy of research within this rapidly evolving field of medicine and research. Click here for additional data file. Click here for additional data file. Click here for additional data file.

56 in total

1. Identifying exposures of in vitro fertilization from drug reimbursement files: a case study from Finland.

Authors: Elina Hemminki; Reija Klemetti; Maria Rinta-Paavola; Jaana Martikainen
Journal: Med Inform Internet Med Date: 2003-12

2. Increased risk of preterm birth in singleton pregnancies resulting from in vitro fertilization-embryo transfer or gamete intrafallopian transfer: a meta-analysis.

Authors: Peter G McGovern; Amaury J Llorens; Joan H Skurnick; Gerson Weiss; Laura T Goldsmith
Journal: Fertil Steril Date: 2004-12 Impact factor: 7.329

3. Increased risk of placenta previa in pregnancies following IVF/ICSI; a comparison of ART and non-ART pregnancies in the same mother.

Authors: Liv Bente Romundstad; Pål R Romundstad; Arne Sunde; Vidar von Düring; Rolv Skjaerven; Lars J Vatten
Journal: Hum Reprod Date: 2006-05-25 Impact factor: 6.918

4. Linking birth and infant death records with assisted reproductive technology data: Massachusetts, 1997-1998.

Authors: Saswati Sunderam; Laura A Schieve; Bruce Cohen; Zi Zhang; Gary Jeng; Meredith Reynolds; Victoria Wright; Christopher Johnson; Maurizio Macaluso
Journal: Matern Child Health J Date: 2005-11-22

5. Improvement of accuracy of clinical reports--the case of IVF cycle rank.

Authors: Yaakov Rosenfeld; Avshalom Strulov
Journal: J Assist Reprod Genet Date: 2009-02-21 Impact factor: 3.412

6. Health care data, the epidemiologist's sand: comments on the quantity and quality of data.

Authors: W J Hierholzer
Journal: Am J Med Date: 1991-09-16 Impact factor: 4.965

Review 7. Cryopreservation of human embryos by vitrification or slow freezing: a systematic review and meta-analysis.

Authors: Kalliopi E Loutradi; Efstratios M Kolibianakis; Christos A Venetis; Evangelos G Papanikolaou; George Pados; Ioannis Bontis; Basil C Tarlatzis
Journal: Fertil Steril Date: 2007-11-05 Impact factor: 7.329

8. [The National Patient Registry as a tool for continuous production and quality control].

Authors: Øjvind Lidegaard; Mette Schou Hammerum
Journal: Ugeskr Laeger Date: 2002-09-16

Review 9. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Standards for Reporting of Diagnostic Accuracy.

Authors: Patrick M Bossuyt; Johannes B Reitsma; David E Bruns; Constantine A Gatsonis; Paul P Glasziou; Les M Irwig; Jeroen G Lijmer; David Moher; Drummond Rennie; Henrica C W de Vet
Journal: Clin Chem Date: 2003-01 Impact factor: 8.327

10. Monitoring of IVF birth outcomes in Finland: a data quality study.

Authors: Mika Gissler; Reija Klemetti; Tiina Sevón; Elina Hemminki
Journal: BMC Med Inform Decis Mak Date: 2004-03-10 Impact factor: 2.796

2 in total

1. Self-reported infertility diagnoses and treatment history approximately 20 years after fertility treatment initiation.

Authors: Alesia M Jung; Stacey A Missmer; Daniel W Cramer; Elizabeth S Ginsburg; Kathryn L Terry; Allison F Vitonis; Leslie V Farland
Journal: Fertil Res Pract Date: 2021-03-12

2. 20 years of the European IVF-monitoring Consortium registry: what have we learned? A comparison with registries from two other regions.

Authors: Ch De Geyter; C Wyns; C Calhaz-Jorge; J de Mouzon; A P Ferraretti; M Kupka; A Nyboe Andersen; K G Nygren; V Goossens
Journal: Hum Reprod Date: 2020-12-01 Impact factor: 6.918

2 in total