Literature DB >> 33576824

Guidelines for biomarker discovery in endometrium: correcting for menstrual cycle bias reveals new genes associated with uterine disorders.

Almudena Devesa-Peiro1,2, Patricia Sebastian-Leon1, Antonio Pellicer1,2,3, Patricia Diaz-Gimeno1.   

Abstract

Transcriptomic approaches are increasingly used in reproductive medicine to identify candidate endometrial biomarkers. However, it is known that endometrial progression in the molecular biology of the menstrual cycle is a main factor that could affect the discovery of disorder-related genes. Therefore, the aim of this study was to systematically review current practices for considering the menstrual cycle effect and to demonstrate its bias in the identification of potential biomarkers. From the 35 studies meeting the criteria, 31.43% did not register the menstrual cycle phase. We analysed the menstrual cycle effect in 11 papers (including 12 studies) from Gene Expression Omnibus: three evaluating endometriosis, two evaluating recurrent implantation failure, one evaluating recurrent pregnancy loss, one evaluating uterine fibroids and five control studies, which collected endometrial samples throughout menstrual cycle. An average of 44.2% more genes were identified after removing menstrual cycle bias using linear models. This effect was observed even if studies were balanced in the proportion of samples collected at different endometrial stages or only in the mid-secretory phase. Our bias correction method increased the statistical power by retrieving more candidate genes than per-phase independent analyses. Thanks to this practice, we discovered 544 novel candidate genes for eutopic endometriosis, 158 genes for ectopic ovarian endometriosis and 27 genes for recurrent implantation failure. In conclusion, we demonstrate that menstrual cycle progression masks molecular biomarkers, provides new guidelines to unmask them and proposes a new classification that distinguishes between biomarkers of disorder or/and menstrual cycle progression.
© The Author(s) 2021. Published by Oxford University Press on behalf of European Society of Human Reproduction and Embryology.

Entities:  

Keywords:  confounding variable; differential expression; endometrial pathologies; endometriosis; gene expression; menstrual cycle progression; recurrent implantation failure; recurrent pregnancy loss; transcriptomic analysis; uterine fibroids

Mesh:

Year:  2021        PMID: 33576824      PMCID: PMC8063681          DOI: 10.1093/molehr/gaab011

Source DB:  PubMed          Journal:  Mol Hum Reprod        ISSN: 1360-9947            Impact factor:   4.025


Introduction

The human endometrium is hormonally regulated and changes throughout the menstrual cycle (Noyes ; Murphy, 2004; Talbi ). During most of the menstrual cycle, the endometrium is not receptive to embryonic implantation; it becomes receptive during a period of two to four-five days within the mid-secretory phase known as the window of implantation (Harper, 1992; Wilcox ). To aid in assisted reproduction, endometrial dating methods have been exhaustively investigated over the past 40 years, as have potential reliable biomarkers of receptive status, including days from luteinising hormone (LH) peak or exogenous progesterone administration, morphological changes detected by ultrasounds, histological features and endometrial gene expression profiles (Noyes ; Díaz-Gimeno ; Niederberger ; Craciunas ). Suboptimal endometrial receptivity and altered embryo-endometrial dialogue are considered to be responsible for one third of implantation failures in in-vitro fertilisation (IVF) cycles (Somigliana ). Uterine disorders are complex, polygenic and multifactorial gynecological alterations affecting endometrial gene expression. Endometrial pathologies such as endometriosis, uterine fibroids and adenomyosis are associated with infertility, and may impact endometrial receptivity (Devlieger ; Dunselman ; Zepiridis ; Devesa-Peiro ). Some disorders remain undiagnosed in IVF patients and current treatments are either invasive (e.g. surgical removal) or have low-to-moderate efficiency (Wang ; Dunselman ; Harada ; Zepiridis ; Tanbo and Fedorcsak, 2017). Moreover, issues such as recurrent implantation failure or pregnancy loss of endometrial origin remain incompletely understood with neither an efficient diagnosis nor effective infertility treatment (Jauniaux ; Diaz-Gimeno ; Sebastian-Leon ). Therefore, identifying reliable biomarkers of uterine disorders is a priority to understand the molecular bases of the pathology and to improve diagnosis, prognosis and treatment. With the advent of high-throughput technologies, uterine disorders are evaluated by transcriptomic analysis to identify genes (Burney ; Hawkins ; Lédée ; Tamaresis ; Koot ; Pathare ) that could be useful as diagnostic biomarkers and/or therapeutic targets. Although many genes are reported as potential uterine disorder biomarkers, the results of individual studies overlap poorly (Altmäe ; Lessey and Kim, 2017; Miravet-Valenciano ; Sebastian-Leon ), and reliable biomarkers and potential therapeutic targets that could be clinically meaningful to address suboptimal endometrial receptivity remain elusive. This lack of reproducibility between studies may arise from low sample sizes, sample heterogeneity, undetected confounding variables in gene expression experimental procedures or differing experimental designs and data analysis protocols (Gurevitch ; Suhorutshenko ; Craciunas ; Devesa-Peiro ). Menstrual cycle progression has a profound influence on gene expression (Talbi ; Koot ; Diaz-Gimeno ; Sebastian-Leon ; Saare ). This effect could mask the discovery of candidate uterine disorder biomarkers whose expression responds to both endometrial progression and alterations in uterine disorders. Consequently, it is unclear whether the observed changes in transcriptomic studies reporting differentially expressed genes (DEGs) reflect variations related to the disorder, to menstrual cycle progression or to both. To address this research gap, we quantified the menstrual cycle effect on biomarkers associated with uterine disorders and described current practices in endometrial transcriptomic analysis. Based on our findings, we propose new guidelines to correct for menstrual cycle bias in biomarker identification.

Materials and methods

Ethics statement

This is a retrospective study using case and control data from multiple studies of endometrial gene expression in women with uterine disorders and women with no endometrial pathology. The raw gene expression data and patient meta-data were downloaded from National Center for Biotechnology Information (NCBI) functional genomics data repository Gene Expression Omnibus (GEO), where data are made available for the scientific community. Following GEO policies, in these repository patient data are anonymised and encrypted, and no additional institutional review board is required for downloading (Edgar, 2002).

Study search and selection

A systematic search and review of individual studies was conducted between October 2016 and January 2019 at the NCBI repository GEO (Edgar, 2002). The search identified experiments involving human transcriptomic case versus control raw data evaluating uterine disorders. Keywords searched included endometrium, endometriosis, uterine fibroids, recurrent implantation failure (RIF) and recurrent pregnancy loss (RPL), among others (see Supplementary Table SI for a full list of search terms). No restrictions were placed on publication date or language. The final inclusion criteria were that: there was at least one uterine disorder evaluated in the study design; RNA was extracted directly from human endometrial biopsies; information regarding the menstrual cycle at the time of biopsy was available for all samples; sample sizes were greater than three for both case and control groups belonging to the same study; microarray or RNA sequencing data were obtained using Affymetrix, Illumina or Agilent gene expression platforms; and raw gene expression data were made freely available to download from GEO. Studies evaluating endometrial gene expression at different times of the menstrual cycle in women with normal endometrium were retrieved from GEO using the same keywords and criteria.

Pre-processing and exploratory analysis

Pre-processing and exploratory analyses were completed according to the gene expression platform used: raw data were downloaded and pre-processed using the affy v. 1.52.0 R package (Gautier ) for studies measuring endometrial gene expression with Affymetrix microarray platforms and the limma v.3.30.13 R package (Ritchie ) was used for studies using Agilent or Illumina devices. Normalisation between samples was applied using quantile normalisation (limma R package v.3.30.13); (Ritchie ) and annotation from probeset to gene symbol was established with the biomaRt R package v. 2.30.0 (Durinck ). For studies evaluating gene expression through RNA-Seq, low-count filtering and normalisation was achieved with the edgeR R package v. 3.16.5 (Robinson ). Exploratory analyses were performed to detect batch effects such as sequencing run or microarray slide. Detected effects were treated using linear models [limma R package v.3.30.13 (Ritchie )]. Afterward, menstrual cycle effect was evaluated through principal component analysis plots drawn with the ggplot2 R package v. 3.2.0 (Wickham, 2016). The proportion of endometrial biopsies collected at different stages of the menstrual cycle were compared between case and control groups using Fisher’s exact test (Fisher, 1922) implemented in the R environment (R Core Team, 2016).

Menstrual cycle effect correction and differential expression analysis

The effect of menstrual cycle progression on endometrial biopsy collection was removed from gene expression data using the removeBatcheffect function based on linear models implemented in the limma R package v.3.30.13 (Ritchie ) because this function enables correcting the bias from a known batch effect (e.g., menstrual cycle) while indicating the group differences to be retained in the data (e.g., uterine disorder versus control). Specifically, we used removeBatcheffect for being a slightly safer option than Combat (Espín-Pérez ), specifying the menstrual cycle phase of endometrial biopsy collection as the batch to remove, and defining the design matrix in relation to the condition to be preserved (case versus control samples). Case versus control differential expression analyses were applied (limma R package v.3.30.13 (Ritchie )) with and without removing the menstrual cycle effect; the proportion of differentially expressed genes (false discovery rate; FDR < 0.05) were compared for demonstrating the menstrual cycle bias. To highlight the advantage of this method and validate its statistical power, an alternative approach was compared evaluating case versus control DEGs within each menstrual cycle phase using the limma R package v.3.30.13 (Ritchie ). The proportion of DEGs (FDR < 0.05) obtained was compared with those obtained after menstrual cycle effect correction. All comparisons of DEG proportions were performed with one-sided Fisher’s exact test [adjusted by FDR (Benjamini and Hochberg, 1995)]. The statistical power of both methods was calculated with sizepower R package v.1.52.0 (Qiu ). Additionally, to validate the reliability and robustness of the method used to remove the menstrual cycle effect, the aforementioned approaches of differential expression analysis (with and without menstrual cycle correction) were applied to GEO individual studies comparing endometrial gene expression profiles between different menstrual cycle phases. If the method properly removed the menstrual cycle effect from transcriptomic data, the differential expression analysis between endometrial phases after correcting this effect should indicate no DEGs. The study design is detailed in Fig. 1A.
Figure 1.

(A) Study design and flowchart for transcriptomic studies evaluating uterine disorders. Systematic search and review of (i) transcriptomic case versus control studies evaluating uterine disorders and (ii) transcriptomic studies evaluating menstrual cycle progression in control women with normal endometrium were performed at the Gene Expression Omnibus (GEO) database. After filtering and selection, raw data were downloaded and pre-processed. For each included study, a differential expression analysis was applied with and without removing the menstrual cycle effect on the data using linear models. For case versus control studies (i), the proportions of differentially expressed genes (DEGs; FDR < 0.05) obtained under both approaches were compared using a Fisher’s Exact test and distinct types of endometrial biomarkers were established. Control studies evaluating menstrual cycle progression (ii) were used to evaluate the reliability of menstrual cycle effect correction; if the method properly removed the menstrual cycle effect from transcriptomic data, the differential expression analysis between endometrial phases after correcting this effect should indicate no differentially expressed genes. (B) Flowchart of the selection process of case versus control transcriptomic studies evaluating uterine disorders. The selection of suitable individual transcriptomic studies at GEO and the number of individual studies excluded and remaining after each filtering step are shown. GEO, Gene Expression Omnibus. n, number of studies. N, sample size. lncRNA, long noncoding RNA.

(A) Study design and flowchart for transcriptomic studies evaluating uterine disorders. Systematic search and review of (i) transcriptomic case versus control studies evaluating uterine disorders and (ii) transcriptomic studies evaluating menstrual cycle progression in control women with normal endometrium were performed at the Gene Expression Omnibus (GEO) database. After filtering and selection, raw data were downloaded and pre-processed. For each included study, a differential expression analysis was applied with and without removing the menstrual cycle effect on the data using linear models. For case versus control studies (i), the proportions of differentially expressed genes (DEGs; FDR < 0.05) obtained under both approaches were compared using a Fisher’s Exact test and distinct types of endometrial biomarkers were established. Control studies evaluating menstrual cycle progression (ii) were used to evaluate the reliability of menstrual cycle effect correction; if the method properly removed the menstrual cycle effect from transcriptomic data, the differential expression analysis between endometrial phases after correcting this effect should indicate no differentially expressed genes. (B) Flowchart of the selection process of case versus control transcriptomic studies evaluating uterine disorders. The selection of suitable individual transcriptomic studies at GEO and the number of individual studies excluded and remaining after each filtering step are shown. GEO, Gene Expression Omnibus. n, number of studies. N, sample size. lncRNA, long noncoding RNA. Finally, we compared the fold change (FC), P-value, gene expression average, and standard deviation (SD) before and after menstrual cycle effect correction for understanding the aetiology of the potential endometrial biomarkers. All statistical analyses were run under R software v.3.3.2 (2016-10-31) (R Core Team, 2016).

Results

Current practices for transcriptomic studies evaluating the endometrium in uterine disorders

Of the endometrial studies found in GEO (n = 694), 35 studies met the inclusion criteria (Fig. 1B). Of these, 31.43% (11 studies) did not register the menstrual cycle phase at the time of endometrial biopsy collection and 37.14% (13 studies) collected all endometrial samples at only the proliferative or secretory phases, with no further subdivision for secretory samples into early-, mid- or late-secretory endometrial stages (Fig. 1B). Only six papers with seven case versus control transcriptomic studies reporting the time point in which the biopsy was collected were suitable for analysis: two of eutopic endometrium in stages I–IV endometriosis (n = 37, n = 81), one of ectopic endometrium in ovarian endometriosis (n = 14), one of uterine fibroids (n = 43), two of RIF (n = 115, n = 18) and one of RPL (n = 20) (Table IA; detailed filtering steps in Fig. 1B).
Table I.

Characterization of transcriptomic studies evaluating uterine disorders. A) Clinical characterization of participants and study designs.

GEO ID (Study name)UDDiagnostic and Sub-classification of the studied uterine disorderCycle typeCycle phase dating methodN° Samples per cycle phaseAgeBMI
GSE6364 (Burney 2007)EULaparoscopy proven, surgically documented and histologically validated. Subtype: Ovaric, peritoneal, rectovaginal. Stage: III–IV (rAF) (The American Fertility Society)Natural; Regular; 3 months without hormonal treatment4 blind histopathologists (Noyes et al., 1975).PF (n = 11), ESE (n = 9), MSE (n = 17)

D: 22–44.

C: N/A

N/A
GSE23339 (Hawkins 2011)ECSurgical pathology reports. Subtype: Ovaric.Regular; 30 days without hormonal treatmentLast menstrual period confirmed by pathologyPF (n = 12), S (n = 2)

D : 20–43.

C : 39–48

N/A

GSE51981

(Tamaresis 2014)

EULaparoscopy proven. Subtype: Ovaric, peritoneal, pelvic, vaginal, rectovaginal. Stage: I–II (n = 16), III–IV (n = 37) (rAF) (American Society for Reproductive Medicine, 1997)3 months without hormonal treatment2 pathologists (Noyes et al., 1975), confirmed by serum estradiol and P4 levels and corroborated by 2 independent bioinformatics methods: clustering in unsupervised whole-transcriptome principal component analysis and cycle phase assignment classifier analysis

PF (n = 34), ESE (n = 15), MSE (n = 32)

PF (n = 23), ESE (n = 7), MSE (n = 12), LSE (n = 1).

E : 20–48. UF : 40–50.

C : 23–40.

N/A
UFParticipants’ operative and pathology reportsN/A

GSE58144

(Koot 2016)

RIF

≥ 3 failed IVF/ICSI or ≥10 good quality transferred embryos without pregnancy after IVF/ICSI. P: Previous implantations: 0–1. Embryo implantations: 3–12. Embryos replaced: 3–1.

C: Previous implantations: 1. Embryo implantations: 1–7, 18. Embryos replaced: 1–8, 30.

Natural. Regular (25–35 days), 30 days without hormonal treatmentUrinary LH ovulation predictor kitLH + 5 (n = 8), LH + 6 (n = 27), LH + 7 (n = 70), LH + 8 (n = 10).D: 27–38 C: 26–39

D: 19–37

C: 19–53

GSE92324

(Pathare 2017)

RIF

≥ 2 IVF cycles/ET with good quality

embryos without previous conception (Polanski et al., 2014)

Controlled ovarian stimulationDays from first administration of hCGhCG + 6 (n = 13), hCG + 7 (n = 5)

D : 27–40

C : 21–30

N/A

GSE65099

(Lucas et al., 2016)

RPL≥ 3 1st trimester lossesNatural. 3 months without hormonal treatmentLH surgeLH + 6 (n = 3), LH + 7 (n = 5), LH + 8 (n = 6), LH + 9 (n = 4), LH + 10 (n = 2).D: 31–40 C: 31–44D: 22–32 C: 18–33

(A) Clinical characterisation of participants and study designs. The GEO identifier, study name given in this work, uterine disorder and clinical information about participants including diagnostic method and sub-classification of patients belonging to the case group, cycle type, endometrial dating method, cycle phase in which the endometrial biopsies were collected along with number of samples collected at each menstrual cycle phase, age, BMI, ethnicity and number of samples for both case and control groups are presented for each study. The transcriptomic platform used to measure gene expression and the publication in which data were initially employed are also presented together with the number of evaluated genes. Tamaresis 2014 includes samples from both endometriosis and uterine fibroid patients along with controls. N/A, not available. D, patients belonging to the case the control group. GEO ID, Gene Expression Omnibus identifier. UD, uterine disorder. BMI, body mass index. RIF, recurrent implantation failure. RPL, recurrent pregnancy loss. EU, eutopic endometriosis. EC, ectopic endometriosis. UF, uterine fibroids. rAF, revised American Fertility Society classification system. LH, luteinizing hormone. AMH, anti-müllerian hormone. ET, embryo transfer. FSH, follicle-stimulating hormone. PF, proliferative. ESE, early secretory. MSE, mid-secretory. LSE, late secretory. S, secretory. ICSI, intracytoplasmic sperm injection. IVF, in vitro fertilisation. PCOS, polycystic ovary syndrome. (B) For each study, the number of samples collected at each stage of the menstrual cycle is indicated independently for cases and controls, together with the two-sided Fisher’s exact test P-value obtained after evaluating whether the proportion of samples collected at each endometrial stage significantly differed between groups.

While studies evaluating endometriosis and uterine fibroids mainly used Noyes histopathological criteria and collected biopsies along both the proliferative and secretory phases (Table IA), those evaluating RIF and RPL used days from LH peak or first administration of human chorionic gonadotropin (hCG) for endometrial biopsies collected in the secretory phase (Table IA). For all studies, case and control groups were balanced in terms of the proportion of samples collected at each endometrial stage (P > 0.05; Table IB). Characterization of transcriptomic studies evaluating uterine disorders. A) Clinical characterization of participants and study designs. D: 22–44. C: N/A D : 20–43. C : 39–48 GSE51981 (Tamaresis 2014) PF (n = 34), ESE (n = 15), MSE (n = 32) PF (n = 23), ESE (n = 7), MSE (n = 12), LSE (n = 1). E : 20–48. UF : 40–50. C : 23–40. GSE58144 (Koot 2016) ≥ 3 failed IVF/ICSI or ≥10 good quality transferred embryos without pregnancy after IVF/ICSI. P: Previous implantations: 0–1. Embryo implantations: 3–12. Embryos replaced: 3–1. C: Previous implantations: 1. Embryo implantations: 1–7, 18. Embryos replaced: 1–8, 30. D: 19–37 C: 19–53 GSE92324 (Pathare 2017) ≥ 2 IVF cycles/ET with good quality embryos without previous conception (Polanski ) D : 27–40 C : 21–30 GSE65099 (Lucas ) D: Caucasian (n = 13), Asian (n = 4), Black (n = 1), Asian Indian (n = 1), unknown (n = 2). C: N/A D: Latina (n = 5), Caucasian (n = 2). C: Latina (n = 5), African American (n = 2) GSE51981 (Tamaresis 2014) E: Caucasian (n = 38), Asian (n = 4), Black (n = 1), Asian Indian (n = 1), Hispanic (n = 1), unknown (n = 8). UF: Caucasian (n = 11), Asian (n = 1), Black (n = 3). C: Caucasian (n = 19), Asian (n = 4), Black (n = 3), Hispanic (n = 1), unknown (n = 1) GSE58144 (Koot 2016) GSE92324 (Pathare 2017) GSE65099 (Lucas ) EU PF (n = 6), ESE (n = 6), MSE (n = 9) PF (n = 5), ESE (n = 3), MSE (n = 8) 00.834 (A) Clinical characterisation of participants and study designs. The GEO identifier, study name given in this work, uterine disorder and clinical information about participants including diagnostic method and sub-classification of patients belonging to the case group, cycle type, endometrial dating method, cycle phase in which the endometrial biopsies were collected along with number of samples collected at each menstrual cycle phase, age, BMI, ethnicity and number of samples for both case and control groups are presented for each study. The transcriptomic platform used to measure gene expression and the publication in which data were initially employed are also presented together with the number of evaluated genes. Tamaresis 2014 includes samples from both endometriosis and uterine fibroid patients along with controls. N/A, not available. D, patients belonging to the case the control group. GEO ID, Gene Expression Omnibus identifier. UD, uterine disorder. BMI, body mass index. RIF, recurrent implantation failure. RPL, recurrent pregnancy loss. EU, eutopic endometriosis. EC, ectopic endometriosis. UF, uterine fibroids. rAF, revised American Fertility Society classification system. LH, luteinizing hormone. AMH, anti-müllerian hormone. ET, embryo transfer. FSH, follicle-stimulating hormone. PF, proliferative. ESE, early secretory. MSE, mid-secretory. LSE, late secretory. S, secretory. ICSI, intracytoplasmic sperm injection. IVF, in vitro fertilisation. PCOS, polycystic ovary syndrome. (B) For each study, the number of samples collected at each stage of the menstrual cycle is indicated independently for cases and controls, together with the two-sided Fisher’s exact test P-value obtained after evaluating whether the proportion of samples collected at each endometrial stage significantly differed between groups.

Menstrual cycle effect on transcriptomic studies searching for uterine disorder biomarkers

All studies collecting endometrial biopsies at different cycle time points had a menstrual cycle effect on gene expression (see PCAs on Figs 2 and 3).
Figure 2.

Menstrual cycle effect on endometrial gene expression in transcriptomic studies by , . For each study (A–D), the result of the principal component analysis is plotted for the first three components before and after applying the menstrual cycle effect correction to evaluate whether the samples are primarily grouped by the menstrual cycle phase of endometrial biopsy collection (colour code) and/or by the uterine disorder (shape code) based on their gene expression profiles. PC, principal component. PF, proliferative. ESE, early secretory. MSE, mid-secretory. SE, secretory. LH, luteinising hormone. RIF, recurrent implantation failure.

Figure 3.

Menstrual cycle effect on endometrial gene expression for . For each study (A and B), the result of the principal component analysis is plotted for the first three components before (right) and after (left) applying the menstrual cycle effect correction to evaluate whether the samples are primarily grouped by the menstrual cycle phase of endometrial biopsy collection (colour code) and/or by the uterine disorder (shape code) based on their gene expression profiles. PC, principal component. hCG, human chorionic gonadotrophin. LH, luteinising hormone. RIF, recurrent implantation failure. RPL, recurrent pregnancy loss.

Menstrual cycle effect on endometrial gene expression in transcriptomic studies by , . For each study (A–D), the result of the principal component analysis is plotted for the first three components before and after applying the menstrual cycle effect correction to evaluate whether the samples are primarily grouped by the menstrual cycle phase of endometrial biopsy collection (colour code) and/or by the uterine disorder (shape code) based on their gene expression profiles. PC, principal component. PF, proliferative. ESE, early secretory. MSE, mid-secretory. SE, secretory. LH, luteinising hormone. RIF, recurrent implantation failure. Menstrual cycle effect on endometrial gene expression for . For each study (A and B), the result of the principal component analysis is plotted for the first three components before (right) and after (left) applying the menstrual cycle effect correction to evaluate whether the samples are primarily grouped by the menstrual cycle phase of endometrial biopsy collection (colour code) and/or by the uterine disorder (shape code) based on their gene expression profiles. PC, principal component. hCG, human chorionic gonadotrophin. LH, luteinising hormone. RIF, recurrent implantation failure. RPL, recurrent pregnancy loss. Samples from Burney 2007, Hawkins 2011, Tamaresis 2014 and Koot 2016 were grouped by endometrial phase before menstrual cycle effect correction (Fig. 2). However, samples were mainly grouped by the uterine disorder (Fig. 2) and a significant average of 44.19% more biomarkers were detected (one-sided Fisher’s exact tests FDR ≤ 5.53 × 10–06) when the effect of the menstrual cycle was corrected (Table IIA). These newly revealed uterine disorder biomarkers that were previously masked by the menstrual cycle effect were, in fact, identified despite balanced endometrial cycle stages between case and control groups (Table IB). Masked uterine disorder biomarkers were also detected in Koot 2016, among endometrial biopsies collected at different time points within the mid-secretory phase (LH + 5-LH + 8) (Table IB).
Table II.

Uterine disorder biomarkers after removing the menstrual cycle bias.

A) Differential expression analysis with and without correcting the menstrual cycle effect on endometrial transcriptomic data.

StudyUDN° DEGs without menstrual cycle correctionN° DEGs with menstrual cycle correction% of newly detected disorder biomarkersFisher’s test FDRMenstrual cycle biomarkersUD biomarkers not masked by the menstrual cycleUD biomarkers masked by the menstrual cycle
Burney 2007 EU12781284.36%6.6 × 10–160127685
Hawkins 2011 EC903120525.06%1.92 × 10–110903302
Tamaresis 2014 EU13 39713 7976.05%5.53 × 10–0643512 962835
UF971510 90911.71%6.6 × 10–168396321277
Koot 2016 RIF23293.75%5.15 × 10–080230
Pathare 2017 RIF27832492−2.29%1
Lucas et al., 2016 RPL000%

(A) Differential expression analysis with and without correcting the menstrual cycle effect on endometrial transcriptomic data. For each study, this table presents the number of differentially expressed genes (DEGs) obtained with and without menstrual cycle correction, the % of disorder-specific genes newly identified when correcting the menstrual cycle effect, the false discovery rate (FDR) associated with the different proportions of DEGs detected with and without menstrual cycle correction (one-sided Fisher’s exact test, as we expected to identify a higher number of DEGs with menstrual cycle correction), and the number of genes belonging to the three types of endometrial biomarkers identified in the study: biomarkers of the menstrual cycle alone (DEGs only detected without correcting the menstrual cycle effect), biomarkers of the uterine disorder that are masked by the menstrual cycle (DEGs only detected with the menstrual cycle correction) and uterine disorder biomarkers that are not masked by the menstrual cycle (intersected DEGs between both approaches). UD, uterine disorder. EU, eutopic endometriosis. EC, ectopic endometriosis. UF, uterine fibroids. RIF, recurrent implantation failure. RPL, recurrent pregnancy loss. (B) Newly discovered uterine disorder biomarkers unmasked by the menstrual cycle effect correction. From the total number of uterine disorder biomarkers that were unmasked after applying the menstrual cycle effect correction method (DEGs only detected with the menstrual cycle correction, type C biomarkers in Fig. 3), the table shows the number of biomarkers that had been previously associated with the uterine disorder either in the original articles or in the databases Disgenet v.6 (42), Phenopedia v.6.2.3 (43), and/or GeneCards v.4.14.0 (41), together with the number of uterine disorder biomarkers not previously reported and thus newly discovered in this work. Keywords used in database searches: ‘endometriosis’, ‘uterine fibroids OR leiomyoma OR myoma’, and ‘recurrent implantation failure’. (C) Differential expression analysis (DEA) for each menstrual cycle phase. This table presents the number of differentially expressed genes (DEGs) in each study and the statistical power (%) obtained when the analysis was performed for samples collected at each menstrual cycle phase separately. The statistical power (%) of the analysis with menstrual cycle phase correction is also indicated for comparison. For both approaches, sample sizes (n) are indicated between parentheses. PF, proliferative. ESE, early secretory. MSE, mid-secretory. S, secretory. LH, luteinizing hormone.

Uterine disorder biomarkers after removing the menstrual cycle bias. A) Differential expression analysis with and without correcting the menstrual cycle effect on endometrial transcriptomic data. (A) Differential expression analysis with and without correcting the menstrual cycle effect on endometrial transcriptomic data. For each study, this table presents the number of differentially expressed genes (DEGs) obtained with and without menstrual cycle correction, the % of disorder-specific genes newly identified when correcting the menstrual cycle effect, the false discovery rate (FDR) associated with the different proportions of DEGs detected with and without menstrual cycle correction (one-sided Fisher’s exact test, as we expected to identify a higher number of DEGs with menstrual cycle correction), and the number of genes belonging to the three types of endometrial biomarkers identified in the study: biomarkers of the menstrual cycle alone (DEGs only detected without correcting the menstrual cycle effect), biomarkers of the uterine disorder that are masked by the menstrual cycle (DEGs only detected with the menstrual cycle correction) and uterine disorder biomarkers that are not masked by the menstrual cycle (intersected DEGs between both approaches). UD, uterine disorder. EU, eutopic endometriosis. EC, ectopic endometriosis. UF, uterine fibroids. RIF, recurrent implantation failure. RPL, recurrent pregnancy loss. (B) Newly discovered uterine disorder biomarkers unmasked by the menstrual cycle effect correction. From the total number of uterine disorder biomarkers that were unmasked after applying the menstrual cycle effect correction method (DEGs only detected with the menstrual cycle correction, type C biomarkers in Fig. 3), the table shows the number of biomarkers that had been previously associated with the uterine disorder either in the original articles or in the databases Disgenet v.6 (42), Phenopedia v.6.2.3 (43), and/or GeneCards v.4.14.0 (41), together with the number of uterine disorder biomarkers not previously reported and thus newly discovered in this work. Keywords used in database searches: ‘endometriosis’, ‘uterine fibroids OR leiomyoma OR myoma’, and ‘recurrent implantation failure’. (C) Differential expression analysis (DEA) for each menstrual cycle phase. This table presents the number of differentially expressed genes (DEGs) in each study and the statistical power (%) obtained when the analysis was performed for samples collected at each menstrual cycle phase separately. The statistical power (%) of the analysis with menstrual cycle phase correction is also indicated for comparison. For both approaches, sample sizes (n) are indicated between parentheses. PF, proliferative. ESE, early secretory. MSE, mid-secretory. S, secretory. LH, luteinizing hormone. In contrast, samples from Lucas and Pathare 2017 were primarily grouped by the disorder rather than by the time point of endometrial biopsy collection before menstrual cycle effect correction; and no more DEGs were identified after correcting the menstrual cycle effect (Table IIA, Fig. 3). While in Burney 2007, Hawkins 2011 and Koot 2016, the DEGs detected before menstrual cycle correction where included in those identified after the correction was applied. For Tamaresis 2014, there were DEGs detected only before applying the menstrual cycle correction (Table IIA).

A new classification of endometrial transcriptomic biomarkers in uterine disorders

Comparison of the expression profiles of significant genes identified before and after applying the menstrual cycle correction allowed us to detect their aetiology and define a new classification of endometrial biomarkers (Table IIA, Fig. 4).
Figure 4.

A new classification of endometrial biomarkers according to their gene expression profiles. For each type of endometrial biomarker (A–C), the average and standard deviation of an example gene is represented before and after menstrual cycle effect correction separated by the uterine disorder (cases and controls) and the menstrual cycle phase. The global measure of cases and controls is also represented together with the correspondent P-values and fold changes (FC) obtained in the differential expression analyses with and without correcting the menstrual cycle. P-values were used instead of FDR as, for each gene, P-value adjustment for multiple testing is dependent on the P-values obtained in the other genes rather than gene-exclusive. (A) Menstrual cycle biomarkers detected only without correction: before the menstrual cycle effect correction, the differences in gene expression among samples from distinct endometrial phases was high and made this gene globally significant (P = 0.036) without being truly affected by the uterine disorder, as the average gene expression of case and control groups was not different within the proliferative and mid-secretory phases. After correcting for the menstrual cycle, the distance between gene expression patterns in different endometrial phases shortened and became non-significant (P = 0.474). Example gene: PPP2R2C from Tamaresis 2014. (B) Uterine disorder biomarkers not masked by the menstrual cycle and detected both before and after the correction: for B1 biomarkers, samples belonging to different phases had similar average expression both before and after applying the menstrual cycle correction, suggesting that expression is not affected by the menstrual cycle and is significantly distinct between cases and controls regardless of whether the correction is applied. In contrast, B2 biomarkers had different average expression between samples collected at different menstrual cycle phases, but the differences between cases and controls were higher, allowing these genes to be also identified as uterine disorder biomarkers before correcting for the effect of the menstrual cycle. Example genes: MYL10 (B1) and SESN1 (B2) from Burney2006. (C) Uterine disorder biomarkers masked by the menstrual cycle and only detected after the correction: Expression differences between menstrual cycle phases are greater than those between cases and controls, making the depicted gene not significant before menstrual cycle correction (P = 0.404). After correcting for the effect of the menstrual cycle, expression differences between menstrual cycle phases are minimised, reducing the variability within case and control groups and making the gene globally significant (P = 0.002). Example gene: CTNNA2 from Burney2006. DEGs, differentially expressed genes. PF, proliferative. ESE, early secretory. MSE, mid-secretory.

A new classification of endometrial biomarkers according to their gene expression profiles. For each type of endometrial biomarker (A–C), the average and standard deviation of an example gene is represented before and after menstrual cycle effect correction separated by the uterine disorder (cases and controls) and the menstrual cycle phase. The global measure of cases and controls is also represented together with the correspondent P-values and fold changes (FC) obtained in the differential expression analyses with and without correcting the menstrual cycle. P-values were used instead of FDR as, for each gene, P-value adjustment for multiple testing is dependent on the P-values obtained in the other genes rather than gene-exclusive. (A) Menstrual cycle biomarkers detected only without correction: before the menstrual cycle effect correction, the differences in gene expression among samples from distinct endometrial phases was high and made this gene globally significant (P = 0.036) without being truly affected by the uterine disorder, as the average gene expression of case and control groups was not different within the proliferative and mid-secretory phases. After correcting for the menstrual cycle, the distance between gene expression patterns in different endometrial phases shortened and became non-significant (P = 0.474). Example gene: PPP2R2C from Tamaresis 2014. (B) Uterine disorder biomarkers not masked by the menstrual cycle and detected both before and after the correction: for B1 biomarkers, samples belonging to different phases had similar average expression both before and after applying the menstrual cycle correction, suggesting that expression is not affected by the menstrual cycle and is significantly distinct between cases and controls regardless of whether the correction is applied. In contrast, B2 biomarkers had different average expression between samples collected at different menstrual cycle phases, but the differences between cases and controls were higher, allowing these genes to be also identified as uterine disorder biomarkers before correcting for the effect of the menstrual cycle. Example genes: MYL10 (B1) and SESN1 (B2) from Burney2006. (C) Uterine disorder biomarkers masked by the menstrual cycle and only detected after the correction: Expression differences between menstrual cycle phases are greater than those between cases and controls, making the depicted gene not significant before menstrual cycle correction (P = 0.404). After correcting for the effect of the menstrual cycle, expression differences between menstrual cycle phases are minimised, reducing the variability within case and control groups and making the gene globally significant (P = 0.002). Example gene: CTNNA2 from Burney2006. DEGs, differentially expressed genes. PF, proliferative. ESE, early secretory. MSE, mid-secretory. Menstrual cycle biomarkers (Fig. 4A) are genes detected only before menstrual cycle correction, suggesting that they respond to endometrial progression but not to the uterine disorder itself (higher expression differences were observed between endometrial stages than between cases and controls). This type of biomarker was identified by Tamaresis 2014, where a gene is differentially expressed without being truly affected by the uterine disorder (Fig. 4A). After correcting for the menstrual cycle, the distance between gene expression patterns in different endometrial phases shortened and became non-significant. Uterine pathology biomarkers not masked by the menstrual cycle (Fig. 4B) are genes detected with and without menstrual cycle correction because there was no effect of endometrial progression (Fig. 4.B.1, no expression differences were observed between endometrial stages) or the effect was lower than that of the uterine disorder (Fig. 4.B.2, higher expression differences between cases and controls than between endometrial stages). Uterine disorder biomarkers masked by the menstrual cycle (Fig. 4C) are genes detected only after menstrual cycle correction because the effect of endometrial progression greatly outweighs that of the uterine disorder (higher expression differences were observed between endometrial phases than between cases and controls). Consequently, these genes remain masked and not significant before menstrual cycle correction and can only be detected as uterine disorder biomarkers after removal of menstrual cycle effect. For the three types of endometrial biomarkers, the menstrual cycle effect correction method only affected the variability of gene expression explained by endometrial progression, thus changing the P-value when comparing case and control groups (Fig. 5). In contrast, fold changes between cases and controls did not substantially change in any gene belonging to any type of endometrial biomarker (Fig. 5), indicating that the correction method successfully maintained the expression differences associated with uterine disorders. Consequently, P-value changes were not caused by alteration in case versus control mean expression differences but by the removal of menstrual cycle induced variation in gene expression (Fig. 5).
Figure 5.

Changes in For each study (A–D) and type of biomarker (colour code) shown in Fig. 4, -values (up) and FCs (down) before and after applying the menstrual cycle effect correction method are plotted against each other. Diagonal dotted lines indicate absence of changes in P-values/FCs before and after applying the correction method. Vertical and horizontal dotted lines for P-values represent P-value = 0.05 before and after correcting for the menstrual cycle effect, respectively.

Changes in For each study (A–D) and type of biomarker (colour code) shown in Fig. 4, -values (up) and FCs (down) before and after applying the menstrual cycle effect correction method are plotted against each other. Diagonal dotted lines indicate absence of changes in P-values/FCs before and after applying the correction method. Vertical and horizontal dotted lines for P-values represent P-value = 0.05 before and after correcting for the menstrual cycle effect, respectively.

Discovery of new potential biomarkers

Among the type C uterine disorder biomarkers that remained undetected before correcting the menstrual cycle effect on gene expression (Fig. 4C, Supplementary Table SII) (Yu ; Stelzer ; Piñero ), we discovered 544 new candidate biomarkers of eutopic endometriosis (Burney 2007), 158 of ectopic ovarian endometriosis (Hawkins 2011) and 27 of recurrent implantation failure (Koot 2016) that had not been previously reported (Table IIB and Supplementary Table SII) (Burney ; Yu ; Hawkins ; Koot ; Stelzer ; Piñero ). These new biomarkers presented an expression difference between cases and controls of 12–121% for eutopic endometriosis, 15–359% for ectopic ovarian endometriosis and 2–11% for RIF (Supplementary Table SII). To better understand their functional role in the context of uterine disorder pathophysiology, these new biomarkers were functionally annotated (Fig. 6, Supplementary Table SIII). New candidate endometriosis biomarkers (both ectopic and eutopic) were mainly related to metabolism (19 ectopic and 73 eutopic), transcription regulation processes (14 ectopic and 47 eutopic) and protein-modification processes (14 ectopic and 47 eutopic) (Fig. 6A). In addition, functions widely reported to be involved in endometriosis such as immune response and inflammation (Tomassetti ; Burney ; Burney and Giudice, 2012; Liu ; Patel ; Anderson, 2019; Marquardt ; Devesa-Peiro ) and cell differentiation and development (Tomassetti ; Burney ; Burney and Giudice, 2012; Crispi ; Liu ; Patel ; Marquardt ; Zhang ; Devesa-Peiro ) were notably annotated to these new potential biomarkers of endometriosis (Fig. 6A). New RIF candidate biomarkers were mostly associated with transcription regulation (3 biomarkers: CHD4, IGHMBP2, ZBTB48), post-transcriptional changes (2 biomarkers: FAM182B and RN7SK), and epigenetics and chromatin remodelling processes (2 biomarkers: CDH4, FAM182B); these functions were previously reported as significantly altered in RIF patients (Cakmak and Taylor, 2011; Koot ; Pathare ; Devesa-Peiro ) (Fig. 6B). Gene names of new candidate biomarkers of ectopic/eutopic endometriosis and RIF belonging to each functional group are listed in Supplementary Table SIII together with literature references supporting the role of each functional group in the context of each uterine disorder.
Figure 6.

Functional roles of the newly discovered potential biomarkers of endometriosis and recurrent implantation failure (A) and recurrent implantation failure (RIF) (B). Functions associated with the new biomarkers of endometriosis (A) (eutopic: n = 544, ectopic: n = 158) and RIF (n = 27) (B) are shown in the endometrial cellular context together with references supporting the role of each function in the pathophysiology of the corresponding disorder. The number of biomarkers of eutopic (A, purple boxes) and ectopic (A, green boxes) endometriosis and RIF (B, pink boxes) annotated to each function are represented. New potential biomarkers belonging to each represented function are fully listed in Supplementary Table SIII. Functional annotations were performed for each new biomarker using Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathways (Ogata ) and Gene Ontology (GO) (Ashburner ) databases. For GO annotation, the three ontologies were used (biological processes, molecular functions, cellular components), only experimental-evidenced GO-gene associations were included, a propagated GO version was used for considering the whole GO-tree structure, and annotated GO terms were filtered by those having more than five and less than 500 associated genes. Obtained GO and KEGG annotated terms were then grouped in broader functional categories, which are represented in this figure. Keywords used in PubMed to search for functions altered in endometriosis and RIF patients included ‘endometriosis’, ‘RIF’, ‘recurrent implantation failure’, ‘function’, ‘pathway’, ‘gene ontology’ and ‘KEGG’. References: a—(Devesa-Peiro ); b—(Burney ); c—(Crispi ); d—(Anderson, 2019); e—(Burney and Giudice, 2012); f—(Liu ); g—(Marquardt ); h—(Patel ); I—(Tomassetti ); j—(Zhang ); k—(Cakmak and Taylor, 2011); l—(Koot ); m—(Pathare ); n—(Huang ); o—(Long ); p—(Hapangama ); q—(Attar ); r—(von Adamek et al., 2005); s—(Kiyomizu ).

Functional roles of the newly discovered potential biomarkers of endometriosis and recurrent implantation failure (A) and recurrent implantation failure (RIF) (B). Functions associated with the new biomarkers of endometriosis (A) (eutopic: n = 544, ectopic: n = 158) and RIF (n = 27) (B) are shown in the endometrial cellular context together with references supporting the role of each function in the pathophysiology of the corresponding disorder. The number of biomarkers of eutopic (A, purple boxes) and ectopic (A, green boxes) endometriosis and RIF (B, pink boxes) annotated to each function are represented. New potential biomarkers belonging to each represented function are fully listed in Supplementary Table SIII. Functional annotations were performed for each new biomarker using Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathways (Ogata ) and Gene Ontology (GO) (Ashburner ) databases. For GO annotation, the three ontologies were used (biological processes, molecular functions, cellular components), only experimental-evidenced GO-gene associations were included, a propagated GO version was used for considering the whole GO-tree structure, and annotated GO terms were filtered by those having more than five and less than 500 associated genes. Obtained GO and KEGG annotated terms were then grouped in broader functional categories, which are represented in this figure. Keywords used in PubMed to search for functions altered in endometriosis and RIF patients included ‘endometriosis’, ‘RIF’, ‘recurrent implantation failure’, ‘function’, ‘pathway’, ‘gene ontology’ and ‘KEGG’. References: a—(Devesa-Peiro ); b—(Burney ); c—(Crispi ); d—(Anderson, 2019); e—(Burney and Giudice, 2012); f—(Liu ); g—(Marquardt ); h—(Patel ); I—(Tomassetti ); j—(Zhang ); k—(Cakmak and Taylor, 2011); l—(Koot ); m—(Pathare ); n—(Huang ); o—(Long ); p—(Hapangama ); q—(Attar ); r—(von Adamek et al., 2005); s—(Kiyomizu ). Genes obtained from Tamaresis 2014 were not used to report newly discovered biomarkers because their samples were separated into two groups according to an unknown effect that we were not able to associate with any technical, biological or clinical registered variable in the original study. This unknown effect could be responsible for the remarkably large number of DEGs obtained both before and after applying the correction of the menstrual cycle effect (Table IIA). Considering all this, we cannot assess how this unknown effect on gene expression may impact the results.

Comparison of the proposed menstrual cycle effect correction method versus transcriptomic analyses within each menstrual cycle phase for the identification of uterine disorder biomarkers in endometrium

A significantly lower proportion of DEGs (FDR < 0.05) was obtained when analysis was performed independently for each menstrual cycle phase compared to when correcting for the menstrual cycle effect on gene expression (Fisher’s exact test FDR < 2.2 × 10−16; Table IIC). Indeed, significant genes from Burney 2007 and Hawkins 2011 were only detected in the early secretory phase and in the proliferative phase, respectively (Table IIC). As expected, due to the reduction in sample size, the statistical power was lower for the per menstrual cycle phase analyses compared to the menstrual cycle correction method (Table IIC).

Validation of the menstrual cycle effect correction method

To check the robustness and reliability of the method used to remove the menstrual cycle effect on gene expression, we applied the aforementioned approaches of differential expression analysis (with and without menstrual cycle correction) to five independent endometrial transcriptomic studies that compared endometrial gene expression profiles across different menstrual cycle phases (Table IIIA). Three studies used the LH peak for endometrial dating (Bradley 2010, Altmae 2017, Sigurgeirsson 2016), one used histopathological criteria of Noyes (Talbi 2006), and the other did not report the dating methodology (Kelleher2017). For the five evaluated studies, samples were mainly grouped by the menstrual cycle phase according to principal component analysis (Supplementary Fig. S1). Consequently, we identified significantly DEGs between endometrial phases in all of them before applying the menstrual cycle effect correction (Table IIIB). However, samples were no longer grouped by endometrial phase (Supplementary Fig. S1) and no DEGs were obtained between the distinct endometrial phases after the menstrual cycle effect was removed (Table IIIB), demonstrating that the correction worked as expected.
Table III.

Transcriptomic studies evaluating differences between menstrual cycle phases in women with normal endometrium: Validating the menstrual cycle effect correction method.

A) Clinical characterisation of participants.

GEO ID (Study Name)Cycle typeCycle phase dating methodN° Samples per cycle phaseAgeBMIEthnicityPlatformRef.
GSE4888 (Talbi 2006)Normo-ovulatory. Regular (24–35 days). 3 months since last hormonal treatment4 pathologists (Noyes et al., 1975)

PF (n = 6)

ESE (n = 4)

MSE (n = 9)

LSE (n = 8)

23–49N/ACaucasian (n = 17), Black (n = 6), Asian (n = 1), Other (n = 2), Unknown (n = 1)hgu133plus2 Affymetrix(Talbi et al., 2006)
GSE29981 (Bradley 2011)Regular (glandular epithelium alone)Days from LH peak: PF (LH-14—LH-1), ESE (LH + 1—LH + 4), MSE (LH + 6 - LH + 7)

PF (n = 10)

ESE (n = 6)

MSE (n = 4)

20–39N/AN/Ahgu133plus2 AffymetrixN/A
GSE98386 (Altmae 2017)Natural cycleDays from LH peak

LH + 2 (n = 20)

LH + 8 (n = 20)

N/AN/AEstoniaIllumina HiSeq 2500(Altmäe et al., 2017; Rekker et al., 2018; Teder et al., 2018)
GSE86491 (Sigurgeirsson 2016)Regular. 3 months since last hormonal treatmentPF: 6–8 days after the start of the subsequent menstruation. MSE-LSE: LH + 7-LH + 9, Urinary LH ovulation predictor kit. Both confirmed by a gynaecological pathologist through histopathological examination.

PF (n = 7)

MSE-LSE (n = 7)

24–3019.8–33.2N/AIllumina HiSeq 2500(Sigurgeirsson et al., 2016)
GSE119209 (Kelleher 2018)N/AN/A

PF (n = 6)

MSE (n = 5)

N/AN/AN/AIllumina HiSeq 2500N/A

(A) Clinical characterisation of participants. The GEO identifier, study name given in this work, and clinical information about participants including cycle type, endometrial dating method, cycle phase in which the endometrial biopsies were collected along with number of samples for each menstrual cycle phase, age, BMI, and ethnicity are presented for each included study. The transcriptomic platform used to measure gene expression and the publication in which data were initially employed are also presented. Altmae2017 and Sigurgeirsson 2016 have paired samples. N/A, not available. GEO ID, Gene Expression Omnibus GSE identifier. BMI, body mass index. LH, luteinizing hormone. PF, proliferative. ESE, early secretory. MSE, mid-secretory. LSE, late secretory. S, secretory. (B) Differential expression analysis with and without correcting for menstrual cycle. For each study, the number of differentially expressed genes (DEGs) obtained after ANOVA and pairwise comparisons between the distinct menstrual cycle phases is shown, with and without menstrual cycle correction.

Transcriptomic studies evaluating differences between menstrual cycle phases in women with normal endometrium: Validating the menstrual cycle effect correction method. A) Clinical characterisation of participants. PF (n = 6) ESE (n = 4) MSE (n = 9) LSE (n = 8) PF (n = 10) ESE (n = 6) MSE (n = 4) LH + 2 (n = 20) LH + 8 (n = 20) PF (n = 7) MSE-LSE (n = 7) PF (n = 6) MSE (n = 5) (A) Clinical characterisation of participants. The GEO identifier, study name given in this work, and clinical information about participants including cycle type, endometrial dating method, cycle phase in which the endometrial biopsies were collected along with number of samples for each menstrual cycle phase, age, BMI, and ethnicity are presented for each included study. The transcriptomic platform used to measure gene expression and the publication in which data were initially employed are also presented. Altmae2017 and Sigurgeirsson 2016 have paired samples. N/A, not available. GEO ID, Gene Expression Omnibus GSE identifier. BMI, body mass index. LH, luteinizing hormone. PF, proliferative. ESE, early secretory. MSE, mid-secretory. LSE, late secretory. S, secretory. (B) Differential expression analysis with and without correcting for menstrual cycle. For each study, the number of differentially expressed genes (DEGs) obtained after ANOVA and pairwise comparisons between the distinct menstrual cycle phases is shown, with and without menstrual cycle correction.

Discussion

This study is the first to assess the current practices in identifying transcriptomic biomarkers of uterine disorders in endometrium, especially in relation to the menstrual cycle phase of endometrial biopsy collection. We found that one third of the studies did not report the menstrual cycle phase of the samples, including all of the suitable studies evaluating endometrial adenocarcinoma, leiomyosarcoma and adenomyosis. Other studies (37.14%) collected endometrial biopsies within the same menstrual cycle phase but with no further sub-classification in early-, mid- or late-secretory stages was made. Noyes histopathological criteria (Noyes ) is one of the most utilised methods for endometrial dating, even though endometrial transcriptomics is superior in both accuracy and reproducibility (Díaz-Gimeno ). Our description of current practices demonstrated that, although it is widely known that the menstrual cycle progression affects most of the genes expressed in endometrium (Talbi ; Koot ; Diaz-Gimeno ; Sebastian-Leon ; Saare ), new guidelines are needed for avoiding the menstrual cycle bias in transcriptomic analysis, and a more in-depth registration and consideration of endometrial stage is required. Menstrual cycle correction enabled identification of an average of 44.19% more uterine disorder biomarkers that would have otherwise remained undiscovered. This phenomenon held true regardless of the endometrial dating method or whether the endometrial biopsies were collected along the entire menstrual cycle (Burney 2007, Hawkins 2011, Tamaresis 2014) or only within the secretory phase (Koot 2016). The highest evidence was shown when the menstrual cycle effect correction was needed even if the study design included samples balanced across the cycle between case and control groups. Our findings suggest that the current practice for avoiding menstrual cycle bias in transcriptomic studies is unable to prevent endometrial progression from masking potential uterine disorder biomarkers. In addition, our results showed biopsies collected at the secretory phase must be further subdivided into early-, mid- and late-secretory stages to correct for menstrual cycle bias. One of the limitations of this study is that it depends on publicly available datasets. Therefore, the effect of the menstrual cycle on biomarker discovery could only be evaluated in a limited number of uterine disorders. However, the effect of the menstrual cycle was present in all the evaluated conditions; thus, this effect is likely also present in other endometrial pathologies not included here. Recently, Suhorutshenko and colleagues demonstrated that endometrial receptivity biomarkers are biased by the distinct proportions of stromal and epithelial cells within the collected endometrial biopsies (Suhorutshenko ). Here, we demonstrate that menstrual cycle progression biases the biomarker search and if the proportion of cell types described by Suhorutshenko and colleagues is inherent to menstrual cycle progression, this produces a cycle-based bias rather than a technical bias in biopsy collection as has been suggested (Suhorutshenko ). We also demonstrate that menstrual cycle progression affects the discovery of endometrial transcriptomic biomarkers and call for an assessment of best practices of endometrial transcriptomic analysis. We propose a novel classification of endometrial biomarkers for gene expression studies evaluating uterine disorders. This new classification identifies biomarkers depending on the aetiology of gene expression changes, distinguishing between menstrual cycle biomarkers, uterine disorder biomarkers not masked by the menstrual cycle (which are sub-classified as genes whose expression is not affected by the menstrual cycle and genes with a menstrual cycle effect but in which the phase-dependent expression changes is less than those explained by the uterine disorder) and uterine disorder biomarkers masked by the menstrual cycle (identified after the menstrual cycle effect is corrected). This latter type of endometrial biomarker is likely to remain undetected under current practices of transcriptomic studies that do not control for menstrual cycle bias. Using this methodology, we unveiled new potential biomarkers: 544 for eutopic endometriosis, 158 for ectopic ovarian endometriosis and 27 for recurrent implantation failure, all of which had not been previously reported in the included studies (Burney ; Hawkins ; Tamaresis ; Koot ) or other studies consulted through Disgenet (Piñero ), Phenopedia (Yu ) and GeneCards (Stelzer ) databases. These new candidate biomarkers were involved in functions known to be altered by the corresponding uterine disorder, such as immune response and inflammation, cell differentiation and development in endometriosis (Tomassetti ; Burney and Giudice, 2012; Crispi ; Tamaresis ; Liu ; Patel ; Anderson, 2019; Marquardt ; Zhang ; Devesa-Peiro ) or epigenetics and transcription and post-transcription regulation in RIF (Cakmak and Taylor, 2011; Koot ; Pathare ; Devesa-Peiro ), highlighting their relevance in the pathophysiology of the disease. Our method of menstrual cycle correction proved to be robust and reliable, as samples were no longer grouped by the endometrial phase and we did not identify any DEGs between distinct menstrual cycle phases after we applied this correction to studies evaluating menstrual cycle changes in women with normal endometrium. This was consistent regardless of the statistical method employed for differential expression analysis (ANOVA or pairwise comparisons) and/or endometrial dating method. In addition, the correction method maintained the average differences between case and control samples for studies evaluating uterine disorders, demonstrating that the effect of the condition was not removed in the correction process and that the observed changes in the P-values were explained only by the removal of the gene expression variability explained by menstrual cycle progression. When comparing our approach of uterine disorder biomarker detection with those followed in the included studies, we found that only Koot and colleagues corrected for menstrual cycle effect also using linear models (Koot ). From the remaining five studies, Burney and colleagues and Tamaresis and colleagues addressed menstrual cycle bias by dividing samples according to the menstrual cycle phase and performing an independent differential expression analysis at a probeset level (Burney ; Tamaresis ). Unlike our proposed correction method, this strategy allows identification of phase-specific uterine disorder biomarkers (e.g., genes whose expression only differs significantly between cases and controls in the proliferative phase but not in the secretory phase). Although identifying phase-specific uterine disorder biomarkers is useful in understanding the relationship between the disorder and the menstrual cycle, we demonstrated that this strategy retrieves significantly fewer potential uterine disorder biomarkers compared to menstrual cycle effect correction, as it sacrifices statistical power due to lower sample sizes. In contrast, our proposed correction method identifies uterine disorder biomarkers regardless of menstrual cycle phase during biopsy. Notably, removing the menstrual cycle effect does not impede identifying potential biomarker genes that are responding to both the menstrual cycle and the uterine disorder. In fact, those genes more greatly influenced by the menstrual cycle than by the uterine disorder are the ones that the correction method can unmask. Considering these findings, we define new guidelines for the detection of reliable uterine disorder biomarkers according to distinct scenarios. If endometrial biopsies are collected at different stages of the menstrual cycle, the menstrual cycle effect must be always corrected in the transcriptomic analysis as endometrial timing is masking genes whose expression is affected by the uterine disorder. In unbalanced studies in which the menstrual cycle stage was not corrected, we expected an additional risk of identifying genes as uterine disorder biomarkers whose expression is indeed dependent on the menstrual cycle and not on the evaluated condition. Therefore, applying the correction for the menstrual cycle in these studies is even more crucial. We observed this in Tamaresis 2014, the only study in which menstrual cycle biomarkers were identified and where the endometriosis samples were mostly secretory and the control samples proliferative. However, this hypothesis needs further confirmation, as this study presented an unknown effect on gene expression. Single-cell studies are increasingly used to evaluate endometrial gene expression changes throughout the menstrual cycle, increasing our molecular understanding of endometrial function and receptivity acquisition (Wang ). Although further studies are needed, we could expect from our results that the menstrual cycle effect will also need to be corrected in single-cell studies aimed to identify biomarkers of uterine disorders on specific cell types and in which samples are collected at different endometrial stages. Although the correction method proposed in this study was previously applied to single-cell studies (Tran ), other methodologies have recently arisen to specifically correct known effects in this type of gene expression data (Haghverdi ; Tran ). In-depth registration and consideration of endometrial stage is needed in any transcriptomic study to optimise the detection of reliable biomarkers of uterine disorders. Here, we introduce a novel classification of endometrial transcriptomic biomarkers depending on the DEG aetiology and set new guidelines to accurately detect uterine disorder biomarkers through differential expression analysis with high reproducibility and statistical power. Using these methods, we unmasked new endometriosis and RIF potential biomarkers to improve diagnosis, prognosis and treatment. The application of these methods in future research on biomarker discovery of uterine disorders would further contribute to delineating their aetiology and progression, and ultimately leading towards improved treatments and increased pregnancy rates in these patients.

Data availability

The raw gene expression data and patient meta-data associated with the studies that were reprocessed and reanalysed in this work are freely available to download from GEO given their unique identifiers: GSE6364 (Giudice, 2007), GSE23339 (Creighton ), GSE51981 (Tamaresis ), GSE58144 (van Hooff ), GSE92324 (Pathare ), GSE65099 (Brosens ), GSE4888 (Talbi ), GSE29981 (Bradley ), GSE98386 (Suhorutshenko ), GSE86491 (Sigurgeirsson ) and GSE119209 (Kelleher ). Supplementary Tables SII and SIII of this article are available at the Figshare Repository, https://dx.doi.org/10.6084/m9.figshare.13643147

Supplementary data

Supplementary data are available at Molecular Human Reproduction online. Click here for additional data file.
  67 in total

1.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.

Authors:  Ron Edgar; Michael Domrachev; Alex E Lash
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

2.  Dating the endometrial biopsy.

Authors:  R W Noyes; A T Hertig; J Rock
Journal:  Am J Obstet Gynecol       Date:  1975-05       Impact factor: 8.661

3.  Adjusting batch effects in microarray expression data using empirical Bayes methods.

Authors:  W Evan Johnson; Cheng Li; Ariel Rabinovic
Journal:  Biostatistics       Date:  2006-04-21       Impact factor: 5.899

Review 4.  Evidence-based guidelines for the investigation and medical treatment of recurrent miscarriage.

Authors:  Eric Jauniaux; Roy G Farquharson; Ole B Christiansen; Niek Exalto
Journal:  Hum Reprod       Date:  2006-05-17       Impact factor: 6.918

Review 5.  Endometrial receptivity in eutopic endometrium in patients with endometriosis: it is not affected, and let me show you why.

Authors:  Jose Miravet-Valenciano; María Ruiz-Alonso; Eva Gómez; Juan A Garcia-Velasco
Journal:  Fertil Steril       Date:  2017-07       Impact factor: 7.329

6.  The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses.

Authors:  Gil Stelzer; Naomi Rosen; Inbar Plaschkes; Shahar Zimmerman; Michal Twik; Simon Fishilevich; Tsippi Iny Stein; Ron Nudel; Iris Lieder; Yaron Mazor; Sergey Kaplan; Dvir Dahary; David Warshawsky; Yaron Guan-Golan; Asher Kohn; Noa Rappaport; Marilyn Safran; Doron Lancet
Journal:  Curr Protoc Bioinformatics       Date:  2016-06-20

Review 7.  Endometriosis, recurrent miscarriage and implantation failure: is there an immunological link?

Authors:  C Tomassetti; C Meuleman; A Pexsters; A Mihalyi; C Kyama; P Simsa; T M D'Hooghe
Journal:  Reprod Biomed Online       Date:  2006-07       Impact factor: 3.828

8.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt.

Authors:  Steffen Durinck; Paul T Spellman; Ewan Birney; Wolfgang Huber
Journal:  Nat Protoc       Date:  2009-07-23       Impact factor: 13.491

9.  Uterine disorders affecting female fertility: what are the molecular functions altered in endometrium?

Authors:  Almudena Devesa-Peiro; Patricia Sebastian-Leon; Francisco Garcia-Garcia; Vicente Arnau; Alejandro Aleman; Antonio Pellicer; Patricia Diaz-Gimeno
Journal:  Fertil Steril       Date:  2020-06       Impact factor: 7.329

10.  Loss of Endometrial Plasticity in Recurrent Pregnancy Loss.

Authors:  Emma S Lucas; Nigel P Dyer; Keisuke Murakami; Yie Hou Lee; Yi-Wah Chan; Giulia Grimaldi; Joanne Muter; Paul J Brighton; Jonathan D Moore; Gnyaneshwari Patel; Jerry K Y Chan; Satoru Takeda; Eric W-F Lam; Siobhan Quenby; Sascha Ott; Jan J Brosens
Journal:  Stem Cells       Date:  2015-12-17       Impact factor: 6.277

View more
  3 in total

1.  The mid-secretory endometrial transcriptomic landscape in endometriosis: a meta-analysis.

Authors:  E Vargas; E García-Moreno; L Aghajanova; A Salumets; J A Horcajadas; F J Esteban; S Altmäe
Journal:  Hum Reprod Open       Date:  2022-04-04

2.  Transcriptomics of receptive endometrium in women with sonographic features of adenomyosis.

Authors:  Erika Prašnikar; Tanja Kunej; Mario Gorenjak; Uroš Potočnik; Borut Kovačič; Jure Knez
Journal:  Reprod Biol Endocrinol       Date:  2022-01-03       Impact factor: 5.211

3.  EndoTime: non-categorical timing estimates for luteal endometrium.

Authors:  Julia Lipecki; Andrew E Mitchell; Joanne Muter; Emma S Lucas; Komal Makwana; Katherine Fishwick; Joshua Odendaal; Amelia Hawkes; Pavle Vrljicak; Jan J Brosens; Sascha Ott
Journal:  Hum Reprod       Date:  2022-04-01       Impact factor: 6.918

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.