Literature DB >> 29713536

A systematic review on the genetics of male infertility in the era of next-generation sequencing.

Amal Robay¹, Saleha Abbasi², Ammira Akil², Haitham El-Bardisi³, Mohamed Arafa^3,4, Ronald G Crystal⁵, Khalid A Fakhro^1,2.

Abstract

OBJECTIVES: To identify the role of next-generation sequencing (NGS) in male infertility, as advances in NGS technologies have contributed to the identification of novel genes responsible for a wide variety of human conditions and recently has been applied to male infertility, allowing new genetic factors to be discovered.
MATERIALS AND METHODS: PubMed was searched for combinations of the following terms: 'exome', 'genome', 'panel', 'sequencing', 'whole-exome sequencing', 'whole-genome sequencing', 'next-generation sequencing', 'azoospermia', 'oligospermia', 'asthenospermia', 'teratospermia', 'spermatogenesis', and 'male infertility', to identify studies in which NGS technologies were used to discover variants causing male infertility.
RESULTS: Altogether, 23 studies were found in which the primary mode of variant discovery was an NGS-based technology. These studies were mostly focused on patients with quantitative sperm abnormalities (non-obstructive azoospermia and oligospermia), followed by morphological and motility defects. Combined, these studies uncover variants in 28 genes causing male infertility discovered by NGS methods.
CONCLUSIONS: Male infertility is a condition that is genetically heterogeneous, and therefore remarkably amenable to study by NGS. Although some headway has been made, given the high incidence of this condition despite its detrimental effect on reproductive fitness, there is significant potential for further discoveries.

Entities: Chemical

Keywords: Genome-wide association study; Male infertility; Next-generation sequencing

Year: 2018 PMID： 29713536 PMCID： PMC5922186 DOI： 10.1016/j.aju.2017.12.003

Source DB: PubMed Journal: Arab J Urol ISSN： 2090-598X

Overview of next-generation sequencing (NGS)1 technologies in the study of genetic disease

Genetic investigation of human populations has made remarkable advances in recent years, owing to the development and availability of NGS platforms. In contrast to the laborious process of single-gene mutation screening through exon-by-exon amplification and Sanger sequencing, NGS enables the interrogation of large panels of genes in a single experiment and at a reasonable cost [1], [2], [3]. NGS can be broadly classified into two categories: targeted panels or whole genome. Targeted methods (sometimes also referred to as ‘panel sequencing’) include investigation of a group of genes (referred to as a ‘gene panel’), usually selected on the basis of known disease association, or expanded to include genes within known disease pathways. Commercially produced custom-capture panels may be tailored to fit any number of genomic fragments of interest. The most comprehensive panel approach is therefore whole-exome sequencing, in which all coding regions are captured and sequenced. Typical whole-exome sequencing panels also capture flanking regulatory regions, enabling assessment of mutations affecting conserved but non-coding genic elements, e.g. splice junctions and 3′ and 5′ untranslated region (UTR) sequences [4]. Beyond whole-exome sequencing, whole-genome sequencing is used to discover variants in the entire human genome. Alongside the advantage of covering non-coding and inter-genic regions, whole-genome sequencing does not require target enrichment prior to sequencing, and thus is possible with minimal sample preparation and results in sequenced fragments that appear evenly distributed across all chromosomes. This random distribution results in similar coverage across most of the genome, which means that variants can be reliably called at average genome depth as low as 20×. This is contrary to panel-based (e.g., whole-exome sequencing) in which target enrichment and PCR amplification may yield highly variable coverage profiles exome-wide, resulting in some exons being missed by chance. Whilst these areas can be discovered through bioinformatics later, re-interrogating them manually is labour intensive. Another important advantage of whole-genome sequencing is the ability to detect genome-wide structural variants (including copy number variants [CNVs]) [5], [6]. Given the number of human disorders related to structural variants, a single test that can assess both large and small genomic variation is sometimes preferable, and the cost of whole-genome sequencing for these diseases is justified as only slightly higher than the cost of running a microarray and whole-exome sequencing separately for the same individual.

Technical considerations for study design

Because a single sequencing experiment may produce hundreds of millions of reads per sequencing lane, target coverage, and by extension variant calling quality, is highly dependent on the total number of regions being interrogated. The same number of reads that can cover a single genome for an average depth of 30× can cover a single exome (∼20 000 genes) for an average depth of >300×, representing a gross inefficiency in the use of sequencing reagents. This can be overcome using multiplexing strategies, e.g. sample barcoding, which allows sequencing more than one individual’s exome in the same sequencing lane followed by bioinformatics assignment of each read to each sample based on unique barcodes. This allows for >5 exomes to be ‘multiplexed’ in a single lane, with each being read to an average depth >60× with the same reagents consumed reading a single genome at 30× [4]. This effect is multiplied several-fold as the size of the interrogated panel shrinks, e.g., >100 individuals can be investigated simultaneously for a panel of ∼200 genes in the same sequencing lane [4], [7], [8]. Thus, coverage requirements and cohort size are critical variables to consider when designing NGS experiments for human disease.

NGS bioinformatics and data interpretation

One critical consideration of NGS is that instruments generate massive amounts of data, requiring sophisticated computational infrastructure and tools (bioinformatics) to process and analyse. Bioinformatics for genome sequencing is a relatively nascent field, mostly a product of work over the last decade, with algorithms and strategies adapting to rapid innovations in sequencing technologies. Regardless of the sequencing platform, all bioinformatics pipelines share in common three major aspects: read mapping, variant calling, and variant interpretation. In simple terms, read mapping is the process by which the sequenced short-reads coming off the instrument are mapped to a reference human genome by standard base-alignment methods. After mapping, bases that differ from the reference are identified (called) as variants. Once variants are called, their putative effects can be interpreted based on the genomic regions they impact and likely contribution to disease. Variants are broadly organised into three different classes: single nucleotide variants (SNVs, previously referred to as single nucleotide polymorphisms), multi-nucleotide variants, and structural variants. Quality and zygosity of each variant are assigned based on a number of statistical considerations, including: depth of sequencing, per-base quality, the number of times each variant base is observed, and the likelihood that such a change is biologically true rather than an artefact of sequencing [9]. Thus, the two steps of alignment and variant calling may themselves introduce error into the experiment, e.g. for fragments coming from highly repetitive genomic segments [10]. This fact is well-recognised in the field and software development has grown into an area of intense exploration, validation, and quality guidelines [11], [12]. Whilst some of these errors may be mitigated using long-read technologies or increasing coverage depth, these solutions remain expensive and impractical when studying large cohorts. Perhaps the most experimentally challenging aspect of NGS bioinformatics is variant interpretation. It is at this step that the effect of each discovered variant is predicted, and thus its putative effect on disease extrapolated. Variant interpretation not only depends on a well-annotated genome, where gene and amino acid positions are well-established, but also on sequencing of large numbers of control individuals against which candidate disease variants (that are often rare) can be distinguished from population-specific polymorphisms (that may appear to be rare if inadequate numbers of population-matched controls are assessed). As more populations get sequenced around the world, these databases are expected to grow and become more robust for variant interpretation in the future [13], [14].

NGS at the point-of-care

Nevertheless, as NGS technologies enter clinical care settings, there is a growing need to establish clinical-grade ‘gold-standard’ analysis ‘pipelines’, similar to the College of American Pathologists (CAP) or Clinical Laboratory Improvement Amendments (CLIA) certifications given to diagnostic laboratories [11], [12], [15]. The role of these pipelines is to reproducibly convert a bio-specimen’s chemical signals into reproducible interpretable data, and from these data to then extract actionable information to improve patient health or change the course of disease management [16]. Clearly, implementation of pipelines that can robustly cover these steps is a non-trivial task. These pipelines would need to account for several influences on quality parameters, including: sample preparation using different protocols, sequencing on different platforms, ensuring all genes in a panel are adequately covered, sources of error from the sequencing chemistry itself, and statistical errors from the sequence alignment and variant calling steps. Thus, there is a fundamental need to establish standard-operating procedures for clinical NGS that guarantee reproducibility, transparency and standardisation, thereby ensuring precision of interpretation in clinical settings. This task scales in complexity with the number of samples being studied, and also interpretation can vary dramatically based on the population that is analysed and the databases from which annotations are being drawn [17]. Of key consideration in NGS analysis is the large number of variant sites produced per individual (3–4 million per genome). Amongst these, there are hundreds or thousands of variants of unknown significance whose interpretation and relevance to health and disease is entirely unknown and can therefore neither be ruled in nor out [12]. In many cases, these variants can be further stratified based on sharing with close family members, arguing for recruitment of parents and siblings at the point of care. In such cases, the presence of variants in unaffected family members may help eliminate them from further consideration; however, the converse is not true, leaving many seemingly private variants with unknown function. Robust clinical platforms should deal with such variants accordingly, bearing in mind that some may turn up meaning in the future and may therefore be relevant to the subject’s health and should not be discarded. The fact that the field is constantly undergoing discovery, with >200 new genes and thousands of variants being linked to diseases each year in humans and many more in model organisms [18], [19], presents a critical challenge of keeping annotation databases up to date. This has resulted in the strategy of sequencing once and interrogating often, based on the premise that a patient’s genome will not change over time and could be reassessed for causal variants periodically as annotations improve. Therefore, this strategy would support whole-genome sequencing or whole-exome sequencing methods at the point of care over targeted panels due to this potential longevity of the data. Such considerations need to be taken into account when designing clinical NGS pipelines, thus ensuring that genetic testing of patients is accurate, reproducible, and safe.

Successful application of NGS to male infertility

Infertility is defined as the inability to conceive after 1 year of continuous unprotected sexual relations [20]. It affects ∼15% of couples, and males contribute to ∼50% of the causes of infertility either solely or combined with female factors [21], [22]. There are many causes for male infertility, including genetic disorders (e.g., chromosomal anomalies or gene defects), hormonal causes, genital infection or trauma, varicocele, chemical or physical agents affecting spermatogenesis, and genital duct obstruction. Genetic anomalies have been reported in 2.2–10.8% of cases of male infertility and are higher in cases of severe quantitative infertility defects (azoospermia and severe oligozoospermia) [22]. However, in 30–40% of cases of male infertility no cause can be identified and these cases are labelled ‘idiopathic’ [23]. In these cases, genetic abnormalities are still highly suspected, although the genes in which they occur remain unknown. The management of male infertility includes complete medical history taking and clinical examination followed by a combination of laboratory investigations tailored to each case. Semen analysis is the cornerstone of male infertility diagnosis. This may be followed by hormonal assays, radiological investigation and genetic studies especially in cases of severe defects. The commonly used genetic tests include karyotyping to detect numerical or structural chromosomal abnormalities and PCR to detect known genetic anomalies like Y-chromosome microdeletion, Anosmin (Kallmann syndrome) gene defects or cystic fibrosis transmembrane conductance regulator (CFTR) variants. In the last few decades and with the advances in in vitro fertilisation (IVF) and the introduction of intracytoplasmic sperm injection (ICSI), severe male infertility cases with few sperms in semen or even cases of azoospermia with focal intra-testicular spermatogenesis can father their own children. This highlighted the need for proper genetic diagnosis to avoid vertical transmission of genetic abnormalities or production of more unstable genetic defects in the new-born. This need could be met using NGS in male infertility cohorts.

Hallmarks of disease suitability for NGS

NGS has found spectacular success in many diseases, most notably Mendelian or rare diseases, where carrying causative variants leads to significant reduction in reproductive fitness. In such cases, causative variants are rare and highly penetrant, allowing interpretation pipelines to discard the vast majority of NGS variants that are also present in control individuals or at a frequency exceeding the disease prevalence in the general population. The rarity of these variants means that other family members who are also affected are very likely to share the same genetic cause, usually due to a founder mutation that has arisen de novo in a recent common ancestor and has been maintained at low frequency in this specific family. However, one of the difficulties in NGS analysis is the issue of penetrance. Diseases where genetic variants are not completely penetrant present a daunting task for data interpretation. Similarly, situations where controls are phenotypic controls but not genetic controls (e.g., asymptomatic carriers or soon to be symptomatic carriers of late-onset diseases) require substantial complementary analysis (statistical and functional studies) to support the discovery of key causative genetic variants.

Suitability of male infertility for NGS

Male infertility is by definition a disease that significantly affects reproductive fitness, thereby ensuring causative variants remain at low-frequency in the population. However, one important difference between these variants and those that cause other rare, severe disorders is that these may be carried and passed down from females, and thus, their frequency may be higher than usually anticipated for rare diseases. Additionally, advances in IVF may lead to successful transmission of disease-causing variants if they happen to be carried in the sperm used for fertilisation. Another substantial challenge is in identifying suitable controls for research studies. Without detailed semen analysis, fertile men (with a history of fathering at least one child) should be used with caution as controls for the different types of male infertility, with the exception of azoospermia, i.e., one cannot know for sure that a confirmed father does not also suffer defects in motility, sperm morphology or sperm count.

Present systematic review

The present literature review aimed to identify the role of NGS in male infertility and to state the new genes that have been identified as a causative factor of male infertility. In an attempt to cover all recent reports in which NGS was used to identify variants causing male infertility, we performed a search on PubMed (https://www.ncbi.nlm.nih.gov/pubmed) for combinations of the following terms: ‘exome’, ‘genome’, ‘panel’, ‘gene sequencing’, ‘whole-exome sequencing’, ‘whole-genome sequencing’, ‘next-generation sequencing’, ‘azoospermia’, ‘oligospermia’, ‘spermatogenic failure’, ‘asthenospermia’, ‘teratospermia’, ‘spermatogenesis’, and ‘male infertility’ (Fig. 1 [24]). We restricted the search to papers published after 2010 and focusing only on humans. This process was done by a team of three individuals such that each of the papers was inspected by at least two separate individuals to determine the suitability for inclusion.

Fig. 1

Preferred reporting items for systematic reviews and meta-analyses (PRISMA) flowchart of review methodology. PubMed was searched for all articles containing NGS studies of infertility as described in the methods, limiting to the organism [Homo sapiens] and to studies published after 2010. A total of 251 unique papers were found by this search strategy, and were evaluated by at least two scientists to determine those that were germane. Studies that did not use NGS technologies, which were on female infertility, which had results from animal models, or which evaluated male patients with known developmental syndromes (e.g., PCD, Kallmann syndrome [24]) were eliminated. So were studies evaluating genetics by GWAS or by single-locus interrogation (e.g., Sanger, TaqMan). Finally, studies using other omics technologies were eliminated. Altogether, 23 unique papers adhered to the inclusion criteria; all genes discovered in these studies appear in Table 1.

Table 1

Genetic variants discovered in infertile men by NGS technologies.

Infertility classification1	Gene identified	Reported alleles2	Study method3	Cohort4	Cohort size5	Number assessed6	Number with variant(s)7	Reference
Quantitative	ADGRG2	[c.A2968G (p.K990E)], [c.G1709A (p.C570Y)]	WES	Sporadic	18 cases	18	1	[35]
	CFTR	c.350G > A (p.Arg117His)	Panel	Sporadic	1112	1112	1	[36]
	DNAH6	c.C10413A (p.H3471Q)	WES	Familial	1 family	5	2	[31]
	DNMT3L	dup21q22.3, del21q22.38	WGS	Sporadic	33 cases	33	2	[37]
	HLA-DQA1, HLA-DRB1	dup6p21.328	WGS	Sporadic	33 cases	33	1	[37]
	MAGEB4	c.1041A > T (p.347Cys-ext24)	WES	Familial	1 family	3	2	[32]
	MEIOB	c.A191T (p.N64I)	WES	Familial	1 family	4	4	[31]
	NPAS2	chr2: 101592000C > G (p.P455A)	Panel	Familial	2 families	6	3	[30]
	SIRPA	[c.*273G > T, c.697G > A (p.Val233Ile)]	Panel	Sporadic	1800	1376	29	[25], [26]
	SIRPG	c.*223 T > G (3′UTR)	Panel	Sporadic	1184	1184	2	[25]
	SPINK2	c.56-3C > G (splice)	WES	Familial	1 family	2	2	[34]
	SYCE1	c.197-2 A > G (splice)	WES	Familial	1 family	6	2	[28]
	SYCP3	c.524_527del (p.Ile175Asnfs*8)	Panel	Sporadic	1112	1112	1	[36]
	TAF4B	c.1831C > T (p.R611X)	WES	Familial	2 families	2	1	[27]
	TDRD9	c.720_723 del TAGT (p.Ser241Profs*4)	Panel	Familial	2 families	17	5	[33]
	TEX14	c.2668-2678del (Early stop codon)	WGS	Familial	1 family	2	2	[31]
	TEX15	c.2130 T > G (p.Y710*)	WES	Familial	2 families	10	4	[29]
	ZMYND15	c.1520_1523delAACA (p.Lys507Serfs*3)	WES	Familial	2 families	2	1	[27]

Morphological	BRDT	c.G2783A (p.G928D)	WES	Familial	1 family	1	1	[40]
	CEP135	c.A1364T (p.D455V)	WES	Familial	1 family	1	1	[39]
	DNAH1	[c.6253_6254del, c.11726_11727del (p.R2085fs, p.P3909fs)], [c.7377 + 1G > C ()], [c.A3836G, c.11726_11727del (p.K1279R, p.P3909fs)], [c.C12397T, c.11726_11727del (p.R4133C, p.P3909fs)], c.5766-2A > G, c.G10630T (p.E3544X)], [c.C4115T,c.11726_11727del (p.T1372M,p.P3909fs)], [c.C6822G, c.G9850A (p.D2274E, p.E3284K)], [c.C7066T, c.11726_11727del (p.R2356W, p.P3909fs)], [c.C7066T, c.11726_11727del (p.R2356W, p.P3909fs)], [c.G2610A, c.G12287T (p.W870X, p.R4096L)], [c.G3108A, c.G5864A (p.W1036X, p.W1955X)], [c.T6212G, c.12200_12202del (p.L2071R, p.4067_4068del)]	WES	Sporadic	21 cases	21	12	[42]
	NPHP4	c.2044C > T (p.R682*)	WES	Familial	1 family	2	2	[38]
	SUN5	[c.381delA (p.Val128Serfs7)], [c.824C > T (p.Thr275Met)], [c.381delA (p.Val128Serfs7)], [c.781G > A (p.Val261Met)], [c.216G > A (p.Trp72)], [c.1043A > T (p.Asn348Ile)], [c.425.1G > A/c.1043A > T (p.Asn348Ile)], [c.851C > G (p.Ser284)], [c.340G > A (p.Gly114Arg)], [c.824C > T (p.Thr275Met)], [c.1066C > T (p.Arg356Cys)], [c.485 T > A (p.Met162Lys)]	Panel	Sporadic	15 cases	15	6	[41]

Motility	CFAP43	[c.2802 T > A (p.Cys934)], [c.4132C > T (p.Arg1378)], [c.253C > T (p.Arg85Trp)], [c.3945_4431del (p.Ile1316Leufs*10)], [c.386C > A (p.Ser129Tyr)]	WES	Sporadic	30 cases	30	3	[46]
	CFAP44	c.2005_2006delAT (p.Met669Valfs*13)	WES	Sporadic	30 cases	30	1	[46]
	CFAP65	c.5341G > T (p.Glu1781*)	WES	Sporadic	30 cases	30	1	[46]
	DNAH1	[c.8626-1G > A (splice)], [c.11726_11727delCT (p.Pro3909ArgfsTer33)], [c.8626-1G > A (splice)]	Panel	Sporadic	6 families and 38 cases	59	10	[43], [44]
	SPAG17	c.G4343A (p.R1448Q)	Panel	Familial	2 families	7	2	[45]

Quantitative anomalies include azoospermia and oligospermia; morphological anomalies include teratozoospermia, macrozoospermia, globozoospermia and acephalic spermatozoa syndrome; motility anomalies include asthenospermia and flagellar abnormalities impairing movement.

For each gene, reported alleles are included in Human Genome Variation Society (HGVS) format [47] (for each allele, the putative effect on the complementary DNA and protein are included). Where more than one allele was observed, individual’s alleles are grouped by square ‘[]’ brackets.

For each gene and alleles, the method of variant discovery by NGS, whole-genome sequencing (WGS), whole-exome sequencing (WES), and panel-based sequencing (Panel), of a pre-selected group of genes.

Cohort type studied, family-based sequencing (Familial), cases or case–control design (Sporadic).

Number of individuals recruited to the study.

Number of individuals assessed.

Number in whom the reported variant(s) was found.

CNV allele discovered by NGS.

The PubMed search returned 669 articles; 418 duplicates were removed and 251 were unique. We then manually inspected all articles through the title and abstract to eliminate results that were not applicable to our present study. These included removing papers focused on: (i) female infertility; (ii) multi-organ syndromes in which infertility is a part (e.g., Kallmann syndrome, primary ciliary dyskinesia [PCD]); (iii) animal models; (iv) non-genetic characterisation of sperm or semen samples; (v) non-genetic studies using next-generation platforms (e.g., epigenetics, transcriptomics); and (vi) genetic investigation by methods other than NGS (e.g., Sanger sequencing, TaqMan, microarray, multiplex ligation-dependent probe amplification [MLPA]). In total, 220 citations were excluded after abstract screening, leaving 31 papers, which were retrieved for full-text searching. Eight papers were the excluded as the full-text did not include data on relevant indicators. Finally, 23 eligible reports were used in the present systematic review, all of which were original articles. All genes discovered in these studies are listed in Table 1 [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], [44], [45], [46], [47]. Genetic variants discovered in infertile men by NGS technologies. Quantitative anomalies include azoospermia and oligospermia; morphological anomalies include teratozoospermia, macrozoospermia, globozoospermia and acephalic spermatozoa syndrome; motility anomalies include asthenospermia and flagellar abnormalities impairing movement. For each gene, reported alleles are included in Human Genome Variation Society (HGVS) format [47] (for each allele, the putative effect on the complementary DNA and protein are included). Where more than one allele was observed, individual’s alleles are grouped by square ‘[]’ brackets. For each gene and alleles, the method of variant discovery by NGS, whole-genome sequencing (WGS), whole-exome sequencing (WES), and panel-based sequencing (Panel), of a pre-selected group of genes. Cohort type studied, family-based sequencing (Familial), cases or case–control design (Sporadic). Number of individuals recruited to the study. Number of individuals assessed. Number in whom the reported variant(s) was found. CNV allele discovered by NGS.

Advances in male infertility due to NGS by subtype

Quantitative anomalies (azoospermia and oligospermia)

Perhaps the most studied of male infertility subtypes using NGS are the quantitative abnormalities: non-obstructive azoospermia (NOA) and oligospermia. The oldest of these was a study in 2013 [25], in which the authors used NGS to refine a genome-wide association study (GWAS) signal they had previously discovered. In this study, five genes were interrogated around peak association signals on chromosomes 12 [peroxisomal biogenesis factor 10 (PEX10), protein arginine methyltransferase 6 (PRMT6) and SRY-box 5 (SOX5)] and 20 [signal regulatory protein α (SIRPA) and signal regulatory protein γ (SIRPG)]. Using custom-capture followed by sequencing on Illumina’s first generation Solexa platform in 96 NOA subjects and 96 healthy controls, the authors identified six variants in three genes (SIRPA, SIRPG and SOX5) that appeared at different frequencies between cases and controls [25]. To verify which of these could be causal, the authors then screened only these six single nucleotide polymorphisms (SNPs) in an additional 520 NOA subjects and 477 controls. This analysis replicated only two SNVs, a protective variant in SIRPA (rs199733185) and a variant that increases risk for NOA in SIRPG (rs1048055) [25]. In a separate study, Xu et al. [26] also found an association between a SNV in SIRPA (rs3197744) by targeted panel sequencing of cases and controls, supporting the putative role of this gene in male infertility. Subsequently, several other studies have used NGS to assess individuals with NOA. First, Ayhan et al. [27] investigated two unrelated consanguineous families with spermatogenic failure, the first with three azoospermic brothers and one oligospermic, and the second with three azoospermic brothers. In this study, the authors used a hybrid approach of employing whole-exome sequencing after SNV genotyping, which allowed them to selectively focus on runs of homozygosity to identify the causative variant [27]. This search led to the identification of a different gene for each family, TATA-box binding protein associated factor 4b (TAF4B) and zinc finger MYND-type containing 15 (ZMYND15), both harbouring recessive deleterious truncating mutations shared by all affected brothers within each family [27]. Notably, the same recessive variant was shared by the oligospermic brother, suggesting some variable penetrance and supporting the grouping of quantitative abnormalities in a single category genetically. Maor-Sagie et al. [28] used whole-exome sequencing in a single patient with NOA to find a candidate homozygous splice-site mutation in synaptonemal complex central element protein 1 (SYCE1), which was then discovered to segregate with the disease in the family, i.e., one affected brother shared the same homozygous mutation, but it was absent from the fertile siblings and in heterozygous state in carrier parents, who were consanguineous. Okutman et al. [29] discovered a recessive mutation in testis expressed 15, meiosis and synapsis associated (TEX15) segregating with NOA in three affected siblings in a Turkish family, absent from the fertile brother and parents. Ramasamy et al. [30] discovered neuronal PAS domain protein 2 (NPAS2) mutations in three siblings with azoospermia in another consanguineous family from Turkey. Finally, Gershoni et al. [31] used a combination of whole-exome sequencing and whole-genome sequencing in different families to discover mutations in the genes: meiosis specific with OB domains (MEIOB), testis expressed 14, intercellular bridge forming factor (TEX14) and dynein axonemal heavy chain 6 (DNAH6) [31]. In all cases, the mutations segregated with the affected members within each family and were rare in control databases, making them prime candidates for causing disease [31]. More recently, five studies published in 2017 used NGS in patients with NOA or oligospermia to uncover additional genes causative of quantitative sperm defects and male infertility. Four of these focused on multiplex consanguineous families, establishing segregation of recessive mutations in serine peptidase inhibitor, Kazal type 2 (SPINK2), MAGE family member B4 (MAGEB4), Tudor domain containing 9 (TDRD9) and adhesion G protein-coupled receptor G2 (ADGRG2) with NOA siblings but none in healthy males in the family [32], [33], [34], [35]. The fifth study devised a novel experimental approach to assess both SNVs and copy number changes in 107 genes associated with male infertility from the literature [36]. Using single molecular inversion probes targeting 4525 genomic regions on 21 chromosomes, the investigators were able to rapidly screen for mutations in these genes in 1138 azoospermic or oligospermic subjects [36]. Whilst the authors found six infertile males with chromosomal anomalies and five with azoospermia factor (AZF)-region deletions, point mutations were only found in an additional six subjects, five with CFTR mutations and one with a mutation in synaptonemal complex protein 3 (SYCP3), further reinforcing the notion that male infertility is extremely genetically heterogeneous [36]. Nevertheless, the authors comment that the sensitivity of their assay (e.g., detecting chromosomal abnormalities in patients who had already been screened by microarrays) and the cost of running such a scalable platform make it ideal for introduction into clinical settings [36]. In an extension of NGS utility to the detection of structural variation, a group of 33 patients with spermatogenic failure and unexplained azoospermia were assessed by whole-genome sequencing for CNVs [37]; 27 patients had a total of 42 CNVs detected, ranging in size from 40 kb to 2.38 Mb. Whilst these CNVs were distributed across multiple chromosomes, and some overlapped known CNVs common in the database of genomic variants, there were three loci that were absent from the database of genomic variants and were shared by more than one azoospermic subject: 21q22.3, 6p21.32, 13q11 each shared by two individuals [37]. Only the first two of these were genic, affecting the DNA methyltransferase 3 like (DNMT3L) gene and the major histocompatibility complex, class II, DR β1 (HLA-DRB1) and major histocompatibility complex, class II, DQ α1 (HLA-DQA1) genes, respectively [37]. Whilst HLA class II genes have been generally implicated in infertility [48], these two genes had not been previously linked. However, evidence supporting DNMT3L gene involvement is stronger, and its role in spermatogenesis and spermatogenic impairment has been shown previously [49]. Altogether, 19 genes have been implicated in causing quantitative defects in spermatogenesis by NGS technologies (Table 1).

Morphological anomalies (teratozoospermia, macrozoospermia, globozoospermia and acephalic spermatozoa syndrome)

Morphological anomalies impairing fertility occur in different forms, affecting the head, neck and the tail of the sperm. The latter usually causes motility defects (next section), whereas the former can be further subdivided into macrozoospermia, globozoospermia, acephalic spermatozoa syndrome, or dysplasia of the sperm fibrous sheath (DFS). In the era of NGS, only five studies have been published to date in which such affected subjects were sequenced. In the first of these studies, Alazami et al. [38] used whole-exome sequencing in a family with asthenozoospermia, identifying a nonsense mutation in nephrocystin 4 (NPHP4). In another study, Sha et al. [39] sequenced a patient with flagellar abnormalities and discovered a recessive deleterious mutation in centrosomal protein 135 (CEP135), a protein necessary for centriole biogenesis. The mutation caused infertility by forming protein aggregates in the centrosome and flagella. In a separate study, Li et al. [40] discovered a mutation in bromodomain testis associated (BRDT) in a consanguineous patient with acephalic spermatozoa. The homozygous mutation, which alters a highly-conserved residue in the BRDT protein, is rare in the sense that its functional study revealed it is a gain-of-function recessive mutation [40]. In this case, one suspects that the gain of function on a single allele, such as those carried by the fertile brother and father, is not sufficient to impair fertility. Moreover, in the largest study on acephalic spermatozoa syndrome, Zhu et al. [41] used whole-exome sequencing in two unrelated infertile men and uncovered protein-altering recessive mutations in Sad1 and UNC84 domain containing 5 (SUN5), one individual with a homozygous variant and the other with compound heterozygous variants. This prompted Sanger sequencing of an additional 15 patients, of which six had additional recessive mutations in this gene [41]. Finally, in a study of 21 patients with DFS, Sha et al. [42] identified 17 unique DNAH1 mutations in 12 cases, including one homozygous and 16 compound heterozygous patients. These mutations segregated in the cases but not in unaffected family members, or a cohort of 50 ethnically matched fertile men. Using functional investigations in a subset of patients, the authors show that these subjects have diminished DNAH1 levels and disorganised 9+2 microtubule arrangements [42]. Altogether, these four studies demonstrate the power of NGS in detecting causative variants in morphological sperm abnormalities.

Motility anomalies (asthenospermia and flagellar abnormalities impairing movement)

Investigation of motility anomalies using NGS has identified five unique genes from four separate studies. In the first study, Amiri-Yekta et al. [43] began by investigating 10 men in six highly consanguineous families with flagellar abnormalities using whole-exome sequencing. Mutations in DNAH1 were identified in two families, and confirmed in one additional sibling from each affected family by Sanger sequencing [43]. Subsequently, the authors screened an additional 38 men for the same founder mutation, identifying one more patient who shared this same mutation [43]. More recently, Wang et al. [44] used whole exome-sequencing to identify an additional four consanguineous Chinese men with frameshift truncating mutations in DNAH1, further establishing this gene’s role in flagellar development and motility during spermatogenesis. Further, Xu et al. [45] identified homozygous mutations in two siblings of consanguineous parents with mutations affecting a highly conserved residue in sperm-associated antigen 17 (SPAG17) causing asthenospermia. Functional studies showed this mutation causes significantly decreased SPAG17 expression in the patients’ spermatozoa, consistent with a functional role in motility [45]. Tang et al. [46] subsequently investigated 30 independent cases with motility defects due to flagellar abnormalities and identified additional recessive mutations (homozygous and compound heterozygous) in the three cilia- and flagella-associated protein (CFAP) genes, CFAP43, CFAP44 and CFAP65 in five men. Subsequent engineering of knockout mice for two of these genes (in CFAP 43 and 44) using CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats), resulted in motility and flagellar abnormalities similar to those seen in the human patients [46].

Congenital bilateral absence of the vas deferens (CBAVD) and Y-chromosome NGS studies

Whilst CBAVD is usually caused by CFTR mutations, one recent study discovered mutations in the X-linked adhesion protein ADGRG2 [50]. By sequencing the exomes of 12 CFTR-negative men, followed by re-sequencing the ADGRG2 gene in 14 additional men with CBAVD, they discovered four hemizygous mutations all predicted to truncate ADGRG2 [50]. This is consistent with mouse studies in which male ADGRG2 knockouts develop obstruction and therefore infertility [50]. A study by Oud et al. [36] discovered a patient with unilateral absence of vas deferens with a CFTR mutation, further expanding the phenotyping spectrum of cystic fibrosis transmembrane receptor-based obstructive infertility. One of the major advantages of whole-genome sequencing is the ability to detect both small and large variants, including structural and CNVs. Such approaches have been used recently on the Y-chromosome to achieve break-point resolution for CNVs [51], [52], although no new causative genes have been identified to date. The ultra-repetitive nature of the Y-chromosome, which is rich in repeated elements and segmental duplications [53], makes CNV detection challenging using whole-genome sequencing data, in particular in terms of accurate mapping of short reads. This mapping uncertainty has the potential to create false calls along the Y-chromosome, an issue that could be mitigated with long-read technologies; however, those are currently expensive and therefore not suitable for routine implementation. Thus, given the current challenges of CNV assignment, it is no surprise that most NGS studies altogether ignore the Y-chromosome [54]. Whilst recent efforts have begun to patch together Y-chromosome structural rearrangements using NGS, there have been no studies targeting infertile men to date. This represents an interesting opportunity for future investigation, further justifying the use of whole-genome sequencing for patient assessment instead of whole-exome sequencing or panel sequencing where possible. Altogether, 23 studies have appeared to date using NGS to discover mutations in 28 genes causing a wide variety of male infertility. This number likely represents the proverbial ‘tip of the iceberg’, with >400 genes identified to cause spermatogenic impairment in mice [55], [56], [57] and up to 100 genes identified in humans in the pre-NGS era (reviewed in [29], [58], [59], [60]). However, as the technology is adopted more readily in clinical and research centres, it has the potential to discover many more.

Conclusions

The development and deployment of NGS technologies have the potential to transform clinical testing across a wide range of human conditions, and male infertility is clearly no exception. The work done to date is testament that whilst the investigation of infertility by modern sequencing technologies may have only recently started, it is a fantastic field to invest in from a discovery point of view. Incumbent upon the success of NGS are improvements to bioinformatics algorithms and tools that help transform data into actionable knowledge. Current test offerings are advancing from small gene panels to complete genomes, and with these advances comes an increasing need for improved bioinformatics, including analytics, annotations, and robust workflows to deliver this information to a clinical audience. The work we review here focuses entirely on the use of NGS to uncover genetic variants in male infertility; however, NGS has now been adapted to uses outside of genomic investigations, including for example transcriptomics, epigenetics, and investigations of the microbiome [reviewed in [61], [62], [63], [64]. Whilst such efforts have already begun addressing problems pertinent to male infertility (e.g., sperm cell transcriptomics [65]), single-sperm cell genotyping [66], spermatocyte methylation analysis [67], seminal microbiome profiling [68], these efforts have not reached mainstream analysis of large cohorts of affected patients. In addition to NGS-based approaches, work on spermatogenesis is flourishing with the use of metabolomics and proteomics. Detection of protein modifications, including important histone modifications such as phosphorylation, ubiquitination, sumoylation or acetylation, can shed light on gene expression patterns with functional consequences on normal (and by extension, abnormal) fertility. Similarly, studies investigating non-coding RNAs and microRNAs regulating spermatogenesis have been undertaken in males with or without infertility to discover biomarkers predictive of infertility [69], [70]. Thus, there is substantial room to harness NGS technologies towards conceptual advances in this condition. One of the major open questions is how can NGS be beneficial to patients with infertility, especially considering the difficulty correcting germline mutations in already affected individuals. First, we think that for many individuals, receiving a genetic diagnosis is far more meaningful than living with the ‘idiopathic’ label. The former can lead to transforming the clinical discussion from focusing on what is wrong to where to go next, rather than living a stressful, drawn-out trial and error approach of implementing various remedies in the hope of conception. Second, the availability of a diagnostic mutation could illuminate a therapeutic pathway for partially restoring fertility. Whilst the field is still in its early days with regard to genetic studies, the emerging picture of high levels of genetic heterogeneity make it well-suited for stratification of patient populations into different potential therapy groups based on affected genes and pathways. Separately, studies of these pathways may shed light on novel intervention possibilities, or opportunities to repurpose medications to improve fertility outcomes. At the very least, knowledge of the genetic mutation can be used during IVF and ICSI to select sperm cells not carrying the same mutation for male progeny. The next decade has the potential to be defining for male infertility in particular and human diseases in general, with advances in NGS promising to play a large part. For infertile patients, there will be a long road ahead from sample collection to deriving clinical utility; in many cases, due to the significant genetic heterogeneity, the utility from any given sample will not be evident until many years down the road, when other patients with insults in the same genetic pathways are discovered. Nevertheless, patient populations should be encouraged to participate in genetic research so that those goals may one day be achieved.

Source of funding

None.

69 in total

Review 1. Detecting structural variations in the human genome using next generation sequencing.

Authors: Ruibin Xi; Tae-Min Kim; Peter J Park
Journal: Brief Funct Genomics Date: 2011-01-06 Impact factor: 4.241

Review 2. The "omics" of human male infertility: integrating big data in a systems biology approach.

Authors: D T Carrell; K I Aston; R Oliva; B R Emery; C J De Jonge
Journal: Cell Tissue Res Date: 2015-12-10 Impact factor: 5.249

3. Validation and application of a novel integrated genetic screening method to a cohort of 1,112 men with idiopathic azoospermia or severe oligozoospermia.

Authors: Manon S Oud; Liliana Ramos; Moira K O'Bryan; Robert I McLachlan; Özlem Okutman; Stephane Viville; Petra F de Vries; Dominique F C M Smeets; Dorien Lugtenberg; Jayne Y Hehir-Kwa; Christian Gilissen; Maartje van de Vorst; Lisenka E L M Vissers; Alexander Hoischen; Aukje M Meijerink; Kathrin Fleischer; Joris A Veltman; Michiel J Noordam
Journal: Hum Mutat Date: 2017-09-06 Impact factor: 4.878

Review 4. Mouse models in male fertility research.

Authors: Duangporn Jamsai; Moira K O'Bryan
Journal: Asian J Androl Date: 2010-11-08 Impact factor: 3.285

5. Association between single-nucleotide polymorphisms of DNMT3L and infertility with azoospermia in Chinese men.

Authors: Jian-Xi Huang; Matthew B Scott; Xiao-Ying Pu; A Zhou-Cun
Journal: Reprod Biomed Online Date: 2011-09-16 Impact factor: 3.828

Review 6. The presence, role and clinical use of spermatozoal RNAs.

Authors: Meritxell Jodar; Sellappan Selvaraju; Edward Sendler; Michael P Diamond; Stephen A Krawetz
Journal: Hum Reprod Update Date: 2013-07-14 Impact factor: 15.610

7. Exome sequencing reveals a nonsense mutation in TEX15 causing spermatogenic failure in a Turkish family.

Authors: Ozlem Okutman; Jean Muller; Yoni Baert; Munevver Serdarogullari; Meral Gultomruk; Amélie Piton; Charlotte Rombaut; Moncef Benkhalifa; Marius Teletin; Valerie Skory; Emre Bakircioglu; Ellen Goossens; Mustafa Bahceci; Stéphane Viville
Journal: Hum Mol Genet Date: 2015-07-21 Impact factor: 6.150

8. Homozygous DNAH1 frameshift mutation causes multiple morphological anomalies of the sperm flagella in Chinese.

Authors: X Wang; H Jin; F Han; Y Cui; J Chen; C Yang; P Zhu; W Wang; G Jiao; W Wang; C Hao; Z Gao
Journal: Clin Genet Date: 2016-11-24 Impact factor: 4.438

Review 9. Next-generation sequencing in the clinic: are we ready?

Authors: Leslie G Biesecker; Wylie Burke; Isaac Kohane; Sharon E Plon; Ron Zimmern
Journal: Nat Rev Genet Date: 2012-11 Impact factor: 53.242

Review 10. World Health Organization reference values for human semen characteristics.

Authors: Trevor G Cooper; Elizabeth Noonan; Sigrid von Eckardstein; Jacques Auger; H W Gordon Baker; Hermann M Behre; Trine B Haugen; Thinus Kruger; Christina Wang; Michael T Mbizvo; Kirsten M Vogelsong
Journal: Hum Reprod Update Date: 2009-11-24 Impact factor: 15.610

12 in total

1. Bi-allelic Mutations in TTC29 Cause Male Subfertility with Asthenoteratospermia in Humans and Mice.

Authors: Chunyu Liu; Xiaojin He; Wangjie Liu; Shenmin Yang; Lingbo Wang; Weiyu Li; Huan Wu; Shuyan Tang; Xiaoqing Ni; Jiaxiong Wang; Yang Gao; Shixiong Tian; Lin Zhang; Jiangshan Cong; Zhihua Zhang; Qing Tan; Jingjing Zhang; Hong Li; Yading Zhong; Mingrong Lv; Jinsong Li; Li Jin; Yunxia Cao; Feng Zhang
Journal: Am J Hum Genet Date: 2019-11-14 Impact factor: 11.025

2. Bi-allelic variants in DNAH10 cause asthenoteratozoospermia and male infertility.

Authors: Kuokuo Li; Guanxiong Wang; Mingrong Lv; Jieyu Wang; Yang Gao; Fei Tang; Chuan Xu; Wen Yang; Hui Yu; Zhongmei Shao; Hao Geng; Qing Tan; Qunshan Shen; Dongdong Tang; Xiaoqing Ni; Tianjuan Wang; Bing Song; Huan Wu; Ran Huo; Zhiguo Zhang; Yuping Xu; Ping Zhou; Fangbiao Tao; Zhaolian Wei; Xiaojin He; Yunxia Cao
Journal: J Assist Reprod Genet Date: 2021-10-16 Impact factor: 3.412

Review 3. Genetic disorders and male infertility.

Authors: Shinnosuke Kuroda; Kimitsugu Usui; Hiroyuki Sanjo; Teppei Takeshima; Takashi Kawahara; Hiroji Uemura; Yasushi Yumura
Journal: Reprod Med Biol Date: 2020-06-27

4. Recent advances and future opportunities to diagnose male infertility.

Authors: Samantha L P Schilit
Journal: Curr Sex Health Rep Date: 2019-10-26

5. Genetic basis of acephalic spermatozoa syndrome, and intracytoplasmic sperm injection outcomes in infertile men: a systematic scoping review.

Authors: Marziyeh Mazaheri Moghaddam; Madiheh Mazaheri Moghaddam; Hamid Hamzeiy; Amir Baghbanzadeh; Fariba Pashazadeh; Ebrahim Sakhinia
Journal: J Assist Reprod Genet Date: 2021-01-15 Impact factor: 3.412

6. Genetic mapping of a male factor subfertility locus on mouse chromosome 4.

Authors: Hideo Gotoh; Ikuo Miura; Shigeharu Wakana
Journal: Mamm Genome Date: 2018-08-31 Impact factor: 2.957

7. Fatemeh Makkizadeh Ph.D., Esmaeil Bigdeloo M.A.

Authors: Fatemeh Makkizadeh; Esmaeil Bigdeloo
Journal: Int J Reprod Biomed Date: 2018-06-13

8. Whole-Exome Sequencing Analysis of Human Semen Quality in Russian Multiethnic Population.

Authors: Semyon Kolmykov; Gennady Vasiliev; Ludmila Osadchuk; Maxim Kleschev; Alexander Osadchuk
Journal: Front Genet Date: 2021-06-11 Impact factor: 4.599

9. Targeted next-generation sequencing panel screening of 668 Chinese patients with non-obstructive azoospermia.

Authors: Miao An; Yidong Liu; Ming Zhang; Kai Hu; Yan Jin; Shiran Xu; Hongxiang Wang; Mujun Lu
Journal: J Assist Reprod Genet Date: 2021-03-16 Impact factor: 3.357

10. Large-scale discovery of male reproductive tract-specific genes through analysis of RNA-seq datasets.

Authors: Matthew J Robertson; Katarzyna Kent; Nathan Tharp; Kaori Nozawa; Laura Dean; Michelle Mathew; Sandra L Grimm; Zhifeng Yu; Christine Légaré; Yoshitaka Fujihara; Masahito Ikawa; Robert Sullivan; Cristian Coarfa; Martin M Matzuk; Thomas X Garcia
Journal: BMC Biol Date: 2020-08-19 Impact factor: 7.431