Literature DB >> 30448225

Next-generation sequencing identifies novel genes with rare variants in total anomalous pulmonary venous connection.

Xin Shi1, Tao Huang2, Jing Wang1, Yulai Liang2, Chang Gu3, Yuejuan Xu1, Jing Sun1, Yanan Lu1, Kun Sun4, Sun Chen5, Yu Yu6.   

Abstract

BACKGROUND: Total anomalous pulmonary venous connection (TAPVC) is recognized as a rare congenital heart defect (CHD). With a high mortality rate of approximately 80%, the survival rate and outcomes of TAPVC patients are not satisfactory. However, the genetic aetiology and mechanism of TAPVC remain elusive. This study aimed to investigate the underlying genomic risks of TAPVC through next-generation sequencing (NGS).
METHODS: Rare variants were identified through whole exome sequencing (WES) of 78 sporadic TAPVC cases and 100 healthy controls using Fisher's exact test and gene-based burden test. We then detected candidate gene expression patterns in cells, pulmonary vein tissues, and embryos. Finally, we validated these genes using target sequencing (TS) in another 100 TAPVC cases.
FINDINGS: We identified 42 rare variants of 7 genes (CLTCL1, CST3, GXYLT1, HMGA2, SNAI1, VAV2, ZDHHC8) in TAPVC cases compared with controls. These genes were highly expressed in human umbilical vein endothelial cells (HUVECs), mouse pulmonary veins and human embryonic hearts. mRNA levels of these genes in human pulmonary vein samples were significantly different between cases and controls. Through network analysis and expression patterns in zebrafish embryos, we revealed that SNAI1, HMGA2 and VAV2 are the most important genes for TAPVC.
INTERPRETATION: Our study identifies novel candidate genes potentially related to TAPVC and elucidates the possible molecular pathogenesis of this rare congenital birth defect. Furthermore, SNAI1, HMGA2 and VAV2 are novel TAPVC candidate genes that have not been reported previously in either humans or animals. FUND: National Natural Science Foundation of China.
Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

Entities:  

Keywords:  Congenital heart defects; Rare variants; Target sequencing; Total anomalous pulmonary venous connection; Whole exome sequencing

Mesh:

Year:  2018        PMID: 30448225      PMCID: PMC6306349          DOI: 10.1016/j.ebiom.2018.11.008

Source DB:  PubMed          Journal:  EBioMedicine        ISSN: 2352-3964            Impact factor:   8.143


Evidence before this study

We reviewed systematically the literature on the pathogenesis of TAPVC from 1960 to 2017 in PubMed using the search terms TAPVC and aetiology. We obtained over 80 articles, of which 14 were related to the molecular and cellular mechanisms of TAPVC. To date, only a few genes have been identified as candidate genes for TAPVC pathogenesis, and these genes are just a partial explanation for some patients.

Added value of this study

Since the pathogenesis of TAPVC remains elusive, we performed next-generation sequencing in 178 unrelated TAPVC cases and 100 healthy controls. To our knowledge, this study represents the largest series of NGS in TAPVC cases reported to date. Our research opens new avenues of investigation into TAPVC pathology and provides novel insights into pulmonary vein development.

Implications of all the available evidence

Without proper intervention in early life, TAPVC can lead to a high mortality rate of approximately 80%. Our findings demonstrate that as a rare congenital heart defect, TAPVC in some sporadic cases may be attributed to rare variants. Moreover, we identified novel candidate genes in several rare damage variants that have never been reported in connection to pulmonary vein development. Alt-text: Unlabelled Box

Introduction

Congenital heart diseases (CHDs) are the most common birth defects in humans, affecting approximately 1% of the population, and are the leading cause of birth defect-related infant mortality [1,2]. Total anomalous pulmonary venous connection (TAPVC) is a rare CHD in which all 4 pulmonary veins fail to link to the left atrium correctly but make abnormal connections to the right atrium or systemic venous system [3]. TAPVC accounts for approximately 1–3% of all CHDs, with a morbidity of approximately 1 out of 15,000 live births [4]. In addition, the mortality of TAPVC patients without proper intervention is nearly 80% [5]. Since the first case of TAPVC was described in 1960, the embryology of TAPVC has been the subject of increasing exploration. However, the molecular mechanism underlying pulmonary vein morphogenesis remains unknown. Only a few genes have been identified as candidate genes for TAPVC pathogenesis. Bleyl et al. revealed that loci on human chromosome 4q12 are involved in TAPVC using genetic linkage analysis, and the candidate genes in this region include VEGFR2 and PDGFR2 [6]. Cinquetti et al. observed increased ANKRD1 gene expression in lymphoblastoid cells derived from a TAPVC patient, indicating that ANKRD1 is related to TAPVC [7]. Karl et al. demonstrated SEMA3D as a crucial gene in pulmonary venous connection because SEMA3D−/− mice displayed TAPVC or partial APVC (PAPVC) phenotypes [8]. These genes explain only a small fraction of the molecular mechanism underlying TAPVC pathogenesis, and comprehensive genomic data obtained via massive parallel sequencing are still lacking. Li et al. analysed whole exome sequencing (WES) data from 6 sporadic TAPVC cases and 81 non-TAPVC counterparts, providing evidence for ACVRL1 as a known causative gene and for SGCD as a candidate TAPVC gene [9]. Nash et al. used WGS analysis to identify a nonsynonymous variant in RBP5 gene that was predicted to be deleterious and overrepresented in TAPVC [10]. Therefore, we applied WES technology to identify the likely rare damaging variants and putative candidate genes in 78 TAPVC cases and 100 non-TAPVC controls, and these variants were then validated using target sequencing (TS) in a replication cohort of 100 TAPVC patients. The expression patterns of candidate genes were analysed in human umbilical vein endothelial cells (HUVECs), mouse pulmonary veins, human pulmonary veins, human embryonic hearts, and zebrafish embryos. Finally, we identified 7 candidate genes, especially SNAI1, HMGA2 and VAV2, that most likely underlie TAPVC pathogenesis. These results have improved our understanding of the diagnostic yield of next generation sequencing (NGS) for TAPVC, and assisted in the identification of candidate genes for TAPVC. To our knowledge, this is the largest series of NGS in TAPVC cases reported to date.

Materials and methods

Study population

Our discovery cohort included 78 unrelated TAPVC cases, 100 healthy controls and a validation cohort included 100 additional unrelated TAPVC cases. Patients with identified chromosomal or syndromic disorder or situs anomaly were excluded. Detailed cardiac and extra-cardiac features were assessed by reviewing medical records, imaging, and dysmorphology assessments. TAPVC cases were further divided into 4 subtype groups according to where the anomalous veins drain: supra-cardiac, cardiac, infra-cardiac and mixed [11,12]All patients were recruited via Xin Hua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine. The study was conducted in accordance with the Declaration of Helsinki, and the protocol used to collect human samples of blood, pulmonary veins, and embryonic hearts was approved by the Ethics Committee of Xinhua Hospital. Written informed consents were also obtained from all subjects before the study protocol.

Whole-exome sequencing

DNA extraction from blood samples was carried out using the QIAamp™ DNA and Blood Mini kit (Qiagen) according to the manufacturer's instructions. Total DNA concentration and quantity was assessed by measuring the absorbance at 260 nm with a NanoDrop 2000c spectrophotometer (Thermo Fisher Scientific). WES was performed using the Agilent Sure Select Target Enrichment kit (V6 58 Mb; Agilent Technologies) for sequence capture and the Illumina HiSeq2500 for sequencing (Illumina) to a target depth of 100×. We conducted bioinformatics analysis pipeline to call SNPs. Firstly, to exclude sequence artifacts from the FASTQ files. The cleaned FASTQ files were used for downstream analysis. Cleaned sequencing data was mapped to the reference human genome (UCSC hg19) by Burrows-Wheeler Aligner (BWA) software[13] to get the original mapping results stored in BAM format. Then, SAMtools[14], Picard (http://broadinstitute.github.io/picard/), and GATK[15] were used to sort BAM files and do duplicate marking, local realignment, and base quality recalibration to generate final BAM file for computation of the sequence coverage and depth. Samtools mpileup and bcftools were used to do variant calling and identify SNPs and InDels [16]. ANNOVAR was performed to do annotation for VCF (Variant Call Format) obtained in the previous effort[17]. We further filtered the SNPs using the American College of Medical Genetics (ACMG) criteria guidelines as follows: Removed untranslated regions (UTRs) as well as synonymous and intronic variants. Extremely low allele frequency compared to that in the control (minor allele <0.005) and public databases (1000 Genomes Project, ExAC and ESP; minor allele <0.005). Variants were scored with SIFT, PolyPhen2 and Mutation Taster. Any variant predicted to be pathogenic in at least 1 program was deemed damaging.

TAPVC-associated SNPs detected by Fisher's exact test

For 78 TAPVC cases and 100 healthy controls, all 325,353 SNPs called in TAPVC were tested with Fisher's exact test. The SNP status was encoded as 0 or 1, where 0 indicated that no SNP alleles were found, and 1 indicated that at least 1 SNP allele was detected. The SNP status and sample class labels in which 1 indicated TAPVC patients and 0 indicated control samples were used to obtain the 2 × 2 confusion table for Fisher's exact test, and P < 3e-5 was considered statistically significant.

Gene-based burden test

Similarly, we aggregated the SNP data at the gene level. Genes with at least 1 rare mutant allele (damaging missense or loss-of-function variants) in the combined cohort, including 78 TAPVC patients and 100 healthy controls, were considered in the burden test and recorded as 1, and all other genes were recorded as 0. The SNP status of each gene was compared with the sample classes, and Fisher's exact test P values were calculated at the gene level. The gene level P values were adjusted with the FDR method, where FDR < 0.05 was considered statistically significant.

Gene selection

We classified initial candidate genes into 3 types according to ACMG standards and guidelines [18]. First, significant variant genes were selected from TAPVC-associated SNPs using Fisher's exact test and the gene-based burden test. These genes were filtered as category I candidate genes. Category II genes included TAPVC pathogenic or likely pathogenic genes derived from the literature and publicly available databases. Category III genes included genes associated with human cardiac development, CHDs and vascular development in previous literature. Using the above screening process, we finally selected these 3 types of genes as our initial candidate genes.

Gene expression detection using the RT-qPCR assay

RNA was extracted from HUVECs and pulmonary vein samples from humans and mice. HUVECs were obtained from the Chinese Academy of Sciences (Shanghai, China). C57 mice were purchased from the Central Institute for Experimental Animals (Shanghai, China). Procedures involving animals and their care were conducted in accordance with National Institutes of Health (NIH) guidelines (NIH pub. no. 85–23, revised 1996) and approved by the Animal Care and Use Committee of Shanghai Xinhua Hospital. The human samples consisted of vertical veins from 5 patients and pulmonary veins from 5 controls harvested from Shanghai Xinhua Hospital. Total RNA extraction and the RT-qPCR were performed as described previously[19]. The RT-qPCR primers are listed in Table S1.

Expression patterns of the selected genes during human embryonic heart development

We collected human embryonic heart samples from Carnegie stages 11 through 15. TissueLyserII (Qiagen) and the RNeasy MinElute Cleanup Kit (Qiagen) were utilized for RNA extraction. The integrity and purity of the RNA was detected by the Experion automated gel electrophoresis system (Bio-Rad) and the NanoDrop 2000c spectrophotometer (Thermo Fisher Scientific). The time course expression patterns of the selected TAPVC candidate genes were measured using an Affymetrix HTA 2.0 microarray.

SNP validation using targeted sequencing

We used multiplex PCR to amplify our target regions. >1 target sequence was amplified by >1 pair of primers in the reaction. Then, we sequenced the target regions of 7 candidate genes with Illumina MiSeq using DNA isolated from 100 additional TAPVC patients. We evaluated the assay performance using a 113-amplicon assay for the coding regions of the 7 candidate genes. All PCR primers were designed using OLIGO 6.2-assisted primer design, and all these primers were highly efficient and sensitive for detection [20]. The TS primers are listed in Table S2.

Network analysis

The Search Tool for the Retrieval of Interacting Genes (STRING) database (http://string-db.org) can critically assess and integrate protein–protein interactions (PPI), including both direct (physical) and indirect (functional) associations [21]. To detect potential relationships among our initial candidate genes, we mapped all the genes to the STRING network and visualized the network using Cytoscape. Since many genes did not directly interact, we used Dijkstra's algorithm [22] to discover the shortest paths between candidate genes. Elucidating the genes on the shortest path between candidate genes can reveal the possible genetic mechanism underlying TAPVC. The Dijkstra's algorithm procedure was as follows [[22], [23], [24]]: let G = (V,E,w) be a weighted graph; Vis the set of vertices that includes the mapped TAPVC genes and other genes in the STRING network, E is the edges that correspond to the interactions in the STRING database, and w is the weight function, i.e., 1 minus the confidence score from the STRING database. If u and v are assumed to be 2 vertices in G, the shortest path between them can be discovered using the following procedures: Let S = {u0}, , l(u0) = 0, and l(v) = ∞ for any vertex v∈S−{u0}; For each vertex, v ∈ S′such that u'v∈E, where u'∈S. If l(v) ≤ l(u') + w(u'v), then continue; otherwise, l(v) = l(u') + w(u'v) and Parent(v) = u'; Find a vertex such that l(v′) =  min {l(v) ∣  v ∈ S′; S=S∪{v'} and ; If v∈S, then continue; otherwise, return to step 2; The label Parent was used to find the shortest path from to.

Expression patterns of the candidate genes in zebrafish embryos at different stages

To confirm whether candidate genes were expressed during the development of zebrafish embryos, we designed specific probes for SNAI1, HMGA2, and VAV2 to perform whole-mount in situ hybridization (WISH) [25]. Digoxigenin (DIG)–labelled RNA probes were transcribed from linearized DNA templates with T7 or SP6 RNA polymerases (Ambion). Embryos were hybridized with the DIG-labelled RNA probes overnight, and the concentration of the probes was 1 ng/ml. Then, embryos were incubated with anti-DIG antibody (Roche; 1: 8000), stained and observed under a fluorescent stereomicroscope (MZ FLIII, Leica).

Results

Descriptions of the study cohort

The discovery cohort consisted of 78 unrelated patients with TAPVC (n = 78, males: 45, females: 33, average age: 0.95 ± 1.87 years) and 100 healthy controls (n = 100, males: 52, females: 48, average age: 3.08 ± 0.86 years) collected from Shanghai Xinhua Hospital between June 2013 and May 2017. Another 100 TAPVC patients (n = 100, males: 54, females: 46, average age: 1.27 ± 0.82 years) were collected as the validation cohort (Table 1). As supra-cardiac and cardiac TAPVC comprised most of the 4 subtypes, representative graphs of the supra-cardiac and cardiac TAPVC patients are shown in Fig. 1.
Table 1

Summary of the demographic and clinical information for 178 TAPVC patients.

Patient characteristicsDiscovery cohortValidation cohort
Mean age at diagnosis (years)0.95 ± 1.871.88 ± 1.21
BMI (kg/m2)14.92 ± 3.0519.87 ± 2.85
Male (n, %)45(57.7)54(54)
Mortality (n, %)4(5.1)7(7)
TAPVC Anatomical type (n, %)is
 Supracardiac43(55.1)50(50)
 Cardiac21(26.9)28(28)
 Infracardiac8(10.3)12(12)
 Mixed6(7.7)10(10)
Associated cardiac lesion (n, %)
 Vascular malformation4(5.1)9(9)
 Valvular malformation18(23.1)22(22)
 Compound malformation6(7.7)9(9)
 Conduction block7(9.0)17(17)
Fig. 1

Representative CT angiography graphs of supra-cardiac and cardiac TAPVC. CT angiography in a 2-month-old boy with supra-cardiac TAPVC; longitudinal section image (A) and volume-rendered image (B) showing 4 individual pulmonary veins joining in a retrocardiac venous confluence and draining into the superior vena cava (SVC) via a vertical vein (VV). CT angiography in a 6-month-old boy with cardiac TAPVC; longitudinal section image (C) and volume-rendered image (D) showing 4 individual pulmonary veins flowing into the CS. ASD: atrial septal defect; PV: pulmonary vein; SVC: superior vena cava; RLPV: right low pulmonary vein; RUPV: right upper pulmonary vein; LUPV: left upper pulmonary vein; LLPV: left low pulmonary vein; CS: coronary sinus; VV: vertical vein.

Summary of the demographic and clinical information for 178 TAPVC patients. Representative CT angiography graphs of supra-cardiac and cardiac TAPVC. CT angiography in a 2-month-old boy with supra-cardiac TAPVC; longitudinal section image (A) and volume-rendered image (B) showing 4 individual pulmonary veins joining in a retrocardiac venous confluence and draining into the superior vena cava (SVC) via a vertical vein (VV). CT angiography in a 6-month-old boy with cardiac TAPVC; longitudinal section image (C) and volume-rendered image (D) showing 4 individual pulmonary veins flowing into the CS. ASD: atrial septal defect; PV: pulmonary vein; SVC: superior vena cava; RLPV: right low pulmonary vein; RUPV: right upper pulmonary vein; LUPV: left upper pulmonary vein; LLPV: left low pulmonary vein; CS: coronary sinus; VV: vertical vein. In the discovery cohort, 325,353 SNPs were found in 78 samples using WES and BWA + GATK analyses. Meanwhile, 363,641 SNPs were found in the 100 control samples using the same analysis methods. To identify the candidate TAPVC pathogenic genes, we proposed an analytical strategy workflow (Fig. 2). Genomic information regarding the SNPs of the 78 patient samples was displayed using a Manhattan plot (Fig. S1). Only 15,755 rare nonsynonymous SNPs associated with TAPVC were tested by Fisher's exact-test. The 24 SNPs in 10 genes with P < 3e-5 are shown in Table S3.
Fig. 2

The analytical strategy workflow for variant filtration. A schematic overview of the different steps taken during next-generation sequencing analysis with gene expression detection is shown. After variant calling and annotation, variants were filtered via SNP-associated analysis and the gene-based burden test. Candidate genes were collected by the detection of mRNA expression, validation of potential variants and network analysis. SNP, single nucleotide polymorphism; WES, whole-exome sequencing; MAF, minor allele frequency; FDR, false discovery rate; HUVECs, human umbilical vein endothelial cells; RT-qPCR, real-time quantitative polymerase chain reaction.

TAPVC-associated gene-based burden test

As mentioned above, the significant SNPs seemed to locate in the same genes densely. Therefore, we tested the associations of genes with significant TAPVC variants using Fisher's exact-test by aggregating SNP data at the gene level. When we focused on the gene level, the TAPVC patients had significantly more genes with SNPs than the control samples, with a t test P value of 4.39e − 31. The number of genes with SNPs in the 100 controls was approximately 16,085 ± 160, while that in the 78 patients was approximately 16,356 ± 77. The boxplot of the numbers of genes with SNPs in the TAPVC patients and controls is shown in Fig. S2. In addition, the 27 genes that had an FDR < 0.05 based on the gene-based burden test are shown in Table 2.
Table 2

TAPVC-associated genes detected using the gene-based burden test.

Gene symbolGene nameFDRP value
CRISP1Cysteine rich secretory protein 11.93E-278.94E-32
HOXD9Homeobox D93.59E-163.33E-20
WDR54WD repeat domain 542.46E-143.41E-18
VAV2Vav guanine nucleotide exchange factor 22.04E-113.77E-15
MARCH9Membrane associated ring-CH-type finger 91.53E-094.24E-13
CA14Carbonic anhydrase 141.75E-095.66E-13
ZDHHC8Zinc finger DHHC-type containing 82.03E-097.53E-13
SDHAP3SDHA C-Terminal Like3.73E-081.90E-11
ASB9Ankyrin repeat and SOCS box containing 99.71E-085.39E-11
MNTMAX network transcriptional repressor6.78E-064.52E-09
SNAI1Snail family transcriptional repressor 16.78E-064.70E-09
CLTCL1Clathrin heavy chain like 11.31E-051.03E-08
CST3Cystatin C0.0001004559.76E-08
RHCGRh family C glycoprotein0.0002540692.82E-07
SERF1ASmall EDRK-rich factor 1A0.0017805812.72E-06
SERF1BSmall EDRK-rich factor 1B0.0017805812.72E-06
RAC3Ras-related C3 botulinum toxin substrate 30.001978023.11E-06
ANKRD7Ankyrin repeat domain 70.0058728449.78E-06
WWTR1WW domain containing transcription regulator 10.0062741631.07E-05
ESX1ESX homeobox 10.0065679151.18E-05
RGPD5RANBP2-like and GRIP domain containing 50.0133929252.73E-05
RGPD6RANBP2-like and GRIP domain containing 60.0133929252.73E-05
NUS1Nogo-B Receptor0.0175107643.64E-05
NME5NME/NM23 family member 50.017763523.86E-05
ABCG5ATP binding cassette subfamily G member 50.0178791423.97E-05
HMGA2High mobility group AT-hook 20.0192426414.36E-05
ARXAristaless related homeobox0.0495884970.0001376

TAPVC-associated gene selection

After screening the TAPVC-associated SNPs and genes from the WES data using Fisher's exact test, SNAI1 was found in both gene lists. Therefore, 36 genes were ultimately defined as category I genes. We then further reviewed the literature systematically to select genes associated with TAPVC identified in previous studies and databases. We identified 12 pathogenic/likely pathogenic genes as being category II (KDR, SMAD1, NRP1, PDGFRA, GJA1, ACVRL1, NKX2–5, ZIC3, SGCD, SEMA3D, ANKRD1, RBP5) (Table S4). We also deemed 10 interesting genes related to vascular development in the literature as being category III genes (ANGPT1, ANGPT2, ANKRD6, ATM, EGR1, ENG, FGF2, SMG1, TYMP, VEGFRA); these genes were also related to human cardiac development and CHDs. Taken together, we considered the aforementioned 58 genes as initial TAPVC candidate genes and subjected them to further expression validation. The analytical strategy workflow for variant filtration. A schematic overview of the different steps taken during next-generation sequencing analysis with gene expression detection is shown. After variant calling and annotation, variants were filtered via SNP-associated analysis and the gene-based burden test. Candidate genes were collected by the detection of mRNA expression, validation of potential variants and network analysis. SNP, single nucleotide polymorphism; WES, whole-exome sequencing; MAF, minor allele frequency; FDR, false discovery rate; HUVECs, human umbilical vein endothelial cells; RT-qPCR, real-time quantitative polymerase chain reaction. TAPVC-associated genes detected using the gene-based burden test.

Detection of TAPVC-associated gene expression

We detected mRNA levels of the aforementioned 58 genes in HUVECs, mouse pulmonary veins and human pulmonary veins harvested from 5 TAPVC patients and 5 controls (Fig. 3A-C). Then, we further filtered these genes according to whether they were highly expressed in different tissues. Finally, 27 genes were selected from the 58 initial candidate genes, and 7 were from the list of TAPVC-associated genes. Of the 7 genes, GXYLT1, HMGA2, SNAI1 and VAV2 were significantly down-regulated in TAPVC patients (P < .05), while CST3, CLTCL1, ZDHHC8 were significantly up-regulated (P < .05) compared with that in the control cohort. Another 20 genes perfectly satisfied our criteria for being pathogenic/likely pathogenic genes or related genes, and their expression patterns are shown in Fig. S3.
Fig. 3

mRNA expression levels of the 7 candidate genes (CST3, CLTCL1, GXYLT1, HMGA2, SNAI1, VAV2, ZDHHC8) in different samples (A) mRNA expression levels of the 7 candidate genes in HUVECs; n = 3. (B) mRNA abundance of the 7 candidate genes in mouse pulmonary veins; n = 3. (C) mRNA differential expression in human pulmonary veins between TAPVC patient and control samples; n = 5. Statistical significance was calculated by Student's t-test, where *P < .05 and **P < .01. (D) Expression patterns of the candidate genes in human embryonic hearts at different time points. The coloured lines represent the expression patterns of the candidate genes, while the grey lines represent known pathogenic genes. CS, Carnegie stage. (E) Heat map showing 42 rare nonsynonymous variants in 7 candidate genes from 78 TAPVC cases. Significantly mutated genes are listed on the left. The percentage of each gene with the variants detected is noted on the right. The upper histogram shows the variant rates in each of the 78 patients. Samples are divided into 4 subtypes (supracardiac, cardiac, infracardiac and mixed); gender information for the patients is also shown on the heat map. The red box shows the patients that had the variants. The middle region of the heat map shows details regarding variants in the sequenced samples.

mRNA expression levels of the 7 candidate genes (CST3, CLTCL1, GXYLT1, HMGA2, SNAI1, VAV2, ZDHHC8) in different samples (A) mRNA expression levels of the 7 candidate genes in HUVECs; n = 3. (B) mRNA abundance of the 7 candidate genes in mouse pulmonary veins; n = 3. (C) mRNA differential expression in human pulmonary veins between TAPVC patient and control samples; n = 5. Statistical significance was calculated by Student's t-test, where *P < .05 and **P < .01. (D) Expression patterns of the candidate genes in human embryonic hearts at different time points. The coloured lines represent the expression patterns of the candidate genes, while the grey lines represent known pathogenic genes. CS, Carnegie stage. (E) Heat map showing 42 rare nonsynonymous variants in 7 candidate genes from 78 TAPVC cases. Significantly mutated genes are listed on the left. The percentage of each gene with the variants detected is noted on the right. The upper histogram shows the variant rates in each of the 78 patients. Samples are divided into 4 subtypes (supracardiac, cardiac, infracardiac and mixed); gender information for the patients is also shown on the heat map. The red box shows the patients that had the variants. The middle region of the heat map shows details regarding variants in the sequenced samples. We then studied the temporal expression patterns of the selected TAPVC genes during embryonic heart development using an Affymetrix HTA 2.0 microarray (Fig. 3D). The expression levels of ANKRD1, KDR, SEMA3D in embryonic hearts were higher than other known TAPVC pathogenic genes. Comparing these known TAPVC pathogenic genes, all 7 candidate genes had relatively high expression, especially GXYLT1. Among the pathogenic or likely pathogenic genes, ANKRD1 was significantly up-regulated (P < .05), while GJA1, KDR, SEMA3D were significantly down-regulated (P < .05); these patterns were identical to those reported in the literature [6,8,10,26,27]. Altered expression was also detected in some related genes, such as ANGPT1 and TYMP (P < .05), but well qualified SNPs were not found in our WES data. In conclusion, we considered CLTCL1, CST3, GXYLT1, HMGA2, SNAI1, VAV2, and ZDHHC8 as our candidate genes.

Rare variants of 7 candidate genes in the TAPVC discovery cohort

By assessing gene expression in different samples, we finally obtained 7 candidate genes. We found 42 rare damage-associated nonsynonymous variants in our WES data (Fig. 3E). GXYLT1 variants (22/42 52.3%) and CLTCL1 variants (13/42 30.9%) accounted for a substantial proportion of variants of the 7 candidate genes, while VAV2 (p.R816C and p.D532G) and ZDHHC8 (p.T184A and p.R540C) had 2 rare nonsynonymous variants, and HMGA2 (p.S105I), SNAI1 (p.R224P) and CST3 (p.R79S) had only 1 rare nonsynonymous variant (Table 3). To elucidate the inner relationship between these SNPs, we performed linkage disequilibrium (LD block) analysis of the 42 rare damage-associated nonsynonymous SNPs from the 7 candidate genes (Fig. S4). LD block analyses were performed for the chromosomal regions with multiple significant SNPs clustered around genome-wide significant SNPs. The LD blocks were defined using Haploview (version 4.2) and criteria established by Gabriel et al. [28]. LD block analysis explained why the rare GXYLT1 and CLTCL1 variants accounted for such a large proportion of variants of the 7 candidate genes, as many SNPs of the same block indicated that 1 SNP mutation can lead to other SNP mutation, and they may have similar effects on TAPVC incidence.
Table 3

Rare variants of 7 candidate genes associated with TAPVC.

ChromosomePositionGeneBase changeAmino acid changeavsnp1421000GExACESP6500
chr9136,633,707VAV2G > Ap.R816Crs1912743260.00060.00020.0002
chr9136,649,478VAV2T > Cp.D532Grs1912390280.00080.0001.
chr1242,499,690GXYLT1T > Cp.Y265Crs79044728...
chr1242,499,694GXYLT1A > Tp.Y264N....
chr1242,499,701GXYLT1C > Ap.R261Srs74583427...
chr1242,499,711GXYLT1C > Ap.R258Lrs76555438...
chr1242,499,714GXYLT1T > Cp.N257Srs78536827...
chr1242,499,738GXYLT1T > Cp.E249G....
chr1242,499,739GXYLT1C > Tp.E249Krs77582546...
chr1242,499,763GXYLT1T > Cp.I241Vrs80202058.0.0002.
chr1242,499,802GXYLT1C > Ap.D228Yrs78540738...
chr1242,499,825GXYLT1A > Cp.I220Srs76034661...
chr1242,499,826GXYLT1T > Cp.I220Vrs75241273...
chr1242,499,857GXYLT1T > Ap.E209D....
chr1242,538,334GXYLT1C > Ap.G39 Wrs181558534.0.0001.
chr1242,538,340GXYLT1C > Ap.G37C....
chr1242,538,349GXYLT1T > Cp.T34A....
chr1242,538,366GXYLT1A > Tp.V28E....
chr1242,538,367GXYLT1C > Tp.V28 M....
chr1242,538,406GXYLT1C > Tp.G15S....
chr1242,538,412GXYLT1C > Tp.A13T....
chr1242,538,415GXYLT1C > Gp.V12 L....
chr1242,538,423GXYLT1A > Gp.V9A....
chr1242,538,435GXYLT1A > Gp.L5P....
chr1266,308,093HMGA2G > Tp.S105I....
chr2023,618,265CST3G > Tp.R79Srs5741520750.00140.0003.
chr2048,604,469SNAI1G > Cp.R224P....
chr2220,127,408ZDHHC8A > Gp.T184Ars2004083050.00220.0005.
chr2220,130,771ZDHHC8C > Tp.R540C....
chr2219,168,250CLTCL1C > Tp.V1633 M..0.0001.
chr2219,170,956CLTCL1C > Tp.V1592 Mrs20737380.0320.0170.0057
chr2219,188,928CLTCL1C > Tp.R1226H....
chr2219,196,497CLTCL1T > Cp.N1126S....
chr2219,207,480CLTCL1G > Ap.R945C....
chr2219,209,603CLTCL1C > Tp.R811Qrs1126360810.0040.0037.
chr2219,209,604CLTCL1G > Ap.R811Wrs126285290.00020.00010.0003
chr2219,212,999CLTCL1A > Tp.F702Yrs1829170630.0020.0006.
chr2219,213,059CLTCL1C > Tp.C682Y....
chr2219,230,318CLTCL1G > Ap.R221Crs3762896360.00040.0001.
chr2219,241,585CLTCL1A > Cp.M139Rrs1879259490.0002..
chr2219,241,684CLTCL1T > Cp.E106G....
chr2219,279,131CLTCL1G > Tp.H12N....

TAPVC-associated variants were confirmed by TS

Additionally, we performed TS by multiplex PCR in an additional 100 cases to validate the rare damage-associated variants of the 7 candidate genes. We found 42 rare damage-associated nonsynonymous variants in the WES data, and many significant SNPs were in genes, such as GXYLT1 and CLTCL1, while 56 rare damage-associated nonsynonymous variants were identified from the TS data (Table S5). Moreover, CLTCL1, SNAI1, ZDHHC8, HMGA2, and VAV2 showed the same variants in both the WES data and the TS data, while the CST3 and GXYLT1 variants differed in the 2 datasets. A heat map was used to illustrate the TS rare variant results (Fig. S5).

Regulatory network of the TAPVC candidate genes

We used Dijkstra's algorithm to find the shortest path to explore the protein interaction networks of candidate genes. Since the CLTCL1 showed no direct interaction with other genes in the STRING database, we showed 6 candidate genes without CLTCL1. We plotted these shortest paths between the different gene sets using Cytoscape. Gene regulatory networks revealed that our candidate genes not only interacted with each other but also closely associated with CHD-related genes and known pathogenic genes (Fig. 4). SNAI1, VAV2 and HMGA2 were at the centre of the regulatory network, suggesting that these three genes may play more important roles in TAPVC pathogenesis.
Fig. 4

Gene regulatory networks and sub-networks showing the importance of highly connected genes. The network was modelled in Cytoscape using annotations from the STRING 9.1 protein-protein interactions database. We used Dijkstra's algorithm to discover the shortest paths between these genes. (A) Network analysis indicated the protein-protein interactions among 6 candidate genes, known pathogenic genes and CHD-related genes. (B) Network analysis of internal correlations among 6 candidate genes. (C) Network analysis of relationships between SNAI1, HMGA2, VAV2 and known TAPVC pathogenic genes. The red nodes represent candidate proteins, the yellow nodes represent known pathogenic genes, the green nodes represent CHD-related genes, and the blue nodes represent the shortest path proteins that connect these genes.

Rare variants of 7 candidate genes associated with TAPVC. Gene regulatory networks and sub-networks showing the importance of highly connected genes. The network was modelled in Cytoscape using annotations from the STRING 9.1 protein-protein interactions database. We used Dijkstra's algorithm to discover the shortest paths between these genes. (A) Network analysis indicated the protein-protein interactions among 6 candidate genes, known pathogenic genes and CHD-related genes. (B) Network analysis of internal correlations among 6 candidate genes. (C) Network analysis of relationships between SNAI1, HMGA2, VAV2 and known TAPVC pathogenic genes. The red nodes represent candidate proteins, the yellow nodes represent known pathogenic genes, the green nodes represent CHD-related genes, and the blue nodes represent the shortest path proteins that connect these genes.

The expression pattern of SNAI1, HMGA2, and VAV2 in zebrafish embryos

Then, we investigated whether SNAI1, HMGA2, and VAV2 could affect the development of embryonic vessels and hearts. We analysed the expression pattern of these 3 genes by WISH (Fig. 5). SNAI1 expression was observed in the primitive veins and entire head at 20 hpf (hours post fertilization), and sustained expression was found in the posterior cardinal vein (PCV) at 24 hpf and 30 hpf. At 48 hpf and 72 hpf, the expression of SNAI1 was mainly observed in the heart. HMGA2 expression patterns mirrored those of SNAI1. At 48 hpf and 72 hpf, VAV2 was expressed mainly in the heart. These findings indicated that these 3 genes are important for cardiovascular development and might be the most crucial candidate genes for TAPVC.
Fig. 5

Expression patterns of SNAI1, HMGA2, and VAV2 during zebrafish embryonic development. WISH results demonstrating the expression of SNAI1 (A), HMGA2 (B), and VAV2 (C) in wild-type zebrafish at the different stages (20 hpf, 24 hpf, 30 hpf, 48 hpf, and 72 hpf). The numbers shown in the bottom corner indicate the number of embryos with similar staining patterns out of the total number of embryos examined. All images are lateral views, rostral to the left.

Expression patterns of SNAI1, HMGA2, and VAV2 during zebrafish embryonic development. WISH results demonstrating the expression of SNAI1 (A), HMGA2 (B), and VAV2 (C) in wild-type zebrafish at the different stages (20 hpf, 24 hpf, 30 hpf, 48 hpf, and 72 hpf). The numbers shown in the bottom corner indicate the number of embryos with similar staining patterns out of the total number of embryos examined. All images are lateral views, rostral to the left.

Discussion

Because knowledge regarding the aetiology of TAPVC is lacking, genetically characterizing TAPVC has been challenging. To investigate the genetic pathogenesis of TAPVC in Chinese population, we conducted gene analysis using next-generation sequencing in a cohort of 178 sporadic TAPVC patients and 100 healthy controls. The major findings of our study can be summarized as follows. Seven totally novel candidate genes (CLTCL1, CST3, GXYLT1, HMGA2, SNAI1, VAV2, ZDHHC8) were associated with TAPVC pathogenesis. STRING network analysis demonstrated that SNAI1, HMGA2, and VAV2, which are highly related to vascular development, appear to play an important role in the genetic mechanism of TAPVC. They were also required for cardiovascular development in zebrafish embryos. Additionally, we reviewed systematically the literature to select TAPVC-related genes identified in previous studies and databases. 12 pathogenic/likely pathogenic genes were identified (Table S4). We found some rare nonsynonymous variants among these genes, but several known pathogenic variants were not detected in our data. Reasons for this discrepancy included that previous studies focused mainly on Caucasians rather than on Chinese individuals, that the sample size of our study was larger than previous reports, or that the models and analysis methods utilized were different. Among the pathogenic or likely pathogenic genes, ANKRD1 in pulmonary vein samples up-regulated significantly, while GJA1, KDR, SEMA3D down-regulated significantly detected by RT-qPCR between case and control groups. These patterns were identical to those reported in the literature. SNAI1 encodes a protein critical for mesoderm formation in the developing embryo [29]. Approximately 18% (14/78) of the discovery cohort patients had the same variant (p.R224P), a novel variant that is reported here for the first time. The incidence of this variant was 17% (17/100) in the validation cohort. SNAI1 is involved in the epithelial to mesenchymal transition (EMT) as well as in the formation and maintenance of the embryonic mesoderm [30,31]. SNAI1 also modulates the proliferation, apoptosis, angiogenesis of most tumour types by associating with VEGF, FGF18 and CDH1 [32,33]. Endothelial cell-specific SNAI1 loss-of-function conditional knockout mice showed an early embryonic lethal phenotype with noticeable defects in vascular remodelling, morphogenesis and arterial-vein specification [34]. SNAI1 is regulated by HMGA2 during the induction of EMT with Smads [35]. In the WES data, the same rare variant (p.S105I) in HMGA2, which critically functions in cardiogenesis and is essential for normal cardiac development, was detected in 3 patients [36]. p.S105I is a totally novel variant that has never been reported. Furthermore, 8 patients were determined to have the same variant in the TS data. HMGA2 encodes a protein with structural DNA-binding domains that acts as a transcriptional regulating factor, and 12q14 microdeletion cases including HMGA2 are reportedly associated with clinical CHD symptoms (including 2 patients with atrial septal defects and 1 patient each with pulmonary stenosis, a sub-aortic stenosis and a patent ductus) [37]. HMGA2 highly expressed during embryogenesis and has been linked to vascular tumours, including angiomyxomas and pulmonary hamartomas, but the relationship between HMGA2 and vascular development remains unknown [38]. VAV2, the second member of the VAV guanine nucleotide exchange factor family of oncogenes, is related to epidermal growth factor receptor binding and angiogenesis (OMIM 600428). VAV2 and VAV3 reportedly work together in neurons and endothelial cells to ensure proper axon guidance and angiogenic responses [39]. The VAV2-Rac1 pathway is important for vasodilatation responses in vascular smooth muscle cells (VSMCs) [40]. Two rare variants (p.R816C, rs191274326 and p.D532G, rs191239028) were identified in 8 TAPVC individuals, 1 of which (p.D532G) was confirmed by TS in the additional patients. Although the 2 variants were previously reported in databases, their functions have never been studied. In our network analysis, SNAI1, which had the highest weight, was located at the centre of the PPI. SNAI1, HMGA2, and VAV2 had closer relationships with known TAPVC pathogenic or likely pathogenic genes than the other candidate genes. Interestingly, our zebrafish models showed that SNAI1 and HMGA2 had a similar expression pattern during the development of zebrafish embryos. They both were highly expressed in primitive veins, PCV, and hearts than in other organs. Although VAV2 did not exhibit the same expression pattern, it was highly expressed in heart and blood vessels in the head. These findings indicate that SNAI, HMGA2, and VAV2 might have an important impact on the development of zebrafish embryo hearts and vessels. However, there is a lack of comprehensive research demonstrating the role of these 3 genes in the foetal cardiovascular development of zebrafish embryos. GXYLT1, which encodes enzymes that transfer xylose to the O-Glc residue bound to Notch EGF repeats in vitro and in vivo (OMIM 613321), had the most variants in our WES data. Nearly 77% (66/78) of the TAPVC patients exhibited 22 different rare damage-associated variants in GXYLT1. However, none of these variants were found in follow-up sequencing, while 3 different variants were observed in the 100 validation patients at follow-up. LD block analysis showed that most of these SNPs were assigned to the same block and may have the same function, which could also explain this phenomenon. Only 2 patients had 1 novel variant (p.R79S) in CST3, but the variant was highly predicted to be damaging. CST3 has been implicated as a prognostic marker in cardiovascular disease [41] and may remodel and establish the arterial wall, and CST3 deficiency occurs in vascular disease [42]. Because we did not find the same variants in deeper and larger sequences, whether CST3 variants play a causal or modifier role in TAPVC is still unclear. Seven TAPVC patients (7/78 8.9%) had 2 rare nonsynonymous variants in ZDHHC8. The p.R816C variant was highly conserved, and p.D532G has not been previously reported in public datasets or predicted to be damaging. Seventeen patients (17/78 21.8%) were found to have 13 rare damage-associated variants of CLTCL1, which, like ZDHHC8, is located on 22q11.2, and both genes are associated with DiGeorge syndrome (DGS). The deletion of 22q11.2 may cause conotruncal heart defects, such as Tetrology of Fallot (TOF), double outlet of right ventricle (DORV) and pulmonary atresia/ventricular septal defect (PA/VSD) [43] [44]. Thus far, the functions of ZDHHC8 and CLTCL1 in cardiovascular development remain unknown, and they be might newly associated with TAPVC pathogenesis. Our study did have some limitations. First, the lack of parental samples limited our ability to study the genetic backgrounds of the variants. In addition, the functions of our candidate genes need to be further verified with fundamental research. In summary, an effective analytical bioinformatics strategy allowed us to identify rare damage variants in novel genes that play a vital role in TAPVC pathology. Our candidate genes (CLTCL1, CST3, GXYLT1, HMGA2, SNAI1, VAV2, ZDHHC8) open new fields of investigation into TAPVC pathology and provide novel insights into pulmonary vein development.
  41 in total

Review 1.  The snail superfamily of zinc-finger transcription factors.

Authors:  M Angela Nieto
Journal:  Nat Rev Mol Cell Biol       Date:  2002-03       Impact factor: 94.444

2.  Notch promotes epithelial-mesenchymal transition during cardiac development and oncogenic transformation.

Authors:  Luika A Timmerman; Joaquín Grego-Bessa; Angel Raya; Esther Bertrán; José María Pérez-Pomares; Juan Díez; Sergi Aranda; Sergio Palomo; Frank McCormick; Juan Carlos Izpisúa-Belmonte; José Luis de la Pompa
Journal:  Genes Dev       Date:  2003-12-30       Impact factor: 11.361

3.  Total anomalous pulmonary venous connection.

Authors:  J T BURROUGHS; J E EDWARDS
Journal:  Am Heart J       Date:  1960-06       Impact factor: 4.749

4.  Haploview: analysis and visualization of LD and haplotype maps.

Authors:  J C Barrett; B Fry; J Maller; M J Daly
Journal:  Bioinformatics       Date:  2004-08-05       Impact factor: 6.937

5.  Vav family GEFs link activated Ephs to endocytosis and axon guidance.

Authors:  Christopher W Cowan; Yu Raymond Shao; Mustafa Sahin; Steven M Shamah; Michael Z Lin; Paul L Greer; Sizhen Gao; Eric C Griffith; Joan S Brugge; Michael E Greenberg
Journal:  Neuron       Date:  2005-04-21       Impact factor: 17.173

6.  Analysis of a Scottish founder effect narrows the TAPVR-1 gene interval to chromosome 4q12.

Authors:  Steven B Bleyl; Lorenzo D Botto; John C Carey; Luciana T Young; Michael J Bamshad; Mark F Leppert; Kenneth Ward
Journal:  Am J Med Genet A       Date:  2006-11-01       Impact factor: 2.802

7.  Total anomalous pulmonary venous connection: Report of 93 autopsied cases with emphasis on diagnostic and surgical considerations.

Authors:  G Delisle; M Ando; A L Calder; J R Zuberbuhler; S Rochenmacher; L E Alday; O Mangini; S Van Praagh; R Van Praagh
Journal:  Am Heart J       Date:  1976-01       Impact factor: 4.749

8.  Total anomalous pulmonary venous return in the fourth decade.

Authors:  Zekeriya Nurkalem; Sevket Gorgulu; Mehmet Eren; Mehmet Salih Bilal
Journal:  Int J Cardiol       Date:  2005-11-10       Impact factor: 4.164

9.  Prognostic value of cystatin C in acute heart failure in relation to other markers of renal function and NT-proBNP.

Authors:  Johan Lassus; Veli-Pekka Harjola; Reijo Sund; Krista Siirilä-Waris; John Melin; Keijo Peuhkurinen; Kari Pulkki; Markku S Nieminen
Journal:  Eur Heart J       Date:  2007-02-08       Impact factor: 29.983

Review 10.  The incidence of congenital heart disease.

Authors:  Julien I E Hoffman; Samuel Kaplan
Journal:  J Am Coll Cardiol       Date:  2002-06-19       Impact factor: 24.094

View more
  31 in total

1.  m6A regulators are differently expressed and correlated with immune response of pancreatic adenocarcinoma.

Authors:  Tao Zhang; Ping Sheng; Yuting Jiang
Journal:  J Cancer Res Clin Oncol       Date:  2022-07-03       Impact factor: 4.553

2.  Identification of Molecular Characteristics and New Prognostic Targets for Thymoma by Multiomics Analysis.

Authors:  Dazhong Liu; Pengfei Zhang; Jiaying Zhao; Lei Yang; Wei Wang
Journal:  Biomed Res Int       Date:  2021-05-19       Impact factor: 3.411

3.  Hippo Pathway Core Genes Based Prognostic Signature and Immune Infiltration Patterns in Lung Squamous Cell Carcinoma.

Authors:  Chang Gu; Jiafei Chen; Xuening Dang; Chunji Chen; Zhenyu Huang; Weidong Shen; Xin Shi; Chenyang Dai; Chang Chen
Journal:  Front Oncol       Date:  2021-04-29       Impact factor: 6.244

4.  An in-depth analysis reveals two new genetic variants on 22q11.2 associated with vitiligo in the Chinese Han population.

Authors:  Xianfa Tang; Hui Cheng; Lu Cheng; Bo Liang; Mengyun Chen; Xiaodong Zheng; Fengli Xiao
Journal:  Mol Biol Rep       Date:  2021-08-04       Impact factor: 2.316

5.  Construction of a Nine-MicroRNA-Based Signature to Predict the Overall Survival of Esophageal Cancer Patients.

Authors:  Xiaobin Zhang; Yi He; Haiyong Gu; Zhichao Liu; Bin Li; Yang Yang; Jie Hao; Rong Hua
Journal:  Front Genet       Date:  2021-05-19       Impact factor: 4.599

6.  The Interaction Analysis of SNP Variants and DNA Methylation Identifies Novel Methylated Pathogenesis Genes in Congenital Heart Diseases.

Authors:  Jing Wang; Xiaoqin Ma; Qi Zhang; Yinghui Chen; Dan Wu; Pengjun Zhao; Yu Yu
Journal:  Front Cell Dev Biol       Date:  2021-05-04

7.  Total Anomalous Pulmonary Venous Connection in Mother and Son with a Central 22q11.2 Microdeletion.

Authors:  Signe Faurschou; Dorte L Lildballe; Lisa L Maroun; Morten Helvind; Maria Rasmussen
Journal:  Case Rep Genet       Date:  2021-06-10

8.  Marker Genes Change of Synovial Fibroblasts in Rheumatoid Arthritis Patients.

Authors:  Lifen Liao; Ke Liang; Lan Lan; Jinheng Wang; Jun Guo
Journal:  Biomed Res Int       Date:  2021-06-04       Impact factor: 3.411

9.  Integrative Analysis of Differently Expressed Genes Reveals a 17-Gene Prognosis Signature for Endometrial Carcinoma.

Authors:  Anna Wang; Hongyan Guo; Zaiqiu Long
Journal:  Biomed Res Int       Date:  2021-07-14       Impact factor: 3.411

10.  Long Non-coding RNA FIRRE Acts as a miR-520a-3p Sponge to Promote Gallbladder Cancer Progression via Mediating YOD1 Expression.

Authors:  Shuqing Wang; Yang Wang; Shouhua Wang; Huanjun Tong; Zhaohui Tang; Jiandong Wang; Yongjie Zhang; Jingmin Ou; Zhiwei Quan
Journal:  Front Genet       Date:  2021-06-08       Impact factor: 4.599

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.