Literature DB >> 26937342

NGS meta data analysis for identification of SNP and INDEL patterns in human airway transcriptome: A preliminary indicator for lung cancer.

Sathya B¹, Akila Parvathy Dharshini², Gopal Ramesh Kumar².

Abstract

High-throughput sequencing of RNA (RNA-Seq) was developed primarily to analyze global gene expression in different tissues. It is also an efficient way to discover coding SNPs and when multiple individuals with different genetic backgrounds were used, RNA-Seq is very effective for the identification of SNPs. The objective of this study was to perform SNP and INDEL discoveries in human airway transcriptome of healthy never smokers, healthy current smokers, smokers without lung cancer and smokers with lung cancer. By preliminary comparative analysis of these four data sets, it is expected to get SNP and INDEL patterns responsible for lung cancer. A total of 85,028 SNPs and 5738 INDELs in healthy never smokers, 32,671 SNPs and 1561 INDELs in healthy current smokers, 50,205 SNPs and 3008 INDELs in smokers without lung cancer and 51,299 SNPs and 3138 INDELs in smokers with lung cancer were identified. The analysis of the SNPs and INDELs in genes that were reported earlier as differentially expressed was also performed. It has been found that a smoking person has SNPs at position 62,186,542 and 62,190,293 in SCGB1A1 gene and 180,017,251, 180,017,252, and 180,017,597 in SCGB3A1 gene and INDELs at position 35,871,168 in NFKBIA gene and 180,017,797 in SCGB3A1 gene. The SNPs identified in this study provides a resource for genetic studies in smokers and shall contribute to the development of a personalized medicine. This study is only a preliminary kind and more vigorous data analysis and wet lab validation are required.

Entities: Chemical Disease Gene Mutation Species

Keywords: Airway transcriptome; INDEL; Lung cancer; Next generation sequencing (NGS); SNP; Secretoglobin

Year: 2014 PMID： 26937342 PMCID： PMC4745382 DOI： 10.1016/j.atg.2014.12.003

Source DB: PubMed Journal: Appl Transl Genom ISSN： 2212-0661

Introduction

Next-generation sequencing (NGS) technology has produced immense biological data and has shed light on the path towards personalized medicine (Liu et al., 2013). NGS technology is used extensively for various applications such as: de novo sequencing, disease mapping, and quantifying expression levels through RNA sequencing etc. (Nielsen et al., 2011). The general application of NGS is: SNP and other variations discoveries, whose downstream usefulness is linkage map construction, genetic diversity analyses, association mapping and marker assisted selection. They make up over 90% of all human genetic variations, implicated in phenotype differences, risk to certain diseases and response to drugs. They also serve as popular biomarkers in pharmacogenomic studies to understand inter-individual differences in response to various treatments. Even synonymous SNPs also influence mRNA stability, protein conformation and protein regulation (Sauna and Kimchi-Sarfaty, 2011). Therefore, it is essential to obtain accurate SNP and INDEL profile information through advanced methods such as next-generation sequencing technologies (Yu and Sun, 2013). (See Fig. 1.)

Fig. 1

Schematic representation of workflow of this current study.

The airway epithelium constitutes an essential tissue barrier protecting the lung from inhaled environmental challenges. Tobacco smoking is the dominant causative for lung/pulmonary cancer and because of this, the epithelial cells of the respiratory tract were impaired in lung cancer patients (Beane et al., 2011a, Shields, 1999, Spira et al., 2004, Miyazu et al., 2005). It creates a field of injury in epithelial cells that line the respiratory tract and is a causative factor for chronic obstructive pulmonary disease and lung cancer, with 10% to 20% of smokers developing these diseases. A smoking-related gene and miRNA expression alteration in the cytologically normal large and small airway epithelium has been proved by using microarray experiments. These expression alterations have been categorized by their degree of reversibility upon smoking cessation, providing insights into genomic changes that may account for persistent lung cancer risk. Similar gene expression alterations have been found in the epithelia of the nose and mouth of smokers (Guo et al., 2004). The serious challenge in lung cancer is that the early detection and anomalous changes in the immune system, persistent inflammation, alteration in chemokine receptor signaling and cytokine trafficking are the most essential features in disease pathogenesis (Beane et al., 2011b, Kunkel and Butcher, 2002, Keith and Miller, 2013, Schembri et al., 2009, Boyle et al., 2010, Beane et al., 2007, Zhang et al., 2010, Harvey et al., 2007). Genes S100A8 and S100A9, which are known to be involved in the inflammatory response in the lung, and CYP4F2, a member of the cytochrome P450 family of enzymes that play a role in xenobiotic pathways, were found to be upregulated in smokers by both RNA-Seq and qRT-PCR. Similarly, the expression of the CCL20, IL8, NFKB1A, and SCGB3A1 immunomodulatory genes were found to be upregulated in the normal airway of patients with lung cancer (Beane et al., 2011b). Genes involved in lung cancer from literature studies were collected and the most relevant genes in lung cancer include the following: EGFR, KRAS, MET, LKB1, BRAF, PIK3CA, ALK, RET, and ROS1. Other frequently mutated genes include tyrosine kinases such as EGFR homolog ERBB4 and multiple ephrin receptor genes such as EPHA3, VEGFR2 (KDR), and NTKR. Recent advances in the fields of mutational analysis and molecularly targeted therapy made it possible to develop new receptor kinase inhibitors such as erlotinib and gefitinib (against EGFR) and most recently crizotinib (against ALK) and antibodies such as ascetuximab (against EGFR) and bevacizumab (against VEGF). In the current study, SNPs and INDELs were analyzed in four different categories of samples (healthy never smokers, current smokers, smokers with lung cancer and smoking without lung cancer) processed by RNA Seq technology, these following data procured from Frank Schembri et al., 2011 (Beane et al., 2011a). Reads were pre-processed by quality checking and filtering of low quality reads. The processed paired-end reads were mapped to reference genome hg19. Reads that mapped to reference were detached and the replica reads were removed to knock out the false SNPs and INDELs. SNP calling was implemented with default parameters and it was annotated to predict the functional impact of variants. Genes that showed distinctive expression by differential expression, gene analysis and substantiated through microarray were reported earlier in these samples (Beane et al., 2011a). The variants existing in the samples and SNPs/INDELs which are expedient to cause lung cancer in smoking person were identified. This is only a preliminary analysis and further in depth data analysis and wet lab experiments would help for further validation.

Materials and methods

Source of NGS data

The mRNA/transcriptome sequence of human airway epithelial cells was attained in the interim of bronchoscopy undergoing lung nodule resection surgery. These data were retrieved from prior study tabulated in Table 1 (Beane et al., 2011a). The reads were paired-end and sequenced through pooled sequence approach using Illumina technology.

Table 1

Details of samples used for NGS data analysis.

Data accession	Category	Details	Age	Status
SRR192333	Healthy never smokers	3 female	~ 29	Healthy normal
SRR192334	Current smokers	1 male 2 female	~ 41.7	Smokers
SRR192335	Smokers without lung cancer	1 male 2 female	~ 49	2 former and 1 current smokers
SRR192336	Smokers with lung cancer	2 male 1 female	~ 64.7	2 former and 1 current smokers with cancer

The differential expression analysis was performed in these samples. The genes that are differently expressed by RNA differential expression analysis and microarray analysis between these samples are tabulated in Table 2. Pathways and molecular functions such as oxidoreductase activity, metabolism of xenobiotics by cytochrome P450 and retinol metabolism were enriched among genes differentially expressed between current and never smokers. Cytokine–cytokine receptor interaction, chemokine signaling pathway, and cell adhesion molecules were enriched among genes differentially expressed between smokers with and without cancer. The differential expressed genes may be induced by other alternations such as fusion genes, copy number variations and methylation.

Table 2

List of differentially expressed genes in diverse sample.

gene symbol	Gene name	Location	Gene expression/sample
S100A8	Calgranulin A	Chr1: 153,362,50–153,363,664	Upregulation in smokers
S100A9	Calgranulin B	Chr1: 153,330,330–153,333,503	Upregulation in smokers
CYP4F2	Cytochrome p450	Chr19: 15,988,834–6,008,885	Upregulation in smokers
NFKB1A1	NFKB inhibitor	Chr14: 35,870,717–35,873,952	Upregulation in smokers with lung cancer
SCGB1A1	Secretoglobin	Chr11: 62,172,575–62,190,667	Differentially expressed smokers with lung cancer
SCGB3A1	Secretoglobin	Chr5: 180,017,103–180,018,540	Upregulation in smokers with lung cancer
CCL20	Chemokine	Chr2: 228,678,558–228,682,272	Upregulation in smokers with lung cancer
IL8	Interleukin 8	Chr4: 74,606,223–74,609,433	Upregulation in smokers with lung cancer
RP11-295J3.2	ncRNA	Chr10: 127,660,757–27,661,695	Down regulation in smokers with lung cancer and smokers
CTD-2325P2.2	Pseudogene	Chr14: 69,159,807–69,160,300	Upregulation in smokers with lung cancer, Down regulation in smokers

Data retrieval

The next generation paired-end sequencing data were procured from the DDBJ DRA database. The data are freely accessible with the successive accession numbers SRR192333, SRR192334, SRR192335, SRR192336 for four different categories of samples such as healthy never smokers, healthy current smokers, smokers without lung cancer and smokers with lung cancer which have been analyzed in this investigation.

Pre-processing

Next generation sequencing approach generates plenty of sequence data in a single experimental run. Hence the data encompasses sequence artifacts that include read errors, poor quality reads and primer/adaptor contamination which have an impact on downstream investigation. Henceforth the quality of data is very decisive otherwise they may lead to spurious conclusions. The initial step after accomplishing the sequencing run is to assess the base quality and to trim or correct bases that do not reconcile the delineated from required specification. Pre-processing was executed using NGSQC toolkit to assess the quality of the data, examine the distribution of nucleotide, and percolate the low quality reads based on sequence constitution (Patel and Jain, 2012).

Alignment/mapping

Various groups effectively developed algorithms and software kit to execute alignment and mapping. In this work, we used Burrows–Wheeler Aligner to efficiently align short sequencing reads versus human reference genome hg19 (Pabinger et al., 2013, Li and Durbin, 2009). The accurately mapped reads were detached using the Filter SAM program in SAMtools and the duplicates were discarded using Mark duplicates in Picard tool (Li et al., 2009).

Variant calling

SNP, INDEL and structural divergent regions were precisely pinpointed using a variant calling process between four different categories of samples and the reference genome. In this study, we implemented variant calling using Genome Analysis Toolkit Unified Genotypic caller (GATK-UGT) (DePristo et al., 2011) and alignments were recalibrated around insertion/deletion region according to Freud-scaled quality scores. GATK-UGT generates output in VCF format. Along with the SNP and its position, it also reports additional information about the called SNPs, such as quality by depth (coverage), mapping quality, read depth, and genotype quality that represent the quality of the called SNPs (Ruffalo et al., 2011, Yu and Sun, 2013).

Variant annotation

It is essential to envision that the possibility of the functional impact of variants in an automated fashion is becoming progressively critical. The tools used for annotation are as follows: ANNOVAR is an efficient tool to elucidate functional residual of genetic variation and illustrated based on gene and its location and type of variation (Wang et al., 2010). SIFT was used to predict the effect of amino acid substitutions on the protein function (Ng and Henikoff, 2001). PolyPhen-2 detects the impact of structural changes intern how it is affecting the protein function by divergent sequence and homology based phylogenetic methods (Adzhubei et al., 2011).

Results and discussion

A simple pipeline to perceive SNPs and INDELs includes, pre-processing the sequence data and filtering low quality bases, mapping reads to the human reference genome, and post-processing of the alignment results in order to find the effect of variation. Initially the sequencing quality was scrutinized using FastQC tool. NGSQC toolkit was used to filter the low quality reads and discard the primer/adapter contaminated reads with default parameters. After filtering based on the quality score, 20.2 million reads in healthy never smokers, 10.3 million reads in healthy current smokers, 10.8 million reads in smokers without lung cancer and 10.7 million reads in smokers with lung cancer were retained and used for further analysis.

Alignment

Short sequencing reads were mapped to the annotated human reference genome (hg19) using BWA with the default parameters. Properly mapped reads were separated from the unmapped reads using Filter SAM by setting the flag values in SAMtool. The main rationales for these unmapped reads are sequencing flaws, uneven quality of the sample preparations, physical gap of the reference and the defined mapping criteria. SAMtools flagstat was used to implement elementary statistics on aligning binary alignment (BAM) files. Among 44,503,612 reads, about 89.36% were aligned against hg19 and the properly mapped reads were 72.95% in healthy never smokers. Among 27,548,608 reads about 87.74% were aligned and properly mapped reads were 71.48% in healthy current smokers. Among 37,108,950 reads about 90.72% were aligned and properly mapped reads were 77.13% in smokers without lung cancer. Among 35,174,558 reads about 90.92% were aligned and properly mapped reads were 77% in smokers with lung cancer. After performing alignment, SAMtools was used to remove duplicate reads (Li et al., 2009). Short-read alignment tools often misalign reads around INDELs, which in many cases results in mismatches. Local realignment around INDELs revamps the accuracy of INDEL calling. Local realignment, eliminates millions of mostly false positive variants while preserving nearly all truly variable sites. To attain perfect call set possible realignment and recalibration were done using INDEL REALIGNER and TABLE RECALIBRATION tools incorporated within GATK. The recalibrated alignment files were then used for SNP disclosure.

SNP/INDEL calling and annotation

The variant discovery software suite developed by the 1000 Genomes Project, the Genome Analysis Tool Kit Unified genotype caller (GATK-UGT) was used to identify SNPs and INDEL. GATK shows a relatively higher positive calling rate and sensitivity when compared to the others, and tends to call more SNPs and lower the false detection. The number of SNPs identified using GATK-UGT is tabulated in Table 3.

Table 3

Number of SNPs present in four categories.

Category	SNP	Transition	Transversion	Ti/Tv ratio
Healthy never smokers	85,028	55,314	29,714	1.86
Healthy current smokers	32,671	21,185	11,486	1.84
Smokers without lung cancer	50,205	32,820	17,385	1.88
Smokers with lung cancer	51,299	33,063	18,236	1.81

It was proclaimed that Ti/Tv ratio for a random variation resulting from systematic errors in the sequencing technology, alignment artifacts and data processing failures should be close to 0.5 (Ni et al., 2012, Ding et al., 2010). In the current study, transition to transversion ratio ranges from 1.81 to 1.88 that signifies that SNPs were likely resulting from true nucleotide polymorphism. SNP was sorted based on functional classes as missense, nonsense and silent mutations. Missense mutation was higher than silent and non-sense mutations. The number of missense mutations in healthy never smokers was 8571, healthy current smokers was 3005, smokers without lung cancer was 3725 and smokers with lung cancer was 3930. Further analysis with this missense mutation can identify mutated genes culpable for disease. Using ANNOVAR, annotation was performed against dbSNP to identify the known SNPs. SNPs which were not found in dbSNP are considered to be novel. The number of known and novel SNPs/INDEL was tabulated in Table 4. INDEL was called using GATK tool with the default parameters. All the QUAL scores were found to be greater than 30 and read depth approximately ranges from 20–250 which was ordinarily used as principle for calling variants in GATK. The number of insertions was found to be double than deletions in all samples. INDEL annotation was performed using an ANNOVAR tool to discover the known and novel INDELs and is tabulated in Table 4.

Table 4

Number of Known and Novel SNPs/INDEL present in all the four categories.

Category	Total SNPs	Known SNPs	Novel SNPs	Total INDELs	Known INDELs	Novel INDELs
Healthy never smokers	85,028	37,635	47,393	5738	2305	3433
Healthy current smokers	32,671	14,396	18,275	1561	654	910
Smokers without lung cancer	50,205	21,914	28,291	3008	1300	1708
Smokers with lung cancer	51,299	21,451	29,848	3138	1363	1775

The annotation was performed based on genomic location and the SNPs and INDELS were distributed in exonic, intronic, intergenic, downstream, upstream, UTR3, UTR5, and splicing region. SNPs/INDELs found in the exonic region are of most influential, variation in the coding region leads to protein non-functional and it is tabulated in Table 5. INDELs found in exonic region were classified into groups as frameshift deletion and insertion, non-frameshift deletion and insertion and are tabulated in Table 5. The frameshift mutation results in abnormal protein products and changes the function and its regulation.

Table 5

SNPs/INDEL present in exonic/functional region.

Category	Synonymous-SNP	Non-synonymous-SNP	Frameshift deletion	Frameshift insertion	Non-frameshift deletion	Non-frameshift insertion
Healthy never smokers	3116	3234	71	230	10	24
Healthy current smokers	1419	1429	23	71	4	9
Smokers without lung cancer	1830	2061	46	107	4	19
Smokers with lung cancer	1902	2058	48	116	5	13

Analysis of differentially expressed genes

SNPs and INDELs were found in the following genes such as S100A8, S100A9, IL8, NFKB1A, SCGB1A1 and SCGB3A1. CYP4F2, CCL20, RP11-295J3.2, and CTD-2325P2.2 genes do not have any SNPs and INDELs. S100A8 and S100A9 genes are calcium and zinc binding proteins, which play a prominent role in regulation of inflammatory response as well as in cancer development and differential expression of these genes is associated with the disease cystic fibrosis (Ni et al., 2012, Ding et al., 2010, Ding and Kaminsky, 2003, Lim et al., 2009, Hsu et al., 2009, Henke et al., 2006). IL8 plays an essential role in pathogenesis of bronchiolitis, a common respiratory tract disease. NFKB1A is involved in inflammatory response and cancer. SCGB1A1 is a member of secretoglobin family and defects in this gene are associated with susceptibility to asthma, lung disease, respiratory failure etc. (Sjodin et al., 2003). SCGB3A1 is involved in regulation of cell proliferation and includes cytokine activity. The number of SNPs and INDELs found within these differential expressed genes is tabulated in Table 6.

Table 6

Number of SNPs and INDELs present in differentially expressed genes.

Category	SNPs	INDELs
Healthy never smokers	39	14
Healthy current smokers	29	3
Smokers without lung cancer	27	7
Smokers with lung cancer	43	10

S100A8, S100A9, IL8, NFKB1A and SCGB3A1 genes were upregulated in the disease pathogenesis. Based on the category of sample the common variation which is present in smokers with lung cancer, smokers without lung cancer and current smokers was tabulated in Table. 7. These variations were not observed in healthy never smokers and the following changes detected in regulatory region that may affect the regulation of the gene.

Table 7

SNPs and INDELs present in different categories.

Chr	Position	dbSNP	Ref/Alt	Category	Location	Gene	Type
1	153,333,376	Novel	A/C	CS + SNL + SL	UTR3	S100A9	NA
4	74,606,669	rs2227307	T/G	CS + SNL + SL	Intronic	IL8	NA
1	153,362,719	Novel	T/TC	CS + SNL + SL	Intronic	S100A8	Frameshift insertion
4	74,607,910	rs2227543	C/T	SNL + SL	Intronic	IL8	NA
4	74,608,162	Novel	T/A	SNL + SL	Intronic	IL8	NA
4	74,608,163	Novel	T/G	SNL + SL	Intronic	IL8	NA
4	74,608,408	Novel	C/CA	SNL + SL	UTR3	IL8	NA
11	62,186,542	rs3741240	G/A	SL	UTR5	SCGB1A1	NA
11	62,190,293	rs191704193	G/A	SL	Intronic	SCGB1A1	NA
5	180,017,251	Novel	C/A	SL	Exonic	SCGB3A1	Nonsynonymous SNV
5	180,017,252	Novel	C/G	SL	Exonic	SCGB3A1	Synonymous SNV
5	180,017,597	Novel	T/C	SL	Intronic	SCGB3A1	NA
14	35,871,168	Novel	C/CT	SL	UTR3	NFKB1A	NA
5	180,017,797	Novel	G/GA	SL	Exonic	SCGB3A1	Frameshift insertion

NA: not Available, CS: current smokers, SNL: smokers with no lung cancer, SL: smokers with lung cancer.

When evaluating the changes between smokers with lung cancer and without lung cancer, we detect some discrepancy in the IL8 regulatory region. By comparing the entire sample, some of the variations are present only in smokers with lung cancer. The effect of non-synonymous aberrations can be anticipated using SIFT and PolyPhen-2. SIFT predicts the potential impact of amino acid substitution on protein function based on sequence homology. PolyPhen-2 also detects the effect of non-synonymous variation based on homologous sequence and structure based prediction. SNP present in this position 180,017,251, amino acid changes alanine to serine predicted to be deleterious based on Polyphen-2 prediction. Frameshift insertion in the position 180,017,797 envisioned to be damaging the protein function. The above SNPs were not reported in catalogue of somatic mutation in cancer (COSMIC) database (Forbes et al., 2011).

Conclusion

Transcriptome analysis using next generation sequencing is the most competent and cost effective for identification of SNPs and INDELs. In the current study, our main focus was to analyze the variants existing in the genes which were disclosed earlier as differentially expressed. We found 5 SNPs and 2 INDELs in SCGB1A1, SCGB3A1 and NFKB1A genes which were present only in smokers with lung cancer. Hence a smoking person having this set of SNPs and INDELs is a preliminary signature for the disease pathogenesis. Understanding an individual's genetic makeup is believed to be the key role in personalized medicine to maximize drug efficacy and minimize adverse side effects. In the future, an individual's genome will be sequenced to predict the future health of the individual and to develop personalized medical treatments that are tailored to work with the genetic variation that is detected (Mullaney et al., 2010). Extending this study to a large group of population will be useful for developing personalized medicine. In the future, we are planning to study the stability of wild and mutant protein for secretoglobin using extensive molecular dynamics study in order to find the effect of frameshift and non-synonymous mutation in structural level.

34 in total

1. ANTI-INFECTIVE PROTECTIVE PROPERTIES OF S100 CALGRANULINS.

Authors: Kenneth Hsu; Chantrakorn Champaiboon; Brian D Guenther; Brent S Sorenson; Ali Khammanivong; Karen F Ross; Carolyn L Geczy; Mark C Herzberg
Journal: Antiinflamm Antiallergy Agents Med Chem Date: 2009-12-04

Review 2. Understanding the contribution of synonymous mutations to human disease.

Authors: Zuben E Sauna; Chava Kimchi-Sarfaty
Journal: Nat Rev Genet Date: 2011-08-31 Impact factor: 53.242

Review 3. Chemokines and the tissue-specific migration of lymphocytes.

Authors: Eric J Kunkel; Eugene C Butcher
Journal: Immunity Date: 2002-01 Impact factor: 31.745

Review 4. Molecular epidemiology of lung cancer.

Authors: P G Shields
Journal: Ann Oncol Date: 1999 Impact factor: 32.976

Review 5. Analysis of next-generation genomic data in cancer: accomplishments and challenges.

Authors: Li Ding; Michael C Wendl; Daniel C Koboldt; Elaine R Mardis
Journal: Hum Mol Genet Date: 2010-09-15 Impact factor: 6.150

Review 6. Genotype and SNP calling from next-generation sequencing data.

Authors: Rasmus Nielsen; Joshua S Paul; Anders Albrechtsen; Yun S Song
Journal: Nat Rev Genet Date: 2011-06 Impact factor: 53.242

7. Telomerase expression in noncancerous bronchial epithelia is a possible marker of early development of lung cancer.

Authors: Yuka Matsuoka Miyazu; Teruomi Miyazawa; Keiko Hiyama; Noriaki Kurimoto; Yasuo Iwamoto; Hiroo Matsuura; Koji Kanoh; Nobuoki Kohno; Masahiko Nishiyama; Eiso Hiyama
Journal: Cancer Res Date: 2005-11-01 Impact factor: 12.701

Review 8. Human extrahepatic cytochromes P450: function in xenobiotic metabolism and tissue-selective chemical toxicity in the respiratory and gastrointestinal tracts.

Authors: Xinxin Ding; Laurence S Kaminsky
Journal: Annu Rev Pharmacol Toxicol Date: 2002-01-10 Impact factor: 13.820

9. The Sequence Alignment/Map format and SAMtools.

Authors: Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal: Bioinformatics Date: 2009-06-08 Impact factor: 6.937

10. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer.

Authors: Simon A Forbes; Nidhi Bindal; Sally Bamford; Charlotte Cole; Chai Yin Kok; David Beare; Mingming Jia; Rebecca Shepherd; Kenric Leung; Andrew Menzies; Jon W Teague; Peter J Campbell; Michael R Stratton; P Andrew Futreal
Journal: Nucleic Acids Res Date: 2010-10-15 Impact factor: 16.971

4 in total

1. Comparative assessments of indel annotations in healthy and cancer genomes with next-generation sequencing data.

Authors: Jing Chen; Jun-Tao Guo
Journal: BMC Med Genomics Date: 2020-11-10 Impact factor: 3.063

2. Two Clubroot-Resistance Genes, Rcr3 and Rcr9^wa, Mapped in Brassica rapa Using Bulk Segregant RNA Sequencing.

Authors: Md Masud Karim; Abdulsalam Dakouri; Yan Zhang; Qilin Chen; Gary Peng; Stephen E Strelkov; Bruce D Gossen; Fengqun Yu
Journal: Int J Mol Sci Date: 2020-07-16 Impact factor: 5.923

3. Identification of Genome-Wide Variants and Discovery of Variants Associated with Brassica rapa Clubroot Resistance Gene Rcr1 through Bulked Segregant RNA Sequencing.

Authors: Fengqun Yu; Xingguo Zhang; Zhen Huang; Mingguang Chu; Tao Song; Kevin C Falk; Abhinandan Deora; Qilin Chen; Yan Zhang; Linda McGregor; Bruce D Gossen; Mary Ruth McDonald; Gary Peng
Journal: PLoS One Date: 2016-04-14 Impact factor: 3.240

4. Comparative transcriptomics uncovers alternative splicing and molecular marker development in radish (Raphanus sativus L.).

Authors: Xiaobo Luo; Liang Xu; Dongyi Liang; Yan Wang; Wei Zhang; Xianwen Zhu; Yuelin Zhu; Haiyan Jiang; Mingjia Tang; Liwang Liu
Journal: BMC Genomics Date: 2017-07-03 Impact factor: 3.969

4 in total