| Literature DB >> 21057797 |
Angela Cánovas1, Gonzalo Rincon, Alma Islas-Trejo, Saumya Wickramasinghe, Juan F Medrano.
Abstract
High-throughput sequencing of RNA (RNA-Seq) was developed primarily to analyze global gene expression in different tissues. However, it also is an efficient way to discover coding SNPs. The objective of this study was to perform a SNP discovery analysis in the milk transcriptome using RNA-Seq. Seven milk samples from Holstein cows were analyzed by sequencing cDNAs using the Illumina Genome Analyzer system. We detected 19,175 genes expressed in milk samples corresponding to approximately 70% of the total number of genes analyzed. The SNP detection analysis revealed 100,734 SNPs in Holstein samples, and a large number of those corresponded to differences between the Holstein breed and the Hereford bovine genome assembly Btau4.0. The number of polymorphic SNPs within Holstein cows was 33,045. The accuracy of RNA-Seq SNP discovery was tested by comparing SNPs detected in a set of 42 candidate genes expressed in milk that had been resequenced earlier using Sanger sequencing technology. Seventy of 86 SNPs were detected using both RNA-Seq and Sanger sequencing technologies. The KASPar Genotyping System was used to validate unique SNPs found by RNA-Seq but not observed by Sanger technology. Our results confirm that analyzing the transcriptome using RNA-Seq technology is an efficient and cost-effective method to identify SNPs in transcribed regions. This study creates guidelines to maximize the accuracy of SNP discovery and prevention of false-positive SNP detection, and provides more than 33,000 SNPs located in coding regions of genes expressed during lactation that can be used to develop genotyping platforms to perform marker-trait association studies in Holstein cattle.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21057797 PMCID: PMC3002166 DOI: 10.1007/s00335-010-9297-z
Source DB: PubMed Journal: Mamm Genome ISSN: 0938-8990 Impact factor: 2.957
Summary of mapping all the RNA-Seq reads to the reference genome (Btau4.0) obtained from seven pooled milk samples
| Uniquely mapped reads | Nonspecifically mapped reads | Mapped reads | ||||
|---|---|---|---|---|---|---|
| No. of reads | % | No. of reads | % | No. of reads | % | |
| Total exon reads | 59888107 | 83 | 12480476 | 17 | 72368583 | 87.5 |
| Exon-exon readsa | 10961868 | 89 | 1313183 | 11 | 12275051 | |
| Total intron reads | 9100877 | 88 | 1271041 | 12 | 10371918 | 12.5 |
| Exon-intron readsb | 1475160 | 90 | 166238 | 10 | 1641398 | |
| Total gene reads | 68988984 | 83 | 13751517 | 17 | 82740501 | 100 |
a Exon-exon reads reads mapping to two contiguous exons. Number is included in total exon reads
b Exon-intron reads reads mapping an exon and a contiguous intron. Number is included in total intron reads
KASPar primers used to validate SNP detected by RNA-Seq
| Primer name | Sequence 5′ → 3′ | Positiona |
|---|---|---|
| DDIT3_60417924_A | GAAGGTCGGAGTCAACGGATTGGACTTCAGCCTTTAATATTGGAGAAA | I2 |
| DDIT3_60417924_T | GAAGGTGACCAAGTTCATGCTGGACTTCAGCCTTTAATATTGGAGAAT | |
| DDIT3_60417924 | CCATGGGATTTTCCAGGCAAGAGTA | I2 |
| INSIG2_73132468_C | GAAGGTCGGAGTCAACGGATTAAGCACTCTTATAGTCTGCATGACG | I8 |
| INSIG2_73132468_T | GAAGGTGACCAAGTTCATGCTAAAGCACTCTTATAGTCTGCATGACA | |
| INSIG2_73132468 | ATATCGTATCACAGTGTTGATGTGCCAAA | I8 |
| STAT5A_43749704_G | GAAGGTGACCAAGTTCATGCTCGAGCACCGGGTCAGGGC | I20 |
| STAT5A_43749704_A | GAAGGTCGGAGTCAACGGATTCGAGCACCGGGTCAGGGT | |
| STAT5A_43749704 | GCAGGCCAGCTCCCTCTGATA | E20 |
| STAT5A_43746587_C | GAAGGTCGGAGTCAACGGATTCGCCTGGAAGTTTGACTCTCC | E15 |
| STAT5A_43746587_G | GAAGGTGACCAAGTTCATGCTCGCCTGGAAGTTTGACTCTCG | |
| STAT5A_43746587 | GGAGTGTGGGCAATGCAGGGAA | I16 |
| STAT5A_43741732_T | GAAGGTGACCAAGTTCATGCTCTCCGCCAACTTCTCACACCT | I18 |
| STAT5A_43741732_A | GAAGGTCGGAGTCAACGGATTCTCCGCCAACTTCTCACACCA | |
| STAT5A_43741732 | GGCCCTGGGGCTCGGGTT | E18 |
Position according to exon/intron distribution in bovine gene
Fig. 1Total number of SNPs per gene expressed in milk cells mapped to the Btau4.0 genome assembly. Red dots represent the number of SNPs per gene in coding regions that are different between Hereford and Holstein. Blue dots represent the SNPs per gene that are polymorphic in Holsteins
List of 70 SNPs in 18 genes validated in coding regions using RNA-Seq and Sanger sequencing
| Gene | BTA | SNP locationa | Allele variation | Frequency |
|---|---|---|---|---|
|
| 10 | 21091039 | A/T | 50/50 |
| 21093914 | A/T | 60/40 | ||
| 21095171 | A/T | 60/40 | ||
| 21096245 | C/G | 60/40 | ||
| 21096645 | T/A | 60/40 | ||
|
| 22 | 50657856 | G/T | 50/50 |
| 50659783 | T/C | 60/40 | ||
| 50659810 | C/T | 60/40 | ||
|
| 5 | 60415461 | C/G | 60/40 |
| 60416148 | G/T | 60/40 | ||
| 60418030 | G/A | 60/40 | ||
|
| 21 | 21527200 | C/G | 50/50 |
| 21528000 | A/T | 60/40 | ||
| 21528001 | T/G | 60/40 | ||
| 21528082 | C/T | 50/50 | ||
| 21532215 | C/A | 60/40 | ||
|
| 21 | 6862322 | T/C | 60/40 |
| 6866036 | T/C | 60/40 | ||
| 6868728 | C/A | 60/40 | ||
| 6869964 | A/C | 50/50 | ||
|
| 5 | 29836177 | A/T | 50/50 |
|
| 4 | 121362562 | T/A | 60/40 |
| 121363743 | C/A | 60/40 | ||
| 121363744 | T/G | 60/40 | ||
| 121364402 | A/C | 50/50 | ||
| 121364403 | T/C | 70/30 | ||
| 121364574 | T/G | 60/40 | ||
| 121364685 | G/A | 50/50 | ||
| 121364911 | A/G | 60/40 | ||
| 121365407 | G/A | 60/40 | ||
| 121365607 | T/C | 60/40 | ||
| 121365773 | G/A | 60/40 | ||
| 121365941 | G/A | 60/40 | ||
| 121366597 | A/G | 50/50 | ||
| 121369136 | G/A | 60/40 | ||
| 121369282 | A/G | 60/40 | ||
| 121369843 | G/A | 60/40 | ||
| 121370020 | T/A | 60/40 | ||
| 121370021 | T/A | 50/50 | ||
| 121371117 | C/G | 50/50 | ||
|
| 2 | 73130467 | C/T | 50/50 |
|
| 2 | 47179315 | C/G | 60/40 |
|
| 8 | 110792768 | G/A | 60/40 |
| 110968532 | C/T | 60/40 | ||
|
| 22 | 53546833 | C/T | 50/50 |
| 53549091 | C/G | 50/50 | ||
| 53552713 | A/T | 60/40 | ||
|
| 11 | 30359073 | T/A | 60/40 |
| 30360874 | T/C | 60/40 | ||
|
| 19 | 35680267 | G/A | 60/40 |
| 35680574 | A/G | 60/40 | ||
| 35682842 | A/G | 60/40 | ||
| 35683588 | G/C | 60/40 | ||
| 35685082 | A/T | 60/40 | ||
|
| 29 | 31200612 | A/C | 50/50 |
|
| 2 | 83382392 | A/T | 60/40 |
|
| 19 | 43780445 | A/G | 70/30 |
| 43780740 | C/A | 60/40 | ||
|
| 19 | 43729581 | C/G | 50/50 |
| 43730210 | G/A | 60/40 | ||
| 43730211 | T/A | 60/40 | ||
| 43741509 | G/A | 60/40 | ||
| 43743914 | C/G | 50/50 | ||
| 43745596 | C/G | 60/40 | ||
| 43748702 | C/G | 70/30 | ||
|
| 19 | 43655236 | A/C | 50/50 |
|
| 5 | 60837392 | A/G | 60/40 |
| 60837393 | C/T | 50/50 | ||
| 60845709 | G/T | 60/40 | ||
| 60845948 | G/T | 60/40 |
aSNP location is based on the bovine genome assembly Btau4.0
Unique SNPs detected by RNA-Seq that were validated using the KASPar Genotyping System
| Gene | Chromosome | SNP | Position | Frequency (%) | Confirmeda |
|---|---|---|---|---|---|
|
| 5 | T/A | 60417924 | 50.0/50.0 | Yes |
|
| 2 | T/C | 73132468 | 50.0/50.0 | Yes |
|
| 19 | G/A | 43749704 | 54.5/45.5 | Yes |
|
| 19 | G/C | 43746587 | 70.6/29.4 | No |
|
| 19 | T/A | 43741732 | 66.7/33.3 | No |
DDIT3 DNA damage inducible transcript 3, INSIG2 insulin-induced gene 2, STAT5A signal transducer and activator of transcription 5A
aSNP confirmed by KASPar SNP Genotyping System and Sanger resequencing