| Literature DB >> 22747999 |
Xuan Zhuang1, Chun Yang, Svein-Erik Fevolden, C-H Christina Cheng.
Abstract
BACKGROUND: Highly repetitive sequences are the bane of genome sequence assembly, and the short read lengths produced by current next generation sequencing technologies further exacerbates this obstacle. An adopted practice is to exclude repetitive sequences in genome data assembly, as the majority of repeats lack protein-coding genes. However, this could result in the exclusion of important genotypes in newly sequenced non-model species. The absence of the antifreeze glycoproteins (AFGP) gene family in the recently sequenced Atlantic cod genome serves as an example.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22747999 PMCID: PMC3441883 DOI: 10.1186/1471-2164-13-293
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Southern blot analysis of Atlantic cod genomic DNA showing presence of AFGP coding sequences.TaqI digested genomic DNA (~10–15 μg) from Atlantic cod (lanes 1–14) and polar cod (lanes 15–17) hybridized strongly to a polar cod B. saida AFGP coding sequence probe. Atlantic cod individuals include Norwegian coastal cod (NCC) and North East Arctic cod (NEAC) from the Finnmark coast and marginal Barents Sea sites: (lanes 1–3) N69° 26.91′ E19° 37.56′; (lanes 4–5) N69° 58.34′ E30° 2.37′; (lanes 6–8) N70° 7.24′ E30° 48.47′; (lanes 9–13) N71° 11.93′ E27° 59.29′, and one individual (lane 14) from Øresund, Denmark. NEAC and NCC are distinguished by their PanI genotype, BB and AA respectively as indicated, while AB can either be NEAC or NCC. For comparison, the related freshwater cod Lota lota (lane 18) that does not have AFGP shows no hybridization.
Figure 2Alignment of AFGP-containing scaffolds in Atlantic cod and partial AFGP genomic locus from polar cod. (A) Schematic alignment map—Grey shaded areas indicate regions of high nucleotide identities between polar cod and Atlantic cod. AFGP partial locus of Atlantic cod is represented by two sequence scaffolds (ATLCOD1As00125 and ATLCOD1As03479) in the Newbler assembly (ATLCOD1A), and one sequence scaffold (ATLCOD1Bs1552075) in the Celera assembly (ATLCOD1B). AFGP genes of polar cod and fragmented coding sequences in the cod scaffolds are depicted as red arrows pointing towards the 3′ end. Hypothetical protein-coding genes MAK16-like and RAB14-like are denoted as purple and green arrows, respectively. (B, C) Nucleotide alignments determined using VISTA—The two Atlantic cod sequence scaffolds ATLCOD1As00125 (B) and ATLCOD1As03479 (C) are depicted as connected brown and orange rectangular bars (assembled sequence fragments) and double black bars (gaps between the sequence fragments), generated with UCSC Genome browser. The framed histogram below the Atlantic cod sequence scaffold structures was VISTA generated plots of their alignment with the polar cod partial AFGP locus sequence. Purple areas in the histogram denote conserved AFGP sequences, and grey arrows denote the AFGP genes we annotated in the two Atlantic cod scaffolds. Pink areas denote other conserved regions. In (B) the full length of ATLCOD1As00125 (134,272 bp) was shown. In (C), the first 85 kbp of ATLCOD1As03479 (254,310 bp, reverse complement sequence) was shown; the remaining region of the scaffold did not align with polar AFGP genomic region and therefore not shown. The conserved sequence blocks between Atlantic cod and polar shared very high sequence identities ranging from 80% to 99%, indicative of shared microsynteny.
Codon usage bias in the 141 9-nt tripeptide repeat coding sequences in AFGP gene Gm1-1
| | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| G | 122 | 0 | 45 | 141 | 0 | 0 | 2 | 4 | 0 |
| A | 0 | 0 | 45 | 0 | 0 | 20 | 139 | 0 | 68 |
| C | 19 | 141 | 49 | 0 | 141 | 121 | 0 | 137 | 25 |
| T | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 48 |
| Predominant nt | G | C | C/G/A | G | C | C | A | C | A/T |
| % | 86.5 | 100 | 34.6/31.9/31.9 | 100 | 100 | 86.4 | 98.6 | 97.2 | 48.2/34.0 |
Trinucleotide equivalent of the biased 9-nt tripeptide repeat coding sequences in AFGP gene Gm1-1
| G | 265 | 4 | 45 |
| A | 139 | 0 | 133 |
| C | 19 | 419 | 195 |
| T | 0 | 0 | 50 |
| Predominant nt | G/A | C | C/A |
| % | 62.6/32.9 | 99.1 | 46.1/31.4 |
Note: There are a total of 423 single codons from the 141 9-nt (3-codon) repeats in Gm1-1.