| Literature DB >> 33905492 |
Kathryn C Asalone1, Ajuni K Takkar1, Colin J Saldanha2,3, John R Bracht1,3.
Abstract
Songbirds have an unusual genomic element which is only found in their germline cells, known as the germline-restricted chromosome (GRC). Because germ cells contain both GRC and non-GRC (or A-chromosome) sequences, confidently identifying the GRC-derived elements from genome assemblies has proven difficult. Here, we introduce a new application of a transcriptomic method for GRC sequence identification. By adapting the Stringtie/Ballgown pipeline to use somatic and germline DNA reads, we find that the ratio of fragments per kilobase per million mapped reads can be used to confidently assign contigs to the GRC. Using this comparative coverage analysis, we successfully identify 733 contigs as high confidence GRC sequences (720 newly identified in this study) and 51 contigs which were validated using quantitative polymerase chain reaction. We also identified two new GRC genes, one hypothetical protein and one gene encoding an RNase H-like domain, and placed 16 previously identified but unplaced genes onto their host contigs. With the current focus on sequencing GRCs from different songbirds, our work adds to the genomic toolkit to identify GRC elements, and we provide a detailed protocol and GitHub repository at https://github.com/brachtlab/Comparative_Coverage_Analysis (last accessed May 12, 2021).Entities:
Keywords: FPKM; GRC; comparative coverage analysis; germline-restricted chromosome; intronless transcriptomics; sequence discovery; zebra finch
Mesh:
Year: 2021 PMID: 33905492 PMCID: PMC8245190 DOI: 10.1093/gbe/evab088
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Fig. 1.(A) Visualization of expected read coverage of liver raw reads mapped onto a testis assembly. (B) Visualization of expected read coverage of testis raw reads mapped onto a testis assembly. Blue represents A-chromosome sequences/reads, whereas red represents germline-restricted sequences/reads. (C) Schematic representation of methods. Outputs from the previous step feed into the next step as the input. [Tab] represents a tab in the text as entered in TextEdit software.
Fig. 2.(A) Volcano plot of fragments per kilobase per million reads mapped (FPKM) fold change comparing testis (n = 4) and somatic (n = 4) data sets. The vertical line represents a fold change of 2, the horizontal line represents a q-value of 0.05. Unknown contigs are represented by empty, black, opaque (alpha = 0.25) circles; contigs validated in this study by qPCR are represented by red triangles; 36 contigs identified as GRC from Kinsella et al. are represented by blue circles. The contig identified as GRC from Kinsella et al. and validated in this study (vascular endothelial growth factor A, VEGFA) is a pink diamond whereas the scribble planar cell polarity protein, SCRIB contig, not predicted to be GRC in our analysis, is a yellow square; negative control contigs are represented by green squares. (B) Comparison of fold change of gene in genomic qPCR (testis/liver DNA) versus fold change of contig FPKM derived by comparative coverage analysis. The dotted line represents the 1:1 line. The solid horizontal line represents a 2-fold change in FPKM. Note, for most qPCR targets there are multiple contigs yielding FPKM values because of the repetitive nature of the GRC. Diphthine-ammonia ligase (DPH6) is represented in yellow, splicing factor 38A in gray, GRC noncoding sequence in navy blue, 1,4-alpha-glucan branching enzyme (GBE1) in red, bone morphogenetic protein 15 (BMP15) in green, vascular endothelial growth factor A (VEGFA) in orange, scribble planar cell polarity protein (SCRIB) in light blue, methyltransferase in pink, ribosomal protein L4 in black and A-chromosome noncoding sequence in light orange.
Table 1. Newly Described GRC Genes
| Gene | Function | Contig ID | FPKM Fold Change | Coordinate Start | Coordinate Stop |
|---|---|---|---|---|---|
| Hypothetical protein | N/A | 739 | 10.93 | 27,383 | 25,784 |
| 740 | 5.55 | 27,088 | 25,486 | ||
| 979 | 10.92 | 3,688 | 3,539 | ||
| 980 | 15.89 | 3,691 | 3,542 | ||
| 1600 | 9.00 | 506 | 16,387 | ||
| 1851 | 7.00 | 13,020 | 10,895 | ||
| 2223 | 9.37 | 319 | 8,595 | ||
| 2224 | 5.73 | 319 | 8,610 | ||
| 151423 | 3.47 | 273 | 1 | ||
| 168518 | 11.67 | 2,943 | 441 | ||
|
| |||||
| LOC105760826 | HMMER identifies an RNase H-like domain found in reverse transcriptase (PF17919) (Finn, Clements, and Eddy 2011). | 163675 | 3.90 | 2,057 | 213 |