| Literature DB >> 27577548 |
Andrew S Mason1, Janet E Fulton2, Paul M Hocking3, David W Burt4.
Abstract
BACKGROUND: LTR retrotransposons contribute approximately 10 % of the mammalian genome, but it has been previously reported that there is a deficit of these elements in the chicken relative to both mammals and other birds. A novel LTR retrotransposon classification pipeline, LocaTR, was developed and subsequently utilised to re-examine the chicken LTR retrotransposon annotation, and determine if the proposed chicken deficit is biologically accurate or simply a technical artefact.Entities:
Keywords: Chicken; ERV; Endogenous retrovirus; LTR retrotransposon; LocaTR identification pipeline
Mesh:
Substances:
Year: 2016 PMID: 27577548 PMCID: PMC5006616 DOI: 10.1186/s12864-016-3043-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1LocaTR pipeline for LTR retrotransposon identification. This flow chart shows how the structural identification, homology and secondary BLAST protocols were combined prior to the further analysis of element density, distribution, expression etc. Input/Output processing is controlled by Python and BASH scripting, and all identification programs used are freely available. The pipeline has been made applicable to any assembled genome and can be accessed via GitHub
Comparison of intact LTR retrotransposons identified by the four structural identification programs
| LTR_STRUC | LTR Harvest | MGEScan_LTR | RetroTector | |
|---|---|---|---|---|
| SIEs identified | 93 | 643 | 427 | 290 |
| Total SIE content (bp) | 767,132 | 4,837,212 | 4,928,810 | 2,664,622 |
| Mean SIE length (bp) | 8,249 | 7,523 | 11,543 | 9,188 |
| Median SIE length (bp) | 6,144 | 6,047 | 7,889 | 7,477 |
| Median SIE LTR identity (%) | 95.8 | 95.2 | 91.5 | 94.0 |
| Mean SIE GC content (%) | 48.1 | 47.2 | 45.7 | 46.3 |
| SIEs unique to program (%) | 28.0 | 65.3 | 44.3 | 50.7 |
Fig. 2Performance of the homology and structure based identification methodologies. Euler diagram representing the relative proportion of LTR retrotransposon content identified by the homology (red), structural ID (blue) and secondary BLAST (purple) modules of the LocaTR pipeline (Fig. 1). Numbers represent total length of LTR retrotransposon sequence in megabase pairs (Mbp); 31.52Mbp in total. Homology methods identified 20.26Mbp of sequence, and structural ID methods 9.11Mbp including a 4.94Mbp (54.23 %) overlap with the homology data. The secondary BLAST annotated an additional 7.09Mbp of sequence based on elements from the structural ID search
Comparison of LTR retrotransposon annotations between chicken genome assemblies
| galGal3 | galGal4 (RepeatMasker) | galGal4 (LocaTR) | |
|---|---|---|---|
| Assembly length (bp) | 1,098,770,941 | 1,046,932,099 | 1,046,932,099 |
| Scaffold N50 (bp) | 11,063,745 | 12,877,381 | 12,877,381 |
| Contig N50 (bp) | 46,345 | 279,750 | 279,750 |
| LTR content (bp) | 14,870,595 | 17,369,358 | 31,490,117 |
| LTR content (%) | 1.35 | 1.66 | 3.01 |
| Number of SIEs | 492 | - | 1,073 |
Fig. 3LTR retrotransposon distribution relative to Ensembl genome annotations. Distances of LTR retrotransposon for both the full (red) and structurally complete (blue) data sets, from Ensembl annotations. The genome wide distribution has significant depletion of elements in the TU for both full (P = 1.94e−5) and structurally complete (P = 3.90e−4) lists. Significance is highlighted using asterisks. Distances are the shortest intergenic distance between element and Ensembl annotation measured in 10 kilobase bins (where the value is the bin upper limit). TU = Transcriptional Unit (incl. exons, introns, UTRs and flanking 5 kilobases up and downstream). ND = Non-Defined (elements on contigs without any Ensembl annotation). Plot was constructed with MATLAB R2015b
Fig. 4LTR retrotransposon genome content across the Avian Lineage. Avian lineage cladogram showing twenty species from the three major lineages of birds and two outgroup species: the Carolina anole and the Western painted turtle. The LTR (%) column shows the relative proportion of the genome annotated as LTR retrotransposon by the combined RepeatMasker protocol. The third column gives the genome size in gigabase pairs (Gbp). The final column gives the scaffold N50 in megabase pairs (Mbp), as a measure indicative of assembly quality. Cladogram constructed based on the avian phylogeny constructed by [57]. Species names are reported as four letter codes in column 1. From top to bottom these are: acar (Anolis carolinensis), cpic (Chrysemys picta bellii), scam (Struthio camelus australis), tgus (Tinamus guttatus), ggal (Gallus gallus), mgal (Meleagris gallopavo), apla (Anas platyrhychos), cliv (Columba livia), pgut (Pterocles gutturalis), cann (Calypte anna), ccan (Cuculus canorus), pade (Pygoscelis adeliae), afor (Aptenodytes forsteri), pcri (Pelecanus crispus), fper (Falco peregrinus), mund (Melopsittacus undulates), cbra (Corvus brachyrhynchos), tgua (Taniopygia guttata), hleu (Haliaeetus leucocephalus), talb (Tyto alba), avit (Apaloderma vittatum), ppub (Picoides pubescens)