| Literature DB >> 28667019 |
Wilson Leung1, Christopher D Shaffer2, Elizabeth J Chen2, Thomas J Quisenberry2, Kevin Ko2, John M Braverman3, Thomas C Giarla4, Nathan T Mortimer5, Laura K Reed6, Sheryl T Smith7, Srebrenka Robic8, Shannon R McCartha8, Danielle R Perry8, Lindsay M Prescod8, Zenyth A Sheppard8, Ken J Saville9, Allison McClish9, Emily A Morlock9, Victoria R Sochor9, Brittney Stanton9, Isaac C Veysey-White9, Dennis Revie10, Luis A Jimenez10, Jennifer J Palomino10, Melissa D Patao10, Shane M Patao10, Edward T Himelblau11, Jaclyn D Campbell11, Alexandra L Hertz11, Maddison F McEvilly11, Allison R Wagner11, James Youngblom12, Baljit Bedi12, Jeffery Bettincourt12, Erin Duso12, Maiye Her12, William Hilton12, Samantha House12, Masud Karimi12, Kevin Kumimoto12, Rebekah Lee12, Darryl Lopez12, George Odisho12, Ricky Prasad12, Holly Lyn Robbins12, Tanveer Sandhu12, Tracy Selfridge12, Kara Tsukashima12, Hani Yosif12, Nighat P Kokan13, Latia Britt13, Alycia Zoellner13, Eric P Spana14, Ben T Chlebina14, Insun Chong14, Harrison Friedman14, Danny A Mammo14, Chun L Ng14, Vinayak S Nikam14, Nicholas U Schwartz14, Thomas Q Xu14, Martin G Burg15, Spencer M Batten15, Lindsay M Corbeill15, Erica Enoch15, Jesse J Ensign15, Mary E Franks15, Breanna Haiker15, Judith A Ingles15, Lyndsay D Kirkland15, Joshua M Lorenz-Guertin15, Jordan Matthews15, Cody M Mittig15, Nicholaus Monsma15, Katherine J Olson15, Guillermo Perez-Aragon15, Alen Ramic15, Jordan R Ramirez15, Christopher Scheiber15, Patrick A Schneider15, Devon E Schultz15, Matthew Simon15, Eric Spencer15, Adam C Wernette15, Maxine E Wykle15, Elizabeth Zavala-Arellano15, Mitchell J McDonald15, Kristine Ostby15, Peter Wendland15, Justin R DiAngelo16, Alexis M Ceasrine16, Amanda H Cox16, James E B Docherty16, Robert M Gingras16, Stephanie M Grieb16, Michael J Pavia16, Casey L Personius16, Grzegorz L Polak16, Dale L Beach17, Heaven L Cerritos17, Edward A Horansky17, Karim A Sharif18, Ryan Moran18, Susan Parrish19, Kirsten Bickford19, Jennifer Bland19, Juliana Broussard19, Kerry Campbell19, Katelynn E Deibel19, Richard Forka19, Monika C Lemke19, Marlee B Nelson19, Catherine O'Keeffe19, S Mariel Ramey19, Luke Schmidt19, Paola Villegas19, Christopher J Jones20, Stephanie L Christ20, Sami Mamari20, Adam S Rinaldi20, Ghazal Stity20, Amy T Hark21, Mark Scheuerman21, S Catherine Silver Key22, Briana D McRae22, Adam S Haberman23, Sam Asinof23, Harriette Carrington23, Kelly Drumm23, Terrance Embry23, Richard McGuire23, Drew Miller-Foreman23, Stella Rosen23, Nadia Safa23, Darrin Schultz23, Matt Segal23, Yakov Shevin23, Petros Svoronos23, Tam Vuong23, Gary Skuse24, Don W Paetkau25, Rachael K Bridgman25, Charlotte M Brown25, Alicia R Carroll25, Francesca M Gifford25, Julie Beth Gillespie25, Susan E Herman25, Krystal L Holtcamp25, Misha A Host25, Gabrielle Hussey25, Danielle M Kramer25, Joan Q Lawrence25, Madeline M Martin25, Ellen N Niemiec25, Ashleigh P O'Reilly25, Olivia A Pahl25, Guadalupe Quintana25, Elizabeth A S Rettie25, Torie L Richardson25, Arianne E Rodriguez25, Mona O Rodriguez25, Laura Schiraldi25, Joanna J Smith25, Kelsey F Sugrue25, Lindsey J Suriano25, Kaitlyn E Takach25, Arielle M Vasquez25, Ximena Velez25, Elizabeth J Villafuerte25, Laura T Vives25, Victoria R Zellmer25, Jeanette Hauke26, Charles R Hauser27, Karolyn Barker27, Laurie Cannon27, Perouza Parsamian27, Samantha Parsons27, Zachariah Wichman27, Christopher W Bazinet28, Diana E Johnson29, Abubakarr Bangura29, Jordan A Black29, Victoria Chevee29, Sarah A Einsteen29, Sarah K Hilton29, Max Kollmer29, Rahul Nadendla29, Joyce Stamm30, Antoinette E Fafara-Thompson30, Amber M Gygi30, Emmy E Ogawa30, Matt Van Camp30, Zuzana Kocsisova30, Judith L Leatherman31, Cassie M Modahl31, Michael R Rubin32, Susana S Apiz-Saab32, Suzette M Arias-Mejias32, Carlos F Carrion-Ortiz32, Patricia N Claudio-Vazquez32, Debbie M Espada-Green32, Marium Feliciano-Camacho32, Karina M Gonzalez-Bonilla32, Mariela Taboas-Arroyo32, Dorianmarie Vargas-Franco32, Raquel Montañez-Gonzalez32, Joseph Perez-Otero32, Myrielis Rivera-Burgos32, Francisco J Rivera-Rosario32, Heather L Eisler33, Jackie Alexander33, Samatha K Begley33, Deana Gabbard33, Robert J Allen2, Wint Yan Aung2, William D Barshop2, Amanda Boozalis2, Vanessa P Chu2, Jeremy S Davis2, Ryan N Duggal2, Robert Franklin2, Katherine Gavinski2, Heran Gebreyesus2, Henry Z Gong2, Rachel A Greenstein2, Averill D Guo2, Casey Hanson2, Kaitlin E Homa2, Simon C Hsu2, Yi Huang2, Lucy Huo2, Sarah Jacobs2, Sasha Jia2, Kyle L Jung2, Sarah Wai-Chee Kong2, Matthew R Kroll2, Brandon M Lee2, Paul F Lee2, Kevin M Levine2, Amy S Li2, Chengyu Liu2, Max Mian Liu2, Adam P Lousararian2, Peter B Lowery2, Allyson P Mallya2, Joseph E Marcus2, Patrick C Ng2, Hien P Nguyen2, Ruchik Patel2, Hashini Precht2, Suchita Rastogi2, Jonathan M Sarezky2, Adam Schefkind2, Michael B Schultz2, Delia Shen2, Tara Skorupa2, Nicholas C Spies2, Gabriel Stancu2, Hiu Man Vivian Tsang2, Alice L Turski2, Rohit Venkat2, Leah E Waldman2, Kaidi Wang2, Tracy Wang2, Jeffrey W Wei2, Dennis Y Wu2, David D Xiong2, Jack Yu2, Karen Zhou2, Gerard P McNeil34, Robert W Fernandez34, Patrick Gomez Menzies34, Tingting Gu2, Jeremy Buhler35, Elaine R Mardis36, Sarah C R Elgin2.
Abstract
The discordance between genome size and the complexity of eukaryotes can partly be attributed to differences in repeat density. The Muller F element (∼5.2 Mb) is the smallest chromosome in Drosophila melanogaster, but it is substantially larger (>18.7 Mb) in D. ananassae To identify the major contributors to the expansion of the F element and to assess their impact, we improved the genome sequence and annotated the genes in a 1.4-Mb region of the D. ananassae F element, and a 1.7-Mb region from the D element for comparison. We find that transposons (particularly LTR and LINE retrotransposons) are major contributors to this expansion (78.6%), while Wolbachia sequences integrated into the D. ananassae genome are minor contributors (0.02%). Both D. melanogaster and D. ananassae F-element genes exhibit distinct characteristics compared to D-element genes (e.g., larger coding spans, larger introns, more coding exons, and lower codon bias), but these differences are exaggerated in D. ananassae Compared to D. melanogaster, the codon bias observed in D. ananassae F-element genes can primarily be attributed to mutational biases instead of selection. The 5' ends of F-element genes in both species are enriched in dimethylation of lysine 4 on histone 3 (H3K4me2), while the coding spans are enriched in H3K9me2. Despite differences in repeat density and gene characteristics, D. ananassae F-element genes show a similar range of expression levels compared to genes in euchromatic domains. This study improves our understanding of how transposons can affect genome size and how genes can function within highly repetitive domains.Entities:
Keywords: Drosophila; Wolbachia; genome size; heterochromatin; retrotransposons
Mesh:
Substances:
Year: 2017 PMID: 28667019 PMCID: PMC5555453 DOI: 10.1534/g3.117.040907
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Sequence improvement statistics for the D. ananassae F- and D-element regions studied
| Region | Total (bp) | Gap (bp) | Unresolved (bp) | Low Quality (bp) |
|---|---|---|---|---|
| improved_13010 | 597,760 | 495 | 7048 | 16 |
| improved_13034_1 | 490,783 | 50 | 70,695 | 944 |
| improved_13034_2 | 395,316 | 25 | 7204 | 392 |
| Total | 1,483,859 | 570 | 84,947 | 1352 |
| improved_13337 | 1,708,805 | 0 | 30,512 | 709 |
Unresolved regions generally correspond to areas with large tandem or inverted repeats. Regions with Phred scores <30 (i.e., estimated error rate >1/1000 bases) are classified as low quality. Most low quality regions either overlap with unresolved regions or are located adjacent to gaps. (A) One region from the F-element scaffold improved_13010 and two regions from the F-element scaffold improved_13034 were improved, which results in a total of 1.4 Mb of high quality bases. (B) The region near the base of the D element (scaffold improved_13337) was improved, which results in 1.7 Mb high quality bases.
Figure 1Results of the manual sequence improvement of the D. ananassae D and F elements. (A) Dot plot comparisons of the scaffolds in the original CAF1 assembly (y-axis) vs. the scaffolds in the improved assembly (x-axis). Dots within each dot plot denote regions of similarity between the CAF1 assembly and the improved assembly. The diagonal lines in the dot plots for the D-element scaffold improved_13337 (left) and the F-element scaffold improved_13010 (middle) show that the overall CAF1 assemblies for these regions are consistent with the corresponding assemblies following manual sequence improvement. However, the high density of dots in the middle of the dot plot for improved_13010 corresponds to a collapsed repeat within the CAF1 assembly (red ←). Manual sequence improvement also identified a major misassembly in the second improved region (improved_13034_2) within the F-element scaffold_13034 (right), where part of the scaffold was inverted compared to the final assembly (red box). The misassembled region is part of the fosmid 1773K10 (bottom inset). (B) The Consed Assembly View for the improved fosmid project 1773K10 shows that the misassembled region contains multiple tandem and inverted repeats. (Top) The gray bar within the Assembly View corresponds to the improved fosmid assembly, and the pink Δ’s denote the ends of the fosmid. The purple and green boxes underneath the gray box correspond to tags (e.g., repeats and comments), and the dark green line corresponds to the read depth. The orange and black boxes above the gray bar correspond to tandem and inverted repeats, respectively. These orange and black boxes indicate that the improved assembly contains multiple tandem and inverted repeats that are located adjacent to each other. (Bottom) The improved assembly for this fosmid was supported by consistent forward-reverse mate pairs (blue Δ’s) and by multiple restriction digests (Figure S2A in File S7).
Figure 2Expansion of the D. ananassae F element can primarily be attributed to the high density of LTR and LINE retrotransposons. (A) Total repeat density estimates from de novo repeat finders (Red, WindowMasker, and Tallymer) show that the D. ananassae F element has higher repeat density than the D. melanogaster F element (56.4–74.5% vs. 14.4–29.5%). Both F elements show higher repeat density than the euchromatic reference regions from the base of the D elements in D. ananassae (11.6–19.8%) and in D. melanogaster (5.4–15.2%). The repeat densities of the improved D. ananassae F-element scaffolds [D. ana: F (improved)] are similar to the repeat densities for all D. ananassae F-element scaffolds [D. ana: F (all)]. (B) Results from the tantan analysis show that the five analysis regions from D. melanogaster and D. ananassae have similar simple repeat density (6.0–7.5%). (C) TRF analysis shows that D. ananassae has higher tandem repeats density than D. melanogaster on both the F (5.6 vs. 2.8%) and the D elements (2.6 vs. 1.1%). (D) RepeatMasker analysis using the Drosophila RepBase library shows that the F element has higher transposon density than the D element both in D. melanogaster (28.0 vs. 7.7%) and in D. ananassae (78.6 vs. 14.4%). There is a substantial increase in the density of LTR and LINE retrotransposons on the D. ananassae F element compared to the D. melanogaster F element (42.1 vs. 5.5% and 21.8 vs. 3.8%, respectively). The D. ananassae euchromatic reference region also shows higher transposon density than D. melanogaster, but most of the difference can be attributed to the density of DNA transposons (4.3 vs. 0.6%). The region of overlap between two repeat fragments is classified as “Overlapping” if the two repeats belong to different repeat classes.
The 10 most common transposons (by the cumulative size of the transposon fragments) on the D. melanogaster F element [i.e., the region between the most proximal () and the most distal () genes]
| Repeat | Total Size (bp) | Repeat Count | Repeat Class | Total Region (%) | Total Repeat (%) |
|---|---|---|---|---|---|
| 49,801 | 258 | RC/Helitron | 4.0 | 14.2 | |
| 46,551 | 241 | RC/Helitron | 3.7 | 13.3 | |
| 21,635 | 33 | DNA transposon | 1.7 | 6.2 | |
| 16,266 | 90 | RC/Helitron | 1.3 | 4.6 | |
| 12,993 | 40 | LINE | 1.0 | 3.7 | |
| 11,769 | 18 | DNA transposon | 0.9 | 3.4 | |
| 10,290 | 36 | DNA transposon | 0.8 | 2.9 | |
| 8942 | 26 | DNA transposon | 0.7 | 2.6 | |
| 8678 | 33 | DNA transposon | 0.7 | 2.5 | |
| 8207 | 28 | LTR | 0.7 | 2.3 |
Eight of the 10 most common transposons on the D. melanogaster F element are RC/Helitrons and DNA transposons. The repeat count for each transposon corresponds to the total number of transposon fragments reported by the RepeatMasker annotation file.
The 10 most common transposons (by the cumulative size of the transposon fragments) on all D. ananassae F-element scaffolds
| Repeat | Total Size (bp) | Repeat Count | Repeat Class | Total Region (%) | Total Repeat (%) |
|---|---|---|---|---|---|
| 1,326,170 | 2292 | LINE | 7.5 | 9.6 | |
| 882,145 | 2927 | RC/Helitron | 5.0 | 6.4 | |
| 599,914 | 754 | LTR | 3.4 | 4.3 | |
| 424,975 | 546 | LINE | 2.4 | 3.1 | |
| 396,161 | 868 | LTR | 2.2 | 2.9 | |
| 291,535 | 573 | LTR | 1.7 | 2.1 | |
| 264,131 | 551 | LTR | 1.5 | 1.9 | |
| 256,665 | 277 | LTR | 1.5 | 1.9 | |
| 243,223 | 362 | LTR | 1.4 | 1.8 | |
| 240,787 | 443 | LTR | 1.4 | 1.7 |
Nine of the 10 most common transposons on the D. ananassae F element are LINE and LTR retrotransposons. The repeat count for each transposon corresponds to the total number of transposon fragments reported by the RepeatMasker annotation file.
Figure 3The high density of “Wolbachia” sequences in the D. ananassae F element can be attributed to Drosophila transposons in the wAna assembly. (A) RepeatMasker analysis shows that 19.8% of the D. ananassae F element matches the genome assembly for wAna. By contrast, 0.02% of the D. ananassae F element matches the genome assemblies for wRi and wMel. Similarly, the D. ananassae D element and the D. melanogaster F and D elements show a substantially higher density of regions that exhibit sequence similarity to the wAna assembly (0.68–6.51%) than the wRi and wMel assemblies (0.00–0.18%). (B) Distribution of regions with matches to the wAna, wRi, and wMel assemblies in the manually improved region of the D. ananassae F-element scaffold improved_13010. The matches to the wAna assembly are distributed throughout the improved scaffold (blue boxes) but there are no matches to either the wRi or the wMel assemblies. (C) The portions of the 3.6-kb wAna scaffold AAGB01000087 (x-axis) with large numbers of alignments to D. ananassae scaffolds show similarity to Drosophila transposons. The portions of the wAna scaffolds that show similarity to D. ananassae scaffolds were extracted from the RepeatMasker output and collated into an alignment coverage track relative to the wAna assembly (brown graph). Whole-genome Chain and Net alignments show that only the last 216 bp of this wAna scaffold has sequence similarity to the wRi and wMel assemblies. RepeatMasker analysis using the Drosophila RepBase library shows that the first 3.4 kb of this wAna scaffold has sequence similarity to the internal and long terminal repeat portions of the BEL-18 LTR retrotransposon (BEL-18_DAn-I and BEL-18_DAn-LTR), as well the internal portion of the Gypsy-1 LTR retrotransposon from D. sechellia (Gypsy-1_DSe-I). Most of the alignments can be attributed to BEL-18_DAn-LTR, with a maximum of 1835 alignments between the wAna scaffold AAGB01000087 and the D. ananassae scaffolds.
Figure 4F-element genes show distinct gene characteristics compared to D-element genes. Each violin plot is comprised of a box plot and a kernel density plot. The ● in each violin plot denotes the median and the darker region demarcates the IQR, which spans from the first (Q1) to the third (Q3) quartiles. The whiskers extending from the darker region spans from Q1 = −1.5 × IQR to Q3 = +1.5 × IQR; data points beyond the whiskers are classified as outliers. (A) D. ananassae F-element genes have larger coding spans (start codon to stop codon, including introns) than D. melanogaster F-element genes. (B) The D. ananassae F-element genes have larger coding spans because they have larger total intron sizes than D. melanogaster F-element genes. (C) F-element genes have larger total coding exon (CDS) sizes than D-element genes in both D. ananassae and D. melanogaster. (D) F-element genes have more CDS than D-element genes. (E) F-element genes have smaller median CDS size than D-element genes. (F) The median intron size for D. ananassae and D. melanogaster F-element genes shows a bimodal distribution; this distribution pattern indicates that the expansion of the coding spans of D. ananassae F-element genes compared to D. melanogaster F-element genes can be attributed to the substantial expansion of a subset of introns.
Figure 5Codon bias in D. ananassae F-element genes can primarily be attributed to mutational biases instead of selection. (A) Violin plots of the Nc show that D. ananassae F-element genes exhibit stronger deviations from equal usage of synonymous codons (lower Nc) than D. melanogaster F-element genes. (B) Violin plots of the CAI show that D. ananassae F-element genes exhibit less optimal codon usage (lower CAI) than D. melanogaster F-element genes. F-element genes in both D. ananassae and D. melanogaster show less optimal codon usage (lower CAI) than D-element genes. (C) Scatterplot of Nc vs. CAI suggests that the codon bias in most D. ananassae and D. melanogaster F-element genes can be attributed to mutational bias instead of selection, as indicated by the placement of most of the genes in the portion of the LOESS regression line (red line) with a positive slope. By contrast, the codon bias in most D. ananassae and D. melanogaster D-element genes can be attributed to selection, as denoted by the LOESS regression line with a negative slope. The dotted line in each Nc vs. CAI scatterplot corresponds to the CAI value for a gene with equal codon usage relative to the species-specific reference gene sets constructed by the program scnRCA (0.200 for D. ananassae and 0.213 for D. melanogaster). Hence this species-specific threshold corresponds to the CAI value when the strengths of mutational bias and selection on codon bias are the same. A smaller percentage of F-element genes in D. ananassae (6/64; 9.4%) have CAI values above this species-specific CAI threshold compared to D. melanogaster (18/79; 22.8%).
Codon GC content for D. melanogaster and D. ananassae F- and D-element genes
| Metric | ||||
|---|---|---|---|---|
| Coding GC | 41.9 | 38.9 | 55.3 | 54.7 |
| First letter GC | 48.8 | 46.9 | 56.8 | 55.9 |
| Second letter GC | 40.2 | 39.0 | 42.7 | 42.9 |
| Third letter GC | 36.5 | 30.7 | 66.6 | 65.1 |
| 4D sites GC | 33.2 | 28.2 | 62.5 | 60.5 |
F-element genes show lower GC content than D-element genes in both D. melanogaster and D. ananassae. D. ananassae F-element genes show the lowest overall GC content among the genes in the four analysis regions, particularly at the third position of the codon.
The “4D sites GC” corresponds to the GC content at fourfold degenerate sites (i.e., the GC content at the third position of the codons that code for alanine, glycine, proline, threonine, and valine).
Figure 6Metagene analysis shows that the coding spans of F-element genes have lower median 9-bp Tm than the genes at the base of the D element. The Tm profiles were determined using a 9-bp sliding window with a step size of 1 bp. The metagene consists of the 2-kb region upstream and downstream of the coding span, with the length of the coding spans normalized to 3 kb. While the codons in D. ananassae F-element genes have lower GC content, D. ananassae F-element genes exhibit a Tm profile that is similar to the D. melanogaster F-element genes. The green “M” below the coding span denotes the Methionine at the translation start site, and the red star denotes the stop codon.
Figure 7Histone modification profiles for D. ananassae and D. melanogaster F-element genes at the third instar larval stage of development. (A) Metagene analysis shows that the region surrounding the 5′ end of F-element genes is enriched in H3K4me2 while the body of the coding span is enriched in H3K9me2. The values in the y-axis within each metagene plot correspond to the log-likelihood ratio between each ChIP sample and input control (assuming a dynamic Poisson model) as determined by MACS2. (B) Differences in the H3K27me3 enrichment patterns for the D. melanogaster gene and its ortholog in D. ananassae. The entire coding span of the ey gene is enriched in H3K27me3 in D. melanogaster (top) (for the D. melanogaster gene models, the thick boxes denote the coding exons and the thin boxes denote the untranslated regions). By contrast, only the region surrounding the 5′ end of the ey ortholog in D. ananassae shows H3K27me3 enrichment. The 5′ ends of the A and D isoforms of ey shows enrichment of H3K4me2 and H3K27me3 in both D. melanogaster and D. ananassae. These bivalent domains suggest that these two isoforms of ey are poised for activation at the third instar larval stage of development in both species.
Figure 8D. ananassae F-element genes show similar expression patterns compared to genes on other Muller elements. RNA-Seq reads from seven samples (adult females, adult males, female carcass, male carcass, female ovaries, male testes, and embryos) were mapped against the improved D. ananassae genome assembly and the read counts for the Gnomon gene predictions were tabulated by htseq-count. The read counts for the seven samples were normalized by library size and then transformed using Tikhonov/ridge regularization in the DESeq2 package to stabilize the variances among the samples. The violin plots compare the distributions of the regularized log2 expression values for the D. ananassae Gnomon gene predictions on all scaffolds (All), on the F-element scaffolds [F (all)], and on the base of the D element [D (base)] for these different developmental stages and tissues.