| Literature DB >> 25398896 |
Gilberto dos Santos1, Andrew J Schroeder2, Joshua L Goodman3, Victor B Strelets3, Madeline A Crosby2, Jim Thurmond3, David B Emmert2, William M Gelbart2.
Abstract
Release 6, the latest reference genome assembly of the fruit fly Drosophila melanogaster, was released by the Berkeley Drosophila Genome Project in 2014; it replaces their previous Release 5 genome assembly, which had been the reference genome assembly for over 7 years. With the enormous amount of information now attached to the D. melanogaster genome in public repositories and individual laboratories, the replacement of the previous assembly by the new one is a major event requiring careful migration of annotations and genome-anchored data to the new, improved assembly. In this report, we describe the attributes of the new Release 6 reference genome assembly, the migration of FlyBase genome annotations to this new assembly, how genome features on this new assembly can be viewed in FlyBase (http://flybase.org) and how users can convert coordinates for their own data to the corresponding Release 6 coordinates.Entities:
Mesh:
Year: 2014 PMID: 25398896 PMCID: PMC4383921 DOI: 10.1093/nar/gku1099
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Overview of the BDGP D. melanogaster Release 6 genome assembly
| Current release | Dmel_Release_6 |
| Data provider | BDGP |
| Collaborators | DHGP, BCM-HGSC, Celera Genomics |
| Sequenced strain | iso-1 |
| Date released | 21-JUL-2014 (FlyBase, Dmel annotation version R6.01) |
| 25-JUL-2014 (GenBank, RefSeq) | |
| NCBI accessions | |
| Assembly: GCA_000001215.4 | |
| RefSeq: GCF_000001215.4 | |
| BioProject: PRJNA13812 | |
| Assembly statistics | • Total sequence length = 143 726 002 bp. |
| • Total gap length = 1 152 978 bp. | |
| • Total number of scaffolds = 1870. | |
| • Seven chromosome arms (plus mitochondrial genomea): X, 2L, 2R, 3L, 3R, 4 and Y. | |
| • The vast majority of sequence, 137.6 Mb, resides on the seven chromosome arms. | |
| • 1862 ‘unlocalized’ minor scaffolds, of which 884 have been mapped cytologically or genetically to a chromosome region: X, 2CEN, 3CEN, Y, XY and rRNA. | |
| Major changes relative to Release 5 | • Release 6 is 4.2 Mb larger. |
| • Total gap length decreased by 1.5 Mb. | |
| • The majority of new sequence added to the chromosome arm scaffolds is in the heterochromatic regions, 10.0 Mb of which derives from the BDGP Release 5 scaffolds XHet, 2LHet, 2RHet, 3LHet, 3RHet and U. | |
| • The chromosome Y scaffold is vastly improved and 10 times larger at 3.1 Mb. | |
| • Most remaining gaps are in the heterochromatic regions of the assembly. | |
| • 1862 minor scaffolds replace Release 5 concatenated pseudoscaffolds (e.g. U). | |
| • 48 minor scaffolds have been modified and improved from Release 5; their names indicate their mapping (2Cen_mapped_Scaffold_10_D1684). The remaining 1814 ‘unmodified’ minor scaffolds have numeric identifiers like 2110000… | |
| • All fragmented gene annotations from Release 5 have been resolved, largely as a result of improvements to the Y and 3R scaffolds. |
aThe reference genome assembly update in Dmel R6.01 (FB2014_04) was for the nuclear genome only, maintaining the old mitochondrial genome assembly, a composite of sequences from various D. melanogaster strains (GenBank U37541.1, RefSeq NC_001709.1). With FlyBase update FB2015_01, the mitochondrial reference genome assembly was also updated, replacing the previous assembly with one derived exclusively from the iso-1 reference strain (GenBank KJ947872.2; RefSeq NC_024511.2).
Detailed information on the BDGP D. melanogaster Release 6 genome assembly
| Scaffold | Length (bp) | Sized gaps | Total gap size (bp) | Unsized gaps | Accessions (GenBank, RefSeq) | Comments |
|---|---|---|---|---|---|---|
| X | 23 542 271 | 4 | 65 520 | 6 | AE014298.5 | • Net gain of 0.9 Mb compared to R5 X plus R5 XHet: most at scaffold end. |
| NC_004354.4 | • Central 15.4 Mb unchanged: | |||||
| R5:X:4,684,794..20,073,489 | ||||||
| • About 104 kb of new sequence added at scaffold start (14 kb from R5 U). | ||||||
| • 1 Mb of new sequence added near scaffold end, including 209 kb and 204 kb from R5 scaffolds U and XHet, respectively. | ||||||
| 2L | 23 513 712 | 0 | 0 | 2 | AE014134.6 | • Net gain of 133 kb compared to R5 2L plus R5 2LHet: all at scaffold end. |
| NT_033779.5 | • Initial 21.5 Mb unchanged: | |||||
| R5:2L:1..21,485,538 | ||||||
| • New sequence at the end includes 293 kb and 40 kb from R5 scaffolds 2LHet and U, respectively. | ||||||
| 2R | 25 286 936 | 1 | 6600 | 7 | AE013599.5 | • Net gain of 0.9 Mb compared to R5 2R plus R5 2RHet: most at scaffold start. |
| NT_033778.4 | • Central 16.7 Mb unchanged: | |||||
| R5:2R:3,036..16,668,212 | ||||||
| • New sequence at the start includes 2.3 Mb and 987 kb from R5 scaffolds 2RHet and U, respectively. | ||||||
| 3L | 28 110 227 | 4 | 117 660 | 5 | AE014296.5 | • Net gain of 1.0 Mb compared to R5 3L plus R5 3L Het: all at scaffold end. |
| NT_037436.4 | • Initial 24.5 Mb unchanged, except an unsized gap is now sized at 7kb: | |||||
| R5:3L:1..24,523,740 | ||||||
| • New sequence at the end includes 2.3 Mb, 328 kb and 150 kb from R5 scaffolds 3LHet, 2RHet and U, respectively. | ||||||
| 3R | 32 079 331 | 9 | 22 772 | 18 | AE014297.3 | • Net gain of 1.7 Mb compared to R5 3R plus R5 3RHet: all at scaffold start. |
| NT_033777.3 | • Last 27.9 Mb of unchanged: | |||||
| R5:3R:1..27,905,053 | ||||||
| • New sequence at the end includes 2.2 Mb and 1.0 Mb from R5 scaffolds 3RHet and U, respectively. | ||||||
| 4 | 1 348 131 | 1 | 17 000 | 0 | AE014135.4 | • Net loss of 3.7 kb compared to R5 scaffold 4. |
| NC_004353.4 | • Replacement of R5 scaffold start: | |||||
| 24.1 kb of sequence removed (some moved to R6 3R, X and Y) and replaced with 3.4 kb from the R5 scaffold U. | ||||||
| • Change in start of R6 scaffold 4 completes the JYalpha gene annotation. | ||||||
| • The remaining sequence is unchanged, but the unsized gap in R5 is now sized at 17kb. | ||||||
| Y | 3 667 352 | 61 | 242 633 | 150 | CP007106.1 | • Net gain of 3.3 Mb compared to R5 YHet: over a 10-fold increase. |
| NC_024512.1 | • 232.3 kb carried over from R5 scaffold YHet. | |||||
| • New sequence includes 702.5 kb and 84.2 kb from R5 scaffolds U and 3LHet, respectively. | ||||||
| M | 19 524 | 0 | 0 | 0 | KJ947872.2 | • Derived from iso-1 reference strain. |
| NC_024511.2 | ||||||
| Comments on scaffold group | ||||||
| X (446) | 1 005 345 | 26 | 72 915 | 0 | • Chromosome X mapped. | |
| 2CEN (28) | 222 873 | 20 | 60 073 | 1 | • Chromosome 2 centromere-proximal region. | |
| 3CEN (144) | 729 966 | 26 | 41 429 | 10 | • Chromosome 3 centromere-proximal region. | |
| Y (199) | 860 223 | 39 | 63 081 | 24 | • Chromosome Y mapped. | |
| XY (66) | 209 541 | 4 | 806 | 1 | • Scaffolds map to both X and Y chromosomes. | |
| rDNA (1) | 76 973 | 2 | 16 500 | 0 | • Ribosomal DNA (RefSeq, NW_007931121.1). | |
| unmapped (978) | 3 053 597 | 139 | 402 989 | 11 | • Unmapped. Now represented by separate scaffolds instead of a concatenated pseudoscaffold. | |
| • Net loss of 7.0 Mb, compared to R5 U, due to movement of sequences to chromosome arms or to mapped minor scaffold groups. | ||||||
NCBI accessions for various BDGP D. melanogaster genome assembly releases
| BDGP release | NCBI accessions (assembly) (RefSeq) | NCBI release date | 1st release at FlyBase | Retired at FlyBase | Comments |
|---|---|---|---|---|---|
| 5 | GCA_000001215.2, | 2007/10/22 | Dmel R5.1 | Dmel R5.57 | |
| GCF_000001215.2 | (FB2006_01) | (FB2014_03) | |||
| 6 plus MT | GCA_000001215.3, GCF_000001215.3 | 2014/07/25 | Dmel R6.01 (FB2014_04) | Dmel R6.03 (FB2014_06) | • Only the nuclear genome assembly was updated. |
| 6 plus ISO1 MT | GCA_000001215.4, GCF_000001215.4 | 2014/08/01 | Dmel R6.04 (FB2015_01) | •The mitochondrial genome assembly was updated. |
Figure 1.Coordinates Converter. This tool can be accessed by the ‘Tools’ menu in the blue navigation bar found at the top of FlyBase pages, under both the ‘Retrieve/Convert’ sub-menu (shown) and the ‘Genomic/Map Tools’ sub-menu. A variety of input formats are accepted, and input lists may be uploaded from a text file or entered directly into the input box; output may be to a browser view or a downloadable file (menu at upper right). By default, the input assembly is set to Release 5 and the output to Release 6, but other forward conversions can be specified. For conversion from Release 6 back to Release 5, an analogous Coordinates Converter tool is provided (link at lower left).
Figure 2.GBrowse 2. Like the original GBrowse, GBrowse 2 allows users to navigate to a region of the genome using coordinates or a landmark, and zoom or scroll along the genome to browse annotated features and aligned evidence; the data tracks shown are user-selected; genomic sequence for the region in view can be downloaded by selecting ‘Download Sequence File’ in the download option menu at the top right. However, GBrowse 2 can handle more data and has convenient new features. Tracks can be moved simply by dragging the track title bar vertically, and hidden or removed by clicking on the appropriate boxes within the track title bar. In GBrowse 2, one can also download sequence for a smaller region within view by lassoing the region (as shown for a 1.4-kb region within the 6-kb view) and selecting ‘Dump selection as FASTA’ from the resulting pop-up menu (far right).