| Literature DB >> 21554714 |
Raja Ragupathy1, Rajkumar Rathinavelu, Sylvie Cloutier.
Abstract
BACKGROUND: Flax (Linum usitatissimum L.) is an important source of oil rich in omega-3 fatty acids, which have proven health benefits and utility as an industrial raw material. Flax seeds also contain lignans which are associated with reducing the risk of certain types of cancer. Its bast fibres have broad industrial applications. However, genomic tools needed for molecular breeding were non existent. Hence a project, Total Utilization Flax GENomics (TUFGEN) was initiated. We report here the first genome-wide physical map of flax and the generation and analysis of BAC-end sequences (BES) from 43,776 clones, providing initial insights into the genome.Entities:
Mesh:
Year: 2011 PMID: 21554714 PMCID: PMC3113786 DOI: 10.1186/1471-2164-12-217
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary of flax (Linum usitatissimum L.) cv CDC Bethune physical map
| Description | Total |
|---|---|
| Number of BAC clones fingerprinted | 43,776 |
| Number of high quality fingerprints used for assembly | 32,025 |
| Average number of valid bands per clone | 64 |
| Number of contigs | 416 |
| Number of singletons | 2,035 |
| Total length of the contigs | 368,192,846 bp |
| N50 contig length | 1,494 kb |
| Longest contig | 5,562 kb |
| Average number of clones per contig | 71 |
| Number of genetic markers used for anchoring contigs | 129 |
Figure 1Physical map of the flax genome ( A. Distribution of the number of clones per contig; B. Distribution of the contigs length.
Descriptive statistics about BES from flax cv CDC Bethune
| Description | Total |
|---|---|
| Number of BES reads greater than 80 bp in length | 81,582 |
| Number of reads with similarity to | 1,245 |
| Number of BAC-end sequences without chloroplast DNA | 80,337 |
| Total length of chloroplast-free BES | 54,600,041 bp |
| Read average length | 679 bp |
| GC content | 43.35 |
Composition of known Viridiplantae repeats in BES using RepeatMasker
| Repeat component | Class | Order | Superfamily | Total no. of | Total length (bp) | Total length as % in |
|---|---|---|---|---|---|---|
| Mobile genetic elements | I. Retroelements | 10,576 | 3,162,436 | 5.8 | ||
| SINE | - | 2 | 89 | 0.0 | ||
| LINE | - | 1,176 | 234,602 | 0.4 | ||
| LTR | 9,245 | 2,900,613 | 5.3 | |||
| 4,867 | 1,850,625 | 3.4 | ||||
| 3,372 | 985,038 | 1.8 | ||||
| Unclassified | 1006 | 64950 | 0.1 | |||
| PLE | Penelope | 153 | 27,132 | 0.0 | ||
| II. DNA transposons | 1,094 | 201,075 | 0.4 | |||
| - | hobo-Activator | 371 | 87,631 | 0.2 | ||
| TIR | Tc1-IS630-Pogo | 11 | 2,036 | 0.0 | ||
| - | En-Spm | 249 | 47,547 | 0.1 | ||
| TIR | MuDR-IS905 | 250 | 31,936 | 0.1 | ||
| TIR | Tourist/Harbinger | 49 | 11,187 | 0.0 | ||
| - | Other (Mirage, P-element) | 1 | 49 | 0.0 | ||
| Unclassified | - | 163 | 20,689 | 0.0 | ||
| rDNA | 13,342 | 7,516,095 | 13.8 | |||
| Satellites | 22 | 1,972 | 0.0 | |||
| Simple sequence repeats (SSRs) | 2,556 | 95,533 | 0.2 | |||
| Low complexity regions (Homopolymers) | 8701 | 340,090 | 0.6 | |||
| Overall length of sequences masked | 36291 | 11317201 | 20.7 | |||
Families of known mobile genetic elements identified in flax BES
| Type | Super Family | No. of Families | Families |
|---|---|---|---|
| Retrotransposon | 19 | Alfare2, Angela1, Barbara, BARE-2, BNR1, CPSC4A, Maximus, Opie2, Prem3, Shacop11, SPRT1, Stonor, TLC1, TNT1, TONT2, Topscotch, TORTL1, TOS17, TOTO1 | |
| 27 | Atlantys, Bagy1, Bnintmo, Calypshan2, Carep, Cereba, Cinful1, CRM-I, Daniela, Dea1, Del, Diaspora, Erika1, Fatima, Ogre, Grande1, Gret1, Gycume1, Gypot1, Gypshan2, Gypsode1, Megy, Ophelia1, Ram12, Sore1, Tekay, Truncator | ||
| LINEs | 5 | BALN1, BVL1, CIN4, FMLN1, Shaline10 | |
| SINEs | 4 | BoSB10A, Casine, Ormosia, Sadhu4-2 | |
| DNA Transposon | 7 | THRIA, TLP3, TNAT1A, TNR1, Tourist, TPN1, TWIF | |
| Total | 62 |
Known flax transposable elements identified in flax BES
| Name of the element | GenBank ID | Length (bp) | Number of hits |
|---|---|---|---|
| Retrotransposons | |||
| FL1a* | 1329 | 5 | |
| FL1b* | 1327 | 11 | |
| FL2* | 318 | None | |
| FL4* | 693 | 365 | |
| FL5* | 979 | 36 | |
| FL6* | 800 | 86 | |
| FL7* | 598 | 74 | |
| FL8* | 672 | 6 | |
| FL9* | 468 | None | |
| FL10* | 1052 | 4 | |
| FL11* | 1300 | None | |
| FL12* | 854 | 67 | |
| Cassandra | 632 | 14 | |
| DNA transposons | |||
| dLUTE | 314 | None |
*partial element
Summary of homology searches of contigs and singletons representing highly repetitive sequences of flax
| Hits of >80 bp in length | ||||
|---|---|---|---|---|
| Database | Number of hits | Actual high scoring | Number of hits | Number of reads not |
| Repbase- | 1 | 314 | 135 | 1193 |
| TIGR repeats | 0 | - | - | 1329 |
| TREP repeats | 0 | - | 5 | 1324 |
| Flax-EST | 498 | 149,059 | 222 | 609 |
| NCBI-EST | 231 | 60,130 | 185 | 913 |
| NCBI-nt | 385 | 115,237 | 73 | 871 |
| NCBI-nr | 261 | - | 110 | 958 |
Types and distribution of SSRs in flax BAC-End sequences
| Number of repeats | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Motif | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15+ | Total |
| Dinucleotide | |||||||||||||
| AC/GT | - | - | 35 | 13 | 13 | 1 | - | 1 | - | - | - | 1 | 64 |
| CA/TG | - | - | 39 | 25 | 12 | 2 | 3 | - | - | - | - | - | 81 |
| GA/TC | - | - | 92 | 47 | 35 | 25 | 13 | 16 | 7 | 4 | 2 | 15 | 256 |
| AG/CT | - | - | 102 | 68 | 51 | 39 | 23 | 16 | 13 | 7 | 6 | 28 | 353 |
| TA | - | - | 99 | 71 | 53 | 48 | 32 | 19 | 15 | 13 | 5 | 33 | 388 |
| AT | - | - | 112 | 78 | 41 | 34 | 35 | 27 | 21 | 15 | 10 | 56 | 429 |
| Trinucleotide | |||||||||||||
| ACG/CGT | - | 7 | 1 | - | - | - | - | - | - | - | - | - | 8 |
| CGA/TCG | - | 5 | 6 | 2 | - | - | - | - | - | - | - | - | 13 |
| CGC/GCG | - | 12 | 2 | - | 1 | - | - | - | - | - | - | - | 15 |
| GAC/GTC | - | 9 | 4 | 3 | - | - | - | - | - | - | - | - | 16 |
| GTA/TAC | - | 6 | 5 | 3 | 2 | 1 | - | - | - | - | - | - | 17 |
| GCC/GGC | - | 15 | 2 | 2 | - | - | - | - | - | - | - | - | 19 |
| CTA/TAG | - | 9 | 9 | 2 | - | - | - | - | - | - | - | - | 20 |
| CAC/GTG | - | 19 | 2 | - | - | - | - | - | - | - | - | - | 21 |
| ACT/AGT | - | 7 | 12 | 6 | 1 | - | 1 | - | - | - | - | - | 27 |
| CCG/CGG | - | 23 | 6 | 3 | 1 | - | - | - | - | - | - | - | 33 |
| ACA/TGT | - | 19 | 8 | 1 | 6 | 1 | - | - | 2 | - | - | - | 37 |
| CCA/TGG | - | 26 | 11 | 3 | - | 2 | 1 | - | - | - | - | - | 43 |
| AAC/GTT | - | 35 | 6 | 3 | 1 | - | 1 | - | - | - | - | - | 46 |
| ACC/GGT | - | 26 | 18 | 5 | - | - | - | - | - | - | - | - | 49 |
| AGG/CCT | - | 35 | 12 | 7 | - | - | - | - | - | - | 1 | - | 55 |
| GCA/TGC | - | 45 | 6 | 5 | 4 | - | - | - | - | 1 | - | - | 61 |
| CTC/GAG | - | 41 | 6 | 10 | 4 | 1 | - | - | - | - | - | - | 62 |
| CAA/TTG | - | 31 | 20 | 3 | 4 | - | 3 | 1 | 1 | - | - | - | 63 |
| CAG/CTG | - | 36 | 13 | 8 | 6 | 1 | 2 | 1 | - | 2 | - | - | 69 |
| AGC/GCT | - | 50 | 17 | 2 | 2 | - | 1 | - | - | - | - | - | 72 |
| ATG/CAT | - | 48 | 22 | 8 | 2 | 3 | 1 | - | - | - | - | - | 84 |
| TAA/TTA | - | 32 | 25 | 12 | 16 | 2 | 2 | - | 2 | 2 | - | 1 | 94 |
| GGA/TCC | - | 56 | 27 | 9 | 5 | - | 2 | - | - | - | - | - | 99 |
| TCA/TGA | - | 50 | 32 | 14 | 5 | - | 1 | 1 | - | - | - | - | 103 |
| ATA/TAT | - | 51 | 14 | 9 | 10 | 6 | 6 | 5 | 4 | 1 | - | 2 | 108 |
| AAT/ATT | - | 56 | 22 | 19 | 11 | 5 | 6 | - | 4 | 2 | 1 | - | 126 |
| ATC/GAT | - | 67 | 37 | 13 | 5 | 5 | - | - | - | - | - | - | 127 |
| AAG/CTT | - | 80 | 46 | 27 | 11 | 4 | 5 | 6 | 2 | - | - | 4 | 185 |
| AGA/TCT | - | 96 | 51 | 20 | 16 | 8 | 4 | 5 | 2 | 1 | - | 3 | 206 |
| GAA/TTC | - | 162 | 63 | 31 | 21 | 11 | 5 | 8 | 2 | 1 | - | 2 | 306 |
| Tetranucleotide | - | 118 | 36 | 12 | 5 | 3 | 7 | - | - | - | - | - | 181 |
| Other higher order motifs | 71 | 45 | 4 | 7 | - | 1 | - | - | - | - | - | - | 128 |
| 71 | 1317 | 1024 | 551 | 344 | 203 | 154 | 106 | 75 | 49 | 25 | 145 | 4064 | |
Figure 2The SSR motif distribution in sequenced plant genomes in comparison with the BES-based estimates in flax. For comparative analysis, SSRs were also mined from whole genome assemblies of castor bean, poplar, grapevine, soybean, cucumber, Arabidopsis, papaya, rice, sorghum, Brachypodium and maize publicly available at http://www.phytozome.net (v6) and apple genome sequence available at ww.rosaceae.org. As per the data citation policy of phytozome, individual references are listed in Additional file 2: Table S2.
Summary of BLAST analyses of BAC-End sequences of flax (Linum usitatissimum cv CDC Bethune)
| S. No | Database | No of BAC-End reads harbouring | No of hits as proportion of total | Total HSP score | Proportion as % of total length | |
|---|---|---|---|---|---|---|
| cutoff e-5 | cutoff e-25 | |||||
| 1 | Flax-EST | 21,532 | - | 26.8 | 5,303,617 | 9.7 |
| 2 | NCBI-EST | 17,038 | - | 21.2 | 3,349,832 | 6.1 |
| 3 | NCBI-Protein-nr | 24,962 | 14,288 | 31.1 (e-5) | - | - |
@Total number of BAC-end reads: 80,337
* Total length of BAC-End sequences: 54,600,041 bp
Figure 3Estimates of the composition of the flax genome (.
Figure 4Distribution of GO-slim annotations of gene products predicted from BAC-End sequences: A. Molecular functions; B. Biological processes and C. Cellular locations.
Figure 5Transposable element (TE) composition in sequenced plant genomes in comparison with the BES-based estimates in flax. The data regarding the TE composition of other plant genomes were taken from [74; papaya] [45; castor bean] [80; apple and other genomes]. Please refer to Additional file 9: Table S9 for more details.