Literature DB >> 32386319

The First Draft Genome of the Plasterer Bee Colletes gigas (Hymenoptera: Colletidae: Colletes).

Qing-Song Zhou¹, Arong Luo¹, Feng Zhang², Ze-Qing Niu¹, Qing-Tao Wu¹, Mei Xiong^1,3, Michael C Orr¹, Chao-Dong Zhu^1,3.

Abstract

Despite intense interest in bees, no genomes are available for the bee family Colletidae. Colletes gigas, one of the largest species of the genus Colletes in the world, is an ideal candidate to fill this gap. Endemic to China, C. gigas has been the focus of studies on its nesting biology and pollination of the economically important oil tree Camellia oleifera, which is chemically defended. To enable deeper study of its biology, we sequenced the whole genome of C. gigas using single-molecule real-time sequencing on the Pacific Bioscience Sequel platform. In total, 40.58 G (150×) of long reads were generated and the final assembly of 326 scaffolds was 273.06 Mb with a N50 length of 8.11 Mb, which captured 94.4% complete Benchmarking Universal Single-Copy Orthologs. We predicted 11,016 protein-coding genes, of which 98.50% and 84.75% were supported by protein- and transcriptome-based evidence, respectively. In addition, we identified 26.27% of repeats and 870 noncoding RNAs. The bee phylogeny with this newly sequenced colletid genome is consistent with available results, supporting Colletidae as sister to Halictidae when Stenotritidae is not included. Gene family evolution analyses identified 9,069 gene families, of which 70 experienced significant expansions (33 families) or contractions (37 families), and it appears that olfactory receptors and carboxylesterase may be involved in specializing on and detoxifying Ca. oleifera pollen. Our high-quality draft genome for C. gigas lays the foundation for insights on the biology and behavior of this species, including its evolutionary history, nesting biology, and interactions with the plant Ca. oleifera.

Entities: CellLine Chemical Disease Species

Keywords: Apoidea; PacBio sequencing; gene family evolution; genome annotation; genome assembly

Mesh：

Year: 2020 PMID： 32386319 PMCID： PMC7313665 DOI： 10.1093/gbe/evaa090

Source DB: PubMed Journal: Genome Biol Evol ISSN： 1759-6653 Impact factor: 3.416

Introduction

Bees are arguably the most important group of angiosperm-pollinating insects (Klein et al. 2007; Danforth et al. 2013), pollinating nearly 90% of all flowering plants that require pollination (Ollerton et al. 2011). With more than 20,000 described species (Ascher and Pickering 2018), wild bees substantially contribute to crop yields (Garibaldi et al. 2011, 2013; Leonhardt et al. 2013), making them both ecologically and economically invaluable. Among them, the family Colletidae is a diverse group of >2,700 species, ranging from the small, wasp-like Hylaeus that carry pollen internally to the more robust, hairy Colletes that share their family name (Michener 2007; Ascher and Pickering 2018). This family was traditionally believed to be the most “primitive” taxon within the superfamily Apoidea (according to mouthpart structure, the similarity of their bilobed glossa to closely related apoid wasps), but molecular studies place Melittidae sister to all other bees (Danforth et al. 2006, 2013; Hedtke et al. 2013; Branstetter et al. 2017; Peters et al. 2017; Sann et al. 2018). Instead, it has been suggested that the bilobed colletid glossa actually evolved for adding their characteristic cellophane-like cell lining to nests (Michener 2007; Almeida 2008). This cell lining, unique to Colletidae, has drawn a great deal of prior study, yet the molecular underpinnings of this behavior remain unknown. Colletes specifically nest in the ground and they are the second-largest genus in the family. Though they have been well studied for their systematics and taxonomy (Michener 2007; Kuhlmann and Proshchalykin 2011, 2013; Niu, Kuhlmann, et al. 2013; Niu, Zhu, et al. 2013; Niu et al. 2014a, 2014b), few studies have examined their molecular phylogenetics and evolution (Kuhlmann et al. 2009; Almeida et al. 2012; Ferrari et al. 2020). Despite general interest in bees, no whole genome has been reported from the family Colletidae until now (Branstetter et al. 2018). Here, we present the whole-genome sequence of Colletes gigas, de novo assembled using single-molecule real-time Pacific Bioscience (PacBio) long reads. We annotated essential genomic elements, repeats, protein-coding genes, and noncoding RNAs (ncRNAs), and further compared gene family evolution across major bee lineages. Further, we carried out phylogenomic analyses of bee families using single-copy (Benchmarking Universal Single-Copy Ortholog [BUSCO]) markers for the first time. We also discuss our findings in relation to C. gigas specializing on Camellia oleifera (Huang et al. 2015), an economically important tea tree with toxic pollen, documented to deplete honey bee colonies when foraged on (Su et al. 2011). Therefore, this and future studies will prove vital for understanding the evolution of floral specialization, especially for chemically defended resources.

Materials and Methods

Sample Collection, Sequencing, and Quality Control

We collected specimens of C. gigas in Ca. oleifera plantations in East-Central China (Tangchi Village, Shucheng County, Lu’an City, Anhui Province, China). A total of 17 C. gigas specimens were collected, including 2 males and 15 females. The species was identified by author Ze-Qing Niu using traditional morphological approaches based on body size, head breadth, facial fovea, clypeus, mesonotum, and wing venations (Wu 1965; Niu, Kuhlmann, et al. 2013), as well as the biology and phenology (Huang et al. 2015). Species identification was also verified by DNA barcoding analyses using COI gene of genus Colletes available in National Center for Biotechnology Information (NCBI; supplementary file S1, Supplementary Material online). Upon collection, specimens were brought back to the laboratory alive, flash-frozen in liquid nitrogen, and stored in −80 °C for long-term preservation. As Hymenoptera use a haplodiploid sex-determination system, we used a single male adult C. gigas (NCBI taxonomy ID: 935657) (Voucher Code: AHSC1104, supplementary fig. S1, Supplementary Material online) with its gut contents removed for PacBio sequencing, whereas two female specimens with their gut contents removed were used for Illumina whole-genome (Voucher Code: AHSC1105) and Illumina transcriptome (Voucher Code: AHSC1107) sequencing. The remaining specimens (Female: AHSC1101-03 and AHSC1108-17; Male: AHSC1106) were deposited at the Institute of Zoology, Chinese Academy of Sciences. Genomic DNA/RNA extraction, library preparation, and sequencing were conducted by the company Nextomics (Wuhan, China). For long-read sequencing, a library was constructed with an insert size of 10 kb and sequenced using P6-C4 chemistry on the PacBio Sequel platform. For short-read sequencing, paired-end libraries were constructed with an insert size of 400 bp and sequenced (2× 150 bp) on the Illumina HiSeq X Ten platform. Raw Illumina short reads were compressed into clumps, and duplicates were removed with clumpify.sh (BBTools suite v37.93, Bushnell). Quality control was performed with bbduk.sh (BBTools): Both sides were trimmed to Q20 based on Phred scores, reads shorter than 15 bp or with more than 5 Ns were discarded, poly-A or poly-T tails of at least 10 bp were trimmed, and overlapping paired reads were corrected.

Genome Size Estimation

We employed the strategy of short-read k-mer distributions to estimate the genome size. The histogram of k-mer frequencies was first computed with 17-mers and 21-mers using khist.sh (BBTools). Genome size was then estimated with a maximum k-mer coverage of 1,000 using GenomeScope v1.0.0 (Vurture et al. 2017).

Genome, Mitochondrion, and Transcriptome Assembly

We performed de novo genome assembly with long reads using Flye (v2.4.2) (Kolmogorov et al. 2019) and Falcon (pb-assembly v0.0.4) (Chin et al. 2016) (length_cutoff_pr = 7,000, max-diff = 100, max-cov = 100, and min-cov = 2). Both assemblies were first polished by Flye (–polish_target) on raw PacBio sequences. To improve genome contiguity, the two assemblies generated from Flye and Falcon pipelines were merged into one assembly after two rounds of quickmerge v0.3 (Chakraborty et al. 2016) with USAGE 2 (https://github.com/mahulchak/quickmerge/wiki, last accessed November 12, 2016), which was further polished with Illumina short reads using two rounds of Pilon v1.22 (Walker et al. 2014). Subsequently, we filtered possible contaminants by HS-BlastN (Chen et al. 2015) employing BLAST+ v2.7.1 (Camacho et al. 2009) against the NCBI nucleotide database and checked vector contamination using VecScreen against the UniVec database. We assembled the mitochondrial genome of C. gigas based on Illumina short reads using Mitobim v1.9.1 (Hahn et al. 2013) and with reference to the published mitochondrial genome of C. gigas (KM978210, Huang et al. 2016), which was then annotated with MITOS webserver (Bernt et al. 2013). We performed transcriptome assembly under a genome-guided method via HISAT2 v2.1.0 (Kim et al. 2015), mapping RNA sequencing (RNA-seq) reads to our assembled genome, and then assembled with StringTie v1.3.4 (Pertea et al. 2015). Redundant isoforms were removed with Redundans v0.13c (Pryszcz and Gabaldón 2016) under default parameters. We finally evaluated the completeness of all assemblies using the BUSCO (Waterhouse et al. 2018) analyses against the insect data set (n = 1,658).

Genome Annotation

We generated a custom library by combining a de novo species-specific repeat library constructed by RepeatModeler version open-1.0.11 (Smit and Hubley 2008–2015 www.repeatmasker.org, last accessed April 8, 2020) with the Dfam 3.0 (Hubley et al. 2016) and RepBase-20181026 databases (Bao et al. 2015). Repeats were identified and masked using RepeatMasker v4.0.9 (Smit AFA, Hubley R, Green P. 2013-2015 www.repeatmasker.org/faq.html, last accessed April 9, 2019) together with the custom library. Gene prediction was conducted with the MAKER v2.31.10 pipeline (Holt and Yandell 2011) by integrating ab initio, transcriptome-based, and protein homology-based evidence. Ab initio gene predictions were performed with Augustus v3.3 (Stanke et al. 2004) and GeneMark-ET v4.33 (Lomsadze et al. 2005). We trained two predictors using BRAKER v2.1.0 (Hoff et al. 2016) with RNA-seq data and used previously assembled, genome-guided transcripts as transcriptome-based evidence. Homology-based gene functions were assigned using Diamond v0.9.18 (Buchfink et al. 2015) and the UniProtKB (SwissProt + TrEMBL) (–sensitive -e 1e–5). Protein domains, gene ontology, and pathway annotations were searched with InterProScan 5.34-73.0 (Finn et al. 2017) against the Pfam (Finn et al. 2014), PANTHER (Mi et al. 2017), Gene3D (Lewis et al. 2018), Superfamily (Wilson et al. 2009), and CDD (Marchler-Bauer et al. 2017) databases. Protein sequences of Tribolium castaneum, Acyrthosiphon pisum, Apis mellifera, Drosophila melanogaster, Bombus impatiens, and Bombyx mori were downloaded as protein homology-based evidence from Ensembl (Flicek et al. 2014). ncRNAs were identified with Infernal v1.1.2 (Nawrocki and Eddy 2013) against the Rfam v14.0 (Kalvari et al. 2018) database. Transfer RNAs were further refined with tRNAscan-SE v2.0 (Lowe and Eddy 1997).

Phylogenomic Analyses

We generated a phylogeny of Apoidea using two data types. The first part is public genomic data from 17 species (see supplementary table S1, Supplementary Material online) with 2 species from Vespidae and Bethylidae selected as outgroups. The second component is RNA-seq data from six other species from GenBank (see table 1). We assembled the transcripts using Trinity v2.8.6 (Grabherr et al. 2011), combined highly similar transcripts, and extracted the longest transcripts using CD-HIT-EST (Li and Godzik 2006). Complete, single-copy genes were selected using BUSCO assessments against the Hymenoptera data set (n = 4,415). For phylogenetic analyses, single-copy genes matrices were then generated following Zhang et al. (2019) using MAFFT v7.394 (Katoh and Standley 2013), trimAl v1.4.1 (Capella-Gutiérrez et al. 2009), and FASconCAT-G v1.04 (Kück and Longo 2014).

Table 1

Summary of Each Assembly at Each Step for Colletes gigas

Assembly	Total Length (Mb)	No. Scaffolds	N50 Length (kb)	Longest Scaffold (Mb)	GC (%)	BUSCO (n = 1,658) (%)
Assembly	Total Length (Mb)	No. Scaffolds	N50 Length (kb)	Longest Scaffold (Mb)	GC (%)	C	D	F	M
Flye	317.355	4,252	5,882	12.25	39.38	99.3	0.7	0.0	0.7
Falcon	274.246	377	4,809	10.8	39.72	88.6	0.2	6.6	4.8
Merged	274.984	343	7,254	13.274	39.68	98.9	1.4	0.2	0.9
Pilon	275.056	343	7,253	13.274	39.66	99.1	1.4	0.1	0.8
Final genome assembly	273.056	326	8,109	13.274	39.69	94.4	1.2	1.0	4.6
Transcript assembly	50.080	18,407	5.41	0.05781	40.56	92.1	4.6	3.7	4.2

Note.—Values of final assemblies are bold. C, complete BUSCOs; D, complete and duplicated BUSCOs; F, fragmented BUSCOs; M, missing BUSCOs.

Summary of Each Assembly at Each Step for Colletes gigas Note.—Values of final assemblies are bold. C, complete BUSCOs; D, complete and duplicated BUSCOs; F, fragmented BUSCOs; M, missing BUSCOs. Phylogenomic tree reconstructions were made using maximum likelihood (ML) and coalescent-based species tree (ASTRAL) methods. ML reconstructions were performed using IQ-TREE v1.6.3 (Nguyen et al. 2015) with 1,000 ultrafast bootstraps (UFBoot, Hoang et al. 2018) and 1,000 SH-aLRT replicates (Guindon et al. 2010). Partitioning schemes and substitution models were estimated with ModelFinder (built into IQ-TREE; Kalyaanamoorthy et al. 2017). Species trees were estimated using ASTRAL-III v5.6.1 (Zhang et al. 2018) based on gene trees generated with IQ-TREE on individual gene alignments. Local branch supports were estimated from quartet frequencies (Sayyari and Mirarab 2016).

Gene Family Identification and Evolution

We identified gene families using 14 public genome protein sequences of insect species, including 13 Hymenoptera species, five Apidae species (Apis mellifera, Bombus impatiens, Ceratina calcarata, Habropoda laboriosa, and Melipona quadrifasciata), two Megachilidae species (Megachile rotundata and Osmia bicornis), one Halictidae species (Dufourea novaeangliae), and one species each from Formicidae (Harpegnathos saltator), Vespidae (Polistes dominula), Braconidae (Diachasma alloeum), Pteromalidae (Nasonia vitripennis), and Diprionidae (Neodiprion lecontei). Drosophila melanogaster was selected as the outgroup. OrthoFinder v2.2.7 (Emms and Kelly 2015) was used to infer orthogroups with Diamond (Buchfink et al. 2015) as the sequence aligner. Gene family evolution (gain and loss) was analyzed using CAFE v4.2 (Han et al. 2013) with the lambda parameter used to calculate birth and death rates. The ultrametric tree generated from OrthoFinder was transformed using r8s (Sanderson 2003) and time calibrated by the divergence time (99 Ma) of Habropoda laboriosa and Dufourea novaeangliae from the TimeTree database (Kumar et al. 2017).

Results and Discussion

Genome Sequencing and Assembly

We generated 6,251,585 subreads on the PacBio Sequel platform totaling 40.58 Gb (150×). The mean and N50 length of long subreads were 6.49 and 11.44 kb, respectively. A total of 39.1 Gb (147.5×) and 7.77 Gb clean data were produced on the Illumina HiSeq X Ten platform for whole-genome and transcriptome sequencing, respectively. The estimated genome size varied from 299.45 to 322.07 Mb at a maximum k-mer coverage of 1,000 (supplementary table S2, Supplementary Material online). The overall rate of heterozygosity (0.176–0.298%) and a distinct first peak occurred at a mean k-mer coverage of 29.33–37.34 in the k-mer plots (supplementary fig. S2, Supplementary Material online). Unique (nonrepetitive) length estimates were generally consistent among analyses, ranging from 194.77 to 245.50 Mb. Ranging from 65.85 to 126.89 Mb, our genome repetitive length estimates varied with the maximum k-mer coverage cutoff, indicating that the assembled genome may include a high number of repeated regions. The size of the Flye assembly was 317.36 Mb including 4,260 contigs, whereas that of the Falcon assembly was 274.25 Mb with an N50 of 4.81 Mb (table 1). The Flye and Falcon assemblies were merged into 343 contigs with N50 = 7.25 Mb after two rounds of quickmerging. Following polishing with Illumina reads and checking for possible contaminants, our final draft assembly of C. gigas had 326 scaffolds, a total length of 273.06 Mb, an N50 length of 8.11 Mb, a maximum scaffold length of 13.274 Mb, and 39.69% GC content. With the genome-guided strategy, there were a total of 18,405 transcripts assembled with a mean and N50 length of 2.72 and 5.41 kb, respectively. We generated a circular mitochondrial genome of 15,888 bp (GenBank No. MN841004), which is slightly longer and higher A + T content (86.47%) than the previously published one (KM978210, 15,885 bp in length with 86.29% A + T content). Assembly completeness was assessed by querying the genome for the insect BUSCO marker set (n = 1,658). We identified 88.6–99.3% complete, 0.0–6.6% fragmented, and 0.7–4.8 missing BUSCOs across all versions of the assembly (table 1).Therefore, the BUSCO analysis indicates that our assembly is near-complete. Genome-guided transcriptome assemblies also show similar completeness. In addition, 92.78% of PacBio long reads, 94.63% of Illumina short reads, as well as all (18,405) assembled transcripts could be mapped to the final genome assembly using the bwa-mem command (Li 2013). All statistics suggest that our assembly is highly complete and reliable. RepeatMasker identified 378,335 repeats, masking 26.27% of the genome assembly. The top-five, most-abundant repeat types were unclassified repeats (11.05%), Helitron transposable elements (2.95%), DNA/TcMar-Tc1 transposons (2.20%), Gypsy long terminal repeat retrotransposons (1.21%), and DNA/PiggyBac (PB) transposons (0.94%) (supplementary table S3, Supplementary Material online). A total of 11,016 protein-coding genes were identified by the MAKER pipeline with means of 5.99 exons and 4.99 introns per gene. The mean length of exons and introns was 271.91 and 665.33 bp, respectively, whereas the gene density of the C. gigas genome was 40.19 genes/Mb. Among predicted genes, 10,851 (98.50%) were supported by protein-based evidence and 9,336 (84.75%) were supported by transcriptome-based evidence. BUSCO analysis identified 1,518 (91.6%) complete, 21 (1.3%) duplicated, 32 (1.9%) fragmented, and 108 (6.5%) missing BUSCOs. InterProScan identified protein domains for 9,495 (86.19%) genes, among which there were 6,577 assigned with gene ontology terms, and 597,486 and 2,729 ones matching the Kyoto Encyclopedia of Genes and Genomes, MetaCyc, and Reactome pathway databases, respectively. For ncRNAs, we identified 122 rRNAs, 258 tRNAs, 52 micro-RNAs, 52 small nuclear RNAs, 11 ribozymes, 366 cis-regulatory elements, 1 antisense, 2 lncRNAs, 3 sRNAs, and 3 other ncRNAs. A total of 21 tRNA isotypes were identified, excepting the SelCys-isotype. (See details in supplementary table S4, Supplementary Material online.) Nucleotide and protein matrices of 147 shared, single-copy genes had 212,277 and 70,268 sites that were divided by ModelFinder into 49 and 50 partitions, respectively. ML trees from proteins generated the same topologies as species trees generated by ASTRAL-III using both nucleotide and protein matrices, which were similar to the ML ones generated using nucleotide matrices, except for the position of Andrena fulva. All interior nodes were supported with high values (fig. 1). The phylogeny of Apoidea derived from protein data shows the sister relationship between species of (Apidae + Megachilidae) sister to ((Colletidae + Halictidae) + Andrenidae), supporting the results of numerous recent phylogenies (Hedtke et al. 2013; Branstetter et al. 2017; Peters et al. 2017; Sann et al. 2018).

1—

Phylogenomic trees of Apoidea constructed based on protein (left) and nucleotide (right) matrices of single-copy genes from 19 published whole genomes and six RNA-seq data sets. Support values are given at nodes. Species in blue belong to family Apidae, orange belong to Megachilidae, yellow belong to Halictidae, red belong to Colletidae, gray belong to Andrenidae, purple belong to Philanthidae, and brown belong to Sphecidae. Gonniozus legneri and Polistes dominula were used as outgroups.

Gene Family Evolution

Gene families were identified using OrthoFinder based on 14 hymenopterans and D. melanogaster. Overall, 91.4% (184,939) of genes were assigned into 10,994 gene families with a mean orthogroup size of 16.8. Among 5,254 families shared by all species, 1,473 were single-copy orthogroups. For C. gigas, 10,926 (94.20%) genes were clustered into 10,269 gene families, and only one family and seven genes were specific to C. gigas (supplementary table S5, Supplementary Material online). We analyzed gene family evolution (gain and loss) using CAFE and estimated gene birth rate (lambda) at 0.00120 when accounting for duplications/gene/Ma. We found that 968 gene families experienced significant expansion or contraction events (family-wide P value < 0.05, supplementary table S6, Supplementary Material online), with details for the 15 species shown in supplementary figure S3, Supplementary Material online. Among them, C. gigas has 70 (33 expansions and 37 contractions) rapidly evolving families (supplementary table S7, Supplementary Material online). The top-five of the largest expanded families were reverse transcriptase (RNA-dependent DNA polymerase) (112), transposase (90), zinc finger (32), carboxylesterase family (17), and olfactory receptor (15). Among them, olfactory receptors are a large family of membrane-associated G-protein-coupled receptors that play crucial roles in insect survival and reproductive success, mediating responses to food, mates, and oviposition sites (Hallem et al. 2006), and carboxylesterase is a multifunctional superfamily widely distributed in nature, many as enzymes participating in catalyzing chemical reactions involving compounds such as toxins or drugs, meaning they play important roles in xenobiotic detoxification (Yu et al. 2009; Aranda et al. 2014), which could be directly beneficial for foraging on the toxic nectar and pollen of Ca. oleifera. Similarly, more olfactory receptors should make C. gigas better at selecting the specific floral resources it is best adapted to, or perhaps even enable measurement of toxins between specific flowers or at stages of bloom such that this species could minimize its exposure, but more study is necessary to determine the major genomic elements related to the specialization of this species on the chemically defended Ca. oleifera. Click here for additional data file.

66 in total

1. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors: Weizhong Li; Adam Godzik
Journal: Bioinformatics Date: 2006-05-26 Impact factor: 6.937

2. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.

Authors: T M Lowe; S R Eddy
Journal: Nucleic Acids Res Date: 1997-03-01 Impact factor: 16.971

3. Assembly of long, error-prone reads using repeat graphs.

Authors: Mikhail Kolmogorov; Jeffrey Yuan; Yu Lin; Pavel A Pevzner
Journal: Nat Biotechnol Date: 2019-04-01 Impact factor: 54.908

4. TimeTree: A Resource for Timelines, Timetrees, and Divergence Times.

Authors: Sudhir Kumar; Glen Stecher; Michael Suleski; S Blair Hedges
Journal: Mol Biol Evol Date: 2017-07-01 Impact factor: 16.240

5. Stability of pollination services decreases with isolation from natural areas despite honey bee visits.

Authors: Lucas A Garibaldi; Ingolf Steffan-Dewenter; Claire Kremen; Juan M Morales; Riccardo Bommarco; Saul A Cunningham; Luísa G Carvalheiro; Natacha P Chacoff; Jan H Dudenhöffer; Sarah S Greenleaf; Andrea Holzschuh; Rufus Isaacs; Kristin Krewenka; Yael Mandelik; Margaret M Mayfield; Lora A Morandin; Simon G Potts; Taylor H Ricketts; Hajnalka Szentgyörgyi; Blandina F Viana; Catrin Westphal; Rachael Winfree; Alexandra M Klein
Journal: Ecol Lett Date: 2011-08-02 Impact factor: 9.492

6. Wild pollinators enhance fruit set of crops regardless of honey bee abundance.

Authors: Lucas A Garibaldi; Ingolf Steffan-Dewenter; Rachael Winfree; Marcelo A Aizen; Riccardo Bommarco; Saul A Cunningham; Claire Kremen; Luísa G Carvalheiro; Lawrence D Harder; Ohad Afik; Ignasi Bartomeus; Faye Benjamin; Virginie Boreux; Daniel Cariveau; Natacha P Chacoff; Jan H Dudenhöffer; Breno M Freitas; Jaboury Ghazoul; Sarah Greenleaf; Juliana Hipólito; Andrea Holzschuh; Brad Howlett; Rufus Isaacs; Steven K Javorek; Christina M Kennedy; Kristin M Krewenka; Smitha Krishnan; Yael Mandelik; Margaret M Mayfield; Iris Motzke; Theodore Munyuli; Brian A Nault; Mark Otieno; Jessica Petersen; Gideon Pisanty; Simon G Potts; Romina Rader; Taylor H Ricketts; Maj Rundlöf; Colleen L Seymour; Christof Schüepp; Hajnalka Szentgyörgyi; Hisatomo Taki; Teja Tscharntke; Carlos H Vergara; Blandina F Viana; Thomas C Wanger; Catrin Westphal; Neal Williams; Alexandra M Klein
Journal: Science Date: 2013-02-28 Impact factor: 47.728

7. Gene identification in novel eukaryotic genomes by self-training algorithm.

Authors: Alexandre Lomsadze; Vardges Ter-Hovhannisyan; Yury O Chernoff; Mark Borodovsky
Journal: Nucleic Acids Res Date: 2005-11-28 Impact factor: 16.971

8. Ensembl 2014.

Authors: Paul Flicek; M Ridwan Amode; Daniel Barrell; Kathryn Beal; Konstantinos Billis; Simon Brent; Denise Carvalho-Silva; Peter Clapham; Guy Coates; Stephen Fitzgerald; Laurent Gil; Carlos García Girón; Leo Gordon; Thibaut Hourlier; Sarah Hunt; Nathan Johnson; Thomas Juettemann; Andreas K Kähäri; Stephen Keenan; Eugene Kulesha; Fergal J Martin; Thomas Maurel; William M McLaren; Daniel N Murphy; Rishi Nag; Bert Overduin; Miguel Pignatelli; Bethan Pritchard; Emily Pritchard; Harpreet S Riat; Magali Ruffier; Daniel Sheppard; Kieron Taylor; Anja Thormann; Stephen J Trevanion; Alessandro Vullo; Steven P Wilder; Mark Wilson; Amonida Zadissa; Bronwen L Aken; Ewan Birney; Fiona Cunningham; Jennifer Harrow; Javier Herrero; Tim J P Hubbard; Rhoda Kinsella; Matthieu Muffato; Anne Parker; Giulietta Spudich; Andy Yates; Daniel R Zerbino; Stephen M J Searle
Journal: Nucleic Acids Res Date: 2013-12-06 Impact factor: 16.971