Literature DB >> 34677607

Chromosome-Level Genome Assembly of Nephotettix cincticeps (Uhler, 1896) (Hemiptera: Cicadellidae: Deltocephalinae).

Bin Yan¹, Xiaofei Yu², Renhuai Dai¹, Zizhong Li¹, Maofa Yang^1,2.

Abstract

The green rice leafhopper, Nephotettix cincticeps (Uhler), is an important rice pest and a vector of the rice dwarf virus in Asia. Here, we produced a high-quality chromosome-level genome assembly of 753.23 Mb using PacBio (∼110×) and Hi-C data (∼94×). It contained 163 scaffolds and 950 contigs, whose scaffold/contig N50 lengths reached 85.36/2.57 Mb. And 731.19 Mb (97.07%) of the assembly was anchored into eight pseudochromosomes. Genome completeness was attained to 97.0% according to the insect reference Benchmarking Universal Single-Copy Orthologs (BUSCO) gene set (n = 1,367). We masked 347.10 Mb (46.08%) of the genome as repetitive elements. Nine hundred sixty-two noncoding RNAs were identified and 14,337 protein-coding genes were predicted. We also assigned GO term and KEGG pathway annotations for 10,049 and 9,251 genes, respectively. Significantly expanded gene families were primarily involved in immunity, cuticle, digestion, detoxification, and embryonic development. This study provided a crucial genomic resource for better understanding on the biology and evolution in family Cicadellidae.

Entities: Chemical

Keywords: Chiasmini; comparative genomics; gene family evolution; genome annotation; green rice leafhopper; insect genomics

Mesh：

Year: 2021 PMID： 34677607 PMCID： PMC8598198 DOI： 10.1093/gbe/evab236

Source DB: PubMed Journal: Genome Biol Evol ISSN： 1759-6653 Impact factor: 3.416

Significance Most species of leafhoppers (Hemiptera: Cicadellidae) are important pests of food crops. A high-quality genome could play a key role in appreciating pest biology, evolution, and devising pest control strategies. Right now, only two cicadellid genomes are currently available, including Homalodisca vitripennis and Empoasca onukii. In this study, the genome of the green rice leafhopper, Nephotettix cincticeps, was sequenced and analyzed. The chromosome-level genome of Nephotettix cincticeps provides a valuable resource for studying the phylogeny and biology of leafhoppers.

Introduction

Leafhoppers (Auchenorrhyncha: Cicadellidae) are currently the largest family of sap-sucking herbivores and comprise the most abundant known vectors of plant pathogens of any insect family in the Hemiptera (Dietrich 2013). Many species in Cicadellidae feed on economically significant plants and are thus considered as important pests, mostly because of injuries incurred to plants from direct feeding injury. And some species can transmit plant pathogens (Weintraub and Beanland 2006). The green rice leafhopper (GRLH), Nephotettix cincticeps (Uhler) (Cicadellidae: Deltocephalinae), is a potent pest of rice. It is widely distributed among rice-producing areas in Asia, and able to transmit rice viral pathogens, notably the rice dwarf virus, in a persistent-propagative manner during the sap-sucking process. The facilitated infection can severely damage host rice plants, manifesting as stunted growth, white chlorotic spots on leaves, and delayed panicle development. In this way, outbreaks of Nephotettix cincticeps have caused serious losses in rice yield and quality in China, Japan, Korea, the Philippines, and Nepal (Ruan et al. 1981; Zheng et al. 1997; Zhu et al. 2005; Honda et al. 2007; Wei and Li 2016; Jia et al. 2021). Accordingly, access to a high-quality GRLH genome could play a fundamental role in studies of pest biology, evolution, and control. To date, only two cicadellid genomes, Homalodisca vitripennis (Germar) (Cicadellidae: Deltocephalinae) and Empoasca onukii Matsuda (Cicadellidae: Deltocephalinae), are available on NCBI GenBank (accessed July 10, 2021), whose genome sizes are 1.45 Gb and 599.26 Mb, respectively. Furthermore, although a chromosome-level genome assembly of Empoasca onukii was uploaded (GCA_018831715.1), the annotations are unavailable to the public. On the other hand, the scaffold N50 length of the Homalodisca vitripennis assembly (GCA_000696855.2) was smaller than 50 kb, indicating the low-quality assembly contiguity. In this study, we provided a de novo chromosome-level genome assembly of Nephotettix cincticeps using PacBio long reads and Hi-C sequencing. We annotated its protein-coding genes, as well as repetitive elements and noncoding RNAs (ncRNAs). Gene family evolution was analyzed across the main hemipteran groups. Furthermore, chromosomal syntenic correspondence was investigated between Nephotettix cincticeps and the brown planthopper Nilaparvata lugens (Stål) (Delphacidae: Delphacinae) to reveal their chromosomal evolution.

Results and Discussion

Genome Assembly

Sequencing platforms generated: 93.62 Gb (143×) of Illumina short reads, 85.76 Gb (110×) of PacBio long reads, 70.80 Gb (94×) of Hi-C data for the genome assembly, and 10.87 Gb of transcriptome data used for annotations. After implementing the quality control, 97.82 Gb of Illumina reads were retained for the analyses of genome survey and genome polishing. The genome survey indicated that the sequenced strain had an approximate genome size of 720 Mb (718.12‒724.33) and showed a very high heterozygosity (1.30‒1.38%). The 85.76 Gb (110×) of PacBio long reads had a N50 of 25.17 kb and a mean length of 14.39 kb. After self-correcting, 56.61 Gb (75×) of Raw PacBio long reads were generated. These corrected reads had a N50 and a mean length of 26.84 and 22.00 kb, respectively. After initial Raven assembly, polishing, redundancy removal, Hi-C scaffolding, and contaminant detection, we generated a high-quality contiguous chromosome-level genome assembly (table 1).The final assembly had a length of 753.23 Mb, comprising 163 scaffolds and 950 contigs, with a scaffold/contig N50 length of 85.36/2.57 Mb, a GC content of 34.48%, and a gap ratio of 0.01%. Among them, 703 contigs (97.07%, 731.19 Mb) were anchored into eight pseudochromosomes (fig. 1). Benchmarking Universal Single-Copy Orthologs (BUSCO) completeness reached 97.0% (1.8% complete and duplicated, 2.0% fragmented, 1.0% missing). High mapping rates for both Illumina (95.65%) and PacBio (97.87%) reads also confirmed the integrity of our assembly. A clean Hi-C contact heatmap (supplementary fig. S1, Supplementary Material online) and low ratio of BUSCO duplicates (1.8%) indicated that no obvious redundant regions were observed within the assembly. When compared with the two available Cicadellidae genomes, the genome size of Nephotettix cincticeps (Deltocephalinae) was slightly larger than that of Empoasca onukii (Typhlocybinae, 599.26 Mb) but much smaller than that of Homalodisca vitripennis (Cicadellinae, 1.45 Gb). This indicated that cicadellid genome sizes can vary greatly among subfamilies of leafhoppers.

Table 1

Genome Assembly and Annotation Statistics for Nephotettix cincticeps

Content	Nephotettix cincticeps
Genome assembly
Assembly size (Mb)	753.23
Number of scaffolds/contigs	163/950
Longest scaffold/contig (Mb)	154.23/17.07
N50 scaffold/contig length (Mb)	85.36/2.57
GC content (%)	34.48
Gaps (%)	0.01
BUSCO completeness (%)	97.0
Single copy (%)	95.2
Duplicated (%)	1.8
Fragmented (%)	2.0
Missing (%)	1.0
Protein-coding genes
Number	14,337
Mean gene length (bp)	17,386.5
Gene ratio (%)	33.42
Exons/introns/CDS per gene	9.5/8.2/9.2
Exon/intron/CDS ratio (%)	4.49/28.92/3.29
Mean exon/intron/CDS length (bp)	245.3/1,827.5/185.3
Genes with GO/KEGG pathway annotations	10,049/9,251
Repetitive elements	347.10 Mb (46.08%)
Number of ncRNAs	962

Genome characteristics of Nephotettix cincticeps. (a) Circos tracks showing element distributions in 100-kb sliding windows from outer to inner: chromosome length, GC content, density of protein-coding genes, DNA transposons, SINE/LINE/LTR retrotransposons, rolling-circle, and simple repeats. (b) Gene family evolution and statistics of orthologs. Node values represent the number of expanded, contracted, and rapidly evolving families; “1:1:1” denoted the shared single-copy genes, “N:N:N” indicates multicopy genes shared by all species, “Others” are the unclassified orthologs, “Unassigned” are orthologs that cannot be assigned to any orthogroups. (c) Significantly expanded families. Families lacking functional annotations are not shown. Functional enrichment of GO (d) and KEGG (e) categories for significantly expanded gene families.

Genome Annotation

Among the genome, 347.10 Mb (46.08%) was masked as repetitive elements. The dominant repeat categories were unclassified (21.10%), retrotransposons (10.70%), DNA transposons (7.63%), rolling-circles (5.26%), and simple repeats (1.21%) (fig. 1 and supplementary table S1, Supplementary Material online). Within the retrotransposons, a large portion were LINE (6.66%) and LTR (3.61%) elements, particularly the families L2 (3.21%) and Gypsy (2.49%). Notably, the retrotransposon L2 elements act as a source of functional micro-RNAs (miRNAs) and target sites (Piriyapongsa et al. 2007; Spengler et al. 2014). The families TcMar-Tc1 (3.44%), TcMar-Mariner (1.27%), and TcMar-Tigger (0.89%) accounted for a major part of the DNA transposons. Both TcMar-Tc1 and TcMar-Mariner, often called “parasitic” mobile elements (Capy et al. 2000), can contribute to the evolution of more complex genomes (Lynch and Conery 2003) such as more mobile elements, larger genome sizes (Liu et al. 2016), or horizontal transmission (Lohe et al. 1995; Lampe et al. 2003). These transposons may play important roles in the adaptations of insect taxa to a wide range of environmental conditions. Overall, we identified 962 ncRNAs: 60 ribosomal RNAs (rRNAs), 60 miRNAs, 130 small nuclear RNAs (snRNAs), three long noncoding RNAs (lncRNAs), 408 tRNAs (21 isotypes), five riboswitches, eight ribozymes, and 288 other ncRNAs (supplementary table S2, Supplementary Material online). The snRNAs were classified into 96 spliceosomal RNAs in six groups (U1, U2, U4, U5, U6, and U11), five minor spliceosomal RNAs in three groups (U4atac, U6atac, and U12), 25 C/D box small nucleolar RNAs (snoRNAs), three H/ACA box snoRNAs, and one other snoRNA. We predicted 14,337 protein-coding gene models from 16,817 sequences (isoform included) with a mean length of 17,386.5 bp, for which the gene content accounted for 33.42% of the genome (table 1). The average number of exons/introns/CDS per gene was 9.5/8.2/9.2 and their mean length was 245.3/1827.5/185.3 bp. The BUSCO completeness assessment (in protein mode) for protein sequences reached 96.0%, indicating the high quality of these predictions of protein-coding genes. Of the 14,337 genes, 13,183 (91.95%), 11,823 (82.46%), and 12,337 (86.05%) genes respectively matched the UniprotKB, InterProScan, and eggNOG records. After integrating the above annotation results, 10,049, 9,251, 10,386, 2,827, and 11,663 genes were assigned to gene ontology (GO) terms, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, Reactome pathways, Enzyme Codes, and COG categories, respectively.

Gene Family Evolution

We selected 15 Pterygota (Insecta: Dicondylia) species including eight hemipterans used for gene family inference, and obtained 15,591 orthogroups (gene families) comprising 189,907 (90.47%) genes. Among them, 4,172 orthogroups and 20,685 genes were assigned as being species-specific; another 4,035 were orthogroups present in all species and 1,328 consisted of single-copy genes (fig. 1). For the Nephotettix cincticeps, 13,125 (91.54%) genes were assigned to 8,856 orthogroups, of which 235 orthogroups and 1,204 genes were species-specific genes. The reconstructed phylogenetic tree of all 15 species based on 747 482 amino acid sites was similar with that reported by Misof et al. (2014) (fig. 1). Thysanoptera was a sister to Hemiptera but Psocodea was clustered with Holometabola insects. Eight hemipteran species were separated into three clades (Sternorrhyncha + (Auchenorrhyncha + Heteroptera)). The Hemiptera was diverged at the middle Mississippian period (339.5–346.8 Ma). Two Auchenorrhyncha taxa species (Fulgoroidea and Membracoidea) were later separated in the early Permian period (296.3–299.6 Ma). Gene family evolution analyses revealed that 1,386 and 3,624 gene families underwent expansions and contractions, respectively. Of them, 28 gene families were significantly expanded without incurring any significantly contracted ones (fig. 1). These significantly expanded families were primarily associated with immunity, cuticle, digestion, detoxification, and embryonic development (fig. 1). Immunity related families constitute the main components of insect antimicrobial defenses, such as Peptidoglycan-recognition protein (14) and SVWC domain-containing protein (12) (Royet and Dziarski 2007; Chen et al. 2011; Royet et al. 2011). Detoxification-related families included cytochrome P450 (20), carboxylesterase (16), and ecdysteroid kinase (5). Interestingly, the fatty acyl-CoA reductase wat genes are essential for gas filling, wax ester synthesis and hydrophobic tracheal coating (Jaspers et al. 2014). Those may be linked to the production of brochosomes in leafhoppers, a process similar to the production of epidermal wax blooms in other sap-sucking hemipterans, which can protect them from entrapment by their exudates (Rakitov and Gorb 2013). Functional enrichment for those significantly expanded gene families also reflected strong representation in immunity and cuticle constituent, particularly in terms of those GO/KEGG items numbering higher than 20 (supplementary figs. S2 and S3, Supplementary Material online).

Materials and Methods

Sample Collection and Sequencing

The specimens Nephotettix cincticeps used for sequencing were collected from rice plants in the Chunhua Modern Agriculture Demonstration Garden, Jiangning district, Nanjing, China, on September 12, 2020. Female adult individuals were washed using ddH2O for DNA and RNA sequencing: seven for Illumina and PacBio whole-genome sequencing, two for transcriptome sequencing, and one for Hi-C sequencing, respectively. Genomic DNA was extracted using the Qiagen Blood & Cell Culture DNA Mini Kit. Libraries consisting of 350-bp and 40-kb insert sizes were constructed using the TruSeq DNA PCR-Free LT Library Preparation Kit and the SMRTbell DNA Template Prep Kit 2.0, and then sequenced on the HiSeq NovaSeq 6000 and PacBio Sequel II platforms. RNA was extracted using the TRIzol Reagent and a library was constructed using the TruSeq RNA v2 Kit. For the Hi-C sequencing, DNA preparation (crosslinking, digesting using the restriction enzyme MboI, ligation, etc.), library construction, and sequencing, as well as other library preparations and sequencing tasks were performed at Berry Genomics (Beijing, China). Quality control of the obtained Illumina clean reads was performed using BBTools suite v38.82 (Bushnell 2014): the script “clumpify.sh” removed any duplicates; “bbduk.sh” carried out the quality trimming (>Q20), length filtering (>15 bp), polymer trimming (>10 bp), and the corrected any overlapping paired reads. To assess the genome characteristics, we surveyed the genome using GenomeScope v2.0 (Ranallo-Benavidez et al. 2020). The k-mer frequency distributions were estimated with 21-mers, based short reads, using the script “khist.sh” (BBTools). The maximum k-mer coverage cutoff was set as 10,000. Raw PacBio long reads which were longer than 10 kb were self-corrected, using NextDenovo v2.3.1 (https://github.com/Nextomics/NextDenovo), and assembled in Raven v1.3.1 software under its default parameters (Vaser and Šikić 2021). The resulting assembly was then polished with two rounds of Illumina reads, by using NextPolish v1.3.1 (Hu et al. 2020), after which redundant haplotypic duplication was removed by Purge_Dups v1.0.1 (Guan et al. 2020), using a minimum alignment score of 60 (-a 60). Minimap2 v2.17 (Li 2018) was used as a sequence aligner in the polishing and purging steps. For the Hi-C data set, its alignment to the genome, removal of duplicates, and mining of Hi-C contacts were all conducted in Juicer v1.6.2 (Durand et al. 2016). To generate pseudochromosomes, we used the 3D-DNA v180922 pipeline (Dudchenko et al. 2017) to correct misassemblies present, and to anchor, order, and orient the contigs. Possible assembly errors produced in the first 3D-DNA round were manually corrected, using the Assembly Tools module within Juicebox (Durand et al. 2016), and the pseudochromosomes refined further in a second 3D-DNA round. We removed potential contaminants through BlastN-like MMseqs2 v11-e1a1c (Steinegger and Söding 2017) searches against the UniVec and NCBI nucleotide (nt) databases. Besides the assembly contiguity indicators, we also evaluated assembly quality in terms of genome completeness and read mapping rate. The BUSCO v5.0.0 pipeline (Manni et al. 2021) was implemented to assess genome completeness against the insect gene set (insecta_odb10, n = 1,367). Finally, all raw PacBio and Illumina reads were mapped to the genome using Minimap2 and the corresponding mapping rate estimated by Samtools v1.9 (Danecek et al. 2021).

Genome Annotations

Three essential genomic elements, namely protein-coding genes, repetitive elements, and ncRNAs, were annotated for the GRLH genome. We used RepeatMasker v4.1.0 (Smit et al. 2013–2015) to mask repetitive elements based on a custom repeat library, which included a de novo library and the Dfam 3.1 (Hubley et al. 2016) and RepBase-20181026 databases (Bao et al. 2015). That de novo repeat library was built using RepeatModeler v2.0.1 (Flynn et al. 2020), with additional LTR discovery pipeline activated (-LTRStruct). Next, we identified the ncRNAs using Infernal v1.1.3 (Nawrocki and Eddy 2013) and tRNAscan-SE v2.0.7 (Chan and Lowe 2019). To reduce potential errors (e.g., pseudogenes), we retained only high-confidence tRNAs by using the tRNAscan-SE built-in script: “EukHighConfidenceFilter.” Protein-coding gene models were predicted via the MAKER v3.01.03 pipeline (Holt and Yandell 2011), by integrating ab initio, transcriptome, and protein homology-based evidence. Ab initio gene models were predicted using BRAKER v2.1.5 (Brůna et al. 2021), which integrated two ab initio predictor tools, Augustus v3.3.4 (Stanke et al. 2004) and GeneMark-ES/ET/EP 4.59_lic (Brůna et al. 2020), thereby simultaneously incorporating transcriptome and protein homology evidence. To align the transcriptome data to the genome, we used HISAT2 v2.2.0 (Kim et al. 2019) to generate BAM alignments, with arthropod reference proteins retrieved from the OrthoDB10 v1 database (Kriventseva et al. 2019). Transcripts fed into MAKER were assembled using the genome-guided assembler StringTie v2.1.4 (Kovaka et al. 2019). The protein sequences passed on to MAKER as evidence of protein homology were downloaded from NCBI for five species: Apis mellifera, Drosophila melanogaster, Thrips palmi, Nilaparvata lugens, and Rhopalosiphum maidis. Gene functions were assigned by searching the UniProtKB database, using Diamond v0.9.24 (Buchfink et al. 2015) in its more sensitive mode and an e-value of 1e-5. Protein domains, GO terms, and pathways (KEGG, Reactome) were assigned by applying eggNOG-mapper v2.0.1 (Huerta-Cepas et al. 2017) against the eggNOG v5.0 database (Huerta-Cepas et al. 2019) and likewise InterProScan 5.47–82.0 (Finn et al. 2017) against five databases: Pfam (El-Gebali et al. 2019), Superfamily (Wilson et al. 2009), Gene3D (Lewis et al. 2018), SMART (Letunic and Bork 2018), and CDD (Marchler-Bauer et al. 2017) databases. Besides Nephotettix cincticeps, to carry out gene family and evolution analyses, we downloaded high-quality nonredundant protein sequences from the NCBI of 14 insect species: one Polyneoptera member (Zootermopsis nevadensis), four belonging to Endopterygota (Apis mellifera, Bombyx mori, Drosophila melanogaster, Tribolium castaneum), one Psocodea member (Pediculus humanus), one Thysanoptera member (Thrips palmi), and eight belonging to Hemiptera (Apolygus lucorum, Cimex lectularius, Halyomorpha halys, Nephotettix cincticeps, Nilaparvata lugens, Phenacoccus solenopsis, Rhopalosiphum maidis, Trialeurodes vaporariorum). Sequence orthology were inferred using OrthoFinder v2.3.8 (Emms and Kelly 2019) for which Diamond served as the sequence aligner. Single-copy orthologs inferred from OrthoFinder were used to reconstruct the insect phylogeny. For each ortholog, the protein sequences were aligned by MAFFT v7.450 (Katoh and Standley 2013) in the L-INS-I mode. Unreliable homologous regions within the alignment were stringently trimmed, using BMGE v1.12 (Criscuolo and Gribaldo 2010) (-m BLOSUM90 -h 0.4). The phylogeny of 15 species was then reconstructed using IQ-TREE v2.0.7 (Minh et al. 2020), with the following set of parameters: “-m MFP –mset LG –msub nuclear –rclusterf 10 -B 1,000 –alrt 1,000 –symtest-remove-bad –symtest-pval 0.10.” The ensuing tree was fed into MCMCTree within the PAML v4.9j package (Yang 2007) as the guide tree, to estimate the divergence times. We selected six fossils from the PBDB database (https://www.paleobiodb.org/navigator/) for conducting the stem node calibration: root (Pterygota <443.4 Ma), Holometabola (>382.7 Ma), Coleoptera (311.4‒323.2 Ma), Hemiptera (314.6‒323.2 Ma), Aphidomorpha (279.3‒298.9 Ma), and Cicadomorpha (298.9‒307.0 Ma). Gene family evolution (expansions and contractions) was inferred using CAFÉ v4.2.1 (Han et al. 2013), setting the significance level to 0.01 for the model of single birth–death parameter “lambda.” We next performed GO and KEGG functional enrichment analyses for those significantly expanded families, this done using the R package “clusterProfiler” v3.14.3 (Yu et al. 2012). For the enrichment scores, their significance level was set to 0.01 (P value) with a false discovery rate control cut-off of 0.05 (q value).

Supplementary Material

Supplementary data are available at Genome Biology and Evolution online. Click here for additional data file.

55 in total

Review 1. Peptidoglycan recognition proteins: pleiotropic sensors and effectors of antimicrobial defences.

Authors: Julien Royet; Roman Dziarski
Journal: Nat Rev Microbiol Date: 2007-04 Impact factor: 60.633

2. PAML 4: phylogenetic analysis by maximum likelihood.

Authors: Ziheng Yang
Journal: Mol Biol Evol Date: 2007-05-04 Impact factor: 16.240

3. Brochosomes protect leafhoppers (Insecta, Hemiptera, Cicadellidae) from sticky exudates.

Authors: Roman Rakitov; Stanislav N Gorb
Journal: J R Soc Interface Date: 2013-07-31 Impact factor: 4.118

4. The fatty acyl-CoA reductase Waterproof mediates airway clearance in Drosophila.

Authors: Martin H J Jaspers; Ralf Pflanz; Dietmar Riedel; Steffen Kawelke; Ivo Feussner; Reinhard Schuh
Journal: Dev Biol Date: 2013-10-29 Impact factor: 3.582

5. Repbase Update, a database of repetitive elements in eukaryotic genomes.

Authors: Weidong Bao; Kenji K Kojima; Oleksiy Kohany
Journal: Mob DNA Date: 2015-06-02

6. tRNAscan-SE: Searching for tRNA Genes in Genomic Sequences.

Authors: Patricia P Chan; Todd M Lowe
Journal: Methods Mol Biol Date: 2019

7. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype.

Authors: Daehwan Kim; Joseph M Paggi; Chanhee Park; Christopher Bennett; Steven L Salzberg
Journal: Nat Biotechnol Date: 2019-08-02 Impact factor: 54.908

8. InterPro in 2017-beyond protein family and domain annotations.

Authors: Robert D Finn; Teresa K Attwood; Patricia C Babbitt; Alex Bateman; Peer Bork; Alan J Bridge; Hsin-Yu Chang; Zsuzsanna Dosztányi; Sara El-Gebali; Matthew Fraser; Julian Gough; David Haft; Gemma L Holliday; Hongzhan Huang; Xiaosong Huang; Ivica Letunic; Rodrigo Lopez; Shennan Lu; Aron Marchler-Bauer; Huaiyu Mi; Jaina Mistry; Darren A Natale; Marco Necci; Gift Nuka; Christine A Orengo; Youngmi Park; Sebastien Pesseat; Damiano Piovesan; Simon C Potter; Neil D Rawlings; Nicole Redaschi; Lorna Richardson; Catherine Rivoire; Amaia Sangrador-Vegas; Christian Sigrist; Ian Sillitoe; Ben Smithers; Silvano Squizzato; Granger Sutton; Narmada Thanki; Paul D Thomas; Silvio C E Tosatto; Cathy H Wu; Ioannis Xenarios; Lai-Su Yeh; Siew-Yit Young; Alex L Mitchell
Journal: Nucleic Acids Res Date: 2016-11-29 Impact factor: 16.971

9. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database.

Authors: Tomáš Brůna; Katharina J Hoff; Alexandre Lomsadze; Mario Stanke; Mark Borodovsky
Journal: NAR Genom Bioinform Date: 2021-01-06

10. The Dfam database of repetitive DNA families.

Authors: Robert Hubley; Robert D Finn; Jody Clements; Sean R Eddy; Thomas A Jones; Weidong Bao; Arian F A Smit; Travis J Wheeler
Journal: Nucleic Acids Res Date: 2015-11-26 Impact factor: 16.971