Literature DB >> 32546503

Taro Genome Assembly and Linkage Map Reveal QTLs for Resistance to Taro Leaf Blight.

M Renee Bellinger¹, Roshan Paudel², Steven Starnes³, Lukas Kambic³, Michael B Kantar², Thomas Wolfgruber², Kurt Lamour⁴, Scott Geib⁵, Sheina Sim⁵, Susan C Miyasaka², Martin Helmkampf⁶, Michael Shintaku⁷.

Abstract

Taro (Colocasia esculenta) is a food staple widely cultivated in the humid tropics of Asia, Africa, Pacific and the Caribbean. One of the greatest threats to taro production is Taro Leaf Blight caused by the oomycete pathogen Phytophthora colocasiae Here we describe a de novo taro genome assembly and use it to analyze sequence data from a Taro Leaf Blight resistant mapping population. The genome was assembled from linked-read sequences (10x Genomics; ∼60x coverage) and gap-filled and scaffolded with contigs assembled from Oxford Nanopore Technology long-reads and linkage map results. The haploid assembly was 2.45 Gb total, with a maximum contig length of 38 Mb and scaffold N50 of 317,420 bp. A comparison of family-level (Araceae) genome features reveals the repeat content of taro to be 82%, >3.5x greater than in great duckweed (Spirodela polyrhiza), 23%. Both genomes recovered a similar percent of Benchmarking Universal Single-copy Orthologs, 80% and 84%, based on a 3,236 gene database for monocot plants. A greater number of nucleotide-binding leucine-rich repeat disease resistance genes were present in genomes of taro than the duckweed, ∼391 vs. ∼70 (∼182 and ∼46 complete). The mapping population data revealed 16 major linkage groups with 520 markers, and 10 quantitative trait loci (QTL) significantly associated with Taro Leaf Blight disease resistance. The genome sequence of taro enhances our understanding of resistance to TLB, and provides markers that may accelerate breeding programs. This genome project may provide a template for developing genomic resources in other understudied plant species.

Entities: CellLine Chemical Disease Gene Species

Keywords: Colocasia esculenta; Taro Leaf Blight; disease resistance genes; linkage mapping; linked-read genome assembly

Mesh：

Year: 2020 PMID： 32546503 PMCID： PMC7407455 DOI： 10.1534/g3.120.401367

Source DB: PubMed Journal: G3 (Bethesda) ISSN： 2160-1836 Impact factor: 3.154

Taro (Colocasia esculenta (L.) Schott) is a food staple widely cultivated in the humid tropics of Asia, Africa, Pacific and the Caribbean. The starchy corm (underground stem) is a good source of calories, and its leaves and petioles are used as a vegetable rich in dietary fiber and vitamin C (Greenwell 1947; Rao ; Kaushal ). As the 5th most produced root crop in the world (Food and Agriculture Organization of the United Nations 2017), taro is a vital component of many subsistence farming communities, and production issues are a serious food security concern. Taro Leaf Blight (TLB), caused by the fungus-like oomycete Phytophthora colocasiae, can reduce leaf yield by 95%, which in turn can reduce corm yield by up to ∼50% in susceptible cultivars (Nelson , Miyasaka , Singh ). When TLB occurs as an epidemic, leaves can be destroyed in 10 days or less, leading to a decrease in photosynthesis with a subsequent reduction of corm yield (Nelson ). Developing TLB-resistant varieties will improve yield, can reduce pesticide applications, and will help to ensure food security. This effort is aimed at developing a genome reference and understanding the genetic basis of resistance to TLB in taro. The geographic center of taro domestication is not clear from archeological records, but molecular analyses indicate the presence of two separate gene pools that originate from the Indo-Malayan region in Asia and Melanesia in the Pacific (Coates , Lebot and Aradhya 1991, Kreike , Mace , Miyasaka ). With a basic chromosome number of n = 14, taro occurs in diploid (2n = 28) or triploid (3n = 42) forms (Coates , Wang ). Both of these forms are present in Asia, Africa, and South America, whereas triploids are generally absent from Oceania (Coates , Ochiai , Matsuda and Nawata 2002, Lebot , Chaïr ). The diploid taro genome shows a wide range in estimated size, from ∼2 to ∼4 Giga base pairs [C-value mean (Bennett and Leitch 1995, Wang )], and within Micronesia and Polynesia the genetic diversity of taro cultivars is relatively low (Lebot and Aradhya 1991, Lebot ). Indeed, comparisons between Papua New Guinea and Pacific Island cultivars revealed several identical genotypes between these two regions, consistent with the latter having been introduced from the former (Mace ). Although genetic resistance to TLB has been identified in taro populations from Palau, Indonesia, and Papua New Guinea, all Hawaiian landraces are susceptible to TLB and many have been lost due to this disease (Miyasaka ). Once established, TLB also forces costly changes to cultivation practices. Given the special cultural significance of taro in many Pacific Island societies, particularly in Hawaii (Abbott 1992, Cho , Rao ), preservation of lineages is of substantial cultural and economic interest. Taro is typically vegetatively propagated by removing and planting suckers or vegetative propagules from the tops, but hand-pollination can produce large numbers of highly heterozygous hybrids that can be selected for disease resistance and desirable agronomic traits. A collection of taro plants maintained by the University of Hawaii and used in breeding programs includes all remaining Hawaiian landraces and several cultivars from Nepal, Indonesia, Thailand, Melanesia, and Micronesia (Cho ). Over the past several decades, crosses between Hawaiian and non-Hawaiian cultivars have been selected for desirable characteristics and made available for commercial planting or further breeding (Shintaku ). One such cross (‘230’ x ‘255’, see methods) produced progeny that exhibited TLB resistance, but a previous linkage map developed from 240 high quality SNPs identified using a reference-free genotyping by sequencing [GBS (Elshire )] approach failed to demonstrate associations between QTL and TLB resistance (Shintaku ). Thus, a high quality reference genome for improved single nucleotide polymorphism (SNP) calling and linkage mapping can serve to identify loci associated with TLB resistance and other desired agronomic qualities, and aid marker-assisted breeding programs. Rapid advancements in sequencing technology and computer algorithms bring de novo assembly of high quality genomes within reach for non-model taxa, even for those with large, repetitive genomes (Goodwin , Paajanen ). Genome assembly with short-read shotgun sequences carries the advantage of high sequencing depth at low per-base cost relative to long-read sequencing platforms; however, short reads are unable to traverse genomic intervals that span repetitive elements, which causes assembly fragmentation. Incorporating data from mate-pair libraries can help to reduce this fragmentation (Wetzel ), but that strategy is less effective when applied to complex genomes with long [multi-kilobase (kb)] repeat elements. Alternatively, linked-reads (barcoded short-read libraries) retain long-range information while maintaining the advantages of short reads, and can produce a complete representation of the genome with high-levels of accuracy (Zheng , Weisenfeld ). This genome sequencing strategy can be implemented with low DNA input, e.g., ∼1 ng (Marks ), and is cost-effective. Paajanen performed a direct comparison of de novo genome assemblies for the potato species Solanum verrucosum (722 Mb genome size), and showed that a single linked-read library sequenced in a single Illumina lane (92x coverage) produced an assembly comparable in quality to an assembly reconstructed from 65 Pacific Biosystems SMRT cells (50x coverage), at a fraction of the cost (∼$4k vs. $25k, USA dollars). Other examples of high-quality linked-read genome assemblies for non-model species include pepper (Capsicum annuum) and the African wild dog (Lycaon pictus), and this technology been leveraged to improve genome completeness for bottlenose dolphin (Tursiops truncatus) and Atlantic herring (Clupea harengus) (Armstrong ; Hulse-Kemp , Kongsstovu ; Martinez-Viaud ). Coupled with a need to understand the genetic basis of TLB resistance, a taro genome reference and analysis of genotypic data from a mapping population that shows resistance to TLB infection will accelerate the understanding of the genomic underpinnings of that trait. Accordingly, the objectives of this study are to: 1) generate a de novo taro genome reference assembly; 2) use GBS data from a TLB-resistant mapping population to construct a linkage map and test for associations between QTL and resistance to TLB; and (3) characterize basic genome architecture of Araceae using available high-quality genome resource data.

Materials And Methods

Genome material

The Hawaiian landrace ‘Moi’ was selected for genome sequencing because it is a grandparent to ‘Parent 230’ used in the ‘1025’ TLB-resistant mapping population (see below), and because it is highly regarded for its agronomic qualities. The DNA for genome sequencing was isolated from newly emerging taro leaf tissue using a modified CTAB protocol (Healey ). Briefly, ∼1 to 1.5 g of tissue was ground in 10 ml of 100 mM Tris pH 7.5, 1.5 M NaCl, 2% CTAB, and 0.3% β-mercaptoethanol, incubated for one hour at 65°, then centrifuged for 5 min at 7600 RPM to pellet cellular debris. The DNA was extracted using a standard phenol-choloroform extraction protocol with three cleaning steps, chloroform, phenol:chloroform:isoamyl alcohol (25:24:1), then chloroform, with 0.5 volume added to the supernatant in each step. After the first chloroform cleaning step the aqueous phase was treated with 5 uL RNAse A (10 mg/mL) and incubated at 42° for 15 min. All mixing was performed by gently inverting the tube for several minutes, and the aqueous phase separated from phenol/chloroform by centrifugation at 7600 RPM. The DNA was precipitated from the aqueous phase by adding 1/5 volume 5M NaCl and 1.5 volume ethanol, spooled on a heat sealed glass Pasteur pipet, and transferred to a vial containing 70% ethanol for a 10 min soak. After repeating the ethanol soak step, the spooled DNA was briefly rinsed in 100% ethanol and then allowed to air dry for several minutes. Finally, the DNA was dissolved overnight in 350 to 400 uL of TE buffer (0.2mM EDTA). The DNA was quantified with a UV spectrophotometer (Eppendorf Biophotometer), which indicated A260/A280 ratios of 1.8 to 1.9 and A260/A230 ratios between 1.9 and 2.2. The DNA concentration was determined with a Qubit fluorometric spectrophotometer and subjected to a second purification using Zymo Research Genomic Clean and Concentrator Columns using the manufacturer’s instructions.

Sequencing library preparation

The linked-read library was prepared by the HudsonAlpha Institute using the 10x Genomics (Pleasanton, California) microfluidic gel bead partitioning Chromium system (Zheng , Weisenfeld ), and was sequenced on two lanes of an Illumina HiSeq X short-read platform. The DNA for this library was prepared by performing a high-pass size-selection to collect DNA fragments >40 kb using a 0.75% agarose gel cassette and external U1 marker for the Blue Pippin system (Sage Science, Beverly, MA, USA) following manufacturer’s protocol, followed by concentration and cleaning with Ampure beads. The long-read sequences were generated on an Oxford Nanopore Technology (“nanopore”) MinION sequencing instrument using R9.4.1 chemistry (Goodwin ), with sequencing conducted at the University of Hawaii at Hilo. A total of 14 FLO-MIN106R9 cells were run using standard kits and reagents and following manufacturer’s protocols. Cells routinely produced 3-5 gigabases (Gb) data, but a small portion of cells stopped prematurely [after ∼100 megabases (Mb) of data]. These failed cells were replaced by Oxford Nanopore Technology at no cost. The DNA for the nanopore long-read sequencing was sheared using Covaris G-Tubes, with most of the DNA running above 8 kb on a 0.7% agarose gel, consistent with the target fragment size.

Genome assembly

The taro genome was assembled from linked-reads and then gap-filled and scaffolded with contiguous segments (contigs) assembled from nanopore long-reads and linkage map results. The linked-read genome was assembled with Supernova software v. 2.1.1 (Weisenfeld ) by the HudsonAlpha Institute. The nanopore long-reads were assembled with Canu v. 1.7.1 and Albacore v. 2.0.1 (Koren ), but those contigs provided an incomplete representation of the genome, as detailed in results, so they were used only for gap filling and scaffolding of the linked-read assembly as implemented in quickmerge (Chakraborty ). That program replaces gaps present in the recipient assembly (“query”, linked-read) with sequences from the donor assembly (“reference”, nanopore), discarding donor contigs without match to the query. The resultant gap-filled, linked-read assembly is hereafter referred to as the “merged” genome. For contig alignment, quickmerge implements MUMmer (Delcher , Kurtz ) programs nucmer and delta-filter, with a threshold cut-off of 95% sequence similarity. To quantify similarity between assembled contigs of linked-read and nanopore assemblies those were pairwise aligned and assessed with MUMmer programs nucmer and dnadiff (Delcher , Kurtz ), with settings delta-filters 1-to-1 alignment and minimum alignment length of 10 kb. Excepting the linked-read assembly, all analyses were performed on the University of Hawaii High Performance Computing cluster (server size: 20 core, 120 gigabyte RAM). Programs were run using default parameters unless otherwise noted above. The haploid genome (1C) size of ‘Moi’ was estimated to be 2.39 Gb, based on the read k-mer profile analysis conducted in Supernova (Weisenfeld ). The data inputs for the linked-read genome included 580 million linked-reads of mean length 150 bp (∼174 Gb), of which 89% were in proper read pairs. The nanopore MinION cells produced a total of 12.3 million long-reads, for a total of 63.7 Gb. For descriptive purposes, sequence data from one to three MinION cells were combined into a total of 11 groups, with summary statistics provided in Supplemental Table 1. To reduce computational demand during long-read contig assembly we discarded short long-reads (< 3.5 kb length, n = 6.5 million) and outlier long-reads (> 150,00 kb length, n = 43), thus retaining 5.8 million long-reads totaling ∼54 Gb. Accounting for the total numbers of sequenced nucleotide bases, read coverages for the linked- and long-read assembles were ∼60x and ∼23x, respectively.

Genome assembly characterization, quality metrics, and filtering

The linked-read and merged assemblies were quality assessed using descriptive measures, e.g., numbers of contigs, total number assembled bases, and completeness, implementing analyses tools QUAST (Gurevich ) and the Benchmarking Universal Single-Copy Orthologs pipeline [BUSCO v4.0.5 (Seppey )]. The latter accessed programs Augustus v3.2.3, Blast+ v2.2.31, and HMMER v3.2., and utilized the OrthoDB plant database liliopsida_odb10 for class monocot containing 3,236 single-copy genes from 10 species (Kriventseva ). For comparative purposes BUSCOs were also analyzed for genomes of seven other monocots spanning five orders. These included: Asparagales, asparagus (Asparagus officinalis, GCF_001876935) and orchid (Phalaenopsis equestris, GCF_001263595.1); Arecales, coconut (Cocos nucifera, GCA_008124465.1); Dioscoreales, trifoliate yam (Dioscorea dumetorum, GCA_902712375.1); Poales, sorghum (Sorghum bicolor, GCF_000003195.3); and Alismatales, eelgrass (Zostera marina, GCA_001185155.1) and great duckweed (Spirodela polyrhiza, GCA_001981405.1). The GCF’s and GCA’s refer to National Center for Biotechnology Information (NCBI) genome accession identifiers. Taro and duckweed are members of the Araceae family, but duckweed has a much smaller genome size (158 Mb) and a high-quality, chromosome-level genome assembly (20 chromosomes, N50 7.6 Mb) has been produced (Michael ). A second BUSCO analysis was performed for taro only using the eukaryota_odb10 eukaryote database that contains 255 genes from 70 species. To prepare for scaffolding with the linkage map, the merged assembly was filtered for assembly artifacts by removing: 1) stretches of Ns at the beginning and ends of scaffolds, 2) scaffolds consisting entirely of Ns, 3) duplicate scaffolds identified by the MUMmer (Delcher , Kurtz ) package dnadiff or flagged by NCBI’s genome filtering process, and 4) contigs/scaffolds containing potential contaminants. The latter were defined as producing blastn hits against 15,180 records of complete bacterial genomes available through NCBI’s FTP site (accessed October 30, 2019), meeting the following cut-offs: 90% identity, length >100 nucleotides, and with match over >10% total contig length. Finally, the merged, filtered assembly was scaffolded using the linkage map (see below) to produce a pseudochromosome-level assembly.

Genome architecture

To gain insights into genome architecture of Araceae, the genomes of taro and great duckweed were assessed for repetitive content and identification of gene domains characteristic of nucleotide-binding leucine-rich repeat (NLR) plant disease resistance genes. These genes also occur in animals, and are crucial regulators of inflammatory and innate immune responses. The repetitive content was characterized by generating de novo repeat libraries with RepeatModeler v1.0.11 (Smit and Hubley 2008-2015), which finds interspersed repeats by integrating RepeatScout v1.0.05 (Price ) and RECON v. 1.08 (Bao and Eddy 2002). Based on each assembly’s custom repeat library, interspersed repeats, simple repeats, and low complexity regions were quantified using the quick search option of RepeatMasker version open-4-0-9-p2 (Smit et al. 2013-2015) run with RMBlastN 2.9.0+-p1 and the RepeatMasker combined database Dfam 3 [(Hubley ) download date 8-25-2019]. A second assessment of repeat content was constructed from all plant repeat databases available in RepeatMasker, including species databases monocotyledons (liliopsida), arabidopsis, rice, wheat, and maize, but those analyses were discarded after proving to be less sensitive than our custom repeat libraries. Relevant to TLB-resistance, the complement of NLR sequences was quantified with NLR-Annotator v0.7-beta (Steuernagel ) using motifs defined by Jupe et al. (Jupe ). NLR-Annotator identifies domains characteristic of the NLR gene family, and returns sequences from the start of the first domain, but not the start to end of the entire protein sequence. The advantage to this approach is that it is unbiased by protein prediction models. The great duckweed genome was included in the NLR analysis for comparative purposes.

TLB-resistant crosses and phenotypic analysis

The TLB mapping population ‘1025’ was created by crossing two breeding lines ‘230’ and ‘255’ (see also Shintaku ). Parent ‘230’ is a cross between Hawaiian landrace ‘Moi’ × TLB-resistant Palauan landrace ‘Dirratengadik’, and Parent ‘255’ is a hybrid between ‘Sawahn Kurasae’ (TLB-resistant landrace from Indonesia) and ‘RMP-08’. ‘RMP-08’ is itself a hybrid between the Hawaiian ‘Red Moi’ and the Papua New Guinea ‘PH15’ cultivar. More than 2500 plants representing parents and progeny from 27 crosses were screened using a modified detached leaf disk assay (Brooks 2008) to define this particular TLB-resistant mapping population (Shintaku ), crosses were repeated, and TLB resistance in several cultivars was confirmed by field tests. The ‘1025’ mapping population was challenged with P. colocasiae isolates ‘S1’ and ‘S3′ collected from taro fields at Pepeekeo and Panaewa, Hawaii, respectively. Pure P. colocasiae cultures were established on a base of 10% V8 media plated into a petri dish (50 ml V8, 0.624 g CaCO3, 450 ml water, 10g agar) overlaid with 1.5% water agar plugs cut into a circle ∼0.5 cm in diameter, that was in turn overlaid with agar plugs from colonized plates. After establishment, cultures were grown by incubating on 10% V8 media at 27° for 5 to 10 days to obtain zoospores. The zoospores were collected from each culture plate by flooding with 10 ml of sterile distilled water followed by incubation at 4° for 30 min. The plate was then left for 20 to 25 min to adjust to room temperature, and then swirled gently and the water pipetted off and placed in a tube to count zoospores on a hemocytometer. The spore suspension was diluted to approximately 30 zoospores per microliter. For disease challenges, 24 mm diameter leaf blade discs were cut from the ‘1025’ mapping population using a cork borer. The second leaf blade from each plant (where leaf blade one is the first fully matured leaf blade) was used in the assay. Leaf discs were placed on 0.9% agar with the adaxial surface exposed. Four leaf blade discs from the same genotype were assayed in separate Petri dishes. In each assay, taro variety ‘Bun Long’ was used as a susceptible check cultivar. Leaf discs were inoculated with 10 µl water containing ∼300 P. colocasiae zoospores. The lesion diameter of the four discs was standardized to that of the susceptible cultivar. As an example, a relative lesion size of 0.5 indicates that the individual has half of the lesion diameter of the susceptible check cultivar.

Genotyping materials and SNP calling

The TLB-resistant mapping population consisting of Parents ‘230’ and ‘255’ and 92 of their progeny were sequenced using GBS (Shintaku ) (NCBI SRX2754312), calling SNPs with the merged genome as a reference. To improve confidence in SNP calls, we included an additional taro GBS dataset comprised of 95 taro accessions that originated from Hawaii, South Pacific, Palau, and mainland Asia (Helmkampf ; NCBI SRX2754311). This raised the total number of GBS samples to 189. For GBS, genomic DNA was isolated from freeze-dried or fresh leaf tissue using the Macherey-Nagel Nucleospin Plant II kit or Qiagen Plant DNeasy Mini kit according to each manufacturer’s protocol. The GBS libraries were digested with the restriction enzyme PstI and prepared for sequencing at the Cornell University Genomic Diversity Facility (Ithaca, NY). Each set of samples was sequenced on a single flow cell lane of an Illumina HiSeq 2500 using single-end protocols to produce reads of length 100 bp. The GBS data were demultiplexed and barcodes trimmed using software GBSX v1.1.4 (Herten ). Across all 189 GBS samples approximately 459 million usable reads were obtained. The Illumina sequences were mapped to the reference using Bowtie2 v2.2.4 (Langmead and Salzberg 2012) with setting –very-sensitive-local. Variants were called from the output of SAMtools v1.4.1 mpileup using the bcftools v1.2 multiallelic calling model (Li ; Danecek ), considering only uniquely mapped reads with threshold map and base quality phred quality scores of 20. After removing non-target samples, the parent and progeny variant call format (VCF) files were preliminarily filtered by discarding SNPs within 5 base pairs of an insertion/deletion (INDEL) site, removing INDELS, and removing samples with high levels of missing data (n = 8). Next, VCF files were iteratively filtered using VCFtools v0.1.14 (Danecek ) to retain di-allelic SNPs with a minimum sequencing depth of at least eight reads per genotype, a per-locus data density of minimum 80%, and setting the minor allele frequency (maf) threshold to 0.012. Loci identified as invariant in the parents were removed, with allowance for missing data in one but not both parents. In preparation for linkage mapping and QTL analyses, the parental genotypes were phased and progeny genotypes imputed in beagle v5.0 (download date 16May19.351; Browning and Browning 2009).

Linkage map and QTL analysis

A linkage map was constructed using GBS data from 84 individuals of the ‘1025’ mapping population that passed quality controls. The variant calls (SNPs) were filtered for segregation distortion, redundancy, homozygous condition in parents (genetically plausible because the ‘230’ and ‘255’ parental stocks consisted of multiple lines of clonal propagules), and unlinked loci. We aimed for 14 linkage groups, the number of haploid chromosomes in taro, and define “major linkage group” as having >8 markers per group. The grouping of SNP markers was performed using the OneMap package (Margarido ) in Rstudio v1.1.423 (RStudio Team 2015) with R v3.5.0 (R Core Team 2019). Markers were anchored to their respective groups based on the OneMap output, and linkage maps were constructed using the CDM functionality of software package Genetic Analysis of Clonal F1 and Double cross populations (GACD) (Zhang ) with setting linkage phases originally unknown. The Kosambi mapping function was used to convert the recombination frequency to map distance in centimorgans (cM) (Kosambi 2016). Software suggested logarithm of the odds (LOD) scores were used to construct and assign markers to different linkage groups (LGs), with a LOD score of 5.97 used for the final linkage map. The GACD software was also used to identify regions in the taro genome that correlated with the TLB resistance in the mapping population, with the mapping algorithm Inclusive Composite Interval Mapping of Additive and Dominant QTL (ICIM - ADD). ICIM is suitable for mapping a small population size as it controls for the bias due to the Beavis effect—i.e., the overestimation of explained phenotypic variance in small populations (Xu 2003). The LOD threshold was calculated by 1000 permutation tests at α = 0.05. The QTL mode of action (i.e., selection model) was calculated using the method of Muchero (Muchero ),where ‘a’ and ‘d’ are the additive and dominance effects respectively. The mu(ac) and mu(bd) are the phenotypic means for the heterozygous loci having alleles from the same species, and mu(ad) and mu(bc) are the phenotypic means for the heterozygous loci carrying alleles from both species. The ratio of d/a is used to assess the QTL mode of action: a d/a ratio of <1 indicates underdominance, ratio between 0 and 1 indicates partial dominance, and a ratio of >1 indicates over-dominance (Muchero ). Using the final linkage map as a guide, contigs were anchored and concatenated into pseudochromosomes, orienting each contig to the linkage map position starting from marker position 0 and maintaining directionality of contigs when >1 markers per linkage group corresponded to >1 SNP loci per contig. The correspondence between markers assigned to linkage groups and SNP positions on the linkage map was further applied to assign confidence rankings using a 4 point scale. A high-confidence score of 1 was assigned when markers in a linkage group were consistently ordered with SNPS on single contigs and separated by a reasonable distance, defined as exceeding that of the median sequence length used to generate the assembly. A score of 2 was assigned to markers that corresponded to a single SNP on a single contig, and a score of 3 was assigned to markers that corresponded to clustered SNPS (defined as < median sequence length of the assembly), or were ordered discordant from the linkage group map distance. Last, loci from single contigs assigning to >1 major linkage groups were scored as 4. These contigs were included in each LG pseudochromosome, therefore a small portion of the linkage groups contain redundant sequences. Only markers of score of 1 were used to assign contig directionality during contig concatenation into pseudochromosomes.

Data availability

The pseudochromosome-level reference genome assembly has been deposited at DDBJ/ENA/GenBank under the accession WUBK00000000, BioProject PRJNA567267. The version described in this paper is version WUBK01000000. File S1 contains detailed descriptions of supplemental results (Tables, Figures, and Appendixes), and documents archived in figshare (linked-read and merged draft genomes, nanopore contigs, the vcf file used for linkage mapping and QTL analysis, the R script used to construct linkage groups, phenotypic data for the mapping population, and the de novo repeat library from RepeatModeler). The raw Illumina sequences and corresponding 10x Genomics barcode files are available through the NCBI Sequence Read Archive (SRA) SRP223785 and SRS5458621, and the Oxford Nanopore Technology long-read sequencing files (prior to self-correction) are available through the BioProject ID PRJNA567267. The previously published GBS datasets analyzed in this study were obtained from the NCBI SRA database under BioProject PRJNA381383 (Accessions: SRX2754311-12). The SNP calling and filtering pipeline is available online (Bellinger 2019). Supplemental material available at figshare: https://doi.org/10.25387/g3.12440039

Results

Genome assembly characterization and quality metrics

The pseudochromosome-level taro genome is estimated to be mostly complete, with the total number of assembled bases, ∼2.45 Gb, slightly larger than the k-mer estimated genome size of 2.39 Gb. The assembly’s largest scaffold was 38 Mb, and after excluding a large number of short contigs (< 5 kb), consists of 55,692 scaffolds representing 2.24 Gb (Table 1). The quickmerge step did little to improve genome assembly quality, decreasing fragmentation by only ∼2k contigs and increasing assembly size by only 119.8 Mb. That low amount of gap-filling can be explained by the majority of nanopore contigs failing to align to the linked-read assembly. While nanopore long-read sequencing resulted in 1.44 giga assembled bases (72,696 contigs), only 602 Mb (< 42%) of those assembled bases aligned to the linked-read genome assembly at the 95% identity threshold. In Table 1, the slight difference in number of assembled bases between the merged and pseudochromosome-level assemblies is because numbers are reported for the unfiltered merged genome, to maintain direct comparability with the (unfiltered) linked-read genome.

Table 1

Descriptive characteristics for three draft taro genome assemblies. The pseudochromosome-level taro assembly (“Ps_chr”) was composed from a linked-read assembly (“LR”) that was gap-filled using contigs assembled from nanopore MinION long-reads (merge step, “Merged”), filtered for assembly artifacts, and then concatenated into pseudochromosomes using a linkage map. Kilobase = kb

			# Scaffolds (% of assembled nucleotides)
	Assembled bases	Largest scaffold	> 5 kb	> 10 kb	> 50 kb	Total	N50	GC content (%)
Ps_chr	2,448,853,100	38,380,923	55,692 (92%)	25,535 (83%)	5,180 (68%)	140,400 (100%)	317,420	41.98
Merged	2,451,639,670	6,251,291	56,504 (92%)	25,535 (83%)	5,527 (67%)	142,854 (100%)	270,514	42.04
LR	2,331,885,920	3,965,393	53,768 (91%)	22,039 (81%)	4,782 (68%)	144,852 (100%)	336,981	42.13

Based on the benchmarking gene set for monocots, completeness of the taro genome assembly is close to the high-quality, chromosome-level duckweed genome assembly, with overall recoveries of 80% and 84% BUSCOs (Figure 1). The proportion of recovered complete and single copy BUSCOs was nearly identical for the two species, 73% and 74%. Turning focus to the percent of complete and duplicated BUSCOs, the value of 3.4% for taro genome was only slightly higher than the value of 1.5% for the duckweed genome. As an outlier, the genome assembly of yam, sequenced using nanopore technology, showed a disproportionate number of duplicated benchmarking genes. Compared to the phylogenetically diverse group of monocots included in our analysis, taro and duckweed exhibited a higher fraction of missing or fragmented genes, with 43% of the missing or fragmented genes of taro also fragmented or missing in duckweed. Given the high-quality of the great duckweed genome, this finding suggests that members of Araceae differ from other monocots by types of gene losses. The BUSCO recovery rates for the non-Araceae monocots were similar regardless of whether their genomes were represented in the liliopsida_obd10 reference gene set (asparagus, eelgrass, orchid, and sorghum). Although results based on the eukaryota_odb10 benchmarking gene set shows taro to have a greater BUSCO recovery rate, encompassing 88.2% genes complete (79.2% single copy; 9.0% duplicated), 5.1% fragmented, and 6.7% missing, that assessment is less comprehensive because that database contains substantially fewer genes than the benchmarking gene database for monocots.

Figure 1

Genome assembly completeness assessed by the recovery of Benchmarking Universal Single-Copy Orthologs (BUSCOs). The percent of BUSCO genes found in each genome is listed for categories single (S) or multiple copies (D), as well as fragmented (F) and missing (M). Analyses are based on the BUSCO liliopsida_odb10 dataset representing class monocot (n = 3,236 genes). See text for scientific names and NCBI genome accession identifiers. De novo repeat modeling revealed that the genome of taro is highly repetitive, with RepeatMasker screening estimating that 80% is comprised of mobile genetic elements (Table 2). Comparing the genome of taro to that of great duckweed indicated that the latter had substantially lower mobile genetic element content, 20%. The proportion of long terminal repeat (LTR) elements in taro and duckweed genomes, ∼36% and 9%, was proportionally similar to their unclassified repeat content, ∼38% and 11%. The genomic proportion of long interspersed elements (LINEs) was low across both taxa (<=1.6%), with only LINE1 types detected in the custom repeat libraries. Short interspersed element (SINEs) were detected only in the duckweed genome. The simple repeat and low complexity contents were also low in both taxa, <= 2.3%. Cumulatively, the total repeat content of taro and duckweed genomes was ∼82% and 23%, respectively.

Table 2

Repetitive content of taro (Colocasia esculenta) and great duckweed (Spirodela polyrhiza) genome assembles. Total repeat content was quantified using de novo repeat libraries constructed with RepeatModeler and screened with RepeatMasker. The percent (%) of sequence is relative to each individual assembly’s total length excluding runs of NNN”s between scaffolded contigs. Short and long interspersed elements are denoted as SINEs and LINEs

Element	Length Occupied (bp)		% Genome
	taro	duckweed	taro	duckweed
SINEs	0	2,139	0.0	<0.01
LINEs	36,753,946	366,596	1.6	0.3
LINE1	33,001,257	366,596	1.4	0.3
LTR elements	834,904,293	12,263,206	36.3	9.1
DNA elements	91,787,447	221,956	4.0	0.2
Unclassified	874,609,262	14,284,860	38.1	10.6
Total interspersed repeats	1,838,054,948	27,138,757	80.0	20.2
Simple repeats	35,678,844	3,145,222	1.6	2.3
Low complexity repeats	8,251,536	671,992	0.4	0.5
Total assembly L	2,451,787,670	135,172,123
L excluding Ns runs	2,297,336,160	134,368,765
Bases masked	1,881,985,328	30,955,971
Total % repeat content			81.9	23.0

Analysis of taro with NLR-annotator identified 391 NLRs, of which 182 were categorized as complete, 109 complete but pseudogenes, and a further 80 annotated as partial only, which indicates a lack of some NLR specific domains (Supplemental Appendix 1). In contrast, the great duckweed genome was estimated to contain only 70 NLRs, of which 46 were categorized as complete, 12 partial, and 12 complete pseudogenes. Members of the NLR gene family divide into two groups, Toll and human interleukin-1 receptor (TIR) proteins (known as TIR-NB-LRR or TNLs) and the non-TIR class of Nucleotide-binding site leucine-rich repeat proteins known as CNLs (also known as CC-NBS-LRR) because some (not all) contain a coiled-coil (CC) structure in the N-terminus domain (Jupe ). Among the taro NLRs we detected all were characterized as belonging to the non-TIR class, CNL, with only six exceptions. Those six TNLs were evenly split among categories complete, partial, and pseudogenes (one complete and one partial). The great duckweed genome contained only CNLs, with the exception of one TNL annotated as a pseudogene.

GBS data and genetic linkage mapping

The per-sample average number of GBS reads was ∼2.26 million, of which ∼1.5 million mapped to exactly one genome location (Table 3). From the 1,519 filtered SNPs calls using those mapping data, 1,423 passed the initial test for segregation distortion, but 865 of those markers were discarded because of homozygosity in both parents (n = 107), redundancy (n = 514), deviance from Mendelian segregation ratio (n = 179) or unlinked status (n = 65) (Table 4). Accordingly, 558 (39.21% of 1,423) SNP markers were assigned to 31 linkage groups, which equates to a total length of ∼285 Mb and represents 11.64% of the genome (Table 5). Taro has a haploid chromosome of x = 14: in our analysis only 14 linkage groups had number of markers > 8 and there was a heavy reduction in the number of markers assigned to linkage groups 17 through linkage group 31 (Supplemental Figure 1). Thus, we included the first 16 linkage groups for linkage map development and QTL analysis. The final linkage map has a total distance of 4094.38 cM, 16 linkage groups, n = 558 markers, and covers a total of 285,477,188 nucleotides (10.83% of the genome; Figure 2 and Table 6). Concatenating contigs into pseudochromosomes based on linkage groups and marker scores (Supplemental Appendix 2), along with artifact filtering, reduced the number of contigs to 140,400, as reported in Table 1.

Table 3

Genotyping by sequencing (GBS) mapping data and variant calls for 86 taro samples. Summary data include sample-averaged counts and percentages for total mapped and unmapped reads and reads that mapped uniquely (1x) or were multi-mapped (>1x). Numbers for variant calls include raw variable sites, INDELs, variable sites that passed filters (see text), and the number of variable sites present the mapping population (MP) after applying a minor allele frequency (maf) threshold of 0.012

Numbers of reads (%)			Mapped (%)		Numbers of variant sites
Total	Mapped	Unmapped	1x	>1 x	Raw data	INDELs	Filtered	MP
2,723,124	2,257,353 (82.9%)	465,771 (17.1%)	1,502,585 (66.4%)	754,767 (33.6%)	15,021,591	10,982	7,018	1,519

Table 4

Linkage mapping results for GBS data mapped to the merged taro genome reference. Markers passing the initial test for segregation distortion are listed as number of initial markers. Homozygous, redundant, and distorted SNPs were removed from the subsequent analysis. After binning of unique markers and filtering for segregation distortion, a suggested logarithm of the odds (LOD) score was calculated and used for linkage group (LG) formation

# Initial markers	# Homozygous markers	# Unique bins	% distorted markers	Suggested LOD	# LG	# markers final map	# LG >= 8 markers
1,423	107	802	22%	5.97	31	558	14

Table 5

Descriptors for linkage groups (LG) constructed from ‘1025’ taro mapping population genotypes. Sixteen major linkage groups (LG) were present in the final linkage map constructed from a mapping population of taro. The SNP markers were called using the “merged” taro genome assembly as a reference (see text for details)

Linkage groups	Total markers	LG length (cM)	Average distance (cM)	Genetic length
LG1	37	269.93	7.3	24,124,590
LG2	73	590.87	8.1	42,680,486
LG3	65	491.26	7.6	30,369,817
LG4	25	205.08	8.2	16,485,140
LG5	35	256.4	7.3	18,072,045
LG6	43	305.31	7.1	13,916,264
LG7	43	237.94	5.5	18,481,439
LG8	43	514.12	12	23,744,905
LG9	38	220.57	5.8	21,816,219
LG10	35	374.35	10.7	15,471,071
LG11	8	43.42	5.4	4,286,715
LG12	5	22.09	4.4	2,732,664
LG13	10	127.25	12.7	4,820,723
LG14	5	31.51	6.3	621,528
LG15	23	179.72	7.8	11,651,500
LG16	32	224.56	7	16,237,138
Total	520	4,094.38	123.3	265,512,244
Average	33	255.90	7.7	16,594,515

Figure 2

Genetic map of taro based on 520 high quality SNP markers covering 16 linkage groups.

Table 6

A list of QTL significant for TLB resistance in the ‘1025’ TLB-resistant mapping population. The QTL naming convention includes the isolate name preceded by a q, DPI4, the linkage group identifier, and a number representing the number of significant QTL in the linkage group. PVE = Phenotypic variance explained, a = additive effect, d = dominance effect, d/a = QTL mode of action

QTL name	Isolate	LG	Left Marker	Right Marker	LOD	PVE (%)	a	d	d/a
qS1_DPI4-3-1	S1	3	TIG00056386_694420	1544_160709	22.38	10.25	0.03	0.46	16.17
qS1_DPI4-6-1	S1	6	TIG00044844_224587	TIG01551418_41957	29.56	17.52	0.29	0.02	0.06
qS1_DPI4-8-1	S1	8	3452_96171	TIG01552475_45417	25.74	13.64	0.29	0.00	0.00
qS1_DPI4-8-2	S1	8	TIG00020381_142717	TIG00037610_138242	19.66	8.82	0.00	0.43	∞
qS1_DPI4-9-1	S1	9	TIG01555954_659795	270115_508	22.68	10.74	0.25	−0.02	−0.09
qS3_DPI4-2-1	S3	2	399_398671	399_398779	7.70	7.75	−0.16	−0.04	0.23
qS3_DPI4-3-1	S3	3	293503_21603	285165_4595	9.60	9.97	0.24	−0.06	−0.24
qS3_DPI4-3-2	S3	3	285165_4624	283183_2064	19.66	27.86	−0.34	−0.05	0.15
qS3_DPI4-6-1	S3	6	TIG00044844_224587	TIG01551418_41957	9.81	9.85	0.24	−0.02	−0.08
qS3_DPI4-7-1	S3	7	1068_74730	TIG00030638_157187	6.75	6.55	−0.03	0.32	−9.81

Genetic map of taro based on 520 high quality SNP markers covering 16 linkage groups.

QTL analysis based on the genetic map

Composite interval mapping identified a total of 10 QTL associated with taro’s resistance to two isolates of P. colocasiae (Table 6). Eight QTL (four per isolate) were unique, whereas one QTL mapped to the same interval (LG6, TIG00044844_224587 and TIG01551418_41957) in both isolates, explaining 17.52% and 9.81% of phenotypic variation for S1 and S3 isolates respectively (Table 6, Figure 3). In terms of genetic selection models, the mode of action for this QTL differed between the two isolates: partial dominance was indicated for the S1 isolate, while underdominance was indicated the S3 isolate. Two QTL showed evidence of overdominance (d > 1), and of the remaining six QTL (with d < 1), three showed partial dominance and three showed underdominance, as indicated by positive or negative values of d in Table 6. The locations of QTL significant for resistance to TLB respective to all markers on LGs are shown in Supplemental Figure 2.

Figure 3

Genome wide representation of major linkage groups, including significant QTL and their LOD scores for S1 (A) and S3 (B) isolates of Phytophthora colocasiae exposed to the ‘1025’ mapping population. The Y-axis indicates the logarithm of odds (LOD) values. Peaks above the threshold (dotted line) of LOD = 15.32 (S1 isolate) and LOD = 6.01 (S3 isolate) represent a QTL having significant interaction with the TLB tolerance. Linkage groups >100 centimorgans (cM) are labeled, with total length shown on y-axis. Approximate locations of nucleotide binding leucine rich proteins (NLRs) (complete, partial, and pseudogenes) are indicated by colored symbols. Among the total complement of sequences containing NLR domains, 25 complete NLRs were located within sequences assigned to LGs and tested for QTL association with TLB resistance. Although none of the NLR domains were contained within those QTL, some were within close proximity. A cluster of three NLRs (1405_nlr_1, 1405_nlr_2, and 1405_nlr_3) occurred on LG6, near the QTL associated with resistance to both S1 and S3 P. colocasiae isolates (Figure 3). These NLRs were characterized as CNLs, and contained a pre-NB domain. The highest concentration (n = 9) of complete NLRs occurred on LG18, representing 5% of taro’s genome-wide NLR complement, yet no QTL significant for association with TLB mapped to that particular LG. Among the nine partial NLRs that mapped to LGs, only one (NLR, 406_nlr_1) was proximate to a QTL, in this case mapping to LG8 in the interval between QTL qS1_DPI4-8-1 and qS1_DPI4-8-2.

Discussion

We sequenced and assembled a taro genome using a linked-read sequencing strategy, with genome contiguity improved through gap filling and scaffolding using contigs assembled from Oxford Nanopore MinIon long-reads and linkage map results from a mapping population for TLB-resistance. By constructing a pseudochromosome-level reference genome, we provide a foundational resource for identifying genomic elements important for agronomic traits and resistance to disease. This draft taro genome is mostly complete, based on the total number of assembled base pairs compared to the k-mer estimated genome size and BUSCO results. Efforts for assembly improvement can focus on increasing contiguity through application of longer-range scaffolding data such as in vitro Hi-C (Dovetail) or optical mapping (BioNano Genomics). For large, highly repetitive genomes, assembly with noisy long-reads may be particularly challenging because of the requirement to disambiguate highly-similar, but non-homologous sequences. Assembling mobile repetitive elements requires sequencing through entire repeat elements that can exceed long-read lengths. For example, a survey of 50 plant genomes indicated that retrotransposons averaged 8,611 bp (Ou and Jiang 2018) in length, and can exceed >30 kb (Xu , Ma ). In this study nanopore contigs were assembled from nanopore long-reads using Canu (Koren ), which includes a read error correction step, yet sequencing depth may have been insufficient to achieve the levels of self-correction necessary to assemble homologous sequence components. Systematic error in raw nanopore sequencing data are known to lead to reduced accuracy of Canu-corrected reads, for example, at theoretical 30x coverage reads averaged 92% identity to a human genome reference after correction (Jain ). Our long-read coverage at ∼23x recovered 1,44 Mb of taro genome sequence (60% of the estimated genome size), but only 602 Mb (< 42%) of those assembled bases aligned to the linked-read genome assembly (at the 95% identity threshold). Instead of collecting additional long-read data, we opted to generate a new linked-reads assembly, restricting use of the nanopore contigs for gap-filling and scaffolding only. Our results show that genome contiguity was only slightly improved by that gap-filling (merge) step, based on N50 values. We note here that nanopore genome polishing with Illumina short-read data only–e.g., nancorr and MaSurCa approaches (Zimin , Goodwin )–would have been a far less effective approach for genome improvement, because short reads cannot polish regions that have ambiguous mappings, as occurs for repetitive regions (Jain ). In today’s dollars the nanopore contigs were sequenced using a 12-cell kit that, with reagents, cost ∼$12,500 USA dollars, in contrast to the 10x Genomics genome assembly that cost ∼$5000 USA dollars, with HudsonAlpha preparing and sequencing the linked-read library and providing bioinformatic and assembly support. Which type of platform and sequencing depth is most suited to a given project is of course contingent on project needs and budget, depending on whether the genome is a means to an end or the end goal (Chakraborty ). In our case, our immediate goal was to identify QTL associated with TLB and identify SNPs markers for rapid screening of progeny in taro breeding programs, which required high sequencing accuracy. Transposable elements (TEs) are self-replicating, mobile genomic units that are major contributors to genome diversity and size, and have key roles in chromosome structure, gene expression, and regulation (Bennetzen and Wang 2014, Orozco-Arias ). Plant genomes are known for proportionally high levels of TE repeat content, for example, up to 80% of genome sequence in corn (Zea mays) and Triticum urartu, a progenitor to wheat (Ling ). The TEs fall into two categories, Class I retrotransposons and Class II DNA transposons, with the former typically more ubiquitous in plants. To illustrate this point, Class I retrotransposons contributed to 21–72% of total genome sequence in five monocot cereal grains and grasses (Ling ), and 30–41% of total genome sequence in Saccharum spp. (sugarcane, a monocot) (Garsmeur ), compared to Class II transposons, which contributed < 8% of total genome sequence in each of those taxa. Our findings are consistent with the above monocot examples: the two Araceae genomes considered here exhibit lower and upper bounds of mobile element repeat content–20% and 80%–with retrotransposons identified as the dominant type of mobile element (a similar proportion of mobile elements went uncharacterized). The spread we observed in repeat content of Araceae is expected because of the smaller genome size of great duckweed (Michael ) and the well-described variation in genome size of taro (Wang ), which may be attributable to repeat content in addition to variation in chromosome numbers. The NLR class of plant disease resistance genes show phylogenetic differences in NB-ARC domains, in addition to presence or absence of TIRs (TNL vs. CNL), and differ considerably in their phyletic distribution (Tarr and Alexander 2009, Yue , Steuernagel ). Our findings are consistent with a low frequency of TNLs in genomes of monocotyledonous species (Cannon , Tarr and Alexander 2009): we found only 6 TNLs in taro, of which only two were complete, and only one pseudogene in duckweed. Thus, consistent with previous studies of monocots, CNLs represent a major component of the repertoire of resistance genes in taro. The number of CNLs identified for taro is consistent with numbers observed in other monocots, averaging 300 to 400, and the fewer number of CNLs in duckweed is consistent with contraction of numbers of NLRs in aquatic plants (Baggs ), in conjunction with the reduced genome size of great duckweed.

Linkage mapping and QTL analysis

Although we were not able to resolve exactly 14 linkage groups, this outcome is consistent with other studies conducted using clonally propagated heterozygous species (Nzuki , Soulard ). In a study of taro, Soulard encountered the same issue where multiple linkage groups with small number of markers (< 5) were observed. Moreover, despite Soulard larger sample size of >260 taro samples per mapping population, their post-filtered number of SNP loci available for linkage mapping, 586 and 548, was similar to ours, 558. In our case, failure to resolve 14 groups could be due to many factors including a smaller sample size and use of a heterozygous reference for calling SNPs. Previously, reference-free genotyping of the ‘230’ and ‘255’ parents and the ‘1025’ population (using the same GBS data) identified 30 linkage groups using 240 SNPs (Shintaku ). With a reference genome we were able to double the number of markers, however we were also able to recognize that a portion of our markers on the linkage map were ordered discordant from the order of SNPs on contigs, which in some cases were present in dense clusters (score 3, S Appendix 2). While stretches of high heterozygosity could be real–as expected given the hybridization history of taro–these also could be caused by mis-assembly, either the collapse of non-homologous reads originating from repetitive regions or chimeric joins, both of which compromise calling SNPs from short-read data. Alternatively, a minor contributor to presence of SNP clusters could arise from NLR proteins, concordant with overdominance (heterozygote advantage) in plant disease resistance genes (Michelmore and Meyers 1998). We found that some of the NLRs encompassed hypervariable genomic regions, for example, LG22 contained a complete NLR protein, 261850_nlr_1, which contained 4 SNPs across 2141 nucleotides. Another case involves LG3, with five SNPs contained in the partial NLR protein (NLR_291178), spanning 721 nucleotides. Overall, the linkage map and QTL results provide foundational resources for future work aimed at identifying functional genomic elements that underpin plant ability to resist diseases, including TLB. In total, we found 10 QTL associated with TLB resistance. Of the 10 putative QTL identified in our study, 9 were isolate-specific with only one in common between the S1 and S3 Phytophthora isolates. This result is similar to other studies of Phytophthora host interactions where isolate-specific effects were found (Leonards-Schippers , Johnson , Truong ). This phenomenon underlines the need for stacking resistant genes, because multiple races and mating types of Phytophthora are found in Hawaii (Shrestha ). The spread of TLB poses a major threat to taro growing regions currently free from this devastating disease. Incorporating disease resistance is the most sustainable approach to manage TLB, as research in the Pacific indicates that management measures such as chemical control are largely ineffective (Singh ), although in Cameroon (Africa) the application of fungicide to taro was shown to reduce impacts from disease (Tarla ). Securing taro genetic resources for future use, along with retaining culturally important types allows for the maximum flexibility in deploying new cultivars.

Conclusions

In this study, we generated a high-quality genome assembly for taro, a root crop that is widely cultivated in tropical regions and is important for food security. Our results will inform studies of the origin, evolutionary history and breeding of this South Pacific crop. In addition, this genome may provide a framework for surveying mechanisms that underlie the formation of distinct morphological features associated with tropical tuber crops. This genome may also stimulate new genetic insights into this important tropical species. The QTLs identified from genotypes of a taro mapping population resistant to TLB can be further investigated to elucidate the genetics of that trait, with the complement of NLR sequences available as a starting point for advancing understanding of resistance system functionality. Further, the QTLs can be used to accelerate marker-assisted breeding programs. Finally, this genome project may provide a template for how to develop genomic resources in other understudied plant species.

56 in total

1. Tracing the origin and evolutionary history of plant nucleotide-binding site-leucine-rich repeat (NBS-LRR) genes.

Authors: Jia-Xing Yue; Blake C Meyers; Jian-Qun Chen; Dacheng Tian; Sihai Yang
Journal: New Phytol Date: 2011-12-23 Impact factor: 10.151

2. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals.

Authors: Brian L Browning; Sharon R Browning
Journal: Am J Hum Genet Date: 2009-02-05 Impact factor: 11.025

3. Convergent Loss of an EDS1/PAD4 Signaling Pathway in Several Plant Lineages Reveals Coevolved Components of Plant Immunity and Drought Response.

Authors: Erin L Baggs; J Grey Monroe; Anil S Thanki; Ruby O'Grady; Christian Schudoma; Wilfried Haerty; Ksenia V Krasileva
Journal: Plant Cell Date: 2020-05-14 Impact factor: 11.277

4. Genetic diversity of taro, Colocasia esculenta (L.) Schott, in Southeast Asia and the Pacific.

Authors: C M Kreike; H J Van Eck; V Lebot
Journal: Theor Appl Genet Date: 2004-08 Impact factor: 5.699

5. Fractionation, stability, and isolate-specificity of QTL for resistance to Phytophthora infestans in cultivated tomato (Solanum lycopersicum).

Authors: Emily B Johnson; J Erron Haggard; Dina A St Clair
Journal: G3 (Bethesda) Date: 2012-10-01 Impact factor: 3.154

6. Nanopore sequencing and assembly of a human genome with ultra-long reads.

Authors: Miten Jain; Sergey Koren; Karen H Miga; Josh Quick; Arthur C Rand; Thomas A Sasani; John R Tyson; Andrew D Beggs; Alexander T Dilthey; Ian T Fiddes; Sunir Malla; Hannah Marriott; Tom Nieto; Justin O'Grady; Hugh E Olsen; Brent S Pedersen; Arang Rhie; Hollian Richardson; Aaron R Quinlan; Terrance P Snutch; Louise Tee; Benedict Paten; Adam M Phillippy; Jared T Simpson; Nicholas J Loman; Matthew Loose
Journal: Nat Biotechnol Date: 2018-01-29 Impact factor: 54.908

7. A mosaic monoploid reference sequence for the highly complex genome of sugarcane.

Authors: Olivier Garsmeur; Gaetan Droc; Rudie Antonise; Jane Grimwood; Bernard Potier; Karen Aitken; Jerry Jenkins; Guillaume Martin; Carine Charron; Catherine Hervouet; Laurent Costet; Nabila Yahiaoui; Adam Healey; David Sims; Yesesri Cherukuri; Avinash Sreedasyam; Andrzej Kilian; Agnes Chan; Marie-Anne Van Sluys; Kankshita Swaminathan; Christopher Town; Hélène Bergès; Blake Simmons; Jean Christophe Glaszmann; Edwin van der Vossen; Robert Henry; Jeremy Schmutz; Angélique D'Hont
Journal: Nat Commun Date: 2018-07-06 Impact factor: 14.919

8. New Insights into Long Terminal Repeat Retrotransposons in Mulberry Species.

Authors: Bi Ma; Lulu Kuang; Youchao Xin; Ningjia He
Journal: Genes (Basel) Date: 2019-04-09 Impact factor: 4.096

9. Genetic Diversification and Dispersal of Taro (Colocasia esculenta (L.) Schott).

Authors: H Chaïr; R E Traore; M F Duval; R Rivallan; A Mukherjee; L M Aboagye; W J Van Rensburg; V Andrianavalona; M A A Pinheiro de Carvalho; F Saborio; M Sri Prana; B Komolong; F Lawac; V Lebot
Journal: PLoS One Date: 2016-06-17 Impact factor: 3.240

10. Cost-effective assembly of the African wild dog (Lycaon pictus) genome using linked reads.

Authors: Ellie E Armstrong; Ryan W Taylor; Stefan Prost; Peter Blinston; Esther van der Meer; Hillary Madzikanda; Olivia Mufute; Roseline Mandisodza-Chikerema; John Stuelpnagel; Claudio Sillero-Zubiri; Dmitri Petrov
Journal: Gigascience Date: 2019-02-01 Impact factor: 6.524

4 in total

1. Degradation characteristics of biodegradable film and its effects on soil nutrients in tillage layer, growth and development of taro and yield formation.

Authors: An Wang; Qingtao Chang; Chunsheng Chen; Xiaoquan Zhong; Kexiang Yuan; Meihua Yang; Wei Wu
Journal: AMB Express Date: 2022-06-22 Impact factor: 4.126

2. Assemblies of the genomes of parasitic wasps using meta-assembly and scaffolding with genetic linkage.

Authors: Kameron T Wittmeyer; Sara J Oppenheim; Keith R Hopper
Journal: G3 (Bethesda) Date: 2022-01-04 Impact factor: 3.542

3. Genetic diversity and population structure of eddoe taro in China using genome-wide SNP markers.

Authors: Zhixin Wang; Yalin Sun; Xinfang Huang; Feng Li; Yuping Liu; Honglian Zhu; Zhengwei Liu; Weidong Ke
Journal: PeerJ Date: 2020-12-08 Impact factor: 2.984

Review 4. Unlocking the inherent potential of plant genetic resources: food security and climate adaptation strategy in Fiji and the Pacific.

Authors: Hemalatha Palanivel; Shipra Shah
Journal: Environ Dev Sustain Date: 2021-02-17 Impact factor: 4.080

4 in total