| Literature DB >> 34012463 |
Jianbo Yuan1,2,3, Xiaojun Zhang1,2,3, Fuhua Li1,2,3, Jianhai Xiang1,2,3.
Abstract
Penaeid shrimp (family Penaeidae) represents one of the most economically and ecologically important groups of crustaceans. However, their genome sequencing and assembly have encountered extreme difficulties during the last 20 years. In this study, based on our previous genomic data, we investigated the genomic characteristics of four penaeid shrimp species and identified potential factors that result in their poor genome assembly, including heterozygosity, polyploidization, and repeats. Genome sequencing and comparison of somatic cells (diploid) of the four shrimp species and a single sperm cell (haploid) of Litopenaeus vannamei identified a common bimodal distribution of K-mer depths, suggesting either high heterozygosity or abundant homo-duplicated sequences present in their genomes. However, penaeids have not undergone whole-genome duplication as indicated by a series of approaches. Besides, the remarkable expansion of simple sequence repeats was another outstanding character of penaeid genomes, which also made the genome assembly highly fragmented. Due to this situation, we tried to assemble the genome of penaeid shrimp using various genome sequencing and assembly strategies and compared the quality. Therefore, this study provides new insights about the genomic characteristics of penaeid shrimps while improving their genome assemblies.Entities:
Keywords: genome; genome assembly; genomic characteristic; penaeid shrimp; whole genome duplication
Year: 2021 PMID: 34012463 PMCID: PMC8126689 DOI: 10.3389/fgene.2021.658619
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1K-mer distribution of four penaeid shrimp genomes. The K-mer distribution plots of the four shrimp species include (A) L. vannamei, (B) F. chinensis, (C) M. japonicus, and (D) P. monodon. K = 19 was selected for the K-mer frequency statistics.
FIGURE 2Genome size of decapods and various penaeid shrimp species. (A) Genome sizes of various families of Decapoda. The information of genome sizes was obtained from the Animal Genome Size Database (www.genomesize.com). (B) Genome sizes of various penaeid shrimp species.
FIGURE 3Evaluation of genome duplication of L. vannamei. (A) K-mer distribution of the single sperm cell and whole-genome sequencing (WGS) data. (B) Ks frequency distribution for pairs of paralogous genes in the Litopenaeus vannamei genome. (C) Allele frequency spectra based on read counts of single-nucleotide polymorphisms (SNPs). For each SNP site, the appearance frequencies of the four bases were sorted from most (Type: First) to least (Type: Fourth). Leftmost and rightmost truncated peaks likely correspond to variations between individuals in a population. For polyploidy genome, the peaks at 25, 50, and 75% would be expected.
Summary of repetitive sequences in four penaeid shrimp genomes.
| Genome length | 1.66 Gb | 1.57 Gb | 2.39 Gb | 1.79 Gb |
| Total repeats | 49.39% | 48.58% | 42.83% | 34.96% |
| DNA | 9.33% | 13.00% | 5.87% | 5.66% |
| LINE | 2.82% | 3.27% | 9.26% | 4.75% |
| SINE | 0.06% | 0.11% | 1.30% | 0.03% |
| LTR | 0.62% | 0.53% | 1.42% | 1.14% |
| Unknown | 3.42% | 3.52% | 4.16% | 7.19% |
| Satellite | 0.10% | 0.16% | 0.00% | 0.35% |
| Simple repeats | 23.93% | 19.50% | 15.01% | 9.79% |
| Low complexity | 9.49% | 8.49% | 5.81% | 6.28% |
FIGURE 4Simple sequence repeats (SSRs) in penaeid shrimp genomes. (A) Content, average length, and density of SSRs in the genomes of various crustaceans. The phylogenetic tree was referred from a previous study (Yuan et al., 2018). The characteristics of SSRs in various crustaceans were referred from the study of Yuan et al. (2021). Since the SSR content may be underestimated in the Marsupenaeus japonicus genome, it was not shown and compared with the other three shrimp genomes. (B) The length distribution of compound SSRs and total SSR in the Litopenaeus vannamei genome. The compound SSR indicates numerous different types of SSRs connected head to tail. (C) Comparisons of the distributions of various types of SSRs in three penaeid shrimp genomes. A part of SSR types with relatively high content in the genome was shown in the plot.
Statistics of genome assembly of Litopenaeus vannamei using different methods.
| Contig number | 982,421 | 463,151 | 110,906 | 43,938 | 60,355 | 50,304 |
| Total length (Gb) | 1.35 | 1.59 | 1.69 | 1.30 | 1.78 | 1.62 |
| Longest (Kb) | 1,219 | 1,219 | 214 | 707 | 422 | 739 |
| N50 (bp) | 2,826 | 9,496 | 25,477 | 43,564 | 34,826 | 57,650 |
| N90 (bp) | 712 | 1,271 | 9,552 | 13,276 | 15,383 | 14,641 |
| Unigene coverage | 95.76% | 89.56% | 93.73% | 83.16% | 68.53% | 94.45% |
| Unigene coverage (50%)* | 85.50% | 78.33% | 84.85% | 71.24% | 50.12% | 86.91% |
FIGURE 5Genome assembly using different coverages of sequencing data. (A) Genome assembly comparison using different coverages of Illumina sequencing data. (B) Genome assembly comparison using different coverages of PacBio sequencing data.