| Literature DB >> 35627213 |
Yahui Gao1,2, Li Ma2, George E Liu1.
Abstract
Structural variations (SVs), as a great source of genetic variation, are widely distributed in the genome. SVs involve longer genomic sequences and potentially have stronger effects than SNPs, but they are not well captured by short-read sequencing owing to their size and relevance to repeats. Improved characterization of SVs can provide more advanced insight into complex traits. With the availability of long-read sequencing, it has become feasible to uncover the full range of SVs. Here, we sequenced one cattle individual using 10× Genomics (10 × G) linked read, Pacific Biosciences (PacBio) continuous long reads (CLR) and circular consensus sequencing (CCS), as well as Oxford Nanopore Technologies (ONT) PromethION. We evaluated the ability of various methods for SV detection. We identified 21,164 SVs, which amount to 186 Mb covering 7.07% of the whole genome. The number of SVs inferred from long-read-based inferences was greater than that from short reads. The PacBio CLR identified the most of large SVs and covered the most genomes. SVs called with PacBio CCS and ONT data showed high uniformity. The one with the most overlap with the results obtained by short-read data was PB CCS. Together, we found that long reads outperformed short reads in terms of SV detections.Entities:
Keywords: cattle; long-read sequencing; structural variation
Mesh:
Year: 2022 PMID: 35627213 PMCID: PMC9142105 DOI: 10.3390/genes13050828
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.141
Figure 1Sample collecting, sequencing, and mapping pipeline.
Yield and alignment coverage statistics for the cattle lung sample across various sequencing platforms.
| Platform | 10 × G | PromethION | PacBio CLR | PacBio CCS |
|---|---|---|---|---|
| Number of reads | 1,577,259,728 | 1,618,623 | 11,178,388 | 2,875,796 |
| Mapped reads | 1,532,221,733 | 1,488,641 | 11,178,388 | 2,875,796 |
| Mapping rate (%) | 97.14 | 91.97 | 100 | 100 |
| Depth | 55× | 11× | 40× | 6× |
| Read min length | 19 | 70 | 53 | 74 |
| Read max length | 150 | 248,333 | 369,285 | 47,915 |
| Read mean length | 133.94 | 28,191.59 | 25,259.03 | 8763.78 |
Statistics over SVs identified by various methods.
| Platform | Method | DEL | DUP | Total |
|---|---|---|---|---|
| 10 × G | LongRanger | 8242 | 73 | 8315 |
| LinkedSV | 6415 | 38 | 6453 | |
| Merge | 10,325 | 114 | 10,439 | |
| ONT | PBSV | 26,397 | 2888 | 29,285 |
| Sniffles | 3497 | 168 | 3665 | |
| Merge | 13,472 | 1881 | 15,353 | |
| PB_CLR | PBSV | 885 | 169 | 1054 |
| Sniffles | 1340 | 1238 | 2578 | |
| Merge | 1800 | 1162 | 2962 | |
| PB_CCS | PBSV | 23,353 | 6569 | 29,922 |
| Sniffles | 190 | 99 | 289 | |
| Merge | 15,601 | 3891 | 19,492 | |
| Merge | SURVIVOR | 16,289 | 4875 | 21,164 |
Figure 2Individualized cattle SV map. The tracks under every black bar represent the SVs for 10 × G_LongRanger, 10 × G_LinkedSV, CCS_PBSV, CCS_Sniffles, CLR_PBSV, CLR_Sniffles, ONT_PBSV and ONT_Sniffles (in order from top to bottom). Red means deletion, and green means duplication.
Figure 3(a) Size distribution for SVs inferred from either long reads or Illumina/10 × G short reads. (b) Comparison between the four SV datasets.