| Literature DB >> 21083938 |
Nik Cummings1, Rob King, Andre Rickers, Antony Kaspi, Sebastian Lunke, Izhak Haviv, Jeremy B M Jowett.
Abstract
BACKGROUND: The primary goal of genetic linkage analysis is to identify genes affecting a phenotypic trait. After localisation of the linkage region, efficient genetic dissection of the disease linked loci requires that functional variants are identified across the loci. These functional variations are difficult to detect due to extent of genetic diversity and, to date, incomplete cataloguing of the large number of variants present both within and between populations. Massively parallel sequencing platforms offer unprecedented capacity for variant discovery, however the number of samples analysed are still limited by cost per sample. Some progress has been made in reducing the cost of resequencing using either multiplexing methodologies or through the utilisation of targeted enrichment technologies which provide the ability to resequence genomic areas of interest rather that full genome sequencing.Entities:
Mesh:
Year: 2010 PMID: 21083938 PMCID: PMC3012606 DOI: 10.1186/1471-2164-11-641
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Massively parallel sequencing of samples enriched for exons on chromosome X following addition of index primers to allow multiplexing. (A) Workflow of procedure showing library construction, adaptor/index ligation, amplification and target enrichment. (B) Plot of number of reads per base across region of the chromosome X showing highly focussed sequence reads around exons and low background outside the target regions. (C) The box-and-whisker plot to examine the distribution of read coverage (log 2 of reads per base) for each of the 8 samples in uniplex, duplex and pentaplex sequencing lanes. The x-axis represents the individual lanes, while the y-axis represents the number of superimposed reads. Boxes represent the interquartile range, with the 75th percentile at the top and the 25th percentile at the bottom. The line in the middle of the box represents the 50th percentile, or median. Whiskers represent the rest of the distribution, with their terminations representing the lowest and highest feature intensity values. Box-and-whisker plots were performed for each sample in either level of plexity. (D) Effect of varying the confidence parameter in GATK analysis software on total number of SNPs called (blue line) and number of SNPs annotated in dbSNP database (red line). Representative example shown is sample D1 from the pentaplex. (E) Confirmation of DNA fragmentation and library construction by agarose gel electrophoresis confirming size range of each index amplified library.
Sequencing and target enrichment results for 8 samples in uniplex, duplex and pentaplex reactions.
| Sample | Plexity | Filtered Reads | Mapped to ChrX | Mapped to Target Region | % Mapped to X | % Mapped to Target | Fold Enrichment |
|---|---|---|---|---|---|---|---|
| A1 | Uniplex | 16,403,360 | 4,044,511 | 3,564,834 | 25.6 | 22.5 | 97 |
| A2 | Duplex | 2,035,467 | 730,156 | 666,572 | 36.9 | 33.7 | 170 |
| B1 | Duplex | 11,833,641 | 2,655,666 | 2,313,476 | 23.2 | 20.2 | 85 |
| A3 | Pentaplex | 3,037,781 | 1,131,596 | 1,036,600 | 38.1 | 34.9 | 180 |
| B2 | Pentaplex | 3,204,352 | 1,230,630 | 1,129,857 | 39.3 | 36.0 | 190 |
| C1 | Pentaplex | 3,122,661 | 1,220,017 | 1,123,547 | 40.1 | 36.9 | 196 |
| D1 | Pentaplex | 3,687,146 | 1,944,667 | 1,778,223 | 53.8 | 49.2 | 324 |
| E1 | Pentaplex | 3,078,897 | 1,832,550 | 1,696,950 | 60.6 | 56.1 | 427 |
Variant identification results across target region by sample.
| Sample | Plexity | Total SNPs* | Annotated | Novel | % Annotated |
|---|---|---|---|---|---|
| A1 | Uniplex | 2,399 | 1,839 | 560 | 77% |
| A2 | Duplex | 1,462 | 991 | 471 | 68% |
| B1 | Duplex | 2,385 | 1,627 | 758 | 68% |
| A3 | Pentaplex | 2,102 | 1,242 | 860 | 59% |
| B2 | Pentaplex | 1,910 | 1,289 | 621 | 67% |
| C1 | Pentaplex | 2,033 | 1,362 | 671 | 67% |
| D1 | Pentaplex | 2,600 | 1,904 | 696 | 73% |
| E1 | Pentaplex | 2,390 | 1,760 | 630 | 74% |
* Calculations performed using confidence parameter of 70 in GATK.
Concentrations of final multiplexed libraries for sequencing.
| Library Number | Concentration (ng/uL) | Calculated nM* |
|---|---|---|
| 1 | 12.7 | 57 |
| 2 | 11.8 | 53 |
| 3 | 5.8 | 26 |
*Calculation based on 340 bp average size and MW of one base pair = 660 g/mol.