| Literature DB >> 20368986 |
Juan Manuel Rosa-Rosa1, Francisco Javier Gracia-Aznárez, Emily Hodges, Guillermo Pita, Michelle Rooks, Zhenyu Xuan, Arindam Bhattacharjee, Leonardo Brizuela, José M Silva, Gregory J Hannon, Javier Benitez.
Abstract
BACKGROUND: The classical candidate-gene approach has failed to identify novel breast cancer susceptibility genes. Nowadays, massive parallel sequencing technology allows the development of studies unaffordable a few years ago. However, analysis protocols are not yet sufficiently developed to extract all information from the huge amount of data obtained. METHODOLOGY/PRINCIPALEntities:
Mesh:
Year: 2010 PMID: 20368986 PMCID: PMC2848842 DOI: 10.1371/journal.pone.0009976
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of high throughput sequencing data.
| Number of sequences | Depth | |||||||
| Chromosome | Family | Individual | Total | Aligned to whole genome (% | Aligned to candidate regions (%**) | Coverage in % | Mean | Median |
|
|
|
| 3,123,937 | 2,956,483 (94.64) | 1,186,611 (40.14) | 98.04 | 26 | 25 |
|
| 4,922,157 | 4,538,392 (92.20) | 1,518,625 (33.46) | 98.43 | 29 | 29 | ||
|
| 4,183,568 | 3,954,837 (94.53) | 1,515,614 (38.32) | 97.89 | 28 | 26 | ||
|
| 2,952,969 | 2,839,271 (96.15) | 1,168,679 (41.16) | 97.11 | 24 | 22 | ||
|
|
| 2,652,926 | 2,580,914 (97.29) | 882,837 (34.21) | 97.96 | 22 | 20 | |
|
| 5,934,453 | 4,737,175 (79.82) | 1,670,157 (35.26) | 98.15 | 28 | 24 | ||
|
|
| 12,228,047 | 11,188,204 (91.50) | 4,694,871 (41.96) | 99.07 | 57 | 48 | |
|
| 4,293,087 | 3,585,982 (83.53) | 1,531,322 (42.70) | 97.50 | 30 | 22 | ||
|
|
| 7,568,672 | 7,442,938 (98.34) | 2,793,056 (37.53) | 99.11 | 45 | 44 | |
|
| 7,160,552 | 6,889,152 (96.21) | 2,574,119 (37.36) | 98.94 | 43 | 42 | ||
|
|
|
| 5,734,052 | 5,599,100 (97.65) | 2,459,740 (43.93) | 98.57 | 43 | 42 |
|
| 6,240,024 | 6,012,522 (96.35) | 2,642,942 (43.96) | 98.22 | 35 | 32 | ||
|
|
| 2,006,661 | 1,667,648 (83.11) | 779,723 (46.76) | 97.11 | 18 | 17 | |
|
| 4,016,214 | 3,618,178 (90.09) | 1,568,060 (43.34) | 97.66 | 25 | 23 | ||
|
|
| 5,811,276 | 5,665,182 (97.49) | 2,311,149 (40.80) | 98.52 | 33 | 32 | |
|
| 2,602,250 | 2,554,051 (98.15) | 1,059,131 (41.47) | 98.27 | 23 | 23 | ||
|
|
| 8,134,956 | 7,903,785 (97.16) | 3,029,994 (38.34) | 98.84 | 51 | 50 | |
|
| 7,922,500 | 7,590,406 (95.81) | 2,817,358 (37.12) | 99.02 | 49 | 48 | ||
|
|
| 2,747,911 | 2,666,280 (97.03) | 1,105,059 (41.45) | 97.87 | 24 | 23 | |
|
| 2,517,619 | 2,406,505 (95.59) | 1,088,350 (45.23) | 97.74 | 24 | 24 | ||
|
| 102.753.831 | 96,397,005 (93.81) | 38,397,397 (39.83) | |||||
|
| 5,137,692 | 4,819,850 (93.63) | 1,919,870 (40.22) | 98.20 | 33 | 31 | ||
|
| 22,390,251 | 18,221,565 (81.38) | 7,438,610 (40.82) | 99.33 | 111 | 98 | ||
|
| 5,214,336 | 4,775,773 (91.58) | 1,909,833 (39.99) | 98.25 | 37 | 34 | ||
The number of sequences and the depth values are shown for a total of 20 individuals from 9 non-BRCA1/2 families and 4 individuals from the control population (Control pool).
chromosome in which linkage signal was found for each of the families.
*with respect to the total number of sequences, ** with respect to the numbers of sequences aligned to the whole genome.
Index value parameters for coverage study.
| Chr | Fam | Ind | Mean | St Dev | Upper | Lower |
|
| 27 | 07S722 | −0.13 | 0.75 | 1.12 | −1.38 |
| 07S723 | −0.04 | 0.31 | 0.77 | −0.86 | ||
| 07S724 | −0.07 | 0.35 | 0.77 | −0.92 | ||
| 07S725 | −0.10 | 0.43 | 0.82 | −1.03 | ||
| 60 | 06-240 | −0.01 | 0.40 | 0.89 | −0.91 | |
| 69-652 | 0.00 | 0.59 | 1.10 | −1.09 | ||
| 531 | I-1408 | 0.05 | 0.53 | 1.08 | −0.97 | |
| I-904 | −0.31 | 1.83 | 2.02 | −2.64 | ||
| 713 | 07S635 | −0.04 | 0.62 | 1.08 | −1.16 | |
| 07S636 | −0.04 | 0.45 | 0.92 | −0.99 | ||
|
| 11 | 04-168 | −0.09 | 0.35 | 0.76 | −0.94 |
| 96-265 | −0.02 | 0.36 | 0.83 | −0.88 | ||
| 40 | 07S576 | −0.05 | 0.73 | 1.18 | −1.29 | |
| 07S581 | 0.00 | 0.41 | 0.91 | −0.91 | ||
| 929 | I-1627 | −0.02 | 0.36 | 0.83 | −0.88 | |
| I-3345 | −0.09 | 0.36 | 0.77 | −0.95 | ||
| 990 | I-1927 | −0.08 | 0.80 | 1.22 | −1.38 | |
| I-1928 | −0.03 | 0.63 | 1.10 | −1.16 | ||
| 1125 | I-2033 | −0.03 | 0.44 | 0.91 | −0.97 | |
| I-4347 | −0.09 | 0.39 | 0.80 | −0.98 | ||
|
| −0.06 | 0.55 | 0.99 | −1.11 |
In order to evaluate the quality of the coverage within the candidate coding regions, we calculated an index value (Is, see Material and Methods). Mean, standard deviation, and lower and upper thresholds for Is used in the coverage study are shown for each affected individual and for the entire set of individuals (Global).
Depth Score threshold optimization assay.
|
| 6 | ||||||||||||||||||||||||||||||||
|
| 990 | ||||||||||||||||||||||||||||||||
|
| 0 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | ||||||||||||||||||||||
|
| 0 | 14 | 50 | 0 | 14 | 50 | 0 | 14 | 50 | 0 | 14 | 50 | 0 | 14 | 50 | 0 | 14 | 50 | 0 | 14 | 50 | 0 | 14 | 50 | 0 | 14 | 50 | 0 | 14 | 50 | 0 | 14 | 50 |
|
| 9768 | 15791 | 15880 | 33 | 60 | 101 | 12 | 14 | 27 | 10 | 10 | 15 | 9 | 9 | 12 | 8 | 8 | 8 | 7 | 7 | 7 | 5 | 5 | 5 | 3 | 3 | 3 | 2 | 2 | 2 | 1 | 1 | 1 |
|
| – | – | – | 25 | 43 | 71 | 7 | 7 | 16 | 5 | 5 | 8 | 4 | 4 | 6 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 3 | 3 | 3 | 2 | 2 | 2 | 1 | 1 | 1 |
|
| – | – | – | – | – | – | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 3 | 3 | 3 | 2 | 2 | 2 | 1 | 1 | 1 |
|
| – | – | – | – | – | – | 43 | 43 | 75 | 20 | 20 | 50 | 0 | 0 | 33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| – | – | – | – | – | – | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 25 | 25 | 25 | 50 | 50 | 50 | 75 | 75 | 75 |
The filtering results for various DS scores in a sample family (Family 990) are shown below. We observed that the threshold used for the control pool did not affect the total number of variants identified when using a DS threshold of 50 or higher for the cases. Taking into account the false positive rates (FPR) and false negative rates (FNR), and in order to be as conservative as possible, we finally chose a DS threshold of 50 for cases and a DS threshold of 14 for the control pool data.
Variant filtering results using MAQ software.
| Chr | Family | Position (hg18) | Gen | Reference | Variant | MS | Consequence | Sanger confirmation |
| 3 | 27 | 170968060 | TERC [MIM:602322] | G | C | 255/39 | NON_SYNONYMOUS_CODING | No |
| 531 | 162443032 | NMD3 [MIM:611021] | A | G | 16/55 | NON_SYNONYMOUS_CODING | No | |
| 6 | 531 | 150094561 | NUP43 [MIM:608141] | C | T | 38/11 | STOP_GAINED | No |
| 713 | 149763387 | SUMO4 [MIM:608829] | A | G | 50/50 | NON_SYNONYMOUS_CODING | No | |
| 990 | 146761618 | GRM1 [MIM:604473] | C | T | 93/80 | NON_SYNONYMOUS_CODING | Yes | |
| 150205485 | LRP11 | T | C | 55/77 | NON_SYNONYMOUS_CODING | Yes | ||
| 151202809 | PLEKHG1 | G | A | 63/46 | NON_SYNONYMOUS_CODING | Yes |
A lack of correlation between MAQ score (MS) and Sanger sequencing confirmation was observed, since variants showing a MS = 255 (maximum) were not confirmed whereas others with a MS = 55 were validated.
MAQ score for individual 1/individual 2.
variant selected in a non-candidate chromosome for this family because of the truncating effect.
No MIM reference [25].
recently described in the Ensembl database.
Figure 1Filtering process.
Analysis pipeline used in the identification of the candidate variants. Left boxes correspond to processing of variants; right box corresponds to coverage analysis. See text for details.
Summary of the variant filtering process.
| Chr | Family | Individual | SNPs | After control | Shared by family | Undescribed | Consequences | Exonic | Candidate SNPs (%) |
| 3 | 27 | 07S722 | 49 | 10 | 0 | 0 | 0 | 0 | 0 (0.00) |
| 07S723 | 39 | 2 | |||||||
| 07S724 | 45 | 18 | |||||||
| 07S725 | 25 | 4 | |||||||
| 60 | 06-240 | 47 | 18 | 15 | 7 | 6 | 3 | 1 (6.67) | |
| 96-652 | 61 | 26 | |||||||
| 531 | I-1408 | 38 | 15 | 5 | 2 | 1 | 1 | 0 (0.00) | |
| I-904 | 66 | 42 | |||||||
| 713 | 07S635 | 50 | 32 | 8 | 4 | 5 | 2 | 1 (12.50) | |
| 07S636 | 46 | 24 | |||||||
| 6 | 11 | 96_265 | 96 | 36 | 17 | 5 | 12 | 1 | 0 (0.00) |
| 04_168 | 81 | 35 | |||||||
| 40 | 07S581 | 93 | 40 | 26 | 8 | 32 | 7 | 3 (11.54) | |
| 07S576 | 96 | 53 | |||||||
| 929 | I-3345 | 81 | 28 | 13 | 3 | 10 | 3 | 0 (0.00) | |
| I-1627 | 75 | 34 | |||||||
| 990 | I-1927 | 131 | 63 | 52 | 14 | 40 | 8 | 4 (7.69) | |
| I-1928 | 119 | 54 | |||||||
| 1125 | I-4347 | 99 | 33 | 11 | 2 | 7 | 0 | 0 (0.00) | |
| I-2033 | 74 | 26 | |||||||
|
| 71 | 30 | 16 | 5 | 13 | 3 | 1 (6.25) |
The number of variants after each of the filtering steps is shown for the 9 non-BRCA1/2 families. The original SNPs were matched against the control pool as well as with the other member/s of the family. Previously undescribed variants were then selected and consequences obtained using PerlAPI tools. Intronic consequences were discarded and finally the variants were checked for homology.
*with respect to SNPs shared by family.
Final candidate SNPs.
| Chr | Family | Position (hg18) | Gene | Reference | Variant | QSa | DSb | Consequencec | Alamuth predictiond | Gene function | |
| 3 | 60 | 161301596 | AC026118.17e | A | T | 91/91 | 56/57 | NCG | Pseudogene | ||
|
|
|
|
|
|
|
|
|
| |||
| 6 |
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
| |||
|
|
|
|
|
|
|
|
|
| |||
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| ||||
|
|
|
|
|
|
|
|
|
|
| ||
|
|
|
|
|
|
|
|
| ||||
Variants confirmed by direct sequencing are marked in bold.
a) Quality Score Ind1/Ind2, b) Depth Score Ind1/Ind2, c) NCG: Non-coding gene, SYN: Synonymous, 3UTR: 3′ UTR, NSYN: Non-synonymous.
d) NDB: Not in database, AFF: predicted to affect the protein, TOL: predicted to be tolerated.
e) No MIM reference.
GEO accession numbers of the raw data from each of the samples.
| Family | Individual | Accesion Number |
| 27 | 07S722 | GSM511164 |
| 07S723 | GSM511165 | |
| 07S724 | GSM511166 | |
| 07S725 | GSM511167 | |
| 60 | 06-240 | GSM511168 |
| 96-652 | GSM511169 | |
| 531 | I-1408 | GSM511170 |
| I-904 | GSM511171 | |
| 713 | 07S635 | GSM511172 |
| 07S636 | GSM511173 | |
| 11 | 96-265 | GSM511175 |
| 04-168 | GSM511174 | |
| 40 | 07S581 | GSM511177 |
| 07S576 | GSM511176 | |
| 929 | I-3345 | GSM511179 |
| I-1627 | GSM511178 | |
| 990 | I-1927 | GSM511180 |
| I-1928 | GSM511181 | |
| 1125 | I-4347 | GSM511183 |
| I-2033 | GSM511182 | |
| Control pool | GSM511184 |