Literature DB >> 26865974

Whole genome sequence analysis of BT-474 using complete Genomics' standard and long fragment read technologies.

Serban Ciotlos1, Qing Mao1, Rebecca Yu Zhang1, Zhenyu Li2, Robert Chin1, Natali Gulbahce1, Sophie Jia Liu1, Radoje Drmanac3, Brock A Peters3.   

Abstract

BACKGROUND: The cell line BT-474 is a popular cell line for studying the biology of cancer and developing novel drugs. However, there is no complete, published genome sequence for this highly utilized scientific resource. In this study we sought to provide a comprehensive and useful data set for the scientific community by generating a whole genome sequence for BT-474.
FINDINGS: Five μg of genomic DNA, isolated from an early passage of the BT-474 cell line, was used to generate a whole genome sequence (114X coverage) using Complete Genomics' standard sequencing process. To provide additional variant phasing and structural variation data we also processed and analyzed two separate libraries of 5 and 6 individual cells to depths of 99X and 87X, respectively, using Complete Genomics' Long Fragment Read (LFR) technology.
CONCLUSIONS: BT-474 is a highly aneuploid cell line with an extremely complex genome sequence. This ~300X total coverage genome sequence provides a more complete understanding of this highly utilized cell line at the genomic level.

Entities:  

Keywords:  BT-474; BT474; Breast cancer; Complete Genomics; Long Fragment Read; Whole genome sequencing

Mesh:

Substances:

Year:  2016        PMID: 26865974      PMCID: PMC4748558          DOI: 10.1186/s13742-016-0113-x

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


Data description

Utility of the dataset

The cell line BT-474 was isolated by Lasfargues et al. [1] in 1978, from a biopsy of invasive ductal carcinoma from a 60 year old Caucasian female. Since that time it has become one of the most heavily utilized cell lines for breast cancer research. At the time of writing, entering the search term “BT-474 OR BT474” into PubMed resulted in 973 unique articles. Surprisingly, the complete genome sequence of this cell line has yet to be published. In this paper, we fill that void in the collective scientific knowledge by providing high coverage whole genome data for BT-474. Previous studies have shown that BT-474 has a modal number of chromosomes approximating tetraploidy, and most of these chromosomes are covered with megabase-sized amplifications, deletions, and other structural rearrangements [2]. In an effort to provide better coverage of these complex rearranged regions, and to provide variant phasing and error correcting information, we generated high coverage libraries from long genomic DNA (~40 kb) using Long Fragment Read (LFR) technology [3, 4], and supplemented those libraries with a standard (STD) short mate pair library (~500 bp) [5] for a combined total coverage of over ~300X. We hope the freely available resource provided in this paper will benefit our understanding of the biology of cancer, and ultimately help to improve therapies for patients.

Library generation

DNA was isolated from low passage number BT-474 cells, procured from the American Type Culture Collection (ATCC, Manassas, VA, USA), using a RecoverEase dialysis kit (Agilent, Santa Clara, CA, USA). This material was further fragmented to 300–800 base pairs using a Covaris E220 (Covaris, Woburn, MA, USA), and processed using Complete Genomics’ proprietary standard library construction [5]. For LFR libraries, approximately 5 cells were collected and deposited into a 1.5 ml microtube with 10 μl of distilled water. Cells were lysed, and DNA was denatured using 1 μl of 20 mM KOH and 0.5 mM EDTA. Denatured genomic DNA was dispersed across a 384-well plate. In each well, long genomic fragments (~40 kb) were amplified, fragmented, and tagged with a unique barcode adapter as previously described [3]. All libraries were sequenced using Complete Genomics’ nanoarray sequencing platform [5].

BT-474 genome analysis

Read data of 343, 298, and 261 Gb from the STD, LFR1, and LFR2 libraries, respectively, were mapped to the NCBI human reference genome (build 37) using Complete Genomics’ pipeline [3, 5, 6] (Table 1), resulting in close to ~100X coverage in each of the libraries. The high coverage allowed more than 90 % of the genome and exome of each library to be called (Table 1.). Plotting reads falling within 100 kb consecutive windows for the BT-474 standard library resulted in the expected complex pattern of amplifications affecting almost all chromosomes [2] (Fig. 1). Known amplifications of ERBB2 and the HOX gene cluster on chromosome 17 [2] are readily identifiable from this plot, as well as many other megabase-sized highly amplified regions. Analysis of both standard and LFR libraries resulted in the discovery of 110, 175, and 145 interchromosomal translocations in the STD, LFR1, and LFR2 libraries, respectively (Table 2). Clustering these translocations, based on windows of 5 kb around the breakpoints, led to the overlap of many translocations within and between libraries, and an overall reduction in the total number of translocations to 291 (Table 2 and Fig. 2). Additionally, comparing our results to a published RNA sequencing analysis of BT-474 [7, 8] demonstrated that three of the five coding interchromosomal translocations were called in our data (Table 3). In the remaining two translocations that were not called by our algorithms, raw reads were found to support their existence in our libraries; for the STARD3-DOK5 translocation, improved algorithms would most likely detect this event. In the case of the TRPC4AP-MRPL45 translocation only one mate pair read in the STD library was found in support, making it unlikely to have been called even with modifications to our algorithms.
Table 1

BT-474 genome statistics

MetricBT-474 STDBT-474 LFR1BT-474 LFR2
Fully called genome fraction0.9720.9150.900
Fully called exome fraction0.9880.9280.920
Gross mapping yield (Gb)343298261
Both mates mapped yield (Gb)306217171
Genome fraction with sequence coverage ≥ 5x0.9970.9810.978
Exome fraction with sequence coverage ≥ 5x0.9990.9800.980
SNV total count3,241,9322,856,6242,890,506
Homozygous SNV count1,531,7231,382,6531,241,444
Heterozygous SNV count1,635,4021,195,2901,239,735
Het/Hom ratio1.070.861.00
ENA sample accession numberERS823996ERS823998ERS823997

STD and LFR libraries were mapped to the NCBI reference genome build 37. An explanation of the genome statistics are as follows: fully called genome fraction, the fraction of the genome for which both alleles at each position are confidently called; fully called exome fraction, the fraction of the coding part of the genome for which both alleles are confidently called; gross mapping yield (Gb), the number of gigabases of read data that can be mapped to NCBI reference genome build 37; both mates mapped yield (Gb), the number of gigabases of read data where both arms of a mate pair can be mapped to NCBI reference genome build 37; genome fraction with sequence coverage ≥ 5x, fraction of the genome with at least 5 reads covering a single position; exome fraction with sequence coverage ≥ 5x, fraction of the coding genome with at least 5 reads covering a single position; SNV total count, the total number of single nucleotide variants called in each library; homozygous SNV count, the total number of homozygous variants called in each library; heterozygous SNV count, the total number of heterozygous variants called in each library; Het/Hom ratio, the ratio of heterozygous variants over homozygous variants (this number is typically around 1.6 for a Caucasian genome); ENA sample accession number, the European Nucleotide Archive accession number to locate the raw read data for each library

Fig. 1

100 kb read coverage. For the standard library of BT-474 reads were averaged across consecutive 100 kb bins, normalized to a tetraploid copy number, and plotted such that each dot represents the coverage of a single 100 kb region of the genome. Y-axis shows haploid copy number; x-axis shows genome position increasing from left to right for chromosome and position

Table 2

Potential translocations identified in BT-474

SV IDchrAchrA startchrA endchrBchrB startchrB endBreakpoint orientationLibrary
1chr157150195715320chr123429511334295399rfLFR1
2chr12.32E + 082.32E + 08chr141.06E + 081.06E + 08frSTD
3chr11.68E + 081.68E + 08chr143552486135524861rfSTD
4chr11.54E + 081.54E + 08chr157143147271431472rfSTD
5chr157351345736117chr163392185133923571ffLFR1
5chr157352575735628chr163392185133922429ffLFR2
5chr157357935735873chr163392306633923356ffLFR2
5chr157358205736219chr163392268633922846ffLFR2
5chr157358335736106chr163392226733922440ffLFR2
5chr157360765736104chr163392321733923483ffLFR2
6chr157352015735384chr163392278033923001frLFR2
6chr157352685735489chr163392291233923205frLFR1
6chr157356235736183chr163392169633922803frLFR1
6chr157358065735854chr163392169733921850frLFR2
6chr157358115736104chr163392272233922916frLFR2
7chr157350445735977chr163392408733924848rfLFR2
7chr157351365735386chr163392439933925038rfLFR1
7chr157351865735521chr163392373233924006rfLFR1
7chr157356515736227chr163392171033922453rfLFR1
7chr157357935736102chr163392313833923604rfLFR1
7chr157357985735892chr163392440033924690rfLFR1
7chr157358255736119chr163392185733922502rfLFR2
8chr11697963616979817chr31895015818950300ffLFR1
9chr13308422533084414chr31.19E + 081.19E + 08ffLFR2
9chr13308428233084474chr31.19E + 081.19E + 08ffLFR1
10chr11696494916965018chr34973298249733419rrLFR1
11chr14531776445317764chr39889908498899084rrSTD
12chr11.09E + 081.09E + 08chr31.1E + 081.1E + 08rrLFR2
13chr11.18E + 081.18E + 08chr41.85E + 081.85E + 08ffLFR1
13chr11.18E + 081.18E + 08chr41.85E + 081.85E + 08ffLFR2
13chr11.18E + 081.18E + 08chr41.85E + 081.85E + 08ffSTD
14chr11.92E + 081.92E + 08chr41.8E + 081.8E + 08ffSTD
15chr12.04E + 082.04E + 08chr47866859678668596ffSTD
16chr12.04E + 082.04E + 08chr47975113979751139ffSTD
17chr12.04E + 082.04E + 08chr41.81E + 081.81E + 08ffSTD
18chr15570545655705813chr49382747193827698frLFR1
19chr11.73E + 081.73E + 08chr41.85E + 081.85E + 08frLFR2
19chr11.73E + 081.73E + 08chr41.85E + 081.85E + 08frLFR1
19chr11.73E + 081.73E + 08chr41.85E + 081.85E + 08frSTD
20chr11.77E + 081.77E + 08chr41199641511996415frSTD
21chr11.91E + 081.91E + 08chr41.8E + 081.8E + 08frSTD
22chr12.32E + 082.32E + 08chr47869298678692986frSTD
23chr12.43E + 082.43E + 08chr47743589777435897frSTD
24chr12.44E + 082.44E + 08chr41.8E + 081.8E + 08frSTD
25chr11.18E + 081.18E + 08chr47624551376245513rfSTD
26chr11.5E + 081.5E + 08chr47606437876064378rfSTD
27chr11.68E + 081.68E + 08chr41.84E + 081.84E + 08rfLFR2
27chr11.68E + 081.68E + 08chr41.84E + 081.84E + 08rfLFR1
28chr11.73E + 081.73E + 08chr41.84E + 081.84E + 08rfLFR1
28chr11.73E + 081.73E + 08chr41.84E + 081.84E + 08rfLFR2
29chr11.9E + 081.9E + 08chr41.81E + 081.81E + 08rfSTD
30chr11.96E + 081.96E + 08chr41.78E + 081.78E + 08rfSTD
31chr12.38E + 082.38E + 08chr41092385410923854rfSTD
32chr12.47E + 082.47E + 08chr41.84E + 081.84E + 08rfSTD
33chr11.6E + 081.6E + 08chr41.84E + 081.84E + 08rrLFR2
33chr11.6E + 081.6E + 08chr41.84E + 081.84E + 08rrLFR1
34chr11.68E + 081.68E + 08chr41.81E + 081.81E + 08rrSTD
34chr11.68E + 081.68E + 08chr41.81E + 081.81E + 08rrLFR2
35chr12.03E + 082.03E + 08chr41.86E + 081.86E + 08rrSTD
36chr12.12E + 082.12E + 08chr41106835311068353rrSTD
37chr12.47E + 082.47E + 08chr47684743876847438rrSTD
38chr12412128924121629chr63333291533333041rrLFR1
39chr12.46E + 082.46E + 08chr61531653215316532rrSTD
40chr13993205039932711chr71.28E + 081.28E + 08frLFR2
41chr12.33E + 082.33E + 08chr81.09E + 081.09E + 08rrSTD
42chr18639879986398985chr91.01E + 081.01E + 08frLFR1
43chr18639879886398975chr91.01E + 081.01E + 08rfLFR1
43chr18639888586398990chr91.01E + 081.01E + 08rfLFR2
44chr19185279791853069chrX1.08E + 081.08E + 08frLFR2
45chr103730804237308294chr118033988980339922ffLFR1
46chr103734663437346708chr118031450280314529rrLFR1
47chr103735212337352151chr118030843380308539rrLFR1
48chr103735827837358481chr118030429380304322rrLFR1
49chr109189628491896284chr124894954148949541rfSTD
50chr104542393545424170chr158073706780737361frLFR1
51chr101461743914617862chr204837583048376358ffLFR2
51chr101461748214617870chr204837582948376359ffLFR1
52chr101461754714617844chr204837584448375915frLFR2
52chr101461756414617697chr204837937048379617frLFR1
52chr101461769214617855chr204837586348376323frLFR1
53chr101461775314617855chr204837584348375872rfLFR1
54chr101461732514617830chr204837945448379795rrLFR1
54chr101461732814617780chr204837954348379794rrLFR2
55chr101.02E + 081.02E + 08chrX9033565090335650frSTD
56chr119007480590074805chr121.34E + 081.34E + 08frSTD
57chr111900287019003273chr149974236499742854frLFR1
57chr111900301519003277chr149974243299742672frLFR2
58chr112016554920165739chr149974293899743430rfLFR2
58chr112016555220165915chr149974290999743449rfLFR1
59chr116048453960484539chr153222414832224148ffSTD
59chr116048640160486401chr153222605632226056ffSTD
60chr11356458357242chr158073671180737504frLFR1
60chr11357798357987chr158073680280737555frLFR1
61chr11356453356667chr165742257557422884rfLFR1
62chr116908973869089738chr171533277615332776ffSTD
63chr118302898183028981chr174148675241486752frSTD
64chr111.14E + 081.14E + 08chr191381349913813499rfSTD
65chr111522440615224520chr194405584544055982rrLFR1
66chr1173938367393836chr202110235921102359ffSTD
67chr1159025635902802chrX1.48E + 081.48E + 08rfLFR1
68chr121.1E + 081.1E + 08chr152451974424519744rfSTD
69chr12452573452724chr173681370536813991ffLFR1
69chr12452611452663chr173681371236813886ffLFR2
70chr12452410452681chr173681388536814117frLFR1
71chr12185129185463chr206294994762950060ffLFR1
71chr12186282186515chr206294871762949064ffLFR1
72chr128637855886378558chr203373693533736935rrSTD
73chr1261275346127658chr221717777517177979frLFR1
74chr132354198323542335chr141.05E + 081.05E + 08ffLFR1
74chr132354217223542381chr141.05E + 081.05E + 08ffLFR2
75chr132354210023542325chr141.05E + 081.05E + 08frLFR1
75chr132354224023542333chr141.05E + 081.05E + 08frLFR1
76chr132354198923542331chr141.05E + 081.05E + 08rfLFR1
77chr132803372028033934chr148823760788237796rrLFR1
78chr132237882822379005chr173668087436680916ffLFR2
78chr132237905422379054chr173668071936680719ffSTD
79chr133654722636547559chr175736169957361862rrLFR1
80chr134982061249820612chr177493125174931251rrSTD
81chr131927596019275960chr181435815614358156frSTD
82chr135098000650980186chr205210530952105592ffLFR1
83chr134299889142999005chr204581043845810647frLFR2
84chr142026850420268687chr152238685622386940frLFR2
85chr142029183020291936chr152240982022410082frLFR1
86chr142026823220268636chr152238666422386879rfLFR1
86chr142026860720269066chr152238686822387058rfLFR2
86chr142026907420269333chr152238685822387158rfLFR1
87chr148871103388711033chr152702622227026222rfSTD
88chr143672685336726853chr174692203846922038rfSTD
88chr143672690936727476chr174692216646922208rfLFR2
89chr149031450190314501chr204546287845462878rfSTD
90chr143115055531150749chr205338030953380666rrLFR2
91chr148917751889177518chrX1.27E + 081.27E + 08frSTD
92chr152048580120486106chr163337625233376612ffLFR1
93chr158073676980736829chr168604741986047441rfLFR1
94chr152048584520486155chr163337608833376404rrLFR1
95chr152056367920564143chr177768024777680760ffLFR1
96chr152056323220563428chr177768059877680764rfLFR1
97chr152056271020563176chr177768135577681423rrLFR1
98chr152576573125765818chr205237441552374605ffLFR2
99chr152472863824729126chr205260284652603206frLFR1
99chr152472876424728985chr205260305052603167frLFR2
100chr152483694924837393chr205260864052609262frLFR1
100chr152483710124837400chr205260884352609205frLFR2
101chr152576504725765694chr205237439452374795rfLFR2
101chr152576505925765749chr205237439552374772rfLFR1
102chr152576508125765289chr205237458552374639rrLFR2
103chr152049771220497759chr222908179129081826frLFR1
103chr152049816120498285chr222908167229081823frLFR2
104chr157319202873192028chrX1172550111725501rfSTD
105chr163368779133688498chr173668692936687323ffLFR1
105chr163368779333688628chr173668678836687070ffLFR2
105chr163368782033688566chr173668728736687558ffLFR2
105chr163368871133688949chr173668631836686384ffLFR1
105chr163369064133691077chr173668405236684801ffLFR1
106chr163370821833708371chr173665589736655935ffLFR2
106chr163370959733709917chr173665461836655128ffLFR1
107chr163371583833715930chr173664881136648948ffLFR1
107chr163371781033718019chr173664687636646932ffLFR1
107chr163372244833722598chr173664217436642455ffLFR2
108chr163372827133728527chr173663618236636239ffLFR2
109chr163368840633688600chr173668665636686774frLFR1
110chr163368777433688478chr173668657036687084rrLFR1
110chr163368822433689111chr173668633136686789rrLFR2
110chr163368892133689026chr173668607436686449rrLFR1
111chr163372245133722930chr173664169936642406rrLFR2
111chr163372245233722565chr173664189636642074rrLFR1
111chr163372357433723794chr173664064336640778rrLFR1
112chr163373125433731590chr173663295636633284rrLFR2
113chr163340433733404694chr193349064633490733ffLFR1
114chr163340668333406798chr193348869333489123rrLFR1
115chr163395010433950328chrX1.37E + 081.37E + 08rfLFR2
116chr175845908858459088chr191753605017536050frSTD
117chr177975926579759265chr191382794813827948rfSTD
118chr173534359435343697chr205562965155630116ffLFR2
118chr173534359535343697chr205562965655630016ffLFR1
118chr173534404035344109chr205563011455630166ffLFR1
119chr173702577337025975chr205098630550986555ffLFR1
119chr173702577437025907chr205098627050986504ffLFR2
119chr173702650137026658chr205098631550986584ffLFR2
119chr173702694837027140chr205098636850986424ffLFR1
120chr173797912937979487chr205139321251393650ffLFR2
121chr174720025247200570chr205267060652670990ffLFR1
121chr174720055147200581chr205267082252670946ffLFR2
122chr173702577337026039chr205098624450986474frLFR2
123chr173730869937308887chr204599801145998274frLFR2
124chr173794430137944887chr205699128756991886frLFR1
124chr173794430537944881chr205699132556991886frLFR2
125chr175012656150127342chr205665987556660528frLFR1
125chr175012691750127317chr205666010956660527frLFR2
126chr173534366535344256chr205562952955630497rfLFR1
126chr173534366935344183chr205562952955630106rfLFR2
127chr173646793636467959chr205616830856168561rfLFR2
128chr173702577137026391chr205098624550986940rfLFR1
128chr173702577137026159chr205098624550987008rfLFR2
129chr173719970737199886chr205386572153865808rfLFR2
130chr173795013737950419chr205144500751445204rfLFR2
131chr172154945921549788chr202607898426079241rrLFR2
132chr173534368535343955chr205562954355629778rrLFR2
132chr173534376435343787chr205562968955629717rrLFR1
132chr173534499035345255chr205563015055630327rrLFR2
133chr173702577737026095chr205098625450986852rrLFR2
133chr173702578237026005chr205098625250986500rrLFR1
134chr173730851037309069chr204599900445999288rrLFR2
134chr173730853337308943chr204599895045999287rrLFR1
135chr173794440537944589chr205699169956991841rrLFR2
136chr175844837858448378chrX77182367718236rrSTD
137chr184336788543367885chr205057035950570359ffSTD
138chr181417451514174667chr211535696015357387ffLFR1
139chr181415083314151055chr211537967415379805rrLFR1
140chr181417429814174539chr211535706915357326rrLFR1
141chr181447390014474154chr211505396415054043rrLFR2
142chr191765212017652329chr205538600355386161ffLFR2
142chr191765225117652438chr205538606555386453ffLFR1
143chr191765183517651994chr205538724455387326frLFR1
143chr191765185817652155chr205538612955386636frLFR1
144chr191721709017217398chr205689009256890284rfLFR1
145chr191721709017217570chr205688990556890286rrLFR1
146chr192453847824538792chr212883452828834780rrLFR1
147chr1999487639948799chr221610377616103874ffLFR1
148chr1999489109949090chr221610368016103777rfLFR1
149chr21.88E + 081.88E + 08chr108497914584979145ffSTD
150chr22.33E + 082.33E + 08chr114335677143356771frSTD
151chr239311013931427chr121.24E + 081.24E + 08ffLFR1
151chr239311293931361chr121.24E + 081.24E + 08ffLFR2
151chr239311723931453chr121.24E + 081.24E + 08ffLFR2
152chr239308913931629chr121.24E + 081.24E + 08frLFR2
152chr239310523931629chr121.24E + 081.24E + 08frLFR1
153chr239316903932147chr121.24E + 081.24E + 08rfLFR2
154chr239311633931384chr121.24E + 081.24E + 08rrLFR2
155chr24945701549457015chr143295340332953403ffSTD
156chr21355095113551227chr152059319920593601ffLFR1
157chr21353181913531839chr152061316320613338rrLFR2
158chr22621453826214538chr158421278884212788rrSTD
159chr25746243257462432chr184032396540323965rrSTD
160chr22.14E + 082.14E + 08chr201875622218756222rrSTD
161chr29554184695541846chr2199079199907919frSTD
162chr29554658295546582chr2199038669903866rfSTD
163chr21192993411929973chr41.62E + 081.62E + 08frLFR2
164chr21.33E + 081.33E + 08chr41.91E + 081.91E + 08rfLFR1
165chr21.96E + 081.96E + 08chr68811719588117195frSTD
166chr22667366326673663chr71951811519518115rrSTD
167chr21.33E + 081.33E + 08chr86921892169218921frSTD
168chr22.13E + 082.13E + 08chr83102203431022034rfSTD
169chr22754938927549460chr91.25E + 081.25E + 08ffLFR2
170chr21.16E + 081.16E + 08chr91.31E + 081.31E + 08frLFR1
170chr21.16E + 081.16E + 08chr91.31E + 081.31E + 08frLFR2
171chr204659744246598072chrX1.29E + 081.29E + 08frLFR2
171chr204659769946598021chrX1.29E + 081.29E + 08frLFR1
172chr205260981152610360chrX1.29E + 081.29E + 08rfLFR2
173chr212792854527928545chrX6518750465187504rrSTD
174chr39642032396420323chr138676028086760280rrSTD
175chr38638587486386256chr148668473386685055ffLFR1
175chr38638588086386147chr148668407286684160ffLFR1
175chr38638589386386230chr148668406986684158ffLFR2
176chr38638587286385872chr148668405486684054rfSTD
176chr38638589386386680chr148668405386684588rfLFR1
176chr38638589986386572chr148668405886684376rfLFR2
177chr39004582690046098chr149158643891586701rfLFR1
177chr39004582890046105chr149158645691586703rfLFR2
177chr39004583290045832chr149158650791586507rfSTD
178chr37569011375690113chr206290945962909459rfSTD
179chr3655747656029chr2194962789496583ffLFR2
179chr3655838656046chr2194962649496451ffLFR1
180chr37571934975720145chr41.91E + 081.91E + 08ffLFR1
180chr37571987375720142chr41.91E + 081.91E + 08ffLFR2
180chr37572088775720942chr41.91E + 081.91E + 08ffLFR1
180chr37572143975721741chr41.91E + 081.91E + 08ffLFR1
181chr37572397075724039chr41.91E + 081.91E + 08ffLFR1
182chr37574783975748224chr41.91E + 081.91E + 08ffLFR1
183chr37571964275719870chr41.91E + 081.91E + 08rrLFR1
183chr37572159175721650chr41.91E + 081.91E + 08rrLFR1
184chr37574728675747286chr41.54E + 081.54E + 08rrSTD
185chr31236988412370300chr51.4E + 081.4E + 08ffLFR1
186chr38638777686387933chr51.07E + 081.07E + 08frLFR2
187chr31.73E + 081.73E + 08chr51.6E + 081.6E + 08rrSTD
188chr31.11E + 081.11E + 08chr81.29E + 081.29E + 08rfLFR2
188chr31.11E + 081.11E + 08chr81.29E + 081.29E + 08rfLFR1
189chr436356403636110chr125447862254478665frLFR2
190chr48599062285990622chr136251387862513878ffSTD
191chr47699068576990685chr183812623538126235frSTD
192chr46075189160751891chr192884548628845486ffSTD
193chr41.58E + 081.58E + 08chr61.56E + 081.56E + 08rfSTD
194chr41963910219639102chr73169678031696780ffSTD
195chr44501501145015011chr96836425068364250rfSTD
196chr565317836531965chr125638736356387445ffLFR2
197chr584158468415916chr121.28E + 081.28E + 08ffLFR1
197chr584162808416280chr121.28E + 081.28E + 08ffSTD
198chr58678847486788474chr128676651286766512rrSTD
199chr572999747300569chr152310504423105293rrLFR2
199chr573001747300563chr152310505823105220rrLFR1
200chr51.76E + 081.76E + 08chr185538158955381589rfSTD
201chr52157299121573383chr65757527257575891frLFR1
201chr52157300721573381chr65757537157575890frLFR2
202chr52157293521572983chr65757492157575108rrLFR1
202chr52157297721573267chr65757568757575894rrLFR1
203chr658412235841372chr132191122321911450ffLFR1
204chr658566585856821chr132189653921896784ffLFR1
205chr658534025853892chr132189903021899151frLFR1
206chr658502865850599chr132190138521901675rrLFR1
207chr6382040382402chr163342825233428505ffLFR2
208chr6381822382432chr163342802233428507frLFR2
208chr6381989382279chr163342827733428507frLFR1
209chr61319132113191418chr176463739064637867ffLFR2
210chr62468330924684093chr223292790332928520ffLFR1
210chr62468339124684010chr223292790632928399ffLFR2
211chr62468376924684707chr223292792332928504frLFR1
211chr62468385724684093chr223292792032928028frLFR2
211chr62468400424684224chr223292826732928449frLFR2
212chr62468400424684488chr223292793332928508rfLFR1
212chr62468410624684364chr223292799632928300rfLFR2
213chr62468398624684639chr223292791932928539rrLFR1
213chr62468398624684642chr223292792332928539rrLFR2
214chr61.38E + 081.38E + 08chr76298242462982424ffSTD
215chr61.35E + 081.35E + 08chr79733790197337901rfSTD
216chr61.08E + 081.08E + 08chr81.3E + 081.3E + 08ffSTD
217chr74491109944911099chr117352743973527439ffSTD
218chr79484277394842987chr121.22E + 081.22E + 08ffLFR1
218chr79484277594842989chr121.22E + 081.22E + 08ffLFR2
219chr75760534157605452chr136364881263648972rrLFR1
220chr75760630157606665chr136363753163637816rrLFR1
221chr76891539268915614chr148738050587380635ffLFR1
222chr76891489968914986chr148738050587380863rfLFR1
222chr76891490068915522chr148738047587380793rfLFR2
222chr76891500268915002chr148738033887380338rfSTD
222chr76891531768915665chr148738050087380550rfLFR1
223chr72625242626252890chr154085384340854161frLFR1
223chr72625258426252823chr154085390040854157frLFR2
224chr72624137426241409chr154085442140854478rfLFR2
225chr72624804726248068chr154085419540854374rfLFR2
226chr74872165048721717chr158073688480737051rrLFR1
227chr71.28E + 081.28E + 08chr167686577776865777rrSTD
228chr76262064862620648chr193188572831885728ffSTD
229chr76254835562548355chr193168740631687406frSTD
230chr77038792070387920chr191886612418866124rfSTD
231chr75172322351723374chr205395318953953275frLFR2
232chr718572161857334chr222380128723801442ffLFR1
233chr718634491863848chr222379425123794417ffLFR1
233chr718666971866826chr222379125523791393ffLFR2
233chr718668321867009chr222379134723791471ffLFR1
234chr718792351879668chr222377829523778753ffLFR2
235chr718795121879690chr222377865023778989frLFR2
236chr718562081856668chr222380166823801951rrLFR1
237chr718792711879662chr222377858223778970rrLFR1
237chr718793281879707chr222377847323778856rrLFR2
238chr71.35E + 081.35E + 08chr91.23E + 081.23E + 08rfSTD
239chr75499314654993146chr91956853619568536rrSTD
240chr74919914549199504chrX5688100356881383rfLFR2
240chr74919916349199163chrX5688100656881006rfSTD
240chr74919917349199467chrX5688101856881256rfLFR1
241chr861980766198076chr102769359327693593ffSTD
242chr85640286356403275chr116504656765046743ffLFR2
242chr85640289556403155chr116504651065046594ffLFR1
243chr85654208156542449chr116539227365392518ffLFR2
244chr85661848956618658chr116930145469301749ffLFR1
245chr85662663656627138chr116907516369075841ffLFR2
245chr85662671556627162chr116907516969075698ffLFR1
246chr85662891356629070chr116363743563637636ffLFR2
247chr85663895656639544chr116920160569202126ffLFR1
247chr85663900156639542chr116920160269202252ffLFR2
247chr85663957556639575chr116920160269201602ffSTD
248chr85354526053545939chr116350136663501663frLFR1
248chr85354541253545943chr116350120863501666frLFR2
249chr85654207956542423chr116930878069309221frLFR1
249chr85654224756542426chr116930890969309226frLFR2
250chr85659202456592480chr116923942769239705frLFR1
250chr85659272656592933chr116923915269239432frLFR1
250chr85659342256593609chr116923960269239726frLFR1
251chr85661786156618657chr116930102069301802frLFR1
251chr85661811156618654chr116930109669301802frLFR2
252chr85662885356628964chr116363740163637659frLFR2
253chr85665172956652151chr116357262963572923frLFR2
253chr85665218056652180chr116357296263572962frSTD
254chr85354010853540108chr116551520365515203rfSTD
254chr85354011153540761chr116551520165515831rfLFR1
254chr85354011453540571chr116551520465515904rfLFR2
255chr85639656756396986chr116539311365393461rfLFR2
256chr85641953656419720chr116901144169011604rfLFR2
257chr85654014856540364chr116931295569313432rfLFR1
257chr85654024056540357chr116931310069313123rfLFR2
258chr85654222556542349chr116539232165392431rfLFR2
259chr85657497556575440chr116932280369322902rfLFR1
259chr85657498956575241chr116932229969322566rfLFR1
260chr85659174156591785chr116926373369263914rfLFR2
261chr85659201356592173chr116923932869239428rfLFR1
262chr85662888556628980chr116363743163637676rfLFR2
263chr85663930956639547chr116920164769201669rfLFR2
264chr85668740856688087chr116357500263575212rfLFR2
264chr85668745856687964chr116357500063575180rfLFR1
264chr85668746956687469chr116357501363575013rfSTD
265chr85671480556715355chr116931248169312733rfLFR1
265chr85671481756715176chr116931248169312725rfLFR2
266chr85356440353564403chr116492968364929683rrSTD
266chr85356450953564736chr116492959464929673rrLFR1
266chr85356452353564616chr116492952864929706rrLFR2
267chr85636624156366635chr116539324765393466rrLFR2
268chr85641953556419894chr116901097469011951rrLFR1
268chr85641953556420039chr116901114469011949rrLFR2
269chr85650422656504323chr116931258369312730rrLFR2
270chr85657491956575636chr116932241169322981rrLFR1
270chr85657492856574928chr116932299269322992rrSTD
270chr85657493756575286chr116932249469322979rrLFR2
271chr85659196856592652chr116923915469239756rrLFR1
271chr85659196856592535chr116923923169239751rrLFR2
272chr85661985356620129chr116358231663582822rrLFR1
272chr85661985356620317chr116358231863582819rrLFR2
272chr85661986256619862chr116358283563582835rrSTD
273chr85662884856629267chr116363700263637728rrLFR2
273chr85662885256629331chr116363718263637728rrLFR1
274chr85664210856642193chr116539231465392377rrLFR2
275chr85664371456643849chr116503939865039590rrLFR1
276chr85668788956688158chr116357502163575048rrLFR2
277chr89772826197728261chr124391802843918028frSTD
278chr81.46E + 081.46E + 08chr123429628334296283rfSTD
279chr81528949415289872chr137431386274313987rfLFR1
280chr88797638087976380chr144132573441325734frSTD
281chr82847803728478037chr183047368530473685rrSTD
282chr81E + 081E + 08chr191571150015711500rrSTD
283chr93446662934467226chr107503057275030835rrLFR1
283chr93446665334466653chr107503085375030853rrSTD
283chr93446668834467009chr107503057175030836rrLFR2
284chr99638441196384411chr131964526919645269frSTD
285chr99638282996382829chr131964643219646432rfSTD
286chr96841838468418769chr147810171678101914ffLFR2
286chr96841845868418714chr147810172078101914ffLFR1
287chr93446756334467886chr149035684690357250frLFR2
287chr93446765334467866chr149035680490357230frLFR1
288chr91.05E + 081.05E + 08chr163395722033957220rfSTD
289chr91.36E + 081.36E + 08chr176779493667794936ffSTD
290chr96879258868792588chr177931207979312079rfSTD
291chr91.24E + 081.24E + 08chr204257845742578457rrSTD

All potential translocations identified in each library are listed. Translocations were clustered by 5 kb windows around the breakpoints to allow for comparison between different libraries. An explanation of the fields are as follows: SV ID, this is the ID for each translocation after clustering; chrA, the chromosome for the A side of the translocation; chrA start, the start of the breakpoint region for the chrA side; chrA end, the end of the breakpoint region for the chrA side; chrB, the chromosome for the B side of the translocation; chrB start, the start of the breakpoint region for the chrB side; chrB end, the end of the breakpoint region for the chrB side; breakpoint orientation, the direction of the translocation on each side of the translocation as compared to NCBI reference genome build 37, “f” represents the forward direction and “r” represents the reverse direction, the first letter is the chrA region and the second letter is the chrB region; library, the library for which each individual translocation was found

Fig. 2

Overlap of inter-chromosomal translocations between libraries. Interchromosomal translocations were called for each library. The overlap between libraries was determined by considering translocations found within 5 kb of each other, and in the same orientation, to be the same event. This also resulted in the aggregation of multiple close translocations within the same sample into a single event. In total, 109 translocations were found in the standard library (black), and 147 and 133 were found in LFR libraries 1 (blue) and 2 (green), respectively. Of these, 85 interchromosomal translocations are shared between at least two libraries

Table 3

Translocations confirmed by published RNA sequencing data

ChrAGeneAChrA startChrA endGeneA strandChrBGeneBChrB startChrB endGeneB strandPaperSupported by SV dataSupported by raw reads
1AHCTF1247,002,400247,094,726-4NAAA76,831,80876,862,166-Kangaspeska et al.yes SV_ID37yes
17STARD337,793,33337,820,454+20DOK553,092,26653,267,710+Edgren et al.noyes
20VAPB56,964,17557,026,157+17IKZF337,913,96838,020,441-Edgren et al.yes SV_ID124yes
20TRPC4AP33,590,20733,680,618-17MRPL4536,452,98936,479,101+Kangaspeska et al.noyes
20RAB22A56,884,77156,942,563+19MYO9B17,186,59117,324,104+Edgren et al.yes SV_ID145yes

Translocations identified in the standard and LFR libraries were compared to RNA sequencing identification of translocations in BT-474 from publications of Edgren et al. [7] and Kangaspeska et al. [8]. An explanation of the fields are as follows: chrA, the chromosome for the A side of the translocation; geneA, the gene affected on the A side of the translocation; chrA start, the start of the breakpoint region for the chrA side; chrA end, the end of the breakpoint region for the chrA side; geneA strand, the coding direction of the gene on the A side of the translocation; chrB, the chromosome for the B side of the translocation; geneB, the gene affected on the B side of the translocation; chrB start, the start of the breakpoint region for the chrB side; chrB end, the end of the breakpoint region for the chrB side; geneB strand, the coding direction of the gene on the B side of the translocation; paper, the publication in which the translocation was identified; supported by SV data (if there is evidence from at least one of our libraries to support this translocation a “yes” will appear in the table followed by the SV ID for our translocation, otherwise a “no” will appear); supported by raw reads (if there is evidence from at least one mate-pair read in one of our libraries to support this translocation a “yes” will appear in the table followed by the SV ID for our translocation, otherwise a “no” will appear)

BT-474 genome statistics STD and LFR libraries were mapped to the NCBI reference genome build 37. An explanation of the genome statistics are as follows: fully called genome fraction, the fraction of the genome for which both alleles at each position are confidently called; fully called exome fraction, the fraction of the coding part of the genome for which both alleles are confidently called; gross mapping yield (Gb), the number of gigabases of read data that can be mapped to NCBI reference genome build 37; both mates mapped yield (Gb), the number of gigabases of read data where both arms of a mate pair can be mapped to NCBI reference genome build 37; genome fraction with sequence coverage ≥ 5x, fraction of the genome with at least 5 reads covering a single position; exome fraction with sequence coverage ≥ 5x, fraction of the coding genome with at least 5 reads covering a single position; SNV total count, the total number of single nucleotide variants called in each library; homozygous SNV count, the total number of homozygous variants called in each library; heterozygous SNV count, the total number of heterozygous variants called in each library; Het/Hom ratio, the ratio of heterozygous variants over homozygous variants (this number is typically around 1.6 for a Caucasian genome); ENA sample accession number, the European Nucleotide Archive accession number to locate the raw read data for each library 100 kb read coverage. For the standard library of BT-474 reads were averaged across consecutive 100 kb bins, normalized to a tetraploid copy number, and plotted such that each dot represents the coverage of a single 100 kb region of the genome. Y-axis shows haploid copy number; x-axis shows genome position increasing from left to right for chromosome and position Potential translocations identified in BT-474 All potential translocations identified in each library are listed. Translocations were clustered by 5 kb windows around the breakpoints to allow for comparison between different libraries. An explanation of the fields are as follows: SV ID, this is the ID for each translocation after clustering; chrA, the chromosome for the A side of the translocation; chrA start, the start of the breakpoint region for the chrA side; chrA end, the end of the breakpoint region for the chrA side; chrB, the chromosome for the B side of the translocation; chrB start, the start of the breakpoint region for the chrB side; chrB end, the end of the breakpoint region for the chrB side; breakpoint orientation, the direction of the translocation on each side of the translocation as compared to NCBI reference genome build 37, “f” represents the forward direction and “r” represents the reverse direction, the first letter is the chrA region and the second letter is the chrB region; library, the library for which each individual translocation was found Overlap of inter-chromosomal translocations between libraries. Interchromosomal translocations were called for each library. The overlap between libraries was determined by considering translocations found within 5 kb of each other, and in the same orientation, to be the same event. This also resulted in the aggregation of multiple close translocations within the same sample into a single event. In total, 109 translocations were found in the standard library (black), and 147 and 133 were found in LFR libraries 1 (blue) and 2 (green), respectively. Of these, 85 interchromosomal translocations are shared between at least two libraries Translocations confirmed by published RNA sequencing data Translocations identified in the standard and LFR libraries were compared to RNA sequencing identification of translocations in BT-474 from publications of Edgren et al. [7] and Kangaspeska et al. [8]. An explanation of the fields are as follows: chrA, the chromosome for the A side of the translocation; geneA, the gene affected on the A side of the translocation; chrA start, the start of the breakpoint region for the chrA side; chrA end, the end of the breakpoint region for the chrA side; geneA strand, the coding direction of the gene on the A side of the translocation; chrB, the chromosome for the B side of the translocation; geneB, the gene affected on the B side of the translocation; chrB start, the start of the breakpoint region for the chrB side; chrB end, the end of the breakpoint region for the chrB side; geneB strand, the coding direction of the gene on the B side of the translocation; paper, the publication in which the translocation was identified; supported by SV data (if there is evidence from at least one of our libraries to support this translocation a “yes” will appear in the table followed by the SV ID for our translocation, otherwise a “no” will appear); supported by raw reads (if there is evidence from at least one mate-pair read in one of our libraries to support this translocation a “yes” will appear in the table followed by the SV ID for our translocation, otherwise a “no” will appear) Single nucleotide variants (SNVs) numbering 3.24 million were called in the STD library, and over 2.85 million in each of the LFR libraries. Of these, 2.84 million were called in all libraries (Fig. 3), demonstrating good reproducibility between different methods of library construction. For all libraries the ratio of heterozygous to homozygous was close to 1; a ratio much lower than the expected ~1.6 for Caucasian genomes. This is most likely the result of loss of heterozygosity (LOH) from the deletion or multi-copy amplifications of large portions, and/or the complete parental copy of almost all chromosomes in the BT-474 genome, as seen in our data (Fig. 1 and Fig. 4), and as previously described [2]. This was confirmed by estimating what would happen to heterozygous variants in the NA12878 genome (the sample used by the ‘Genome in a Bottle’ Consortium [9]) in two scenarios: if the same percentage of the genome was LOH based on 1) the percentage of the genome lost, or 2) the percentage of variants lost (22.2 % and 20.3 %, respectively, Table 4). In both cases the ratio of heterozygous to homozygous variants was reduced to close to 1 (Table 5).
Fig. 3

Overlap of called variations between libraries. Single nucleotide variants (SNVs) numbering 3.24 million were called in the STD library, and over 2.85 million in each of the LFR libraries. The overlap between each library was compared and plotted. The standard library (black), and LFR libraries 1 (blue) and 2 (green) are highly overlapping, demonstrating that the majority of the variant calls are highly reproducible between separately processed sequencing libraries

Fig. 4

Circos plot of the BT-474 genome. Chromosome number (in large bold numbers and letters), chromosome position (in small numbers), and a karyotype ideogram form the outer circle of the plot. The remaining circles are described in order of outermost to innermost: called ploidy (the copy number of region; blue-gray), Lesser Allele Fraction (LAF, the fraction of the lesser allele, 0.5 for a heterozygous SNP, 0 for a homozygous SNP; green), density of heterzogous SNPs (orange), and density of homozygous SNPs (blue). Lines in the center of the plot represent interchromosomal junctions

Table 4

Calculation of the amount of LOH in BT-474

ChromosomeBase pairsVariationsCentromere position (Mbp)% LOHLOH (bp)LOH (variations)
1249,250,6214,401,0911250.0 %00
2243,199,3734,607,70293.30.0 %00
3198,022,4303,894,3459140.4 %800000001573295
4191,154,2763,673,89250.40.0 %00
5180,915,2603,436,66748.40.0 %00
6171,115,0673,360,890610.0 %00
7159,138,6633,045,99259.90.0 %00
8146,364,0222,890,69245.634.2 %50000000987501
9141,213,4312,581,8274924.8 %35000000639910
10135,534,7472,609,80240.241.0 %555347471069355
11135,006,5162,607,25453.740.7 %550065161062289
12133,851,8952,482,19435.80.0 %00
13115,169,8781,814,24217.9100.0 %1151698781814242
14107,349,5401,712,79917.60.0 %00
15102,531,3921,577,3461975.0 %768985441183010
1690,354,7531,747,13636.60.0 %00
1781,195,2101,491,841240.0 %00
1878,077,2481,448,60217.20.0 %00
1959,128,9831,171,35626.50.0 %00
2063,025,5201,206,75327.50.0 %00
2148,129,895787,78413.20.0 %00
2251,304,566745,77814.7100.0 %51304566745778
X155,270,5602,174,95260.6100.0 %1552705602174952
Total3,036,303,84655,470,9371,028674,184,81111,250,331
% of total in LOH22.2 %20.3 %

Loss of heterozygosity (LOH) in the genome of BT-474 was calculated based on the genomic distance covered by large areas (>10 mb), where the lesser allele fraction is zero. In our data the lesser allele was calculated in 100 kb windows, based on read counts at all fully called variant loci. An explanation of the fields are as follows: chromosome, the chromosome for which each LOH region was calculated; base pairs, the total number of base pairs for each chromosome based on the NCBI reference genome build 37; variations, total number of confirmed variations for each chromosome based on Ensembl genome browser release 68; centromere position (Mbp), the location in megabases of the centromere for each chromosome; %LOH, the percentage of each chromosome estimated to have LOH; LOH (bp), the total number of base pairs for each chromosome that are estimated to be found in LOH regions; the total number of variations from Ensembl genome browser release 68 that are found within the estimated LOH region

Table 5

NA12878 simulation of LOH event in BT-474

Variants in NA12878Variants based on LOH simulation by lengthVariants based on LOH simulation by number of variations
Hom SNPs1,306,5441,768,5571,729,016
Het SNPs2,081,1411,619,1281,658,669
Ratio1.590.920.96

The percentage of the BT-474 genome found to be in regions of LOH was used to simulate what would happen to the heterozygous and homozygous variants in the genome of NA12878 if that same amount of LOH was to occur. The simulation demonstrated that the LOH variants, and the increase in homozygous variants as a result, caused a shift in the Het/Hom SNP ratio from 1.59 to 0.92–0.96, similar to the average ratio of the three libraries of 0.98 seen in BT-474. An explanation of the fields are as follows: hom SNPs, total number of homozygous variants; het SNPs, total number of heterozygous variants; variants in NA12878, number of variants of each category in the genome of NA12878; variants based on LOH simulation by length, the change in the number of variants based on applying the percentage of LOH as calculated from the total length of the LOH region in BT-474; variants based on LOH simulation by number of variations, the change in the number of variants based on applying the percent of LOH as calculated from the total number of variants in the LOH region in BT-474

Overlap of called variations between libraries. Single nucleotide variants (SNVs) numbering 3.24 million were called in the STD library, and over 2.85 million in each of the LFR libraries. The overlap between each library was compared and plotted. The standard library (black), and LFR libraries 1 (blue) and 2 (green) are highly overlapping, demonstrating that the majority of the variant calls are highly reproducible between separately processed sequencing libraries Circos plot of the BT-474 genome. Chromosome number (in large bold numbers and letters), chromosome position (in small numbers), and a karyotype ideogram form the outer circle of the plot. The remaining circles are described in order of outermost to innermost: called ploidy (the copy number of region; blue-gray), Lesser Allele Fraction (LAF, the fraction of the lesser allele, 0.5 for a heterozygous SNP, 0 for a homozygous SNP; green), density of heterzogous SNPs (orange), and density of homozygous SNPs (blue). Lines in the center of the plot represent interchromosomal junctions Calculation of the amount of LOH in BT-474 Loss of heterozygosity (LOH) in the genome of BT-474 was calculated based on the genomic distance covered by large areas (>10 mb), where the lesser allele fraction is zero. In our data the lesser allele was calculated in 100 kb windows, based on read counts at all fully called variant loci. An explanation of the fields are as follows: chromosome, the chromosome for which each LOH region was calculated; base pairs, the total number of base pairs for each chromosome based on the NCBI reference genome build 37; variations, total number of confirmed variations for each chromosome based on Ensembl genome browser release 68; centromere position (Mbp), the location in megabases of the centromere for each chromosome; %LOH, the percentage of each chromosome estimated to have LOH; LOH (bp), the total number of base pairs for each chromosome that are estimated to be found in LOH regions; the total number of variations from Ensembl genome browser release 68 that are found within the estimated LOH region NA12878 simulation of LOH event in BT-474 The percentage of the BT-474 genome found to be in regions of LOH was used to simulate what would happen to the heterozygous and homozygous variants in the genome of NA12878 if that same amount of LOH was to occur. The simulation demonstrated that the LOH variants, and the increase in homozygous variants as a result, caused a shift in the Het/Hom SNP ratio from 1.59 to 0.92–0.96, similar to the average ratio of the three libraries of 0.98 seen in BT-474. An explanation of the fields are as follows: hom SNPs, total number of homozygous variants; het SNPs, total number of heterozygous variants; variants in NA12878, number of variants of each category in the genome of NA12878; variants based on LOH simulation by length, the change in the number of variants based on applying the percentage of LOH as calculated from the total length of the LOH region in BT-474; variants based on LOH simulation by number of variations, the change in the number of variants based on applying the percent of LOH as calculated from the total number of variants in the LOH region in BT-474 Analysis of the coding regions of a comprehensive list of known cancer-causing genes [10, 11] identified 67 small variants (<50 base pairs, Additional file 1). Most of these are probably inherited variants with no involvement in tumor formation, however variants in TP53 and PIK3CA, previously found as somatic mutations in many tumors [12], were found in this cell line (Additional file 1). Also identified in our data: a potentially inherited variant in CHEK2, listed as ‘likely to be pathogenic’ in the ClinVar database [13]. To demonstrate the quality of our variant calls we compared them to a list generated by targeted sequencing of BT-474 as part of the Cancer Cell Line Encyclopedia (CCLE) project [14]. When the data from all three libraries were combined, 92 % of the variants found in CCLE were also called in our data, suggesting that our BT-474 genome is of good quality (Fig. 5 and Additional file 2). Further, 130 variants were found in two or more of our libraries that were not found in the CCLE data. This is either because the exons in which these variants were found were not covered as part of the CCLE target set, these variants were missed in the CCLE sequencing analysis, or to a lesser extent they are false positives in our dataset (Additional file 2).
Fig. 5

Data directory tree. The output from the LFR process consists of a series of files and folders. A complete description of everything contained within the Complete Genomics data package can be found in the Additional file 3

Data directory tree. The output from the LFR process consists of a series of files and folders. A complete description of everything contained within the Complete Genomics data package can be found in the Additional file 3

Availability of supporting data

Complete Genomics data formats

The entire data set from Complete Genomics, provided here, consists of a series of files and directories covering various categories of whole genome analysis (Fig. 6). A complete description of all files and the methods used to generate them can be found in the “Standard Sequencing Service Data File Formats v2.5” document provided by Complete Genomics, Inc. (available in Additional file 3 [15]).
Fig. 6

Overlap of called variations between libraries and CCLE data. Variants in the standard (black), and LFR 1 (blue) and 2 (green) libraries from this study found in the genes analyzed in the CCLE study, were compared to those variants called for BT-474 by the CCLE (orange). Eighty-nine percent of variants found by CCLE (orange) in BT-474 were also found within at least one of our libraries

Overlap of called variations between libraries and CCLE data. Variants in the standard (black), and LFR 1 (blue) and 2 (green) libraries from this study found in the genes analyzed in the CCLE study, were compared to those variants called for BT-474 by the CCLE (orange). Eighty-nine percent of variants found by CCLE (orange) in BT-474 were also found within at least one of our libraries

LFR-specific files

Data packages from LFR do not include directories for structural variation (SV) or mobile element insertion (MEI; for more information on the content of these directories see the “Standard Sequencing Service Data File Formats v2.5” file mentioned above. In addition, one of the fields in the variant file (hapLink) is modified and there are six new fields described below: hapLink: LFR phased variants have an ID with the pattern: “Phased_#_#_#”, where # is an integer, the first two #s describe unique contigs, and the last # in the series is either 1 or 0 and represents the two possible haplotypes for each contig. All SNPs sharing the same “Phased_#_#_#” are from the same haplotype. wellCount: total number of LFR wells (out of 384) containing sequence reads calling the variant or reference allele. This metric is used to identify polymerase-induced false positive calls, since it is unlikely that random polymerase errors will occur in multiple different wells. A complete explanation of this concept can be found in Peters et al. [4]. wellIDs: contains the IDs of the specific wells from which reads calling the variant come. exclusiveWellCount: this is the number of wells at each locus that have reads calling only the variant or the reference allele (not both). For true heterozygous variants this number should be close to that obtained for “wellCount”. SharedWellCount: at each locus this is the number of wells that contain reads calling both alleles. For true heterozygous variants this should be low; having a high number here suggests mapping errors. For homozygous variants almost all of the well counts should be in this field. MinExclusiveWellCountInThisLocus: this is the minimum number of exclusive wells (non-shared well counts) at each locus. MaxExculisveWellCountInThisLocus: this is the maximum number of exclusive wells (non-shared well counts) at each locus.

LFR structural variant analysis files

Each LFR genome contains an LFR-specific structural variant file in the ASM directory (see Fig. 2 for directory tree). This file is generated using a novel algorithm that identifies unexpected mate-pairs that are found in more than one compartment of an LFR library (manuscript in preparation). A full description of the headers can be found within each file under the Excel tab labeled “Header Description”. Read and mapping data for all genomes reported here are available at the European Nucleotide Archive (ENA) under study accession number PRJEB10587. Sample accession numbers for each sequence library can be found in Table 1. Supporting data is also available from the GigaScience GigaDB database [16].
  14 in total

1.  Isolation of two human tumor epithelial cell lines from solid breast carcinomas.

Authors:  E Y Lasfargues; W G Coutinho; E S Redfield
Journal:  J Natl Cancer Inst       Date:  1978-10       Impact factor: 13.506

2.  Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays.

Authors:  Radoje Drmanac; Andrew B Sparks; Matthew J Callow; Aaron L Halpern; Norman L Burns; Bahram G Kermani; Paolo Carnevali; Igor Nazarenko; Geoffrey B Nilsen; George Yeung; Fredrik Dahl; Andres Fernandez; Bryan Staker; Krishna P Pant; Jonathan Baccash; Adam P Borcherding; Anushka Brownley; Ryan Cedeno; Linsu Chen; Dan Chernikoff; Alex Cheung; Razvan Chirita; Benjamin Curson; Jessica C Ebert; Coleen R Hacker; Robert Hartlage; Brian Hauser; Steve Huang; Yuan Jiang; Vitali Karpinchyk; Mark Koenig; Calvin Kong; Tom Landers; Catherine Le; Jia Liu; Celeste E McBride; Matt Morenzoni; Robert E Morey; Karl Mutch; Helena Perazich; Kimberly Perry; Brock A Peters; Joe Peterson; Charit L Pethiyagoda; Kaliprasad Pothuraju; Claudia Richter; Abraham M Rosenbaum; Shaunak Roy; Jay Shafto; Uladzislau Sharanhovich; Karen W Shannon; Conrad G Sheppy; Michel Sun; Joseph V Thakuria; Anne Tran; Dylan Vu; Alexander Wait Zaranek; Xiaodi Wu; Snezana Drmanac; Arnold R Oliphant; William C Banyai; Bruce Martin; Dennis G Ballinger; George M Church; Clifford A Reid
Journal:  Science       Date:  2009-11-05       Impact factor: 47.728

3.  Identification of fusion genes in breast cancer by paired-end RNA-sequencing.

Authors:  Henrik Edgren; Astrid Murumagi; Sara Kangaspeska; Daniel Nicorici; Vesa Hongisto; Kristine Kleivi; Inga H Rye; Sandra Nyberg; Maija Wolf; Anne-Lise Borresen-Dale; Olli Kallioniemi
Journal:  Genome Biol       Date:  2011-01-19       Impact factor: 13.583

4.  The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.

Authors:  Jordi Barretina; Giordano Caponigro; Nicolas Stransky; Kavitha Venkatesan; Adam A Margolin; Sungjoon Kim; Christopher J Wilson; Joseph Lehár; Gregory V Kryukov; Dmitriy Sonkin; Anupama Reddy; Manway Liu; Lauren Murray; Michael F Berger; John E Monahan; Paula Morais; Jodi Meltzer; Adam Korejwa; Judit Jané-Valbuena; Felipa A Mapa; Joseph Thibault; Eva Bric-Furlong; Pichai Raman; Aaron Shipway; Ingo H Engels; Jill Cheng; Guoying K Yu; Jianjun Yu; Peter Aspesi; Melanie de Silva; Kalpana Jagtap; Michael D Jones; Li Wang; Charles Hatton; Emanuele Palescandolo; Supriya Gupta; Scott Mahan; Carrie Sougnez; Robert C Onofrio; Ted Liefeld; Laura MacConaill; Wendy Winckler; Michael Reich; Nanxin Li; Jill P Mesirov; Stacey B Gabriel; Gad Getz; Kristin Ardlie; Vivien Chan; Vic E Myer; Barbara L Weber; Jeff Porter; Markus Warmuth; Peter Finan; Jennifer L Harris; Matthew Meyerson; Todd R Golub; Michael P Morrissey; William R Sellers; Robert Schlegel; Levi A Garraway
Journal:  Nature       Date:  2012-03-28       Impact factor: 49.962

5.  Differences and homologies of chromosomal alterations within and between breast cancer cell lines: a clustering analysis.

Authors:  Milena Rondón-Lagos; Ludovica Verdun Di Cantogno; Caterina Marchiò; Nelson Rangel; Cesar Payan-Gomez; Patrizia Gugliotta; Cristina Botta; Gianni Bussolati; Sandra R Ramírez-Clavijo; Barbara Pasini; Anna Sapino
Journal:  Mol Cytogenet       Date:  2014-01-23       Impact factor: 2.009

6.  Discovery and saturation analysis of cancer genes across 21 tumour types.

Authors:  Michael S Lawrence; Petar Stojanov; Craig H Mermel; James T Robinson; Levi A Garraway; Todd R Golub; Matthew Meyerson; Stacey B Gabriel; Eric S Lander; Gad Getz
Journal:  Nature       Date:  2014-01-05       Impact factor: 49.962

7.  Detection and phasing of single base de novo mutations in biopsies from human in vitro fertilized embryos by advanced whole-genome sequencing.

Authors:  Brock A Peters; Bahram G Kermani; Oleg Alferov; Misha R Agarwal; Mark A McElwain; Natali Gulbahce; Daniel M Hayden; Y Tom Tang; Rebecca Yu Zhang; Rick Tearle; Birgit Crain; Renata Prates; Alan Berkeley; Santiago Munné; Radoje Drmanac
Journal:  Genome Res       Date:  2015-02-11       Impact factor: 9.043

8.  Whole genome sequence analysis of BT-474 using complete Genomics' standard and long fragment read technologies.

Authors:  Serban Ciotlos; Qing Mao; Rebecca Yu Zhang; Zhenyu Li; Robert Chin; Natali Gulbahce; Sophie Jia Liu; Radoje Drmanac; Brock A Peters
Journal:  Gigascience       Date:  2016-02-09       Impact factor: 6.524

9.  Reanalysis of RNA-sequencing data reveals several additional fusion genes with multiple isoforms.

Authors:  Sara Kangaspeska; Susanne Hultsch; Henrik Edgren; Daniel Nicorici; Astrid Murumägi; Olli Kallioniemi
Journal:  PLoS One       Date:  2012-10-31       Impact factor: 3.240

10.  ClinVar: public archive of relationships among sequence variation and human phenotype.

Authors:  Melissa J Landrum; Jennifer M Lee; George R Riley; Wonhee Jang; Wendy S Rubinstein; Deanna M Church; Donna R Maglott
Journal:  Nucleic Acids Res       Date:  2013-11-14       Impact factor: 16.971

View more
  3 in total

1.  Whole genome sequence analysis of BT-474 using complete Genomics' standard and long fragment read technologies.

Authors:  Serban Ciotlos; Qing Mao; Rebecca Yu Zhang; Zhenyu Li; Robert Chin; Natali Gulbahce; Sophie Jia Liu; Radoje Drmanac; Brock A Peters
Journal:  Gigascience       Date:  2016-02-09       Impact factor: 6.524

2.  Loss of STAT6 leads to anchorage-independent growth and trastuzumab resistance in HER2+ breast cancer cells.

Authors:  Molly DiScala; Matthew S Najor; Timothy Yung; Deri Morgan; Abde M Abukhdeir; Melody A Cobleigh
Journal:  PLoS One       Date:  2020-06-11       Impact factor: 3.240

Review 3.  Recent Advances in Experimental Whole Genome Haplotyping Methods.

Authors:  Mengting Huang; Jing Tu; Zuhong Lu
Journal:  Int J Mol Sci       Date:  2017-09-11       Impact factor: 5.923

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.