| Literature DB >> 21524290 |
Xiaofang Jiang1, Huanhuan Jiang, Cun Li, Sheng Wang, Zhiqiang Mi, Xiaoping An, Jiankui Chen, Yigang Tong.
Abstract
BACKGROUND: T4 phage is a model species that has contributed broadly to our understanding of molecular biology. T4 DNA replication and packaging share various mechanisms with human double-stranded DNA viruses such as herpes virus. The literature indicates that T4-like phage genomes have permuted terminal sequences, and are generated by a DNA terminase in a sequence-independent manner;Entities:
Mesh:
Substances:
Year: 2011 PMID: 21524290 PMCID: PMC3105952 DOI: 10.1186/1743-422X-8-194
Source DB: PubMed Journal: Virol J ISSN: 1743-422X Impact factor: 4.099
Figure 1Read sequence distribution. The occurrence frequencies of sequences in raw data file 1.fq (shown in the upper graph) and 2.fq (shown in the lower graph) show that although the majority of the read sequences have 6-21 occurrences, high occurrence reads exist in both raw read files.
Top 20 high frequency sequences in raw sequencing data
| read sequence | frequencies | genome presence | |||||
|---|---|---|---|---|---|---|---|
| rank | total | 1.fq | 2.fq | Ori1 | Pos2 | Junction3 | |
| GCTCTTCGGAAAGGTCAAAAACAGTTTGAG...... | 1 | 828 | 427 | 401 | F | 30641 | tctatttggagctcttcgga |
| GTTTTACAGAATCGTACTCGGCCTTGTTCG...... | 2 | 705 | 388 | 317 | F | 3272 | aattactggagttttacaga |
| GTATAATGATTCATCAACAAACAAAAGACA...... | 3 | 692 | 383 | 309 | R | 30486 | cccttttggagtataatgat |
| GCGTAATTCCACCTTTTTCTTCCCAATCTT...... | 4 | 673 | 352 | 321 | F | 52555 | tcttgttggagcgtaattcc |
| GGTATACATCATTAAATAACGATGTATATC...... | 5 | 641 | 333 | 308 | F | 163251 | agaaattggaggtatacatc |
| GTATTTCAAGAAACGTGATAAAGCCCAGGC...... | 6 | 577 | 318 | 259 | F | 121764 | aacgtttggagtatttcaag |
| GCGTAATTGCTTCAGGTAAGCCTTTAGGAT...... | 7 | 505 | 256 | 249 | F | 73140 | agaatatggagcgtaattgc |
| GTGCATGATTGGTAACAGTTCGGCAACCCA...... | 8 | 505 | 277 | 228 | F | 40411 | ggtctttggagtgcatgatt |
| GTTTTACAGACAACGCAAATCTTATCTGAC...... | 9 | 496 | 253 | 243 | F | 115803 | atcgattggagttttacaga |
| GCTGAAAAGGCAGCTGAAACTAAAGCCGCT...... | 10 | 494 | 270 | 224 | R | 3702 | taaattagcagctgaaaagg |
| GTATAATGTAAAAACAAACCTGAGGAAATT...... | 11 | 490 | 274 | 216 | R | 32654 | ctcccttggagtataatgta |
| GTATTAACAAGATTCCAGAATTTCTCACCC...... | 12 | 481 | 253 | 228 | R | 75276 | gttttctggagtattaacaa |
| GTTTCTCAGCGATTTTAATCGACCACTCTT...... | 13 | 448 | 238 | 210 | F | 29924 | tcgtcttggagtttctcagc |
| GTTACATAAGCATCAGGAGCAGATGGTCCC...... | 14 | 445 | 254 | 191 | R | 102003 | ttgctttggagttacataag |
| GCTTTAATCTTAACAATAGTGCCGAGATAA...... | 15 | 443 | 245 | 198 | F | 165136 | gtatttacctgctttaatct |
| GCTGAACGTACCGAAGTTGCAGGTATGACT...... | 16 | 440 | 266 | 174 | R | 28799 | gttgttcagagctgaacgta |
| GTATAATCTTTCTATCAACTTGAGGAGAAT...... | 17 | 434 | 217 | 217 | R | 46215 | gatggatggagtataatctt |
| GCTGCATCTTCAGATTGGTCTTCGTCTTCA...... | 18 | 431 | 251 | 180 | F | 5448 | ttcagatggagctgcatctt |
| GTTATTACTAAACAAGTTTTTAACCGCACT...... | 19 | 426 | 222 | 204 | R | 122567 | ctcccttggagttattacta |
| GTTAACAAATGCCATACGACATTTAAGGGA...... | 20 | 425 | 208 | 217 | F | 56968 | aacgtttagagttaacaaat |
1 Orientation in genome. F, forward; R, reverse.
2 Position in genome.
3 The genomic sequence around the start site of the HFSs.
Figure 2Frequency distribution of the top 20 high frequency sequences (HFSs) in the original sequencing data files.
Figure 3Percentage of first bases in read sequences. In read sequences with 5-15 occurrences, the first base percentage is comparable with the base composition in the genome. As the read sequence occurrence increases, the percentage of A and G goes up and comprises 100% of the first bases in sequences that occur more than 100 times. Base G is the only first base in sequences that occur greater than 160 times.
Figure 4Sequence logo representation of sequences around the start sites of the top 20 HFSs. Sequences were plotted with WebLogo [23]. The height of the letter indicates the degree of conservation. Nucleotide 11 is the start site of HFSs.
Figure 5Occurrence of the reverse sequences around the top 50 forward and reverse HFSs. The frequencies of the reverse sequences starting from a particular position around the first base of the HFSs were calculated. The frequency profile of the forward and reverse HFSs are near identical, with position +2 reverse sequences (R+2) having a remarkablely higher occurrence than the others.
Figure 6Sequence logo representation of sequence around the start sites of paired HFSs. Sequences were plotted with WebLogo [23]. Height of letter indicates degree of conservation. Nucleotide 11 is the start site of forward HFSs.
Figure 7Distribution of top 50 forward and reverse HFSs on the IME08 genome.