| Literature DB >> 23876763 |
Qi Dai1, Zhaofang Yan, Zhuoxing Shi, Xiaoqing Liu, Yuhua Yao, Pingan He.
Abstract
Lempel-Ziv complexity has been widely used for sequence comparison and achieved promising results, but until now components' distribution in exhaustive history has not been studied. This paper investigated the whole distribution of LZ-words and presented a novel statistical method for sequence comparison. With the components' length in mind, we revised Lempel-Ziv complexity and obtained various sets of LZ-words. Instead of calculating the LZ-words' contents, we defined a series of set operations on LZ-word set to compare biological sequences. In order to assess the effectiveness of the proposed method, we performed two sets of experiments and compared it with alignment-based methods.Entities:
Keywords: Lempel–Ziv complexity; Phylogenetic analysis; Set operation; Word set
Mesh:
Year: 2013 PMID: 23876763 PMCID: PMC7094135 DOI: 10.1016/j.jtbi.2013.07.008
Source DB: PubMed Journal: J Theor Biol ISSN: 0022-5193 Impact factor: 2.691
Fig. 1All the transition operations and extension operations between the sets and the sets according to the set.
Fig. 2The comparison of k-word counts of the deduced sequences HCoV-229E_LZ and HCoV-229E_RLZ with k from 1 to 4.
Fig. 3Comparison of length distribution of the components in the exhaustive history obtained by Lempel–Ziv (LZ) complexity and revised Lempel–Ziv complexity.
Abbreviation for the strains, accession number, nucleotide length, genotype, acronym and country for each of the 48 complete HEV genomes.
| 1 | B1 (Bur-82) | 7207 | I | AA | Burma (Rangoon) | |
| 2 | B2 (Bur-86) | 7194 | I | AB | Burma (Rangoon) | |
| 3 | I2 [Mad-93] | 7194 | I | AC | India (Madras) | |
| 4 | I3 | 7194 | I | AD | India (Hyderabad) | |
| 5 | Np1(TK15/92) | 7199 | I | AE | Nepal (Kathamandu) | |
| 6 | P2[Abb-2B] | 7143 | I | AF | Pakistan (Abbottabad) | |
| 7 | Yam-67 | 7206 | I | AG | India (Yamuna Nagar) | |
| 8 | C1(CHT-88) | 7207 | I | AH | China (Xinjiang, Hetian) | |
| 9 | C2(KS2–87) | 7221 | I | AI | China (Xinjiang, Kashi) | |
| 10 | C3(CHT-87) | 7176 | I | AJ | China (Xinjiang, Hetian) | |
| 11 | C4(Uigh179) | 7194 | I | AK | China (Xinjiang, Uighur) | |
| 12 | China Hebei | 7200 | I | AR | China (Hebei) | |
| 13 | P1(Sar-55) | 7138 | I | AM | Pakistan (Sargodha) | |
| 14 | I1(FHF) | 7202 | I | AN | India | |
| 15 | Morocco | 7212 | I | AO | Morocco | |
| 16 | T3 | 7170 | I | AP | Chad | |
| 17 | M1 | 7180 | II | BB | Mexico (Telixtac) | |
| 18 | HE-JA10 | 7262 | III | CA | Japan (Tokyo) | |
| 19 | JKN-Sap | 7256 | III | CB | Japan (Sapporo) | |
| 20 | JMY-HAW | 7240 | III | CC | Japan (Sapporo) | |
| 21 | swUS1 | 7207 | III | CD | USA | |
| 22 | US1 | 7202 | III | CE | USA (Minnesota) | |
| 23 | US2 | 7277 | III | CF | USA (Tennessee) | |
| 24 | JBOAR1-Hyo04 | 7247 | III | CG | Japan (Hyogo) | |
| 25 | JDEER-Hyo03L | 7230 | III | CH | Japan (Hyogo) | |
| 26 | JJT-KAN | 7218 | III | CI | Japan (Kanagawa) | |
| 27 | JMO-Hyo03L | 7180 | III | CJ | Japan (Hyogo) | |
| 28 | JRA1 | 7230 | III | CK | Japan (Tokyo) | |
| 29 | JSO-Hyo03L | 7180 | III | CR | Japan (Tokyo) | |
| 30 | JTH-Hyo03L | 7180 | III | CM | Japan (Tokyo) | |
| 31 | JYO-Hyo03L | 7180 | III | CN | Japan (Tokyo) | |
| 32 | swJ570 | 7257 | III | CO | Japan (Tochigi) | |
| 33 | Kyrgyz | 7239 | III | CP | Kyrgyzstan | |
| 34 | Arkell | 7255 | III | CQ | Canada (Ontario, Guelph) | |
| 35 | HE-JA1 | 7258 | IV | DA | Japan (Hokkaido) | |
| 36 | HE-JK4 | 7250 | IV | DB | Japan (Tochigi) | |
| 37 | HE-JI4 | 7186 | IV | DC | Japan (Tochigi) | |
| 38 | JAK-Sai | 7236 | IV | DD | Japan (Saitama) | |
| 39 | JKK-Sap | 7235 | IV | DE | Japan (Sapporo) | |
| 40 | JSM-Sap95 | 7202 | IV | DF | Japan (Hokkaido) | |
| 41 | JSN-Sap-FH | 7234 | IV | DG | Japan (Hokkaido) | |
| 42 | JSN-Sap-FH02C | 7251 | IV | DH | Japan (Hokkaido) | |
| 43 | JTS-Sap02 | 7202 | IV | DI | Japan (Hokkaido) | |
| 44 | JYW-Sap02 | 7202 | IV | DJ | Japan (Hokkaido) | |
| 45 | swJ13–1 | 7258 | IV | DK | Japan (Hokkaido) | |
| 46 | swCH25 | 7270 | IV | DR | China (Uighur) | |
| 47 | T1 | 7232 | IV | DM | China (Beijing) | |
| 48 | CCC220 | 7193 | IV | DN | China (Changchun) |
Fig. 4Cluster trees of 48 HEV genomes using tree construction algorithm Neighbor-joining based on the proposed operation measure with SSM1, SSM2, SSM3, SSM4, and SSM5.
The accession number, abbreviation, name and length for each of the 24 coronavirus genomes.
| No | Accession | Group | Abbreviation | Genome | Length(nt) |
|---|---|---|---|---|---|
| 1 | I | HCoV-229E | Human coronavirus 229E | 27,317 | |
| 2 | I | TGEV | Transmissible gastroenteritis virus | 28,586 | |
| 3 | I | PEDV | Porcine epidemic diarrhea virus | 28,033 | |
| 4 | II | BCoVM | Bovine coronavirus strain Mebus | 31,032 | |
| 5 | II | BCoVL | Bovine coronavirus isolate BCoV–LUN | 31,028 | |
| 6 | II | BCoVQ | Bovine coronavirus strain Quebec | 31,100 | |
| 7 | II | BCoV | Bovine coronavirus | 31,028 | |
| 8 | II | MHVM | Murine hepatitis virus strain ML–10 | 31,100 | |
| 9 | II | MHV2 | Murine hepatitis virus stain 2 | 31,028 | |
| 10 | II | MHVP | Murine hepatitis virus strain Penn 97–1 | 31,233 | |
| 11 | II | MHV | Murine hepatitis virus | 31,276 | |
| 12 | III | IBV | Avian infectious bronchitis virus | 27,608 | |
| 13 | IV | BJ01 | SARS coronavirus BJ01 | 29,725 | |
| 14 | IV | Urbani | SARS coronavirus Urbani | 29,727 | |
| 15 | IV | HKU-39849 | SARS coronavirus HKU-39849 | 29,742 | |
| 16 | IV | CUHK-W1 | SARS coronavirus CUHK–W1 | 29,736 | |
| 17 | IV | CUHK-Su10 | SARS coronavirus CUHK–Su10 | 29,736 | |
| 18 | IV | SIN2500 | SARS coronavirus Sin2500 | 29,711 | |
| 19 | IV | SIN2677 | SARS coronavirus Sin2677 | 29,705 | |
| 20 | IV | SIN2679 | SARS coronavirus Sin2679 | 29,711 | |
| 21 | IV | SIN2748 | SARS coronavirus Sin2748 | 29,706 | |
| 22 | IV | SIN2774 | SARS coronavirus Sin2774 | 29,711 | |
| 23 | IV | TW1 | SARS coronavirus TW1 | 29,729 | |
| 24 | IV | TOR2 | SARS coronavirus | 29,751 |
Fig. 5Phylogenetic tree of 24 coronavirus genomes based on (a) the proposed operation measure and (b) multiple alignment CLUSTAL X.
| … | … | ||||
| … | … |
| 11/15 | 2/15 | 2/15 |