| Literature DB >> 35627143 |
Pablo Mier1, Miguel A Andrade-Navarro1.
Abstract
Homorepeat sequences, consecutive runs of identical amino acids, are prevalent in eukaryotic proteins. It has become necessary to annotate and evaluate this feature in entire proteomes. The definition of what constitutes a homorepeat is not fixed, and different research approaches may require different definitions; therefore, flexible approaches to analyze homorepeats in complete proteomes are needed. Here, we present polyX2, a fast, simple but tunable script to scan protein datasets for all possible homorepeats. The user can modify the length of the window to scan, the minimum number of identical residues that must be found in the window, and the types of homorepeats to be found.Entities:
Keywords: homorepeats; low-complexity regions; web tool
Mesh:
Substances:
Year: 2022 PMID: 35627143 PMCID: PMC9141109 DOI: 10.3390/genes13050758
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.141
Figure 1PolyX2 workflow.
Figure 2PolyX2 web tool. (a) Execution module on home page; (b) overview of the results.
Running time and number of homorepeats detected for several protein datasets and default parameters in the web server and using the standalone script.
| Datasets | Number of Proteins | Time 1 | Time 1 | Number of Homorepeats |
|---|---|---|---|---|
| Fruit fly proteome | 13,806 | 30 s | 12 s | 4459 |
| Human proteome | 20,609 | 46 s | 21 s | 2871 |
| Isoform sequences | 40,403 | 1 m 33 s | 47 s | 6303 |
| SwissProt | 563,972 | - | 6 m 59 s | 26,535 |
1 Averaged after 10 executions. 2 Server running Debian GNU/Linux 10. 3 Executed on a Lenovo ThinkPad 64-bit with 15.3 Gb of RAM and an Intel Core i7-8665U CPU @ 1.90GHz × 8, running Ubuntu 20.04.02 LTS.