Literature DB >> 23084779

Wavelet analysis of DNA walks on the human and chimpanzee MAGE/CSAG-palindromes.

Yanjiao Qi1, Nengzhi Jin, Duiyuan Ai.   

Abstract

The palindrome is one class of symmetrical duplications with reverse complementary characters, which is widely distributed in many organisms. Graphical representation of DNA sequence provides a simple way of viewing and comparing various genomic structures. Through 3-D DNA walk analysis, the similarity and differences in nucleotide composition, as well as the evolutionary relationship between human and chimpanzee MAGE/CSAG-palindromes, can be clearly revealed. Further wavelet analysis indicated that duplicated segments have irregular patterns compared to their surrounding sequences. However, sequence similarity analysis suggests that there is possible common ancestor between human and chimpanzee MAGE/CSAG-palindromes. Based on the specific distribution and orientation of the repeated sequences, a simple possible evolutionary model of the palindromes is suggested, which may help us to better understand the evolutionary course of the genes and the symmetrical sequences.
Copyright © 2012. Published by Elsevier Ltd.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 23084779      PMCID: PMC5054716          DOI: 10.1016/j.gpb.2012.07.004

Source DB:  PubMed          Journal:  Genomics Proteomics Bioinformatics        ISSN: 1672-0229            Impact factor:   7.691


Introduction

In nature, symmetry can be found everywhere, including macroscopic and microcosmic objects. The palindrome, as one of the symmetrical sequences, consists of two arms of similar DNA-with one inverted and complemented relative to the other around a central point, usually nonhomologous spacer. Previous studies report that palindromic sequences are frequently observed and important for the structure and/or function of several classes of proteins [1], [2]. However, extracting statistical features within the palindrome still needs to be further explored, and this may improve understanding of the organization of candidates for immunotherapy [3], [4], and the evolution of life on the genomic level [5], [6]. Nowadays, methods of signal processing are becoming increasingly popular for various applications in bioinformatics as they may facilitate the exploration of intrinsic structural features. For example, mutual information functions [7], [8], autocorrelation functions [9], [10], power spectra [11], [12], “DNA walk” representation [13], Zipf analysis [14], Fourier analysis [15] and so on have revealed many interesting physical properties of DNA molecules. However, the mosaic structure of DNA sequences is one of the main obstacles to intricate statistical analysis [16]. These patches appear in the DNA walk landscapes and are likely to introduce some breaking of scale invariance [17], [18]. Wavelets have been widely applied to a variety of biomedical problems with great success [19], [20], [21]. In addition, wavelets are also well suited to visualizing patterns in DNA sequences and extracting regions with biological interest [22]. Wavelet transform has been found useful for verifying the existence of long-range correlations in intronic DNA sequences [23], and characterizing the scaling properties of sequences [24], [25]. With the completion of the human genome sequence, comparative genomics has become a powerful approach to extracting genetic information from large stretches of nucleotide sequences. The chimpanzee, our closest living relative, may help us to understand humans thoroughly both in function and statistical features. Fourier transformation is often used to convert a signal to the frequency domain. However, there are some disadvantages that restrict its wide application. For example, Fourier transformation cannot provide information about a simple discontinuity signal spectrogram and time-localization [26]. Compared with Fourier transformation, wavelets show advantages in analyzing signals that contain discontinuities and sharp spikes. This is mainly due to the fact that the wavelet transform incorporates in its definition two basic features, time and scale, which are important to fractal processes. Therefore, there is a growing interest in using wavelets in the sequence analysis. One of the easiest ways to extract information from a DNA sequence is to “view” it. In this study, we used the Daubechies wavelet (db1) 1-D Daubechies binary discrete wavelet analysis of DNA walks on the palindromes that contains some MAGEA and CSAG cancer/testis family genes (so called “MAGE/CSAG-palindrome”) [27]. By digital signal processing tools, we try to address the pattern irregularities in the palindromes, which are often associated with biological function. Our data suggest that segmental duplications and their reverse complemented sequences, which are located in either the two arms or the spacers of the palindromes, display distinct pattern regularities under wavelet analysis. In addition, a simple evolutionary model is proposed for the evolutionary relationship between human and chimpanzee based on the symmetrical structure of the MAGE/CSAG-palindromes.

Results and discussion

The orthologous segmental repeats have higher similarities than the paralogous sequences

We performed dot matrix program alignment for the palindrome sequences in human and chimpanzee. It was found that the sequence length is 111.399 kb for the palindrome from human, and 107.974 kb for that from chimpanzee. The left and right arms of human palindrome H_IR are 51.214 and 51.191 kb, respectively, with spacer of 8.994 kb. However, the arms of the homologous X-palindrome in chimpanzee are shorter (44.294 and 44.308 kb, respectively), while the spacer is lager (19.372 kb). The arm-to-arm similarity in palindrome is 99.7% and 99.8% in the chimpanzee and human, respectively. However, the similarity of orthologous sequences between human and chimpanzee is a little bit lower, which is 97.4%, 97.3% and 94.3% for the left arms, right arms and the spacer regions, respectively. As seen in Figure 1, the dot plot shows that there are five segmental repeats, r1 to r5, on the MAGE/CSAG-palindrome from both human (H_r1∼H_r5) and chimpanzee (P_r1∼P_r5). These include two inverted repeats and two direct repeats on both arms of the palindrome and one segmental repeat in the spacers which is completely reversed. Using the Martinez-NW Method, we calculated the similarity between these repeats. The results showed that the similarities between the repeats within the same species were mostly less than 90% (Table S1A, B), while the similarities were more than 90% between orthologous repeats of the human and chimpanzee palindromes (Table S1C). For example, the similarity between r1 and the other repeats (r2 to r5) in human is 80.60%, 82.20%, 80.60% and 99.50%, respectively while the r1 repeat from human and chimpanzee shares 97.8% similarity.
Figure 1

Sketch map of segmental repeats on the palindrome in human and chimpanzee H_r1∼H_r5 and P_r1∼P_r5 indicate the segmental repeats (1–5) on the MAGE/CSAG-palindrome in the human (Homo sapiens) and chimpanzee (Pan troglodytes), respectively. Gray arrowheads denote the arms of the palindromes. The segmental repeats (H_r3 and P_r3) in the spacer regions are reverse complementary between human and chimpanzee (showed in the red circle). The position of these repeats are acquired by dot matrix program alignment (H_r1: 14816∼24813, H_r2: 30428∼41411, H_r3: 51211∼61752, H_r4: 66086∼80973, H_r5: 82719∼96568; P_r1: 14765∼24744, P_r2: 30378∼40998, P_r3: 46630∼57177, P_r4: 63065∼77581, P_r5: 79327∼93220).

Three-dimensional DNA walks of the human and chimpanzee X palindromes

Graphical representations of DNA sequences are helpful because they allow visual observations of base patterns, sequence composition and evolution. The four-letter genome alphabet is firstly converted into some numerical pattern. By increasing the dimensionality of numerical DNA sequence, DNA walk representation is no longer limited to strictly binary classifications only [13], and may be useful to reflect structural information. In this study, we adopt an image representation for nucleotides based on the 3D DNA walks analysis. Similar trends of the repetitive regions on the both arms of palindromes, and dissimilar tendencies in the spacers can be observed easily using the complex walk representation. The resulting walk sequences were plotted in the 3D Cartesian coordinate system with the index k for sequence Y plotted along the Z axis (Figure 2). The DNA walk plots provide a direct graphical representation for these direct repeats and inverted repetitive sequences. In particular, the graphical representation highlights the similarity and difference of the palindromes between human and chimpanzee. The remarkable differences were further investigated by wavelet analysis to visualize the fractal pattern along the human and chimpanzee palindromes.
Figure 2

3D DNA walks of palindromes on the X chromosomes DNA walk for palindromes on the human and chimpanzee X chromosomes was shown in black and blue, respectively. Here, we define the value for A, for T, for C, for G. For any position k we have a cumulative sum. So the sum of A − T is a real part of the sum, and the sum of C − G is the imaginary part of the sum. The base numbering is shown on Z axis.

Wavelet analysis of DNA walks of the palindromes

Figure 3 display the 3D DNA walks wavelet analysis of the MAGE/CSAG-palindromes for human and chimpanzee, respectively. The top portion of each of the figures produced by the Wavelet toolbox plots the wavelet coefficients versus the base index number, while the bottom portion is a “temperature” plot of versus the scale and base index number. High temperatures correspond to high intensities of [22]. Regions of high intensity in the wavelet transform are correlated with segmental duplications, which contain testis/cancer MAGEA/CSAG genes. As shown in Figure 3A, regions of high intensity in the yellow boxes are matched to these segmental repeats, which include all introns and exons of MAGE-A genes [27], such as MAGEA 6, MAGEA 2B, MAGEA 12, MAGEA 2 and MAGEA 3 genes. However, some CSAG genes fell out of the scope, such as CSAG2, CSAG4, CSAG1 and CSAG2B. A similar phenomenon was observed in the MAGE/CSAG-palindrome of chimpanzee (Figure 3B). Indeed, these segmental repeats generate irregular patterns in the DNA walk, which show high intensity in the wavelet transform. The region of high-intensity appearing on the right side of the fifth rectangular box may be due to its high GC components. This suggested that the segmental duplications with biological functions of the palindromes confer this distinct wavelet transform pattern from their surrounding sequences without biological functions.
Figure 3

Wavelet transform analysis for 3D DNA walk of X-palindrome A. Yellow boxes (high temperature) represent the location of segmental repeats on human palindrome, including two repetitive segments on each arm and one in the spacer region. B. Yellow boxes represent the location of segmental repeats on chimpanzee palindrome, including two direct repetitive segments on the left arm, one in the spacer, and two inverted repetitive segments on the right arm.

Phylogenetic relationship between the human and chimpanzee palindromes

The similarity between the whole human and chimpanzee palindrome structure is 94.5%. In order to look for evidence that the MAGE/CSAG-palindrome was present in the common ancestor of human and chimpanzee, we further analyzed the two 400 bp inner boundaries (the sequences between the left arm and the spacer, and between the spacer and the right arm) and the two 500 bp outer boundaries (the sequence between the left/right boundary and the left/right arm). Results showed that the identities of the orthologous outer boundaries are remarkably high between human and chimpanzee, but the orthologous inner boundaries have very low similarities. For example, the similarity between the left outer boundary of H_IR (L_O_H) and the left outer boundary of P_IR (L_O_P) is 99.20%, but the similarity between the left inner boundary of H_IR (L_I_H) and the left inner boundary of P _IR (L_I_P) is 34.40%. However, we observed extremely low similarity between the paralogous two outer/inner boundaries within the same palindrome (Table S2). For example, the similarity between the L_O_H and the right outer boundary of H_IR (R_O_H) is 31.60%, the similarity between the L_O_P and the right outer boundary of P_IR (R_O_P) is 31.60%. These findings suggested that the palindromes were already present in the common ancestor of humans and chimpanzees. Furthermore, it was also found that the similarities between the orthologous arms are less than that of the paralogous arms both in human and chimpanzee. In this study, all repetitive segments contain MAGE-A/CSAG genes, and the same gene order is maintained in the chimpanzee as in the human [28], [29]. Therefore, we speculated that the palindromes might pre-date separation of the human and chimpanzee lineages, and the paired arms of the palindromes evolved in concert. However, it is still a challenge for researchers to interpret the recent origin of these segments and their role in primate genome evolution [29], [30], [31], [32], [33], [34]. Segmental duplications have been shown to be associated with genome rearrangement events during species evolution [35], [36]. Unequal crossing over may happen between these repetitive segments with high similarities during evolution after human diverged from chimpanzee, which may have strong effects on long-scale correlation observed in the original palindromes. Therefore, a simple model of the evolutionary relationship between the MAGE/CSAG-palindromes of human and chimpanzee is proposed in Figure 4. The common ancestral palindrome may have three symmetrical repetitive segments on each arm. When unequal chromatid exchange occurs between tandem arrays of sequence, contraction and expansion of the array can homogenize the sequence repeats. Finally, the produced palindromes include different structural patterns. This may suggest that the evolutionary mechanism and structures of palindromes on the X chromosome differ from those of the palindromes on the Y chromosome [1], [2], [37]. However, further experiments are still needed to study the structure, biological function and the relationship between human and chimpanzee.
Figure 4

Proposed evolutionary model of the MAGE/CSAG-palindromes The common ancestral palindrome may have three symmetrical repetitive segments on each arm. The chimpanzee palindrome (A) and the human palindrome (B) were produced after unequal crossover between duplicated segments. There are two inverted repetitive segments (black regions with the right arrow) and three direct repetitive segments (black regions with the left arrow) on the chimpanzee MAGE/CSAG-palindrome. In contrast, three inverted repetitive segments and two direct repetitive segments exist on the human MAGE/CSAG-palindrome.

Conclusion

The graphic comparison of the sequences by DNA walks using wavelet analysis may help us to better investigate and characterize the features and structures, and offer a comprehensive understanding for regions of biological interest. From the 3-D DNA walk plots, we can visualize the similarities and differences between human and chimpanzee during the evolutionary process. Further wavelet analysis indicates that the sequences with biological significance have different patterns compared to surrounding sequences, and the duplicated segments present in the palindromes reflect the evolutionary course. Based on the specific distribution and possible occurrences, such as crossing over, gene conversion and so on, we proposed a simple evolutionary model of the MAGE/CSAG-palindromes on the human and the chimpanzee X chromosomes, although the numbers of palindromes on the human X chromosome considered here are not plenteous, and our knowledge about their real evolutionary origin and hidden significance is incomplete. The model allows for a direct visualization of irregular genomic structural patterns, and may offer a new vision to better understand the special structure and evolution of the X-palindromes. Further experiments are still needed to verify the particular characteristics and biological functions, as well as the phylogenetic relationship to other primates.

Materials and methods

Subjects and data

The sequences of human complete palindrome and chimpanzee BAC clones (AC145689, AC144384) were downloaded from the NCBI website (http://www.ncbi.nlm.nih.gov/) (NCBI, Build Number: 37.1). The palindrome locates on the human X chromosome 151847041∼151958439, containing some testis/cancer genes [37]. Location of the palindrome on the chimpanzee X chromosome, as well as the arms of the two palindromes, was obtained, by performing dot matrix program alignment of the BAC clones [38].

DNA walk analysis

The random 2D DNA walk plot provides a tool to exhibit periodic patterns in a sequence [13]. By increasing the dimensionality of the numerical DNA sequence, the graphical representation is no longer limited strictly to binary classifications. Here we adopt a distinct representation for nucleotides based on their mapping into the four cardinal points of the complex plane. Let denote as a DNA sequence of length N. For a position k within the DNA sequence, we define the value for A, for T, for C, for G. Furthermore, let us denote the 3D DNA walk sequence as , where for any position k we have a cumulative sum of the for 1 ⩽ i ⩽ k described by: In this method, four cardinal directions in (x, y) coordinate system are chosen to represent the content of the four bases in DNA sequences. Among the existing methods of DNA sequence visualization, 3D walk [39] is the most popular method.

Wavelet-based analysis

Wavelet-based tools are well suited to multi-resolution analysis and local feature extraction of non-stationary signals, such as locating different patterns in genome sequences [40]. The continuous wavelet transform (CWT) of a signal with respect to the wavelet is defined as:where a (the scale parameter) > 0 and (the translation parameter) is a real number. Generally, by using the CWT with discrete values of and , the discrete wavelet transform (DWT) is simply determined. Here we merely choose the Daubechies wavelet (db1) 1-D Daubechies, and denote .

Authors’ contributions

YQ conceived the idea, collected the datasets, carried out the analysis, interpreted the data and drafted the manuscript. NJ provided the corresponding computer programs about wavelet transform, and revised the manuscript. DA revised the manuscript. All authors read and approved the final manuscript.

Competing interests

The authors have declared that no competing interests exist.
  33 in total

1.  Finding pathogenicity islands and gene transfer events in genome data.

Authors:  P Liò; M Vannucci
Journal:  Bioinformatics       Date:  2000-10       Impact factor: 6.937

Review 2.  Recent duplication, domain accretion and the dynamic mutation of the human genome.

Authors:  E E Eichler
Journal:  Trends Genet       Date:  2001-11       Impact factor: 11.639

3.  Recent segmental duplications in the human genome.

Authors:  Jeffrey A Bailey; Zhiping Gu; Royden A Clark; Knut Reinert; Rhea V Samonte; Stuart Schwartz; Mark D Adams; Eugene W Myers; Peter W Li; Evan E Eichler
Journal:  Science       Date:  2002-08-09       Impact factor: 47.728

4.  Inverted repeat structure of the human genome: the X-chromosome contains a preponderance of large, highly homologous inverted repeats that contain testes genes.

Authors:  Peter E Warburton; Joti Giordano; Fanny Cheung; Yefgeniy Gelfand; Gary Benson
Journal:  Genome Res       Date:  2004-10       Impact factor: 9.043

5.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences.

Authors: 
Journal:  Phys Rev Lett       Date:  1992-06-22       Impact factor: 9.161

6.  Wavelet analysis of DNA walks.

Authors:  Adrian D Haimovich; Bruce Byrne; Ramakrishna Ramaswamy; William J Welsh
Journal:  J Comput Biol       Date:  2006-09       Impact factor: 1.479

7.  Long-range correlations in nucleotide sequences.

Authors:  C K Peng; S V Buldyrev; A L Goldberger; S Havlin; F Sciortino; M Simons; H E Stanley
Journal:  Nature       Date:  1992-03-12       Impact factor: 49.962

8.  A simple way to look at DNA.

Authors:  M A Gates
Journal:  J Theor Biol       Date:  1986-04-07       Impact factor: 2.691

Review 9.  Segmental duplications and the evolution of the primate genome.

Authors:  Rhea Vallente Samonte; Evan E Eichler
Journal:  Nat Rev Genet       Date:  2002-01       Impact factor: 53.242

10.  Coordinated expression of clustered cancer/testis genes encoded in a large inverted repeat DNA structure.

Authors:  Anne Bredenbeck; Verena M Hollstein; Uwe Trefzer; Wolfram Sterry; Peter Walden; Florian O Losch
Journal:  Gene       Date:  2008-02-29       Impact factor: 3.688

View more
  1 in total

1.  WalkIm: Compact image-based encoding for high-performance classification of biological sequences using simple tuning-free CNNs.

Authors:  Saeedeh Akbari Rokn Abadi; Amirhossein Mohammadi; Somayyeh Koohi
Journal:  PLoS One       Date:  2022-04-15       Impact factor: 3.752

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.