Cite

Introduction

The discovery of deoxyribonucleic acid (DNA) structure lies as a fundamental topic of 20th century biology and continues to be the favourite question of some contemporary scientists. Deoxyribonucleic Acid or DNA is a long linear polymer, formed by a large number of nucleotides. Each nucleotide contains a phosphate group, a sugar group and a nitrogen base. The four types of nitrogen bases are adenine (A), thymine (T), guanine (G) and cytosine (C). The uracil base (U) is presented in the RNA molecule. The DNA adopts a double helix form, which is a helical structure constituted by two complementary strands of nucleic acids. For this reason, DNA is also called α-helix.

Shortly speaking, DNA is a macromolecule formed by a large number of nucleotides. Thus, the genetic information is stored in the sequence of the bases. Therefore, the order of the bases determines the information available for the RNA production and proteins.

Powerful methods for DNA sequencing have been developed. These methods serve to sequence complete genomes. For example, small genomes from viruses or fungi, and large genomes, such as the human genome, are made up of 3000 million base pairs [1].

In the National Center for Biotechnology Information (NCBI) directories are the databases that contain complete genomes, complete sequences of chromosomes, sequences of mRNAs, and proteins. The importance to analyse the large DNA databases in the Nonlinear Dynamics context is based on the work conducted earlier by Jeffrey [2], who proposed a graphic representation of these databases via an extended chaos game. Other contributions similarly based on a statistical description of DNA sequences take on a more structured form, such as Zu-Guo Yu et al [3], where the generalized dimensions Dq and its derivative, the ’analogous’ specific heat Cq , are calculated for the coding and noncoding length sequences of bacteria.

The Multifractal formalism was originally proposed to study various chaotic models derived from phenomena associated with turbulence [4]. This theory is used as an archetype of fractal measurements composed of interwoven fractal sets characterized by the Holder exponent, and it has become a crucial tool in the analysis of statistical data.

Section 2 is be dedicated to give the theoretical support of the DNA chaos game paraphrasing the ideas of Barnsley about definition of measures on fractals. In section 3 is showed a didactic approach using a pretty small genome applying the ideas of the last section. Sections 4 and 5 involves the use of these ideas to give a multifractal interpretation to the DNA sequences. In section 5, we discuss the coarse grained multifractal theory and using the curdling theorem. The alternative definition of the singularity spectra used in the Chabbra-Jensen algorithm are obtained. In section 6, we show the multifractal spectra for the six-mers of two bacterias, two archaea, the Homosapiens chromosome 21 and a fungus, and discuss the information contained in the singularity spectra for six DNA sequences.

DNA Chaos Game

We construct an unitary square Q with its corners labeled with a different basis of the genomic sequence V = {A,C,G,T}; where V indicates the vertices of Q on the cartesian plane which are A = (0,0), C = (0,1), G =(1,1) and T =(1,0). It is applied the Jeffrey’s "chaos game algorithm", and the Chaos Game Representation (CGR) [2] is obtained. It provides an objective meaning for studying DNA sequences in the context of fractal geometry and non-linear science.

The analytical process for the construction of CGR belongs to a DNA organism sequence that begins with the choice of an arbitrary point inside the square Q. Let P(x0, y0) be the starting point. For simplicity, we take the middle point of Q, and let V(xV , yV ) be any corner of Q. P(x1, y1) are the coordinates of a middle point of the segment PV. These coordinates are given by,

x1=12(x0+xV)y1=12(y0+yV) $$\begin{array}{*{35}{l}}{{x}_{1}}=\frac{1}{2}\left( {{x}_{0}}+{{x}_{V}} \right) \\{{y}_{1}}=\frac{1}{2}\left( {{y}_{0}}+{{y}_{V}} \right) \\\end{array}$$

The matrix function of this process can be defined by

P1=ωV(P0)=(120012)(xy)+(xy2yv2). $${{P}_{1}}={{\omega }_{V}}\left( {{P}_{0}} \right)=\left( \begin{array}{*{35}{l}}\frac{1}{2} & 0 \\0 & \frac{1}{2} \\\end{array} \right)\left( \begin{array}{*{35}{l}}x \\y \\\end{array} \right)+\left( \begin{array}{*{35}{l}}\frac{xy}{2} \\\frac{yv}{2} \\\end{array} \right).$$

The next points are produced by successive applications of the same process.

Pn=ωVn(Pn1)withP0=(1212). $${{P}_{n}}={{\omega }_{{{V}_{n}}}}\left( {{P}_{n-1}} \right)\,\,\,with\,\,\,\,{{P}_{0}}=\left( \begin{array}{*{35}{l}}\frac{1}{2} \\\frac{1}{2} \\\end{array} \right).$$

It means that the DNA sequence can be expressed as a sequence of points produced by iterative applications of (3). Notice that the graph of the set of points Pn forms the CGR.

We plot the CGR of three different organisms in Fig. 1. This shows visually that the DNA sequence has some fractal patterns and a self-affine characteristics. Thus, it shows that the DNA sequence is not a random sequence.

Fig. 1

The CGR of two bacterias (Pirellula staleyi and Escherichia coli), two archaea (Nanoarchaeota archaeon and Halobacterium salinarum), the Homo sapiens chromosome 21 and a fungus Encephalitozoon intestinalis.

An IFS is associated with a DNA sequence. We construct four affine transformations [5] to explore regions in Q.

ωA=(120012)(xy)+(00)ωC=(120012)(xy)+(012)ωG=(120012)(xy)+(1212)ωT=(120012)(xy)+(120) $$\begin{align}& {{\omega }_{A}}=\left( \begin{array}{*{35}{l}}\frac{1}{2} & 0 \\0 & \frac{1}{2} \\\end{array} \right)\left( \begin{array}{*{35}{l}}x \\y \\\end{array} \right)+\left( \begin{array}{*{35}{l}}0 \\0 \\\end{array} \right) \\ & {{\omega }_{C}}=\left( \begin{array}{*{35}{l}}\frac{1}{2} & 0 \\0 & \frac{1}{2} \\\end{array} \right)\left( \begin{array}{*{35}{l}}x \\y \\\end{array} \right)+\left( \begin{array}{*{35}{l}}0 \\\frac{1}{2} \\\end{array} \right) \\ & {{\omega }_{G}}=\left( \begin{array}{*{35}{l}}\frac{1}{2} & 0 \\0 & \frac{1}{2} \\\end{array} \right)\left( \begin{array}{*{35}{l}}x \\y \\\end{array} \right)+\left( \begin{array}{*{35}{l}}\frac{1}{2} \\\frac{1}{2} \\\end{array} \right) \\ & {{\omega }_{T}}=\left( \begin{array}{*{35}{l}}\frac{1}{2} & 0 \\0 & \frac{1}{2} \\\end{array} \right)\left( \begin{array}{*{35}{l}}x \\y \\\end{array} \right)+\left( \begin{array}{*{35}{l}}\frac{1}{2} \\0 \\\end{array} \right) \\ \end{align}$$

It is important to note that there is a relationship between the chaos game algorithm and the Bernoulli mapping. This is possible, thanks to the chaos game properties. The sequence of points P = {P1P2 . . .Pn} and the sequence of DNA have a one-to-one relation. The P is unique for each genome. Knowing the coordinates of Pn, the coordinates of Pn−1 can be determined by using a binary representation for the x and y coordinates. It is possible to show that the inverse relationship of (1) is:

xn1=2xnmod1yn1=2ynmod1 $$\begin{align}& {{x}_{n-1}}=2{{x}_{n}}\,\,mod1 \\ & {{y}_{n-1}}=2{{y}_{n}}\,\,mod1 \\ \end{align}$$

Fig. 2

Bernoulli map.

Any point of P contains the information of which are all the bases of DNA sequence up to that point. Since, knowing Pn, all the points prior to it can be calculated, and using each point corresponds to a base of the genome which are the bases before the n −th.

The Hutchinson’s operator of (4) is W(B)=ωA(B)ωC(B)ωG(B)ωT(B)whereBQ. $W\left( B \right)={{\omega }_{A}}\left( B \right)\bigcup {{\omega }_{C}}\left( B \right)\bigcup {{\omega }_{G}}\left( B \right)\bigcup {{\omega }_{T}}\left( B \right)\,\,\text{where}\,\,B\subset Q.$The application of each function to Q provides an square QV = ωV (Q) with side 1/2. The action of Hutchinson’s operator on Q forms the unitary square, because the union of four squares of side 1/2 is again Q. In other words, Q is invariant under W [6].

Q=W(Q) $$Q=W\left( Q \right)$$

When we apply W twice to Q, we have

W2(Q)=V2V1ωV2ωV1(Q)=V2V1QV2V1 $${{W}^{2}}\left( Q \right)=\bigcup\limits_{{{V}_{2}}}{\bigcup\limits_{{{V}_{1}}}{{{\omega }_{{{V}_{2}}}}}}\circ {{\omega }_{{{V}_{1}}}}\left( Q \right)=\bigcup\limits_{{{V}_{2}}}{\bigcup\limits_{{{V}_{1}}}{{{Q}_{{{V}_{2}}{{V}_{1}}}}}}$$

in this way, a grid with 42 squares is obtained„ each one with side (1/2)2. For m applications of W, we obtain 4m sub-squares of side (1/2)m, which conform Q.

We have considered subregions of Q instead of chaos game points, due to the fact that P0 is the midpoint of Q, P1 = ωV (P0) is the midpoint of QV1 , P2 = ωV2 ° ωV1(P0) is the midpoint of QV2V1 , and so on. Thus, PN = ωVN ◦· · ·◦ωV1(P0) is the midpoint of QVN···V1.

Then, each DNA sequence generates a large collection of squares of sides [ (1/2),(1/2)2,,(1/2)N ]. $\left[ \left( {1}/{2}\; \right),{{\left( {1}/{2}\; \right)}^{2}},\ldots ,{{\left( {1}/{2}\; \right)}^{N}} \right].$The squares satisfy the following property:

QVNVN1V2V1QVNVN1V2QVNVN1QVNQ. $${{Q}_{{{V}_{N}}{{V}_{N-1}}\ldots {{V}_{2}}{{V}_{1}}}}\subset {{Q}_{{{V}_{N}}{{V}_{N-1}}\ldots {{V}_{2}}}}\subset \circ \circ \circ \subset {{Q}_{{{V}_{N}}{{V}_{N-1}}}}\subset {{Q}_{{{V}_{N}}}}\subset Q.$$

Consequently, we can observe that the number of monomers V = (A,C,G,T) in a DNA sequence matches the number of points inside QV and the number of dimers V1V2 matches with the number of points inside QV2V1 . On the whole, the number of R-mers is given by the number of points inside of QVR···V2V1.

The assertion that the DNA sequence is not a random sequence involves the appearance of statistical techniques for the analysis of the different distributions of the DNA sequences.

A simple example

This section exemplifies how chaos game works. Through a playful model, we propose to illustrate the theoretical support of DNA chaos game. In this context, a small DNA sequence GAATC it is a toy genome.

Lets us take our toy GAATC. As we have pointed out in (2), the sequence points of GAATC are: {P1 = ωG(P0),P2 = ωA(P1),P3 = ωA(P2),P4 = ωT (P3),P5 = ωC(P4)}, we can see its graph or CGR in figure 3(a).

Fig. 3

(a): The CGR of GAATC. (b): Application of the Hutchinson’s operator to the square.

The application of Hutchinson’s operator to Q and the SFI (4) may be illustrated by the figure 3(b). Therefore, we can show the subregion’s descriptions for this small sequence, where we have obtained sub-squares of side [ (1/2),(1/2)2,(1/2)3,(1/2)4,(1/2)5 ] $\left[ \left( {1}/{2}\; \right),{{\left( {1}/{2}\; \right)}^{2}},{{\left( {1}/{2}\; \right)}^{3}},{{\left( {1}/{2}\; \right)}^{4}},{{\left( {1}/{2}\; \right)}^{5}} \right]$in figure 4. A tiny sequence provides a didactic point of view to motivate the reader to make his or her own games on the computer.

Fig. 4

Subregion’s description of GAATC, in (a) application of functions, in (b) localization.

We must emphasize that, there exist a subtle distinction between two processes. One of them is the functions’ application, it means that the application of (4) is given according to the characters apparition order in the sequence and it is described graphically in the figure 4(a). The other corresponds to the location of sub-squares, where given a sub-square, we can find it, using the property (8). It means that in the localization processes, the sequence must be read in a reverse order as shown in the figure 4(b).

DNA sequence’s singular measure

Figure 1 shows that a fractal F could be generated by a certain DNA sequence and F is contained in Q. From the graphical point of view, 21 Human chromosome’s CGR is the practical proof of self-similarity. If we look closely 1(d), we will notice that the upper right quadrant QG, is repeated at every subregion on Q but in reduced scales and different tonalities. We see that the tonalities vary in some nontrivial manner; it can be related to a complex measure. Therefore, this is our motivation to use the multifractal theory.

The description of subregions provides a systematic way for the construction of a cover CR, this is called the optimal cover of F. The CR is composed of squares QR of side λR, each QR has a statistical measure μ(QR) ≠ 0 given by the frequency with which the DNA sequence visits QR. The aim of this description is that each QR of the cover CR is associated with a R-mer of the DNA sequence.

Summarizing, we can say that a R-mer is a DNA subsequence of size R. It means, that this subsequence hasR basis, if it is within the DNA sequence, then the probabilistic measure of a R-mer depends on the total number with which this appears in the DNA sequence. The cover CR gives the number of all the different possible R-mers: 4R.

In order to get the frequency of each different R-mer of a DNA sequence with N basis, we denote NV1V2...VR as the number of times R-mer appears in the total sequence. It can be easily noticed that the total number of R-mers in the entire sequence is N −R +1. Therefore the frequency of a given subsequence is

FV1V2VR=NV1V2VRNR+1 $${{F}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}}=\frac{{{N}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}}}{N-R+1}$$

Now, we may say by the fact that the R-mer is inside of the square QV1V2...VR , that is its probabilistic measure μ and its side λ are:

μV1V2VR=FV1V2VR&λ(V1V2VR)=(12)R $${{\mu }_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}}={{F}_{{{V}_{1}}{{V}_{2\ldots {{V}_{R}}}}}}\,\,\,\,\,\,\And \,\,\,\,\,\lambda \left( {{V}_{1}}{{V}_{2}}\ldots {{V}_{R}} \right)={{\left( \frac{1}{2} \right)}^{R}}$$

To understand R-mers statistical behaviour more thoroughly, it is necessary to make an intermediate step where we establish a one-to-one relationship between sub-squares of Q and sub-intervals of the unit interval [6].

Thus, we use a quaternary basis associating each nitrogenous base with a number: A = 0, C = 1, G = 2 and T = 3, then the DNA sequence becomes a sequence of digits. This makes it easy the calculation of the frequencies in the environment of a programming language (python 3.6.0). We have plotted the histograms of 21 Human chromosome in figure 5. We see the same behavioral patterns as in a multiplicative process’ measure.

Fig. 5

21 Human chromosome histograms for R = 1 to R = 6.

Multifractal analysis

In this section we use the coarse grained multifractal theory to characterize the different R-mers that are contained in the DNA sequence, as each R-mer has associated a sub-square QR of size (1/2)R , these sub-squares conform the cover CR = {QR} of the fractal F generated by the DNA sequence. Each QR has a coarse grained Holder’s exponent, which is given by:

α(QV1V2VR)=lnμ(QV1V2VR)lnλ(QV1V2VR)=lnμ(QV1V2VR)lnλR $$\alpha \left( {{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}} \right)=\frac{\ln \mu \left( {{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}} \right)}{\ln \lambda \left( {{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}} \right)}=\frac{\ln \mu \left( {{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}} \right)}{\ln {{\lambda }_{R}}}$$

The fractal F is covered by interwoven subsets Jα, which are defined as:

Jα={ QV1V2VR|α(QV1V2VR)(α,α+Δα) } $${{J}_{\alpha }}=\left\{ {{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}}\left| \alpha \left( {{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}} \right)\in \left( \alpha ,\alpha +\Delta \alpha \right) \right. \right\}$$

The pre-fractal dimension of each Jα is given by

fR(α)=lnNR(α)lnλR $${{f}_{R}}\left( \alpha \right)=-\frac{\ln {{N}_{R}}\left( \alpha \right)}{\ln {{\lambda }_{R}}}$$

where NR(α) is the cardinality of Jα. The multifractal spectra of the cover means the knowledge of the prefractal dimension of all subsets Jα. We determine these spectra using the Chabbra-Jensen algorithm [7].

More details of the algorithm than we present in this paper can be found in [8]. In this algorithm, is fundamental the definition of the escort probability measure of μ(QV1V2...VR) of order q is fundamental, which is given by [9]:

μ(q,QV1V2VR)=[ μ(QV1V2VR) ]qZR(q) $$\mu \left( q,{{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}} \right)=\frac{{{\left[ \mu \left( {{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}} \right) \right]}^{q}}}{{{Z}_{R}}\left( q \right)}$$

where

ZR(q)=j[ μ(QV1V2VR) ]q $${{Z}_{R}}\left( q \right)={{\sum\limits_{j}{\left[ \mu \left( {{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}} \right) \right]}}^{q}}$$

The sum over j is taken over all R-mers with a measure different from zero. Taking into account that:

ifQV1V2VRJαthenμα(QV1V2VR)=λRα $$if\,\,\,\,{{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}}\subset {{J}_{\alpha }}\,\,\,\,\,\,then\,\,\,\,{{\mu }_{\alpha }}\left( {{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}} \right)=\lambda _{R}^{\alpha }$$

and that ZR(q) satisfies the power law:

ZR(q)=λRτ(q)orτ(q)=lnZR(q)lnλR $${{Z}_{R}}\left( q \right)=\lambda _{R}^{\tau \left( q \right)}\,\,\,\,\,or\,\,\,\,\tau \left( q \right)=\frac{\ln {{Z}_{R}}\left( q \right)}{\ln {{\lambda }_{R}}}$$

The q-measure of a set Jα is given by:

μ(q,Jα)=NR(α)μα(q,QV1V2VR)=λRfR(α)[ μ(QV1V2VR) ]qZR(q)=λRqατ(q)fR(α) $$\mu \left( q,{{J}_{\alpha }} \right)={{N}_{R}}\left( \alpha \right){{\mu }_{\alpha }}\left( q,{{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}} \right)=\lambda _{R}^{-{{f}_{R}}\left( \alpha \right)\frac{{{\left[ \mu \left( {{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}} \right) \right]}^{q}}}{{{Z}_{R}}\left( q \right)}=\lambda _{R}^{q\alpha -\tau \left( q \right)-{{f}_{R}}\left( \alpha \right)}}$$

As the q-measure only takes values between zero and one, then

τ(q)qαf(α) $$\tau \left( q \right)\le q\alpha -f\left( \alpha \right)$$

For all values of λR, the q-measure obtains its maximal value for a special value α(q), which satisfies:

τ(q)=qα*(q)f(α*(q)) $$\tau \left( q \right)=q{{\alpha }^{*}}\left( q \right)-f\left( {{\alpha }^{*}}\left( q \right) \right)$$

We note that (19) and (20) imply that

τ(q)=infα(qαf(α)) $$\tau \left( q \right)=\underset{\alpha }{\mathop{\inf }}\,\left( q\alpha -f\left( \alpha \right) \right)$$

where the infimum is taken over all possible values of α. This expression establishes that τ(q) and f (q) are a couple of Legendre transforms.

The behaviour of the q-measure around α(q) is found using a logarithmic expansion of (18):

lnμ(q,Jα)=lnμ(q,Jα*)+(αα*){ qdf(α)dα ]α* }lnλR+(αα*)22{ d2f(α)dα2 ]α* }lnλR $$\ln \mu \left( q,{{J}_{\alpha }} \right)=\ln \mu \left( q,{{J}_{{{\alpha }^{*}}}} \right)+\left( \alpha -{{\alpha }^{*}} \right)\left\{ q-{{\left. \frac{df\left( \alpha \right)}{d\alpha } \right]}_{{{\alpha }^{*}}}} \right\}\ln {{\lambda }_{R}}+\frac{{{\left( \alpha -{{\alpha }^{*}} \right)}^{2}}}{2}\left\{ -{{\left. \frac{{{d}^{2}}f\left( \alpha \right)}{d{{\alpha }^{2}}} \right]}_{{{\alpha }^{*}}}} \right\}\ln {{\lambda }_{R}}$$

Then, the results of this expansion are:

q=df(α)dα ]α*(q)andlnλRd2f(α)dα2 ]α*(q)<0 $$q={{\left. \frac{df\left( \alpha \right)}{d\alpha } \right]}_{{{\alpha }^{*}}\left( q \right)}}\,\,\,\,and\,\,\,\,\,\,-\ln {{\lambda }_{R}}{{\left. \frac{{{d}^{2}}f\left( \alpha \right)}{d{{\alpha }^{2}}} \right]}_{{{\alpha }^{*}}\left( q \right)}}<0$$

Therefore, the q-measure for values of α near α takes the form:

μ(q,Jα)=12πσαexp(αα*(q))22σα2withσα2= f(α*(q))lnλR $$\mu \left( q,{{J}_{\alpha }} \right)=\frac{1}{\sqrt{2\pi {{\sigma }_{\alpha }}}}\exp -\frac{{{\left( \alpha -{{\alpha }^{*}}\left( q \right) \right)}^{2}}}{2\sigma _{\alpha }^{2}}\,\,\,with\,\,\,\,\sigma _{\alpha }^{2}=\left\| {f}''\left( {{\alpha }^{*}}\left( q \right) \right)\ln {{\lambda }_{R}} \right\|$$

It shows that q-measure concentrates around Jα*(q), $J_{\alpha }^{*}\left( q \right),$and when λR0,Jα* ${{\lambda }_{R}}\to 0,J_{\alpha }^{*}$is the support of the q-measure, i.e.

limλR0μ(q,JR)=δ(αα*(q)) $$\underset{{{\lambda }_{R}}\to 0}{\mathop{\lim }}\,\mu \left( q,{{J}_{R}} \right)=\delta \left( \alpha -{{\alpha }^{*}}\left( q \right) \right)$$

then, Jα∗ is the curdling set of the q-measure, and the result (21) is called the curdling theorem [8].

The value of α(q) is found taking the derivative of (20) and using (22).

α*(q)=dτ(q)dq=1lnλRdlnZR(q)dq $${{\alpha }^{*}}\left( q \right)=\frac{d\tau \left( q \right)}{dq}=\frac{1}{\ln {{\lambda }_{R}}}\frac{d\ln {{Z}_{R}}\left( q \right)}{dq}$$

where in the second term of the r.h.s. of (26), (17) was used. Finally, using (14) and (15), we obtain that:

α*(q)=1lnλRV1V2VRμ(q,QV1V2VR)lnμ(QV1V2VR) $${{\alpha }^{*}}\left( q \right)=\frac{1}{\ln {{\lambda }_{R}}}\sum\limits_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}{\mu \left( q,{{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}} \right)}\ln \mu \left( {{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}} \right)$$

The fractal dimension of Jα∗ is determined using (20). i.e.

f(α*(q))=qα*(q)τ(q)=qα*(q)lnZR(q)lnλR $$f\left( {{\alpha }^{*}}\left( q \right) \right)=q{{\alpha }^{*}}\left( q \right)-\tau (q)=q{{\alpha }^{*}}\left( q \right)-\frac{\ln {{Z}_{R}}\left( q \right)}{\ln {{\lambda }_{R}}}$$

introducing (25) in the last expression, we obtained:

f(α*(q))=1lnλRV1V2VRμ(q,QV1V2VR)lnμ(q,QV1V2VR) $$f\left( {{\alpha }^{*}}\left( q \right) \right)=\frac{1}{\ln {{\lambda }_{R}}}\sum\limits_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}{\mu \left( q,{{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}} \right)}\ln \mu \left( q,{{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}} \right)$$

The relations (27) and (29) are the alternative definition of the singularity spectrum used in the Chhabbra and Jensen algorithm. The curve fR(α) versus α is obtained when the q-parameter is eliminated, but these expressions contain more information about the singularity spectra, as we will discuss in the next section.

Concluding Remarks

We begin discussing the information contained in the Chhabbra-Jensen algorithm, considering different values of the q-parameter. The simplest case is q = 0, when the q-measure is given by:

μ(q=0,QV1V2VR)=1ZR(q=0)=14R $$\mu \left( q=0,{{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}} \right)=\frac{1}{{{Z}_{R}}\left( q=0 \right)}=\frac{1}{{{4}^{R}}}$$

It corresponds to consider that all the R-mers have the same probability. The value of the fractal dimension of the curdling set, is evaluated using (29), then

f(α*(q=0))=1lnλR(4R)μ(q=0,QR)lnμ(q=0,QR)=Rln4Rln2=2 $$f\left( {{\alpha }^{*}}\left( q=0 \right) \right)=\frac{1}{\ln {{\lambda }_{R}}}\left( {{4}^{R}} \right)\mu \left( q=0,{{Q}_{R}} \right)\ln \mu \left( q=0,{{Q}_{R}} \right)=\frac{-R\ln 4}{-R\ln 2}=2$$

This is the fractal dimension of Q which is the support of F , and it is the maximum value of the fractal dimension. As follows from (27), the value of the α(q = 0) is given by:

α*(q=0)=1lnλR14RV1V2VRlnμ(QV1V2VR)=14RV1V2VRα(QV1V2VR) $${{\alpha }^{*}}\left( q=0 \right)=\frac{1}{\ln {{\lambda }_{R}}}\frac{1}{{{4}^{R}}}\sum\limits_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}{\ln \mu \left( {{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}} \right)=\frac{1}{{{4}^{R}}}}\sum\limits_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}{\alpha \left( {{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}} \right)}$$

This is the average of the Holder’s exponent of all the members of the cover CR. When q = 1, the q-measure reduces to the measure of the QR, i.e. μ(q = 1,QV1V2...VR) = μ(QV1V2...VR). From (27) and (29), we have

α*(q=1,QV1V2VR)=f(α*(q=1))=1Rln2V1V2VRμ(QV1V2VR)lnμ(QV1V2VR) $${{\alpha }^{*}}\left( q=1,{{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}} \right)=f\left( {{\alpha }^{*}}\left( q=1 \right) \right)=-\frac{1}{R\ln 2}\sum\limits_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}{\mu \left( {{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}} \right)}\ln \mu \left( {{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}} \right)$$

Thus, for the measure μ(QR), the fractal dimension of the curdling set Jα∗ is identical with the value of the Holder’s exponent which characterize this set, but this value is given by the average entropy of the cover CR of it. We analyse the behavior of μ(q,QR) when q goes to infinity. For large values of q, the measure that gives the maximum contribution to ZR(q) is

μMAX=maxV1V2VRμ(QV1V2VR)=maxV1V2VR(12R)α(V1V2VR)=12Rαmin $${{\mu }_{MAX}}=\underset{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}{\mathop{\max }}\,\mu \left( {{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}} \right)=\underset{{{V}_{1}}{{V}_{2}}\ldots {{V}_{R}}}{\mathop{\max }}\,{{\left( \frac{1}{{{2}^{R}}} \right)}^{\alpha \left( {{V}_{1}}{{V}_{2}}\ldots {{V}_{R}} \right)}}=\frac{1}{{{2}^{R{{\alpha }_{min}}}}}$$

Then, for large values of q, we have

ZR(q)NR(αmin)[ μMAX(QR) ]q $${{Z}_{R}}\left( q \right)\approx {{N}_{R}}\left( {{\alpha }_{min}} \right){{\left[ {{\mu }_{MAX}}\left( {{Q}_{R}} \right) \right]}^{q}}$$

Thus, using this result into (14), we obtain

μ(q,QV1V2V3)=1NR(αmin)[ μ(QV1V2V3)μMAX(QR) ]q $$\mu \left( q,{{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{3}}}} \right)=\frac{1}{{{N}_{R}}\left( {{\alpha }_{min}} \right)}{{\left[ \frac{\mu \left( {{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{3}}}} \right)}{{{\mu }_{MAX}}\left( {{Q}_{R}} \right)} \right]}^{q}}$$

Therefore,

limqμ(q,QV1V2V3)=1NR(αmin)δ(μ(QV1V2V3)μMAX(QR)) $$\underset{q\to \infty }{\mathop{\lim }}\,\mu \left( q,{{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{3}}}} \right)=\frac{1}{{{N}_{R}}\left( {{\alpha }_{min}} \right)}\delta \left( \mu \left( {{Q}_{{{V}_{1}}{{V}_{2}}\ldots {{V}_{3}}}} \right)-{{\mu }_{MAX}}\left( {{Q}_{R}} \right) \right)$$

Thus, when it is taking the limit of (27) when q →∞, we obtain

limqα(q)=lnμMAX(QR)lnλR=αmin(QR) $$\underset{q\to \infty }{\mathop{\lim }}\,\alpha \left( q \right)=\frac{\ln {{\mu }_{MAX}}\left( {{Q}_{R}} \right)}{\ln {{\lambda }_{R}}}={{\alpha }_{\min }}\left( {{Q}_{R}} \right)$$

Similarly, using the fact that μmin(QR)=λRαmax, ${{\mu }_{\min }}\left( {{Q}_{R}} \right)=\lambda _{R}^{{{\alpha }_{\max }}},$the behavior of the Holder’s exponent can be obtained when q →−∞, i.e.

limqα(q)=αmin(QR) $$\underset{q\to \infty }{\mathop{\lim }}\,\alpha \left( q \right)={{\alpha }_{\min }}\left( {{Q}_{R}} \right)$$

The important characteristics of the fractal spectra fR(α) are the following:

The curve is always convex upward.

The q parameter is the slope of the straight line which is tangent to the curve fR(α) in any value of α. Both properties follow from (23).

The equation of this straight line is y = q [α −α(q)]+ f (α(q)). Therefore, it intersects the y-axis in −τ(q), as follows from (22).

The maximum value of fR(α) occurs at α(q = 0) = α0. At this point f is equal to the fractal dimension of the support.

The fR(α) curve is tangent to the line y = α, the point of tangency occurs at α1, this value corresponds with the information of fractal dimension of the curdling set.

The range of values of α are given by αmin = α(q→∞) ≤ α ≤ αmax(q→∞).

We conclude that there are at least four important values of the Holder’s exponent, α01min and αmax, which characterize globally the fractal spectra. The value α0 where fR(α) reaches its maxima; the value of α1 which determines the fractal dimension of the curdling set, the other two values are used for determining the width of the spectrum W = αmax−αmin, and the skew shape of the spectrum r = (αmax−α0)/(α0−αmin). The skew parameter r determines which fractal exponents are dominant. Right skew shape of the spectrum with r > 1 is more complex than the left skew shape with r < 1 [10, 11, 12].

The geometrical shape of the multifractal spectra illustrates the level of multifractality of the DNA sequences [10]. In figure 6, we show the multifractal spectra of the sixth-mers of several DNA sequences, all of them were constructed with the range of q’s between qmin = 30 and qmax = 30; with steps Δq = 0.5.

Fig. 6

Multifractal spectra of two bacterias: Pirellula staleyi and Escherichia coli, two archaea: Nanoarchaeota archaeon and Halobacterium salinarum, the Homo sapiens chromosome 21 and a fungi Encephalitozoon intestinalis

The values of the quantities that we select for evaluating the "randomness" are reported in Table 1. The maximum of all curves is 2; which is due to that all take the square Q as the support of F. The parameter r shows that the Halobacterium salinarum’ spectrum has a greater symmetry than the others, and the rest have a right-skewed shape. The bacteria have a higher value of W and r, while the archaea have a lower value of them. This suggests that the multifractal spectrum of bacteria presents a greater complexity than the spectrum of archaea. This suggests that the use of the multifractal technique could be used for a quantitative classification of archaea and bacteria.

Relevant parameters of multifractals of Fig. 6 : the value of α where the maximum value occurs α0, the wide of spectrum W and the spectrum skew shape r.

Organism α0 W r
Pirellula staleyi 2.0673621898 1.4825717654943124651 2.06044807855
Escherichia coli 2.04444319969 1.394197126734667779 2.34682776089
Halobacterium salinarum 2.13387945312 1.3762064622503003001 0.944824001754
Nanoarchaeota archaeon 2.08756863398 1.346437367927127315 1.22732173982
Homo sapiens chr 21 2.11728218486 1.705128432757234025 1.40041368164
Encephalitozoon intestinalis 2.06212028856 1.222195891979336313 2.53446680707
eISSN:
2444-8656
Language:
English
Publication timeframe:
Volume Open
Journal Subjects:
Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics