Interval Modulated Frequency Distribution Problems

by

Huen Y.K.

CAHRC, P.O.Box 1003, Singapore 911101
http://web.singnet.com.sg/~huens/
email: huens@mbox3.singnet.com.sg

(A short communication - 1st released: 19/1/98.)


Abstract

One of the favourite past-time of the author is to make forays into life sciences in search of novel mathematical operations or algorithms. This is usually fruitful since Nature's mathematics in biology is often different from that of the human kind. Here is a problem borrowed from findings from genetic linkage pioneered by William Bateson, R.C.Punnett and Thomas Hunt Morgan [1]. In sequence algebra, the outer product of two identical normalised number sequences will generate a 2D-matrix map where the frequencies of occurrence of cross-product terms have equal probability. This property was previously used by the author to validate the two Mendalian principles algebraically [2]. Morgan and Sturtevant discovered that genes on the same chromosome do not obey Mendel's principle of independent assortment as the probabilities are interval modulated. In this paper, the mathematics of interval modulated frequency distributions are investigated by sequence algebra and some findings reported.


1. Introduction

In this paper the outer product between two identical sequences are computed using the general expression shown in equation (1). f(a(i),b(j)) is a function of the interval distances between the ith and jth terms in a sequence. In this problem, the upperbounds ub of the two summations shall be taken as identical, so that the resultant matrix map is always square. Since the two sequences are identical, a diagonal line of symmetry exists. Putting f(a(i),b(j)) = 1 makes frequencey distribuiton independent of intervals. For uniformity, we call this a zero-ordered system since f(a(i),b(j))^0=1. If f(a(i),b(j)) is a function of absolute difference between a(i) and b(j), this is defined as a first ordered problem and if f(a(i),b(j)) is a quadratic function, this is defined as a second ordered problems. However, Morgan and Sturtevant also found interferences if there is a third gene positioned between the other two genes. Collectively, the author calls these "interval modulated frequence distribution" problems. This type of problems can involve 2 or more genes but for simplicity, investigations are confined to cases with influence confined to 2 or 3 genes only. A generalised equation for the 2-gene problem is given by equation (1) where k determines the order and ub the number of terms in each sequence, it being assumed that both sequences are identical. Extension to the 3-gene problem is given by equation (2) where the order of the three terms are important. In realistic applications, the array element c(i) is always bracketted by the other two elements a(i) and b(i).



2-GENE FORMULA

Outerprod:=sum(sum(f(a(i),b(j))/(x^a(i)*y^b(j)),i=1..ub),j=1..ub);............(1a).

..........................................ub.../..ub....................\
.......................................-----..|-----...............k..|
.........................................\......|..\.....f(a(i),b(j))....|
....................Outerprod :=..)....|...).------------...|.................(1b).
........................................./......|../........a(i)..b(j)....|
.......................................-----..|-----..y....z..........|
.........................................j = 1.\..i=1................../



3-GENE FORMULA

sum(sum(sum(abs(f(a(i),c(i),b(i)))/(x^a(i)*y^b(j)*z^c(k)),i=1..k-1),j=k+1..ub),k=2..(ub-1));.............(2a).

.......................ub - 1../..ub......./k - 1.................................\\
.........................-----..|..-----...|-----...................................||
..........................\........|...\........| \.....abs(f(a(i), c(i), b(i)))...||
...........................).......|....).......|..) ------------------------..||.....................(2b).
........................../........|.../........| /............a(i)...b(j)...c(k).....||
.........................-----..|..-----...|-----.....x......y......z.............||
.......................k = 2....\j=k+1..\i = 1..................................//


2. Classification Of Problems And Examples

Problems can be classified into two types and are applicable to both the 2-term and 3-term formulae given above:

Type 1: Here the sequences are known and the matrix maps are to be computed using equations (1a) or (1b). This type of problem is relatively straightforward.

Type 2: Here the frequency distribution in the matrix map is known and from experimental data we are required to predict the original sequences used in the outer-product. If the frequency distribution is nonlinear, the problem might be difficult to solve. In any case, even if we have complete lookup tables for all combinations, there is no guarantee that we can find the original sequences determinstically unless one can prove that the the outer-product is a bijective mapping operation. At present a proof of the existence or otherwise of this desirable property has not been attempted.



TYPE 1 PROBLEMS


2-TERM PROLBEMS

Case (1): Zero Ordered Problem, f(a(i),b(j))^k = 1, i.e. k = 0.

Zero ordered problems arise from the outer-product of identical normalised sequences. All terms in the square matrix have equal probability of occurrence and is independent on whether the sequences have uniform or irregular intervals. A theorem concerning this property is worded as follows:

Theorem On Outer-Product Of Normalised Sequences: The outer-product of two or more identical normalised sequences with arbitrary intervals between terms will always result in a matrix map with uniform frequency distributions.

Proof: Equations (3a) and (3b) give normalised number sequences with arbitrary intervals between successive terms. Equation (3) gives the outer-product of these two sequences. The result will be the same even if the outer-product of more than two identical sequences are taken.

................................................1........1..........1...........1
................................Seq1 := ----- + ----- + ----- + ----- ........................(3a).
...............................................a(1).....a(2)......a(3).......a(4)
..............................................x.........x..........x...........x

................................................1.........1.........1..........1
...............................Seq2 := ----- + ----- + ----- + -----..........................(3b).
................................................b(1)....b(2)......b(3).......b(4)
...............................................x........x..........x...........x

............................1...................1..................1...................1..................1..................1
Outerprod := ----------- + ----------- + ----------- + ----------- + ----------- + -----------
........................a(2)..b(1)......a(2)..b(2)......a(2)..b(3)......a(2)..b(4)......a(3)..b(1)......a(3)..b(2)
.......................x.....x............x.....x...........x.......x...........x......x...........x......x............x......x

...................1..................1..................1...................1...................1...................1
.........+ ----------- + ----------- + ----------- + ----------- + ----------- + -----------
..............a(3)..b(3)......a(3)..b(4)......a(4)..b(1)......a(4)..b(2).......a(4)..b(3)......a(4)..b(4)
............x......x...........x.......x..........x.......x...........x......x............x.......x...........x......x

..................1...................1...................1...................1
.........+ ----------- + ----------- + ----------- + ----------- .............................(4).
.............a(1)..b(1).......a(1)..b(2)........a(1)..b(3).......a(1)..b(4)
............x.....x............x.......x............x.......x............x.......x

Note that no assumptions are made on intervals the indexed array elements. Since the numerators of all terms in equation (3) are of unity values, this means that the frequency distribution is uniform and is independent of the intervals between sequence terms. This applies also to outer-products of 3 or more identical normalised sequences. Q.E.D.

Example 1: Here is a practical example which demonstrates uniform frequencies of distribution of a zero-ordered system.

............................................................1.......1.........1
....................................Seq1 := 1/z + ---- + ---- + ----- .......................(5a).
..............................................................3.......7.........13
............................................................x.......x.........x

...........................................................1........1.........1
....................................Seq2 := 1/z + ---- + ---- + ----- .......................(5b).
.............................................................3........7........13
...........................................................y........y.........y

Taking the outer-product we get the resultant 2D-sequence as shown in equation (6):

Outerprod:=

.1.......1........1........1.......1.......1........1........1........1...........1........1..........1...........1
--- + ---- + ----+-----+----+----+------+------+------+------+------+------+------
y x........3.....3.......3..3.......7....7.........3.7......7..3......13....7..7.....13......3..13.....13..3
..........yx.....y x....y..x.....yx....y..x......y..x......y..x.....yx......y..x.....y...x....y..x........y..x

......1.........1............1
+ ------ + ------ + ------- ................................................(6).
....7..13....13..7.....13..13
...y..x......y...x......y...x

Terms in equation (6) can be arranged into a 4 x 4 cell matrix as shown in Table 1 where all terms have equal probability of occurrence. In genetics, this would have been called a Punnett square [1].

...Table 1 - Terms in a 4x4 cell matrix
.....x...........1..........3..........7..........13
y....|..frequencies of occurrence of terms
----------------------------------------------
1....|...........1..........1..........1..........1
3....|...........1..........1..........1..........1
7....|...........1..........1..........1..........1
13..|...........1..........1..........1..........1
----------------------------------------------

If one uses the model that Mendel developed to show the principle of independent assortment, this Punnett square will return the ratios of 9:3:3:1 amongst the F2 phenotypes [1]. It shows that the same result can be obtained algebraically using the outer-product of two identical normalised sequences.

We will get the same result in equation (1) by replacing f(a(i),b(j)) by 1 and assigning values to the array elements given below. The procedure is straightforward and will not be elaborated here.

.........................a(1) and b(1) = 1;
.........................a(3) and b(3) = 3;
.........................a(7) and b(7) = 7;
and...................a(13) and b(13) = 13.

Case (2): First Ordered Problem, f(a(i),b(j))=abs(a(i)-b(j))^k+1, with k = 1. The reason for adding a 1 to this function is to ensure that the diagonal values remain at unity.

..........................................ub.../..ub......................\
.......................................-----..|-----.....................|
.........................................\......|..\ abs(a(i)-b(j)))+1|
....................Outerprod :=..)....|...).------------......|.................(7).
........................................./......|../........a(i)..b(j)......|
.......................................-----..|-----..y....z............|
.........................................j = 1.\..i=1..................../

We assign a(1)=b(1)=1; a(2)=b(2)=3; a(3)=b(3)=7; and a(4)=b(4)=13 and compute using the Maple program line given below:

Outerprod:=sum(sum((abs(a(i)-b(j))+1)/(x^a(i)*y^b(j)),i=1..4),j=1..4);

which results in the 2D-sequence given by equation (6):

Outerprod:=

...1......3.........7.......13........3.........1.........5..........11.........7.........5..........1...........7.........13
--- + ---- + ---- + ----- + ---- + ----- + ----- + ------ + ---- + ----- + ----- + ------ + -----
..x y....3.........7.........13.........3......3..3......7..3.....13..3........7......3..7......7...7......13..7......13
.........x..y.....x..y......x...y....x y......x..y......x..y......x...y.......x y......x..y.......x..y........x...y.....x y

.....11............7.............1
+ ------ + ------ + -------...............................(8).
......3..13.....7..13.....13..13
....x...y.......x..y........x...y

Terms in equation (6) are arranged in a 4x4 cell matrix shown in Table 2.

...Table 2 - Terms in a 4x4 cell matrix
.....x...........1..........3..........7..........13
y....|..frequencies of occurrence of terms
----------------------------------------------
1....|...........1..........3..........7..........13
3....|...........3..........1..........5..........11
7....|...........7..........5..........1..........7
13..|...........13........11........7..........1
----------------------------------------------

Although not relevant, counting frequencies of occurrence assuming this to be a Punnett square will give the ratio of 56:23:15:1. This is interpreted to mean that the surface profile has been modified compared to that in case (i).

Case (3): Second Ordered Problem, f(a(i),b(j))=abs(a(i)-b(j))^2+1.

Outerprod:=sum(sum((abs(a(i)-b(j))^2+1)/(x^a(i)*y^b(j)),i=1..4),j=1..4);

Outerprod:=

..1......5.......37.......145........5.........1........17........101.......37.......17........1.........37........145
--- + ---- + ---- + ----- + ---- + ----- + ----- + ------ + ---- + ----- + ----- + ------ + -----
..x y....3.........7.........13.........3......3..3......7..3.....13..3........7......3..7......7...7....13..7........13
.........x..y.....x..y......x...y....x y......x..y......x..y......x...y.......x y......x..y.......x..y......x...y.......x y

........101........37...........1
+ ------ + ------ + -------.......................................(9).
......3..13.....7..13.....13..13
....x...y.......x..y........x...y

Terms in equation (7) are arranged in a 4x4 cell matrix as shown in Table 3. The surface profile is more nonlinear but it still has the same general shape compared to that in cases (i) and (ii) and the diagonal values remain at unity.

...Table 3 - Terms in a 4x4 cell matrix
.....x...........1..........3..........7..........13
y....|..frequencies of occurrence of terms
----------------------------------------------
1....|...........1..........5..........37........145
3....|...........5..........1..........17........101
7....|...........37........17........1..........37
13..|...........145......101......37........1
----------------------------------------------

The second-ordered model is introduced just to show that by varying the power index the multiplier function, one has some control over the surface profile of the square matrix. Purely for curiosity, counting frequencies of occurrence assuming this to be a Punnett square will return the ratio of 243:139:78:1.

It is easy to modify the surface profile such that the diagonal line will have maximum values and falling values of off diagonal terms which increase with distances from the diagonal. All that needs to be done is to subtract values of all cells in cases (i) and (ii) by the cell with the largest value, i.e., either [a(4),b(1)] or [a(1),b(4)] which, being symmetrically displaced cells contain the same values. These need no further elaborations.

Case (4): Raising the whole surface above the base plane with increasing values along the diagonal line governed by f(a(i),b(j))=(i*j+j*i)^k/2 where k is an integer power index. The reason for using symmetrical pairs in the 4x4 matrix is to conserve diagonal symmetries. This surface profile is quite different from those in cases (i) to (iii) where the terms in the diagonal lines are all unity values. This profile is useful if one wants values of terms along the diagonal to increase with distance from the origin or the smallest term. Only the case for k = 1 is demonstrated although nonlinear increment can be obtained by increasing k above unity value.

Outerprod:=sum(sum(abs((i*j+j*i)/2)/(x^a(i)*y^b(j)),i=1..4),j=1..4);

...1......2......3.........4.......2.........4.........6.........8.........3........6.........9........12........4
--- + ----+ ----+ -----+ ----+ -----+ -----+ ------+ ----+ -----+ -----+ ------+ -----
..x y.....3.......7.......13.........3.....3..3.....7..3.....13..3.......7....3..7......7..7....13..7......13
..........x..y...x..y....x...y....x.y.....x..y.....x..y......x...y......x y....x..y......x..y.....x...y.....x y

.......8..........12...........16
+ ------ + ------ + -------......................................(10).
....3..13......7...13......13..13
...x..y.......x...y..........x...y

Terms in equation (8) are arranged in a 4x4 cell matrix as shown in Table 4. Note that the diagonal terms of 1, 4, 9 and 16 are perfect squares.

...Table 4 - Terms in a 4x4 cell matrix
.....x...........1..........3..........7..........13
y....|..frequencies of occurrence of terms
----------------------------------------------
1....|...........1..........2..........3..........4
3....|...........2..........4..........6..........8
7....|...........3..........6..........9..........12
13..|...........4..........8..........12........16
----------------------------------------------




DOUBLE-CROSSOVER PROBLEMS

In genetic linkage problems, this is the type with 3 genes on one chromosome. This cannot be simply extended via the previous 2-term problem since the relative orders of the three terms must be preserved. This can be clarified using figure 1:

================X=Z===========Y===================

Fig.1-Three terms with variable intervals between the x-z pair and the z-y pair.

In modulated frequency distributions, the probability that two terms at very close proximity should be zero such as that shown between x and z in figure 1. The range of z is dictated by the positions of x and y since relative order will be lost if z ventures beyond the bounds of either x or y. The matrix map is now 3- dimensional involving variables x, y, and z but with the additional constraint that only values of z bracketted by those between x and y are computed. To simplify the problem, we assume that contribution of probabilities of occurrence by the x-y pair is nil. This simply means that we are concentrating on the probabilities of double crossovers. Equation (11) shows the general formula used. The numerator expression abs(a(i)-b(j))*abs(b(j)-c(k)) represents the product of two probabilities contributed by the two crossovers.

evalf((sum(sum(sum(abs(a(i)-b(j))*abs(b(j)-c(k))/(x^a(i)*y^b(j)*z^c(k)),i=1..k-1), j=(k+1)..ub),k=2..(ub-1))));

.......................ub - 1../..ub......./k - 1............................................\\
.........................-----..|..-----...|-----..............................................||
..........................\........|...\........| \.....abs(a(i)-b(j))*abs(b(j)-c(k))...||
...........................).......|....).......|..) ------------------------.............||..........................(11).
........................../........|.../........| /............a(i)...b(j)...c(k)................||
.........................-----..|..-----...|-----.....x......y......z........................||
.......................k = 2....\j=k+1..\i = 1.............................................//

The computations only give absolute product probabilities instead of rate of combination since the variables of x, y, and z are not defined for dominant and recessive properties. In realistic situation, one the count of the number of recombinants will be used as the divisor for the rate. This can be easily included but we leave details to practitioners.

Numeric Example: The chromosome is assumed to have ten genes where we are required to compute a table of rate of recombinants for triplets x, y, and z where z is considered as positioned between x and y. The boundings are arranged such that only ordering of x-z-y are computed. If it is required to compute for other orders such as x-y-z or y-x-z, recomputatons are necessary. This simplifies data presentations. To interprete the results in equation (11b), take the first term where the numerator indicates the product probability of 2 and the denominator indicates that z falls exactly halfway between x and y. It can be seen that as the intervals get larger, the product of probabilities also get larger which is within expactations. Note that probabilities are not reported as fractions since no divisors are used.

Outerprod:=sort((sum(sum(sum(abs(i-j)*abs(j-k)/(x^i*y^j*z^k),i=1..(k-1)),j=(k+1)..6),k=2..5)));.........(12a).

Outerprod:=

....2...........6..........3.........12...........8...........20.........2.............4..........15..........6
-------+-------+-------+-------+-------+-------+--------+-------+-------+--------
.....3..2.......4..2.......4..3.......5..2.......5..3.......6..2....2..4..3.......5..4.......6..3....2..5..3
x y..z......x y..z....x y..z.....x y..z.....x y..z.....x y..z.....x..y..z.....x y..z.....x y..z.....x..y..z

.......10..........3...........12.........5............8............2............4............6............3
+-------+--------+--------+-------+--------+--------+--------+--------+--------
.......6..4.....2..5..4....2..6..3.......6..5....2..6..4....3..5..4....2..6..5....3..6..4....3..6..5
...x y..z.....x..y..z.....x..y..z.....x y..z.....x..y..z.....x..y..z.....x..y..z.....x..y..z.....x..y..z

..........2
+ --------...............................................(11b).
......4..6..5
.....x..y..z




3. Conclusions

Because sequence algebra is the algebra of sequences, it has found a suitable niche in mathematical genetics. Although the present paper is primary mathematical, sufficient attention has been paid to work in genetic linkage so that practioners in that field could easily pick up from where the author left unfinished. Sequence algebra have found suitable applications in more field than one. For example, the power indices in the denominators can be used to indicate the positions of genes along the linear gene map whilst the numerator values used to indicate frequencies of occurrence. Furthermore, it lends itself easily to compuations using symbolic packages. The theory of sequence algebra is quite new but most papers of importance can be found from this URL-site because the author has made this a resource center for this new discipline. This is convenient since there are no published texts on this domain at present. Lastly, although type 2 problems are included in the classifications, no examples are shown. The reverse problem of determining the whole sequence either algebraically or from a lookup table sounds challenging but it has not yet been attempted. This is because the author is not sure whether the mapping relation is bijective.

4. References

Comments: Not all references in this list are directly referred in the main paper. Most are provided for readers not familiar with sequence algebra. These papers can be easily hyperlinked whilst you are browsing in the URLsite. Most html files are quite short and can be download quite fast without unzipping operations.

1. Weaver R.F. and Hedrick P.W.:Basic Genetics, WCB Publishers, 2nd Edition, 1995, Printed in Dubuque, pp 154-159.

2. A Sequence Algebraist's Attempts To Learn From Life Sciences - Huen Y.K. (Date Released 14/1/98, 38 Kbytes)

================================================
3. Huen Y.K.: A Matrix Map for Prime and Non-prime Numbers, INT. J. Math. Educ. Sci. Technol., 1994, VOL. 25, NO.6, pp 913-920.

4. Huen Y.K.: Some Interesing Properties Of The Natural Number System, Int. J. Math. Educ. Sci. Technol., 1996, VOL.27, NO. 5, 685-691.

5. Huen Y.K.: Visual algebra and its applications, INT. J. Math. Educ. Sci. Technol.,1996, VOL.??, NO.?, ???-??? (In the press as proof paper mes 100421).

6. A Simple Introduction To Sequence Algebra - by Huen Y.K. (date release: 15.3.97) (38 KBytes, 11*A4 pages).

========================================================

7. The Canonical Generating Function or CGF(z) ... - by Huen Y.K. (date released : 27.5..97) (24 KBytes, 7*A4s).

========================================================

8. Visual Solutions Of Number Theoretic Problems ..... - by Huen Y.K. (date released : 3.6.97) (38.3 KBytes, 10*A4s).

=====================END OF PAPER ======================