Interval Modulated Frequency Distribution Problems
by
Huen Y.K.
CAHRC, P.O.Box 1003, Singapore 911101
http://web.singnet.com.sg/~huens/
email: huens@mbox3.singnet.com.sg
(A short communication - 1st released: 19/1/98.)
Abstract
One of the favourite past-time of the author is to make forays into life sciences
in search of novel mathematical operations or algorithms. This is usually
fruitful since Nature's mathematics in biology is often different from that of the
human kind. Here is a problem borrowed from findings from genetic linkage
pioneered by William Bateson, R.C.Punnett and Thomas Hunt Morgan [1].
In sequence algebra, the outer product of two identical normalised
number sequences will generate a 2D-matrix map where the frequencies of occurrence
of cross-product terms have equal probability. This property was previously used
by the
author to validate the two Mendalian principles algebraically [2]. Morgan and Sturtevant
discovered that genes on the same
chromosome do not obey Mendel's principle of independent assortment as the
probabilities are interval modulated. In this paper, the mathematics of interval
modulated frequency
distributions are investigated by sequence algebra and some findings reported.
1. Introduction
In this paper the outer product between two identical sequences are computed
using the general expression shown in equation (1). f(a(i),b(j)) is a function of the interval
distances between the ith and jth terms in a sequence. In this
problem, the upperbounds
ub of the two summations shall be taken as identical, so that the resultant matrix map
is always square. Since the two sequences are identical, a diagonal line of symmetry
exists. Putting f(a(i),b(j)) = 1 makes frequencey distribuiton independent of
intervals. For uniformity, we call this a zero-ordered system since f(a(i),b(j))^0=1.
If f(a(i),b(j)) is a function
of absolute difference between a(i) and b(j), this is defined as a first ordered problem and
if f(a(i),b(j)) is a quadratic function, this is
defined as a second ordered problems. However, Morgan and Sturtevant also found
interferences if there is a third gene positioned between the other two genes.
Collectively, the author calls these "interval modulated frequence distribution"
problems. This type of problems can involve 2 or more genes but for simplicity,
investigations are confined to cases with influence confined to 2 or 3 genes only.
A generalised equation for the 2-gene problem is given by
equation (1) where k determines the order and ub the number of terms in each
sequence, it being assumed that both sequences are identical. Extension to
the 3-gene problem is given by equation (2) where the order of the three terms
are important. In realistic applications, the array element c(i) is always bracketted
by the other two
elements a(i) and b(i).
2-GENE FORMULA
Outerprod:=sum(sum(f(a(i),b(j))/(x^a(i)*y^b(j)),i=1..ub),j=1..ub);............(1a).
..........................................ub.../..ub....................\
.......................................-----..|-----...............k..|
.........................................\......|..\.....f(a(i),b(j))....|
....................Outerprod :=..)....|...).------------...|.................(1b).
........................................./......|../........a(i)..b(j)....|
.......................................-----..|-----..y....z..........|
.........................................j = 1.\..i=1................../
3-GENE FORMULA
sum(sum(sum(abs(f(a(i),c(i),b(i)))/(x^a(i)*y^b(j)*z^c(k)),i=1..k-1),j=k+1..ub),k=2..(ub-1));.............(2a).
.......................ub - 1../..ub......./k - 1.................................\\
.........................-----..|..-----...|-----...................................||
..........................\........|...\........| \.....abs(f(a(i), c(i), b(i)))...||
...........................).......|....).......|..) ------------------------..||.....................(2b).
........................../........|.../........| /............a(i)...b(j)...c(k).....||
.........................-----..|..-----...|-----.....x......y......z.............||
.......................k = 2....\j=k+1..\i = 1..................................//
2. Classification Of Problems And Examples
Problems can be classified into two types and are applicable to both the 2-term
and 3-term formulae given above:
Type 1: Here the sequences are known and the matrix maps are
to be computed using equations (1a) or (1b). This type of problem is relatively
straightforward.
Type 2: Here the frequency distribution in the matrix map is known and from
experimental data we are required to predict the original sequences used in the
outer-product. If the frequency distribution is nonlinear, the problem might be difficult
to solve. In any case, even if we have complete lookup tables for all combinations,
there is no guarantee that we can find the original sequences determinstically unless
one can prove that the the outer-product is a bijective mapping operation. At present
a proof of the existence or otherwise of this desirable property has not been attempted.
TYPE 1 PROBLEMS
2-TERM PROLBEMS
Case (1): Zero Ordered Problem, f(a(i),b(j))^k = 1, i.e. k = 0.
Zero ordered problems arise from the outer-product of identical normalised
sequences. All terms in the square matrix have equal probability of occurrence and
is independent on whether the sequences have uniform or irregular intervals.
A theorem concerning this property is worded as follows:
Theorem On Outer-Product Of Normalised Sequences: The outer-product of two
or more identical
normalised sequences with arbitrary intervals between terms will always result
in a matrix map with uniform frequency distributions.
Proof: Equations (3a) and (3b) give normalised number sequences with
arbitrary intervals between successive terms. Equation (3) gives the outer-product of
these two sequences. The result will be the same even if the outer-product of
more than two identical sequences are taken.
................................................1........1..........1...........1
................................Seq1 := ----- + ----- + ----- + ----- ........................(3a).
...............................................a(1).....a(2)......a(3).......a(4)
..............................................x.........x..........x...........x
................................................1.........1.........1..........1
...............................Seq2 := ----- + ----- + ----- + -----..........................(3b).
................................................b(1)....b(2)......b(3).......b(4)
...............................................x........x..........x...........x
............................1...................1..................1...................1..................1..................1
Outerprod := ----------- + ----------- + ----------- + ----------- + ----------- + -----------
........................a(2)..b(1)......a(2)..b(2)......a(2)..b(3)......a(2)..b(4)......a(3)..b(1)......a(3)..b(2)
.......................x.....x............x.....x...........x.......x...........x......x...........x......x............x......x
...................1..................1..................1...................1...................1...................1
.........+ ----------- + ----------- + ----------- + ----------- + ----------- + -----------
..............a(3)..b(3)......a(3)..b(4)......a(4)..b(1)......a(4)..b(2).......a(4)..b(3)......a(4)..b(4)
............x......x...........x.......x..........x.......x...........x......x............x.......x...........x......x
..................1...................1...................1...................1
.........+ ----------- + ----------- + ----------- + ----------- .............................(4).
.............a(1)..b(1).......a(1)..b(2)........a(1)..b(3).......a(1)..b(4)
............x.....x............x.......x............x.......x............x.......x
Note that no assumptions are made on intervals the indexed
array elements. Since the numerators
of all terms in equation (3) are of unity values, this means that the frequency
distribution is uniform and is independent of the intervals between sequence terms.
This applies also to outer-products of 3 or more identical normalised sequences.
Q.E.D.
Example 1: Here is a practical example which demonstrates uniform
frequencies of distribution of a zero-ordered system.
............................................................1.......1.........1
....................................Seq1 := 1/z + ---- + ---- + ----- .......................(5a).
..............................................................3.......7.........13
............................................................x.......x.........x
...........................................................1........1.........1
....................................Seq2 := 1/z + ---- + ---- + ----- .......................(5b).
.............................................................3........7........13
...........................................................y........y.........y
Taking the outer-product we get the resultant 2D-sequence as shown in equation (6):
Outerprod:=
.1.......1........1........1.......1.......1........1........1........1...........1........1..........1...........1
--- + ---- + ----+-----+----+----+------+------+------+------+------+------+------
y x........3.....3.......3..3.......7....7.........3.7......7..3......13....7..7.....13......3..13.....13..3
..........yx.....y x....y..x.....yx....y..x......y..x......y..x.....yx......y..x.....y...x....y..x........y..x
......1.........1............1
+ ------ + ------ + ------- ................................................(6).
....7..13....13..7.....13..13
...y..x......y...x......y...x
Terms in equation (6) can be arranged into a 4 x 4 cell matrix as shown in Table 1
where all terms have equal probability of occurrence. In genetics, this would have been
called a Punnett square [1].
...Table 1 - Terms in a 4x4 cell matrix
.....x...........1..........3..........7..........13
y....|..frequencies of occurrence of terms
----------------------------------------------
1....|...........1..........1..........1..........1
3....|...........1..........1..........1..........1
7....|...........1..........1..........1..........1
13..|...........1..........1..........1..........1
----------------------------------------------
If one uses the model that Mendel developed to show the principle of
independent assortment, this Punnett square will return the ratios of 9:3:3:1
amongst the F2 phenotypes [1]. It shows that the same result can be obtained
algebraically using the outer-product of two identical normalised sequences.
We will get the same result in equation (1) by replacing f(a(i),b(j)) by 1 and assigning
values to the array elements given below. The procedure is straightforward and will
not be elaborated here.
.........................a(1) and b(1) = 1;
.........................a(3) and b(3) = 3;
.........................a(7) and b(7) = 7;
and...................a(13) and b(13) = 13.
Case (2): First Ordered Problem, f(a(i),b(j))=abs(a(i)-b(j))^k+1, with k = 1.
The reason
for adding a 1 to this function is to ensure that the diagonal values remain at unity.
..........................................ub.../..ub......................\
.......................................-----..|-----.....................|
.........................................\......|..\ abs(a(i)-b(j)))+1|
....................Outerprod :=..)....|...).------------......|.................(7).
........................................./......|../........a(i)..b(j)......|
.......................................-----..|-----..y....z............|
.........................................j = 1.\..i=1..................../
We assign a(1)=b(1)=1; a(2)=b(2)=3; a(3)=b(3)=7; and a(4)=b(4)=13 and compute using
the Maple program line given below:
Outerprod:=sum(sum((abs(a(i)-b(j))+1)/(x^a(i)*y^b(j)),i=1..4),j=1..4);
which results in the 2D-sequence given by equation (6):
Outerprod:=
...1......3.........7.......13........3.........1.........5..........11.........7.........5..........1...........7.........13
--- + ---- + ---- + ----- + ---- + ----- + ----- + ------ + ---- + ----- + ----- + ------ + -----
..x y....3.........7.........13.........3......3..3......7..3.....13..3........7......3..7......7...7......13..7......13
.........x..y.....x..y......x...y....x y......x..y......x..y......x...y.......x y......x..y.......x..y........x...y.....x y
.....11............7.............1
+ ------ + ------ + -------...............................(8).
......3..13.....7..13.....13..13
....x...y.......x..y........x...y
Terms in equation (6) are arranged in a 4x4 cell matrix shown in Table 2.
...Table 2 - Terms in a 4x4 cell matrix
.....x...........1..........3..........7..........13
y....|..frequencies of occurrence of terms
----------------------------------------------
1....|...........1..........3..........7..........13
3....|...........3..........1..........5..........11
7....|...........7..........5..........1..........7
13..|...........13........11........7..........1
----------------------------------------------
Although not relevant, counting frequencies of occurrence assuming this to be a
Punnett square will give the
ratio of 56:23:15:1. This is interpreted to mean that the surface profile has been
modified compared to that in case (i).
Case (3): Second Ordered Problem, f(a(i),b(j))=abs(a(i)-b(j))^2+1.
Outerprod:=sum(sum((abs(a(i)-b(j))^2+1)/(x^a(i)*y^b(j)),i=1..4),j=1..4);
Outerprod:=
..1......5.......37.......145........5.........1........17........101.......37.......17........1.........37........145
--- + ---- + ---- + ----- + ---- + ----- + ----- + ------ + ---- + ----- + ----- + ------ + -----
..x y....3.........7.........13.........3......3..3......7..3.....13..3........7......3..7......7...7....13..7........13
.........x..y.....x..y......x...y....x y......x..y......x..y......x...y.......x y......x..y.......x..y......x...y.......x y
........101........37...........1
+ ------ + ------ + -------.......................................(9).
......3..13.....7..13.....13..13
....x...y.......x..y........x...y
Terms in equation (7) are arranged in a 4x4 cell matrix as shown in Table 3. The
surface profile is more nonlinear but it still has the same general shape compared
to that
in cases (i) and (ii) and the diagonal values remain at unity.
...Table 3 - Terms in a 4x4 cell matrix
.....x...........1..........3..........7..........13
y....|..frequencies of occurrence of terms
----------------------------------------------
1....|...........1..........5..........37........145
3....|...........5..........1..........17........101
7....|...........37........17........1..........37
13..|...........145......101......37........1
----------------------------------------------
The second-ordered model is introduced just to show that by varying the power index
the multiplier function, one has some control over the surface profile of the square
matrix. Purely for curiosity, counting frequencies of occurrence assuming this to be a
Punnett square will return the ratio of 243:139:78:1.
It is easy to modify the surface profile such that the diagonal line will have maximum
values and falling values of off diagonal terms which increase with distances from the
diagonal. All that needs to be done
is to subtract values of all cells in cases (i) and (ii) by the cell with the largest value, i.e.,
either [a(4),b(1)] or [a(1),b(4)] which, being symmetrically displaced cells contain the
same values. These need no further elaborations.
Case (4): Raising the whole surface above the base plane with increasing
values along the diagonal line governed by f(a(i),b(j))=(i*j+j*i)^k/2 where k is an integer
power index. The reason for using
symmetrical pairs in the
4x4 matrix is to conserve diagonal symmetries. This surface profile is quite different
from those in cases (i) to (iii) where the terms in the diagonal lines are all unity values.
This profile is useful if one wants values of terms along the diagonal to increase with
distance from the origin or the smallest term. Only the case for k = 1 is demonstrated
although nonlinear increment can be obtained by increasing k above unity value.
Outerprod:=sum(sum(abs((i*j+j*i)/2)/(x^a(i)*y^b(j)),i=1..4),j=1..4);
...1......2......3.........4.......2.........4.........6.........8.........3........6.........9........12........4
--- + ----+ ----+ -----+ ----+ -----+ -----+ ------+ ----+ -----+ -----+ ------+ -----
..x y.....3.......7.......13.........3.....3..3.....7..3.....13..3.......7....3..7......7..7....13..7......13
..........x..y...x..y....x...y....x.y.....x..y.....x..y......x...y......x y....x..y......x..y.....x...y.....x y
.......8..........12...........16
+ ------ + ------ + -------......................................(10).
....3..13......7...13......13..13
...x..y.......x...y..........x...y
Terms in equation (8) are arranged in a 4x4 cell matrix as shown in Table 4. Note
that the diagonal terms of 1, 4, 9 and 16 are perfect squares.
...Table 4 - Terms in a 4x4 cell matrix
.....x...........1..........3..........7..........13
y....|..frequencies of occurrence of terms
----------------------------------------------
1....|...........1..........2..........3..........4
3....|...........2..........4..........6..........8
7....|...........3..........6..........9..........12
13..|...........4..........8..........12........16
----------------------------------------------
DOUBLE-CROSSOVER PROBLEMS
In genetic linkage problems, this is the type with 3 genes on one chromosome.
This cannot be simply extended via the previous 2-term problem since the relative
orders of the three terms must be preserved. This can be clarified using figure 1:
================X=Z===========Y===================
Fig.1-Three terms with variable intervals between the x-z pair and the z-y
pair.
In modulated frequency distributions, the probability that two terms at very
close proximity should be zero such as that shown between x and z in figure 1.
The range of z is dictated by
the positions of x and
y since relative order will be lost if z ventures beyond the bounds of either x or y.
The matrix map is now 3-
dimensional involving variables x, y, and z but with the additional constraint that
only values of z bracketted by those between x and y are computed. To simplify the
problem, we assume that contribution of probabilities of occurrence by the x-y
pair is nil. This simply means that we are concentrating on the probabilities of
double crossovers. Equation
(11) shows the general formula used. The numerator expression
abs(a(i)-b(j))*abs(b(j)-c(k))
represents the product of two probabilities contributed by the two crossovers.
evalf((sum(sum(sum(abs(a(i)-b(j))*abs(b(j)-c(k))/(x^a(i)*y^b(j)*z^c(k)),i=1..k-1),
j=(k+1)..ub),k=2..(ub-1))));
.......................ub - 1../..ub......./k - 1............................................\\
.........................-----..|..-----...|-----..............................................||
..........................\........|...\........| \.....abs(a(i)-b(j))*abs(b(j)-c(k))...||
...........................).......|....).......|..) ------------------------.............||..........................(11).
........................../........|.../........| /............a(i)...b(j)...c(k)................||
.........................-----..|..-----...|-----.....x......y......z........................||
.......................k = 2....\j=k+1..\i = 1.............................................//
The computations only give absolute product probabilities instead of
rate of combination since the variables of x, y, and z are not defined for dominant
and recessive properties.
In realistic situation, one the count of the number of recombinants will be used as the divisor for the rate.
This can be easily included but we leave details to practitioners.
Numeric Example: The chromosome is assumed to have ten genes where
we are required to compute a table of rate of recombinants for triplets x, y, and z
where z is considered as positioned between x and y. The boundings are
arranged such that only ordering of x-z-y are computed. If it is required to
compute for other orders such as x-y-z or y-x-z, recomputatons are necessary.
This simplifies data presentations. To interprete the results in equation (11b), take
the first term where the numerator indicates the product probability of 2 and
the denominator indicates that z falls exactly halfway between x and y. It can
be seen that as the intervals get larger, the product of probabilities also get larger
which is within expactations. Note that probabilities are not reported as fractions
since no divisors are used.
Outerprod:=sort((sum(sum(sum(abs(i-j)*abs(j-k)/(x^i*y^j*z^k),i=1..(k-1)),j=(k+1)..6),k=2..5)));.........(12a).
Outerprod:=
....2...........6..........3.........12...........8...........20.........2.............4..........15..........6
-------+-------+-------+-------+-------+-------+--------+-------+-------+--------
.....3..2.......4..2.......4..3.......5..2.......5..3.......6..2....2..4..3.......5..4.......6..3....2..5..3
x y..z......x y..z....x y..z.....x y..z.....x y..z.....x y..z.....x..y..z.....x y..z.....x y..z.....x..y..z
.......10..........3...........12.........5............8............2............4............6............3
+-------+--------+--------+-------+--------+--------+--------+--------+--------
.......6..4.....2..5..4....2..6..3.......6..5....2..6..4....3..5..4....2..6..5....3..6..4....3..6..5
...x y..z.....x..y..z.....x..y..z.....x y..z.....x..y..z.....x..y..z.....x..y..z.....x..y..z.....x..y..z
..........2
+ --------...............................................(11b).
......4..6..5
.....x..y..z
3. Conclusions
Because sequence algebra is the algebra of sequences, it has found a suitable
niche in mathematical genetics. Although the present paper is primary mathematical,
sufficient attention has been paid to work in genetic linkage so that practioners
in that field could easily pick up from where the author left unfinished. Sequence algebra have
found suitable applications in more field than one. For example, the power indices in the denominators can be used to indicate
the positions of genes along the linear gene map whilst the numerator values
used to indicate frequencies of occurrence. Furthermore, it lends itself easily
to compuations using symbolic packages. The theory of sequence algebra is quite
new but most papers of importance can be found from this URL-site because the
author has made this a resource center for this new discipline. This is convenient
since there are no published texts on this domain at present. Lastly, although type 2
problems are included in the classifications, no examples are shown. The reverse
problem of determining the whole sequence either algebraically or from a lookup
table sounds challenging but it has not yet been attempted. This is because the
author is not sure whether the mapping relation is bijective.
4. References
Comments: Not all references in this list are directly referred in the main paper.
Most are provided for readers not familiar with sequence algebra. These papers
can be easily hyperlinked whilst you are browsing in the URLsite. Most html files are
quite short
and can be download quite fast without unzipping operations.
1. Weaver R.F. and Hedrick P.W.:Basic Genetics, WCB Publishers,
2nd Edition, 1995, Printed in Dubuque, pp 154-159.
2.
A Sequence Algebraist's Attempts To Learn From Life Sciences
- Huen Y.K. (Date Released 14/1/98, 38 Kbytes)
================================================
3. Huen Y.K.: A Matrix Map for Prime and Non-prime Numbers, INT. J. Math. Educ. Sci.
Technol., 1994, VOL. 25, NO.6, pp 913-920.
4. Huen Y.K.: Some Interesing Properties Of The Natural Number System, Int. J. Math. Educ.
Sci. Technol., 1996, VOL.27, NO. 5, 685-691.
5. Huen Y.K.: Visual algebra and its applications, INT. J. Math. Educ. Sci. Technol.,1996,
VOL.??, NO.?, ???-??? (In the press as proof paper mes 100421).
6. A Simple Introduction To Sequence
Algebra - by Huen Y.K.
(date release: 15.3.97) (38 KBytes, 11*A4 pages).
========================================================
7. The Canonical Generating Function
or CGF(z) ... - by Huen Y.K.
(date released : 27.5..97) (24 KBytes, 7*A4s).
========================================================
8. Visual Solutions Of Number Theoretic
Problems ..... - by Huen Y.K. (date released : 3.6.97) (38.3 KBytes, 10*A4s).
=====================END OF PAPER ======================