Translations Of Genetic Codes By Macsyma 2.2.1
by
Huen Y.K.
CAHRC, P.O.Box 1003, Singapore 911101
http://web.singnet.com.sg/~huens/
email: huens@mbox3.singnet.com.sg
(A short communication - 1st released: 10/2/98, revised: 10/10/98)
Abstract
This paper describes an algorithm for translating the genetic codes conveyed by
an mRNA sequence which is initially presented as a string. The string must be
framed correctly into a sequence of triplet codons and since there are three
choices out of which only one is correct, tests are implemented to select the
correct one based on the identifications of the initiation and termination
codons. This substring is isolated and translated into a protein sequence
based on the genetic code table and
returned in sequence algebraic format. The translations would be rather
complicated if done manually but this is facilitated
by the use of a symbolic algebra package called Macsyma 2.2.1.
1. Introduction
Protein synthesis by translations from an mRNA sequence is quite a complicated
process since some
steps are purely biochemical and would defy mathematical modelling.
However the transcriptions and translations of an mRNA sequence itself are amenable
to mathematical modelling via sequence algebra and manipulated by the use of a
symbolic algebra package called Macsyma 2.2.1. Information on transcriptions
and translations is available
from standard textbooks in genetics and will only be described briefly where
relevant [3].
Proteins, or polypeptides, are polymers of amino acids linked through peptide bonds.
Translations do not occur in the double helix proper but a copy called mRNA is
first transcribed followed by translations of this sequence to proteins. The translations
of triplet codes into proteins are carried out by a protein synthesizing complex
called the ribosome aided by adapter molecules called transfer RNA (tRNA).
Algorithms
for the
transcriptions of mRNA from the double helix has already been described in
previous papers [1,2]. In the present paper, we confine the descriptions to the
translation process itself assuming that the mRNA is already available in string
format. Since genetic codes come in triplets, the
string cannot be randomly translated without framing the triplet codons correctly.
There are three choices of framing from which only one is the correct one and this
is detected by the presence of the initiation condon ATG and one of three
stop codons viz., TAG, TAA, or TGA. In mRNA the Ts in the sequence are
replaced by Us. This means that up to three tests may be required to scan
for the correct stop codon.
In spite of the complexity of translations, the algorithm
itself is fairly straightforward once the mRNA strand is
available. Since this sequence is conventionally presented as a string,
some string processing functions from Macsyma are required [4]. Since the
start and stop codons are not translated, one must be able to isolate the
mRNA sequence bracketted by these two codons. The number of characters
in the "trimmed" mRNA string proper will always be in integer multiples of threes.
The program lines showing
how this is done is given in example 1.
In Macsyma, a convenient function call subst(a,b,c)
is available where a can represent the protein, b the triplet code
and c the mRNA
sequence. Each subst function can only substitute for one specific triplet codon
and since there are 64 such codes, the same function will have to be nested 64
times (see program lines in equation (6b)). In other words, in each pass, the function
will only look for one specific type
of triplet codon to be translated into a protein. The program can handle degeneracy
since it will only translate those triplet codons left behind from a previous step. The
first priority is that the mRNA sequence itself would have to be presented with the correct
framing by recognising the initiation codon and one of the stop codons. The framing process will be undertaken
before the translation proper.
2. Isolating The mRNA Sequence
As previously mentioned, when mRNA is presented as a string, three alternative
triplet coded sequences can be
formed but only one of these is the correct one being identified by the presence of
the initiation codon ATG and one of three alternative stop codons TAG, TAA, or TGA.
Example 1 shows how the mRNA is isolated.
Example 1: A double-stranded DNA sequence is given by equation (1) where the
top strand is the coding strand. Detect and write down the sequence of the
open reading frame in the mRNA that would be transcribed from this gene. (This problem
is cited from problem 2 of reference [3]).
5' ATCCGATGAAACCGTGGACACCCAGATAAATCG 3'
3' TAGGCTACTTTGGCACCTGTGGGTCTATTTAGC 5'...........(1).
Correct solution: The isolated mRNA sequence is written in triplet codons
with Ts replaced by Us as shown in equation (2).
5' AAA CCG TGG ACA CCC AGA 3'...................(2).
To obtain the above answer by Macsyma, we first declare the DNA strand as a
string:
x1:"ATCCGATGAAACCGTGGACACCCAGATAAATCG";
ATCCGATGAAACCGTGGACACCCAGATAAATCG............(3).
We count the length or number of characters in this string:
nchar:string_length(x1);
33 .................(4).
This line will hunt for the position of the initiation or start codon. Since this takes
three characters, we know that the mRNA proper will start on the 4th character from
the position of the start codon. Equations (5a) and (5b) are applied to x1 which is
the orginal DNA strand given by equation (2).
start_codon:sum((oddp(substring(x1,k,k+2)/"TGA")-false)/(true-false)/z^(k+2),k,1,nchar);
1
----- ......................(5a).
9
z
We do likewise for the stop codon with the exception that the mRNA will end one
character before the first character of this codon.
stop_codon:sum((oddp(substring(x1,k,k+2)/"TAA")-false)/(true-false)/z^k,k,1,nchar);
1
----- ......................(5b).
27
z
The actual positions n1, and n2 of the start and the end codon are found by the next two lines.
n1:numfactor(diff(denom(start_codon),z));
9 ...............................(5c).
n2:numfactor(diff(denom(stop_codon),z));
27 .............................(5d).
Once n1, and n2 are computed, then you can isolate the mRNA string.
mRNA:substring(x1,n1,n2-1);
AAACCGTGGACACCCAGA ...................(5e).
Then you can reformat the mRNA sequence in correct triplet codes in equation (5f).
mRNA_seq:makelist(concat(getchar(x1,3*k-2),getchar(x1,3*k+1-2),getchar(x1,3*k+2-2))/z^k,k,1,nchar/3);
AAA CCG TGG ACA CCC AGA
[-----, ------, -----, -----, ------, ------] ........(5f).
z 2 3 4 5 6
z z z z z
3. The Translation Program
The most convincing way to demnstrate that the translation program will work is to
translate the entire genetic code table which contains 64 triple codes. Since there
are only 20 proteins, the use of 64 codes will mean that more than one triplet can
be translated into the same protein.
x1:
[uuu, uuc, uua, uug, ucu, ucc, uca, ucg, uau, uac, uaa, uag, ugu, ugc, uga,
ugg, cuu, cuc, cua, cug, ccu, ccc, cca, ccg, cau, cac, caa, cag, cgu, cgc, cga,
cgg, auu, auc, aua, aug, acu, acc, aca, acg, aau, aac, aaa, aag, agu, agc, aga,
agg, guu, guc, gua, gug, gcu, gcc, gca, gcg, gau, gac, gaa, gag, ggu, ggc, gga, ggg]....(6a).
x2:subst(leu,cug,subst(leu,cua,subst(leu,cuc,subst(leu,cuu,subst(trp,ugg,
subst(stop,uga,subst(cys,ugc,subst(cys,ugu,subst(stop,uag,subst(stop,uaa,
subst(tyr,uac,subst(tyr,usu,subst(ser,ucg,subst(ser,uca,subst(ser,ucc,
subst(ser,ucu,subst(leu,uug,subst(leu,uua,subst(phe,uuc,
subst(phe,uuu,x1))))))))))))))))))));
x3:subst(arg,cgu,subst(arg,cgc,subst(arg,cga,subst(arg,cgg,subst(his,cau,
subst(his,cac,subst(gln,caa,subst(gln,cag,subst(pro,ccu,subst(pro,ccc,
subst(pro,cca,subst(pro,ccg,%))))))))))));
x4:subst(asn,aau,subst(asn,aac,subst(lys,aaa,subst(lys,aag,subst(thr,acu,subst
(thr,acc,subst(thr,aca,subst(thr,acg,subst(lle,auu,subst(lle,auc,subst
(lle,aua,subst(met,aug,%))))))))))));
x5:subst(val,guu,subst(val,guc,subst(val,gua,subst(val,gug,subst
(ser,agu,subst(ser,agc,subst(arg,aga,subst(arg,agg,%))))))));
x6:subst(gly,ggu,subst(gly,ggc,subst(gly,gga,subst(gly,ggg,subst(asp,gau,subst
(asp,gac,subst(glu,gaa,subst(glu,gag,subst(ala,gcu,subst(ala,gcc,subst
(ala,gca,subst(ala,gcg,%))))))))))));............................(6b).
The 64 triplet codes given by equation (6a) are translated by the above program lines
into the protein sequence given by equation (6c):
x6:
[phe, phe, leu, leu, ser, ser, ser, ser, uau, tyr, stop, stop, cys, cys,
stop, trp, leu, leu, leu, leu, pro, pro, pro, pro, his, his, gln, gln, arg,
arg, arg, arg, lle, lle, lle, met, thr, thr, thr, thr, asn, asn, lys, lys, ser,
ser, arg, arg, val, val, val, val, ala, ala, ala, ala, asp, asp, glu, glu, gly,
gly, gly, gly]................................................(6c).
The above program is applied to the actual mRNA line given by equations (5e) or (5f) which
yields:
.......................lys....pro...trp....thr....pro...arg
......................----+----+----+----+----+-----........................(6d).
........................z..........2......3.....4.....5.......6
.................................z.......z......z......z.......z
Additional Comments:
Instead of using multiply nested subst functions, the use of sublis would
have been
more convenient for editing in the above example. We give below only a
partial example using
this function. Please see equation (6a) for string assignment to x1. Note that
in sublis the assignment
statements put triplet-codon = protein-type instead of protein-type=triplet-codon
whereas in subst we use the latter.
x2:sublis([cug=leu,cua=leu,leu=cuc,leu=cuu,trp=ugg,stop=uga,cys=ugc,cys=ugu,
stop=uag,stop=uaa,tyr=uac,tyr=usu,ser=ucg,ser=uca,ser=ucc,ser=ucu,leu=uug,
leu=uua,phe=uuc,phe=uuu],x1);.............................................(6e).
[uuu, uuc, uua, uug, ucu, ucc, uca, ucg, uau, uac, uaa, uag, ugu, ugc, uga,
ugg, cuu, cuc, leu, leu, ccu, ccc, cca, ccg, cau, cac, caa, cag, cgu, cgc, cga,
cgg, auu, auc, aua, aug, acu, acc, aca, acg, aau, aac, aaa, aag, agu, agc,
aga, agg, guu, guc, gua, gug, gcu, gcc, gca, gcg, gau, gac, gaa, gag, ggu,
ggc, gga, ggg]...........................................(6f).
4. Conclusions
This paper shows how to use Macsyma 2.2.1 to do translations of an mRNA sequence
into a protein sequence.
5 References
1.The Manipulations Of Bilinear Sequences By
Macsyma 2.2- by Huen Y.K. (date released : 5.2.98, 22 Kbytes).
2.
Sequence Algebra - A Tutorial Paper
- Huen Y.K. (Date Released 2/2/98, 46 Kbytes)
3. Weave R.F. and Hedrick P.W. Basic Genetics,
second edition, Wm.C.Brown Publishers, chapter 10, pp238 to 267.
4. Macsyma: Symbolic/numeric/graphical mathematics software, Mathematics and
System Reference Manual, 16th edition, chapter 10, pp321 to 338.
Comments: References from this point onward are not referred in the
main paper.
Most are provided for readers not familiar with sequence algebra. These papers
can be easily hyperlinked whilst you surf into this URLsite.
Published Papers:
5. Huen Y.K.: A Matrix Map for Prime and Non-prime Numbers, INT. J. Math. Educ. Sci.
Technol., 1994, VOL. 25, NO.6, pp 913-920.
6. Huen Y.K.: Some Interesing Properties Of The Natural Number System, Int. J. Math. Educ.
Sci. Technol., 1996, VOL.27, NO. 5, 685-691.
7. Huen Y.K.: Visual algebra and its applications, INT. J. Math. Educ. Sci. Technol.,
1997, VOL.28, NO.3, pp 333-344.
8. Huen Y.K.: The twin prime problem revisited, INT. J. Math. Educ. Sci. Technol.,1997, VOL.28,
NO. 6, pp 825-834.
9. Huen Y.K.: Is Pie Periodic?, INT. J. Math. Educ. Sci. Technol.,199?,VOL.??,NO.?,???-???. (in the press).
10. Huen Y.K.: Final value theorem in number sequences., INT. J. Math. Educ. Sci. Technol.,199?,VOL.-??,NO.?,???-???. (accepted).
Papers posted in this website which might be relevant for background information:
11. A Simple Introduction To Sequence
Algebra - by Huen Y.K.
(date release: 15.3.97) (38 KBytes, 11*A4 pages).
========================================================
12. Evaluations Of Normc( ) Function
In Macsyma 2.2
- Huen Y.K. (Date Released 17/12/97, 14 Kbytes)
================================================
13.
List Processing In Sequence Algebra
- Huen Y.K. (Date Released 23/12/97, 20 Kbytes)
================================================
14. The Canonical Generating Function
or CGF(z) ... - by Huen Y.K.
(date released : 27.5..97) (24 KBytes, 7*A4s).
========================================================
15. Visual Solutions Of Number Theoretic
Problems ..... - by Huen Y.K. (date released : 3.6.97) (38.3 KBytes, 10*A4s).
========================================================
16. Final Value Theorem Applied To Number
Sequences... - by Huen Y.K. (date released : 5.6.97) (29.4 KBytes, 9*A4s).
========================================================
17. Methods Of Developing Sequence
Algebraic Formulations For Comp(z) and Prime(z) - by Huen Y.K. (date released : 20.6.97) (36.8 KBytes, 10*A4s).
========================================================
18. Composite Number Sequence
Challenge 1/97 - by Huen Y.K. (date released : 28.6.97) (24.8 KBytes, 7*A4s).
========================================================
19. Lemmata, Corollaries, And
Theorems In Sequence Order Analysis. - by Huen Y.K. (date released : 6.7.97) (38.3 KBytes, 12*A4s).
========================================================
20. Improved Formulations For Comp(z)
and Prime(z)
- by Huen Y.K. (date released : 16.9.97) (17 KBytes ).
========================================================
21. Detecting False Reports
in Primality Tests By The Oddcomp(z) Method.
- by Huen Y.K. (date released : 18.9.97, Revised 20/9) (26 KBytes ).
========================================================
22. The Throwing Power Of
Oddcomp(z).
- by Huen Y.K. (date released : 24.9.97 ) (15 Kbytes).
========================================================
23. Sequence Algebraic
Approach To Prime Number Theorem
- by Huen Y.K. (date released : 28.9.97 ) (21 Kbytes).
========================================================
24. Generating Functions -
Closed Forms vs Open Forms
- by Huen Y.K. (date released : 1.10.97 ) (21 Kbytes).
========================================================
25. Generating Large
Odd Composite With Two Prime Factors
- by Huen Y.K. (date released : 3.10.97 ) (13.5 Kbytes).
========================================================
26. In Search Of Counter-
Examples In Maple's Isprime Function.
- by Huen Y.K. (date released : 4.10.97 ) (18 Kbytes).
========================================================
27. A Sequence Algebraist's
View Of Lehmann's Primality Test
- by Huen Y.K. (date released : 6.10.97 ) (26 Kbytes).
========================================================
28. On Odd(z), Oddcomp(z),
Seq1(z) and Seq2(z)
- by Huen Y.K. (date released : 10.10.97 ) (17 Kbytes).
========================================================
29. How To Generate A Short
And Contiguous Oddcomp(z) Sequence?
- by Huen Y.K. (date released : 15.10.97 ) (13 Kbytes).
========================================================
=====================END OF PAPER ======================