Translations Of Genetic Codes By Macsyma 2.2.1

by

Huen Y.K.

CAHRC, P.O.Box 1003, Singapore 911101
http://web.singnet.com.sg/~huens/
email: huens@mbox3.singnet.com.sg

(A short communication - 1st released: 10/2/98, revised: 10/10/98)

Abstract

This paper describes an algorithm for translating the genetic codes conveyed by an mRNA sequence which is initially presented as a string. The string must be framed correctly into a sequence of triplet codons and since there are three choices out of which only one is correct, tests are implemented to select the correct one based on the identifications of the initiation and termination codons. This substring is isolated and translated into a protein sequence based on the genetic code table and returned in sequence algebraic format. The translations would be rather complicated if done manually but this is facilitated by the use of a symbolic algebra package called Macsyma 2.2.1.


1. Introduction

Protein synthesis by translations from an mRNA sequence is quite a complicated process since some steps are purely biochemical and would defy mathematical modelling. However the transcriptions and translations of an mRNA sequence itself are amenable to mathematical modelling via sequence algebra and manipulated by the use of a symbolic algebra package called Macsyma 2.2.1. Information on transcriptions and translations is available from standard textbooks in genetics and will only be described briefly where relevant [3].

Proteins, or polypeptides, are polymers of amino acids linked through peptide bonds. Translations do not occur in the double helix proper but a copy called mRNA is first transcribed followed by translations of this sequence to proteins. The translations of triplet codes into proteins are carried out by a protein synthesizing complex called the ribosome aided by adapter molecules called transfer RNA (tRNA). Algorithms for the transcriptions of mRNA from the double helix has already been described in previous papers [1,2]. In the present paper, we confine the descriptions to the translation process itself assuming that the mRNA is already available in string format. Since genetic codes come in triplets, the string cannot be randomly translated without framing the triplet codons correctly. There are three choices of framing from which only one is the correct one and this is detected by the presence of the initiation condon ATG and one of three stop codons viz., TAG, TAA, or TGA. In mRNA the Ts in the sequence are replaced by Us. This means that up to three tests may be required to scan for the correct stop codon.

In spite of the complexity of translations, the algorithm itself is fairly straightforward once the mRNA strand is available. Since this sequence is conventionally presented as a string, some string processing functions from Macsyma are required [4]. Since the start and stop codons are not translated, one must be able to isolate the mRNA sequence bracketted by these two codons. The number of characters in the "trimmed" mRNA string proper will always be in integer multiples of threes. The program lines showing how this is done is given in example 1.

In Macsyma, a convenient function call subst(a,b,c) is available where a can represent the protein, b the triplet code and c the mRNA sequence. Each subst function can only substitute for one specific triplet codon and since there are 64 such codes, the same function will have to be nested 64 times (see program lines in equation (6b)). In other words, in each pass, the function will only look for one specific type of triplet codon to be translated into a protein. The program can handle degeneracy since it will only translate those triplet codons left behind from a previous step. The first priority is that the mRNA sequence itself would have to be presented with the correct framing by recognising the initiation codon and one of the stop codons. The framing process will be undertaken before the translation proper.


2. Isolating The mRNA Sequence

As previously mentioned, when mRNA is presented as a string, three alternative triplet coded sequences can be formed but only one of these is the correct one being identified by the presence of the initiation codon ATG and one of three alternative stop codons TAG, TAA, or TGA. Example 1 shows how the mRNA is isolated.

Example 1: A double-stranded DNA sequence is given by equation (1) where the top strand is the coding strand. Detect and write down the sequence of the open reading frame in the mRNA that would be transcribed from this gene. (This problem is cited from problem 2 of reference [3]).

5' ATCCGATGAAACCGTGGACACCCAGATAAATCG 3'
3' TAGGCTACTTTGGCACCTGTGGGTCTATTTAGC 5'...........(1).
Correct solution: The isolated mRNA sequence is written in triplet codons with Ts replaced by Us as shown in equation (2).

5' AAA CCG TGG ACA CCC AGA 3'...................(2).
To obtain the above answer by Macsyma, we first declare the DNA strand as a string:

x1:"ATCCGATGAAACCGTGGACACCCAGATAAATCG";

ATCCGATGAAACCGTGGACACCCAGATAAATCG............(3).
We count the length or number of characters in this string:

nchar:string_length(x1);

			33    .................(4).

This line will hunt for the position of the initiation or start codon. Since this takes three characters, we know that the mRNA proper will start on the 4th character from the position of the start codon. Equations (5a) and (5b) are applied to x1 which is the orginal DNA strand given by equation (2).

start_codon:sum((oddp(substring(x1,k,k+2)/"TGA")-false)/(true-false)/z^(k+2),k,1,nchar);

		  1
		-----    ......................(5a).
		   9
		  z
We do likewise for the stop codon with the exception that the mRNA will end one character before the first character of this codon.

stop_codon:sum((oddp(substring(x1,k,k+2)/"TAA")-false)/(true-false)/z^k,k,1,nchar);

		  1
		-----    ......................(5b).
		   27
		  z
The actual positions n1, and n2 of the start and the end codon are found by the next two lines.

n1:numfactor(diff(denom(start_codon),z));

		9  ...............................(5c).
n2:numfactor(diff(denom(stop_codon),z));

		27  .............................(5d).
Once n1, and n2 are computed, then you can isolate the mRNA string.

mRNA:substring(x1,n1,n2-1);

		AAACCGTGGACACCCAGA  ...................(5e).
Then you can reformat the mRNA sequence in correct triplet codes in equation (5f).

mRNA_seq:makelist(concat(getchar(x1,3*k-2),getchar(x1,3*k+1-2),getchar(x1,3*k+2-2))/z^k,k,1,nchar/3);

  AAA    CCG     TGG    ACA    CCC     AGA
[-----, ------, -----, -----, ------, ------]  ........(5f).
   z       2       3      4      5       6
          z       z      z      z       z
3. The Translation Program

The most convincing way to demnstrate that the translation program will work is to translate the entire genetic code table which contains 64 triple codes. Since there are only 20 proteins, the use of 64 codes will mean that more than one triplet can be translated into the same protein.

x1: [uuu, uuc, uua, uug, ucu, ucc, uca, ucg, uau, uac, uaa, uag, ugu, ugc, uga, ugg, cuu, cuc, cua, cug, ccu, ccc, cca, ccg, cau, cac, caa, cag, cgu, cgc, cga, cgg, auu, auc, aua, aug, acu, acc, aca, acg, aau, aac, aaa, aag, agu, agc, aga, agg, guu, guc, gua, gug, gcu, gcc, gca, gcg, gau, gac, gaa, gag, ggu, ggc, gga, ggg]....(6a).

x2:subst(leu,cug,subst(leu,cua,subst(leu,cuc,subst(leu,cuu,subst(trp,ugg, subst(stop,uga,subst(cys,ugc,subst(cys,ugu,subst(stop,uag,subst(stop,uaa, subst(tyr,uac,subst(tyr,usu,subst(ser,ucg,subst(ser,uca,subst(ser,ucc, subst(ser,ucu,subst(leu,uug,subst(leu,uua,subst(phe,uuc, subst(phe,uuu,x1))))))))))))))))))));

x3:subst(arg,cgu,subst(arg,cgc,subst(arg,cga,subst(arg,cgg,subst(his,cau, subst(his,cac,subst(gln,caa,subst(gln,cag,subst(pro,ccu,subst(pro,ccc, subst(pro,cca,subst(pro,ccg,%))))))))))));

x4:subst(asn,aau,subst(asn,aac,subst(lys,aaa,subst(lys,aag,subst(thr,acu,subst (thr,acc,subst(thr,aca,subst(thr,acg,subst(lle,auu,subst(lle,auc,subst (lle,aua,subst(met,aug,%))))))))))));

x5:subst(val,guu,subst(val,guc,subst(val,gua,subst(val,gug,subst (ser,agu,subst(ser,agc,subst(arg,aga,subst(arg,agg,%))))))));

x6:subst(gly,ggu,subst(gly,ggc,subst(gly,gga,subst(gly,ggg,subst(asp,gau,subst (asp,gac,subst(glu,gaa,subst(glu,gag,subst(ala,gcu,subst(ala,gcc,subst (ala,gca,subst(ala,gcg,%))))))))))));
............................(6b).

The 64 triplet codes given by equation (6a) are translated by the above program lines into the protein sequence given by equation (6c):

x6: [phe, phe, leu, leu, ser, ser, ser, ser, uau, tyr, stop, stop, cys, cys, stop, trp, leu, leu, leu, leu, pro, pro, pro, pro, his, his, gln, gln, arg, arg, arg, arg, lle, lle, lle, met, thr, thr, thr, thr, asn, asn, lys, lys, ser, ser, arg, arg, val, val, val, val, ala, ala, ala, ala, asp, asp, glu, glu, gly, gly, gly, gly]................................................(6c).

The above program is applied to the actual mRNA line given by equations (5e) or (5f) which yields:

.......................lys....pro...trp....thr....pro...arg
......................----+----+----+----+----+-----........................(6d).
........................z..........2......3.....4.....5.......6
.................................z.......z......z......z.......z

Additional Comments:

Instead of using multiply nested subst functions, the use of sublis would have been more convenient for editing in the above example. We give below only a partial example using this function. Please see equation (6a) for string assignment to x1. Note that in sublis the assignment statements put triplet-codon = protein-type instead of protein-type=triplet-codon whereas in subst we use the latter.

x2:sublis([cug=leu,cua=leu,leu=cuc,leu=cuu,trp=ugg,stop=uga,cys=ugc,cys=ugu, stop=uag,stop=uaa,tyr=uac,tyr=usu,ser=ucg,ser=uca,ser=ucc,ser=ucu,leu=uug, leu=uua,phe=uuc,phe=uuu],x1);.............................................(6e).

[uuu, uuc, uua, uug, ucu, ucc, uca, ucg, uau, uac, uaa, uag, ugu, ugc, uga, ugg, cuu, cuc, leu, leu, ccu, ccc, cca, ccg, cau, cac, caa, cag, cgu, cgc, cga, cgg, auu, auc, aua, aug, acu, acc, aca, acg, aau, aac, aaa, aag, agu, agc, aga, agg, guu, guc, gua, gug, gcu, gcc, gca, gcg, gau, gac, gaa, gag, ggu, ggc, gga, ggg]...........................................(6f).

4. Conclusions

This paper shows how to use Macsyma 2.2.1 to do translations of an mRNA sequence into a protein sequence.

5 References

1.The Manipulations Of Bilinear Sequences By Macsyma 2.2- by Huen Y.K. (date released : 5.2.98, 22 Kbytes).

2. Sequence Algebra - A Tutorial Paper - Huen Y.K. (Date Released 2/2/98, 46 Kbytes)

3. Weave R.F. and Hedrick P.W. Basic Genetics, second edition, Wm.C.Brown Publishers, chapter 10, pp238 to 267.

4. Macsyma: Symbolic/numeric/graphical mathematics software, Mathematics and System Reference Manual, 16th edition, chapter 10, pp321 to 338.

Comments: References from this point onward are not referred in the main paper. Most are provided for readers not familiar with sequence algebra. These papers can be easily hyperlinked whilst you surf into this URLsite.

Published Papers:

5. Huen Y.K.: A Matrix Map for Prime and Non-prime Numbers, INT. J. Math. Educ. Sci. Technol., 1994, VOL. 25, NO.6, pp 913-920.

6. Huen Y.K.: Some Interesing Properties Of The Natural Number System, Int. J. Math. Educ. Sci. Technol., 1996, VOL.27, NO. 5, 685-691.

7. Huen Y.K.: Visual algebra and its applications, INT. J. Math. Educ. Sci. Technol., 1997, VOL.28, NO.3, pp 333-344.

8. Huen Y.K.: The twin prime problem revisited, INT. J. Math. Educ. Sci. Technol.,1997, VOL.28, NO. 6, pp 825-834.

9. Huen Y.K.: Is Pie Periodic?, INT. J. Math. Educ. Sci. Technol.,199?,VOL.??,NO.?,???-???. (in the press).

10. Huen Y.K.: Final value theorem in number sequences., INT. J. Math. Educ. Sci. Technol.,199?,VOL.-??,NO.?,???-???. (accepted).

Papers posted in this website which might be relevant for background information:

11. A Simple Introduction To Sequence Algebra - by Huen Y.K. (date release: 15.3.97) (38 KBytes, 11*A4 pages).

========================================================

12. Evaluations Of Normc( ) Function In Macsyma 2.2 - Huen Y.K. (Date Released 17/12/97, 14 Kbytes)

================================================

13. List Processing In Sequence Algebra - Huen Y.K. (Date Released 23/12/97, 20 Kbytes)

================================================

14. The Canonical Generating Function or CGF(z) ... - by Huen Y.K. (date released : 27.5..97) (24 KBytes, 7*A4s).

========================================================

15. Visual Solutions Of Number Theoretic Problems ..... - by Huen Y.K. (date released : 3.6.97) (38.3 KBytes, 10*A4s).

========================================================

16. Final Value Theorem Applied To Number Sequences... - by Huen Y.K. (date released : 5.6.97) (29.4 KBytes, 9*A4s).

========================================================

17. Methods Of Developing Sequence Algebraic Formulations For Comp(z) and Prime(z) - by Huen Y.K. (date released : 20.6.97) (36.8 KBytes, 10*A4s).

========================================================

18. Composite Number Sequence Challenge 1/97 - by Huen Y.K. (date released : 28.6.97) (24.8 KBytes, 7*A4s).

========================================================

19. Lemmata, Corollaries, And Theorems In Sequence Order Analysis. - by Huen Y.K. (date released : 6.7.97) (38.3 KBytes, 12*A4s).

========================================================

20. Improved Formulations For Comp(z) and Prime(z) - by Huen Y.K. (date released : 16.9.97) (17 KBytes ).

========================================================

21. Detecting False Reports in Primality Tests By The Oddcomp(z) Method. - by Huen Y.K. (date released : 18.9.97, Revised 20/9) (26 KBytes ).

========================================================

22. The Throwing Power Of Oddcomp(z). - by Huen Y.K. (date released : 24.9.97 ) (15 Kbytes).

========================================================

23. Sequence Algebraic Approach To Prime Number Theorem - by Huen Y.K. (date released : 28.9.97 ) (21 Kbytes).

========================================================

24. Generating Functions - Closed Forms vs Open Forms - by Huen Y.K. (date released : 1.10.97 ) (21 Kbytes).

========================================================

25. Generating Large Odd Composite With Two Prime Factors - by Huen Y.K. (date released : 3.10.97 ) (13.5 Kbytes).

========================================================

26. In Search Of Counter- Examples In Maple's Isprime Function. - by Huen Y.K. (date released : 4.10.97 ) (18 Kbytes).

========================================================

27. A Sequence Algebraist's View Of Lehmann's Primality Test - by Huen Y.K. (date released : 6.10.97 ) (26 Kbytes).

========================================================

28. On Odd(z), Oddcomp(z), Seq1(z) and Seq2(z) - by Huen Y.K. (date released : 10.10.97 ) (17 Kbytes).

========================================================

29. How To Generate A Short And Contiguous Oddcomp(z) Sequence? - by Huen Y.K. (date released : 15.10.97 ) (13 Kbytes).

========================================================
=====================END OF PAPER ======================