Building Databases Via Sequence Algebra

by

Huen Y.K.

CAHRC, P.O.Box 1003, Singapore 911101
http://web.singnet.com.sg/~activweb/
Related URL-sites: http://web.singnet.com.sg/~huens/
email: huens@mbox3.singnet.com.sg

(A short communication - 1st released: 28/12/97, Revised:28/12)


Abstract

Combining sequence algebra and list processing functions in Macsyma, this paper shows how to build a small database engine in minutes in your own backyard. Databases have come a long way and have become very sophisticated, so much so that learning to use one has become a major undertaking. There are occasions where one needs only a simple database for local use. A very simple database complete with its own search engine can be built from a template in minutes using a symbolic software called Macsyma 2.2. Although it is no match to the plethora of commerical database programs, nevertheless its simplicity might be difficult to ignore if the present trend of migration of data processing from servers to clients' sides is anything to go by.


1. Introduction

The developments of databases have come a long way with today's plethora of choices such as VB Data Access Objects, Open Database Connectivity (ODBC), Remote Data Objects, OLE DB, ActiveX Data Objects, dBASE, Btrieve, Paradox, FoxPro and ODBC data sources. The choices are bewildering and the learning curves steep. For small local database, these seem like overkills. This paper describes a very simple database which you can setup in minutes using Macsyma 2.2. It is crude but if you wish to transmit your database together with the search engine across the web to your readers, you will begin to appreciate why such a database has it niche. Conventional database across the web cannot be used without involving a server which means time delay in signal transmissions and receptions. The project demonstrates once again the increasing number of potential applications arising from sequence algebra.

The outer-product of two discrete number sequences F1(x) and F2(y) will generate a 2D- number sequence involving order-variables x and y. One can extend this paradigm to an nD-sequence by the outer product of n x 1D-sequences. Sequences are simply lists in algebraic format. These can be easily converted over to list format and manipulated by powerful list processing functions available in the LISP language. This paper shows Macsyma 2.2 can be used for this purpose [3].

In Hgram graphical format, we start with 2D cartesian graphs as primitives. Since printed pages and computer screens are 2-dimensional, one needs to overcome visual limitations by using Hgram graphs [1(iii),(iv) and (v)]. To build a 4D-graphical space, we use 2 x 2D-subgraphs as primitives. Given a 4D-point expressed in list format as P = [a,b,c,d], we can partition this into sublists given by P = [[a,b], [c,d]]. The first list [a,b] is plotted in the first 2D-subgraph and the second in the second subgraph. Whereas the first subgraph has a static graphical origin, the second subgraph has a dynamic graphical orgin anchored on the nonstationary point [a,b]. It might appear that there is no connection between Hgram graphs and sequence algebra but there is since both are concerned with multidimensional visualisation. In fact a 1D sequences can be regarded as a discrete cartesian axis. In this paper, the starting point of building a database is via the outer product of 1D sequences.

2. The Concept Of Multidimensional Lists

This will be explained by way of an example:

Example 1: Building and manipulating a 3D-list based on sequence algebraic sequences. All terms in the 3D-sequence space can be either in rational fraction forms or in polynomial power series form. In this example we use the rational fraction form where the denominator is expressed as x^i*y^j*z^k where the coordinates of the 3D-point is ordered by point [i,j,k]. If variables are in algeraic forms, you can do away with the indices of i,j and k. In each term every variable has the remaining part as its coefficient which can be extracted by list processing using a symbolic software such as Macsyma 2.2. Starting with three normalised 1D-sequences as cartesian axes as shown in equations (1) to (3), the outer product will result in a normalised 3D-sequence as shown in equation (4).

..........................1........1.........1
............F1_x = ---- + ---- + ---- + 1 .............................................(1).
..........................x..........2.........3
....................................x.........x

..........................1........1.........1
............F1_y = ---- + ---- + ---- + 1 .............................................(2).
..........................y..........2.........3
....................................y.........y

..........................1.........1........1
............F1_z = ---- + ---- + ---- + 1 .............................................(3).
..........................z...........2........3
.....................................z........z

Equation (4) represents an empty database template awaiting the entries of data in the numerators. Whilst the template can be generated algebraically, the database only has information when these are entered. You can also mask away some terms either as such information is not available or not required.

F1_xyz = F1_x*F1_y*F1_z =

(1/(x * y * z)) + (1/(x^2 * y * z)) + (1/(x^3 * y * z)) + (1/(y * z)) + (1/(x * y^2 * z)) + (1/(x^2 * y^2 * z))+ (1/(x^3 * y^2 * z)) + (1/(y^2 * z)) + (1/(x * y^3 * z)) + (1/(x^2 * y^3 * z)) + (1/(x^3 * y^3 * z)) + (1/(y^3 * z))+ (1/(x * z)) + (1/(x^2 * z)) + (1/(x^3 * z)) + (1/z) + (1/(x * y * z^2)) + (1/(x^2 * y * z^2)) + (1/(x^3 * y * z^2))+ (1/(y * z^2)) + (1/(x * y^2 * z^2)) + (1/(x^2 * y^2 * z^2)) + (1/(x^3 * y^2 * z^2)) + (1/(y^2 * z^2)) + (1/(x * y^3 * z^2))+ (1/(x^2 * y^3 * z^2)) + (1/(x^3 * y^3 * z^2)) + (1/(y^3 * z^2)) + (1/(x * z^2)) + (1/(x^2 * z^2)) + (1/(x^3 * z^2))+ (1/(z^2)) + (1/(x * y * z^3)) + (1/(x^2 * y * z^3)) + (1/(x^3 * y * z^3)) + (1/(y * z^3)) + (1/(x * y^2 * z^3))+ (1/(x^2 * y^2 * z^3)) + (1/(x^3 * y^2 * z^3)) + (1/(y^2 * z^3)) + (1/(x * y^3 * z^3)) + (1/(x^2 * y^3 * z^3))+ (1/(x^3 * y^3 * z^3)) + (1/(y^3 * z^3)) + (1/(x * z^3)) + (1/(x^2 * z^3)) + (1/(x^3 * z^3)) + (1/(z^3)) + (1/(x * y))+ (1/(x^2 * y)) + (1/(x^3 * y)) + (1/y) + (1/(x * y^2)) + (1/(x^2 * y^2)) + (1/(x^3 * y^2)) + (1/(y^2))+ (1/(x * y^3)) + (1/(x^2 * y^3)) + (1/(x^3 * y^3)) + (1/(y^3)) + (1/x) + (1/(x^2)) + (1/(x^3)) + 1......................................................................................(4).

With multidimensional lists, linear orders can only arise if the orders of axes are specified. Without orders, one will have 3! or 6 ways of accessing the lowest levelled lists as x.y.[z_list], x.z.[y_list], y.x.[z_list], y.z.[x_list], z.x.[y_list] or z.y.[x_list]. By fixing the order as x,y, and z, there is only one way of accessing elements in the list as x.y.[z_list]. 1-dimensional arrays are used in the lowest levelled list which in this case is the z_list. The method of accessing the elements are shown in equations (5) and (6) where one is interested to access the sublist associated with the term 1/(x^2*y^2). In Macsyma, this sublist is simply the coefficient expression to 1/(x^2*y^2) which can be either numeric or logic values.

F_xcoeff:coeff(F_xyz,1/(x^2));

This yields a 2D-list called F_xcoeff associated with 1/x^2, i.e., there are sixteen terms given by equation (5).

(1/(y * z)) + (1/(y^2 * z)) + (1/(y^3 * z)) + (1/z) + (1/(y * z^2)) + (1/(y^2 * z^2)) + (1/(y^3 * z^2)) + (1/(z^2)) + (1/(y * z^3)) + (1/(y^2 * z^3)) + (1/(y^3 * z^3)) + (1/(z^3)) + (1/y) + (1/(y^2)) + (1/(y^3)) + 1.................................................................................(5).

F_xycoeff:coeff(F_xcoeff,1/(y^2));

This yields a 1D-list called F_xycoeff associated with 1/(x^2*y^2), i.e. there are four terms given by equation (6). This is the lowest levelled sublist.

(1/z) + (1/(z^2)) + (1/(z^3)) + 1..........................................................(6).

Although the above is called a list, it is still in sequence algebraic format. To make a real list, the args-function will be applied on equation (6) as shown in equation (7):

Fz: args(F_xycoeff);

...............................1......1......1
...........................[ ----, ----, ----, 1] ..................................................(7).
...............................z........2.......3
.......................................z.......z

Suppose we wish to assign a special value of 100 to the numerator of the second term 1/z^2. The procedures are shown from equations (8) to (13).

Fz: args(F_xycoeff);

..............................1.......1......1
...........................[ ----, ----, ----, 1] .................................................(8).
..............................z.........2......3
.......................................z......z

Fzcoeffs:reverse(makelist(coeff(F_xycoeff,z,i),i,lopow(F_xycoeff,z),0));

..............................[1, 1, 1, 1] ................................................(9).

Fzcoeffs[3]:100;

..............................100 .........................................................(10).

Fzcoeffs;

..............................[1, 1, 100, 1].............................................(11).

Sublist_z:makelist(Fzcoeffs[i]/z^(i-1),i,1,4);

..........................................1......100.....1
...................................[1, ----, ------, ----] ........................................(12).
..........................................z............2......3
.....................................................z.......z

apply("+",%);

....................................1........100........1
..................................---- + ------ + ---- + 1 ......................................(13).
....................................z............2..........3
...............................................z...........z

Up to this point we have only shown how to enter data into the multidimensional list. The next section will demonstrate a simple example.


3. A Simple Searchable Database

Here is how a simplified biological database concerning attributes of the three kingdoms of animals, plants, and fungi are setup. The database only has two levels represented by x1 and x2. As information is hierachical, you will need to plan these lists carefully. The outer product of x1 and x2 will generate a template by which you can enter the last level of information which in this example has only three fuzzy levels of true (t), false (f) and probable (p).

x1:animal+plant+fungus;

....................animal+plant+fungus

x2:hunter+parasite+photosynthesizer;

....................photosynthesizer + parasite + hunter

template:expand(x1*x2);

photosynthesize * plant + parasite * plant + hunter * plant + fungus * photosynthesize + animal * photosynthesize + fungus * parasite + animal * parasite + fungus * hunter + animal * hunter

Below we enter the data as either t for true, f for false and p for probable. For a developed database, data can be entered using the procedures described in section 2.

Database:

...................photosynthesize * plant * t + animal * hunter * t + p * parasite * plant + f * hunter * plant + f * fungus * photosynthesize + animal * f * photosynthesize + fungus * p * parasite + animal * p * parasite + f * fungus * hunter

Below, we subject the database to various test searches. We expect answers to be either true, false or probable but if you enter nonsense, the answer will be 0 which means no answer.

coeff(xxx:coeff(Database,hunter),animal);
Interpretation: Query: Are animals hunters? The answer is probable since some animals are not hunters.

......................p

coeff(xxx:coeff(Database,fungus),animal);
Interpretation: Is fungus a member of the animal kingdom? The answer is 0 because this information is not available from the database. If such information is required you will need to set up two levels for animal, plant, and fungus with different postfixes such as in level 1: animal_1, plant_1, and fungus_1 whilst in level 2: animal_2, platn_2, and fungus_2. This is necessary since Macsyma does not treat x and a coefficient of x^2 unless we put the product as x1*x2 where x1 will be treated as the coefficient of x2 and vice versa. In other words, we cannot use the same atom within the same list, nested or not.

.....................0

coeff(xxx:coeff(Database,t),fungus);

Not all fungi are parasitic. The lichen family is not parasitic. The next query gives the correct answer.

......................0

coeff(xxx:coeff(Database,p),fungus);

......................p

4. Likely Applications

Often in multidimensional problems, the numerator coefficient is computed using values of predefined surrounding nodes such as in the finite difference method of solving stationary problems of Laplace's equation [2]. This should not be a problem since numerator functions F(i,j) can be manipulated separately from the denominator expressions. For example in Laplace's equation for 2D steady-state heat conduction with known boundary conditions, the average temperature at a node (i,j) would be given by

..............F(i-1,j) + F(i+1,j) + F(i,j-1) + F(i,j+1)
F(i,j) = ----------------------------------------- ..........................(14).
........................................4

We would start with a normalised 2D-sequence space which is equivalent to an empty graphical paper or template and assign boundary temperature values. Internally all nodes can start with unity values of numerator coefficients but by repeated computations with minimisation of weighted residues, one can force the internal temperature nodes to converge to a steady-state pattern using equation (4). Such examples are well covered in most textbooks on heat conduction and will not be repeated here. Using sequence algebra and list processing it is easier to handle problems of high dimensions which might be difficult since most symbolic and compiling softwares have upper limits on the dimensions of arrays. In Macsyma 2.2 this is limited to 5 dimensions. Using the present method, there is no such constraints since only 1D-arrays are used for the lowest levelled list.

Most library and Internet databases have deeply nested dimensions. Database search can be easily translated into search in high dimensional sequence space. Symbolic softwares might be very suitable for such applications.

4. Summary

This paper outlines how multidimensional data can be manipulated by symbolic softwares. High dimensional arrays take a lot of memory resources especially for those with irregular boundaries. Most software packages have upper limits on the number of dimensions permitted. If this is a constraint one could bypass the use of dimensioned arrays as described in this paper.


5. References

Comments: Not all titles are directly referenced in the main paper. These are included for background readings for readers who are not familiar with sequence algebra. Only papers having some relevance to the paper are included.

(1) The following articles are either published or in the press but for copyright reasons, these cannot be reproduced here for download. Please contact the publishers: Taylor & Francis Ltd., 1 Gunpowder Square, London, EC4A 3DE, England, for further information:

(i) Some interesting properties of the natural number system, INT.J.MATH.EDUC.SCI.TECHNOL, 1996,VOL.27,NO.5,685-691.

(ii) Visual algebra and its applications, INT.J.MATH.EDUC.SCI.TECHNOL, 1997,VOL.28,NO.3, 333-344.

(iii) Huen Y.K. 1988: An introduction to the HGRAM (an n-Dimensional Graph Paper). Copyright application, July 1988, USA Library of Congress Copyright Registration No. Txu 354026.

(iv) Huen Y.K.: 1991: The HGRAM graphical format - its geometrical interpretation and its applications, Int. J. Math. Educ. Sci.Technol., 22,No.3, 403-418.

(v)Liyaw A.P., Kiu C.M., Pok Y.M. and Huen Y.K.: Hgram patterns of Routh stability zones in linear systems, INT.J.MATH.EDUC.SCI.TECHNOL, 1997,VOL.28,NO.2,225-241.

=========================================================
2. Hanna O.T. and Sandall O.C. (1995): Computational Methods In Chemical Engineering , Prentice Hall International Editions, New Jersey, pp373 to 376.

========================================================

3. Macsyma (1996):Macsyma symbolic/numeric/graphical mathematics software - Mathematics and System Reference Manual (16th Edition).

========================================================

4. A Simple Introduction To Sequence Algebra - by Huen Y.K. (date release: 15.3.97) (38 KBytes, 11*A4 pages).

========================================================

5. The Canonical Generating Function or CGF(z) ... - by Huen Y.K. (date released : 27.5..97) (24 KBytes, 7*A4s).

========================================================

6. Unsolved Problems In Sequence Algebra - by Huen Y.K. (date released : 6.6.97) (29.4 KBytes, 9*A4s).

========================================================

7. Methods Of Developing Sequence Algebraic Formulations For Comp(z) and Prime(z) - by Huen Y.K. (date released : 20.6.97) (36.8 KBytes, 10*A4s).

========================================================

8. Lemmata, Corollaries, And Theorems In Sequence Order Analysis. - by Huen Y.K. (date released : 6.7.97) (38.3 KBytes, 12*A4s).

========================================================

9. Generating Functions - Closed Forms vs Open Forms - by Huen Y.K. (date released : 1.10.97 ) (21 Kbytes).

========================================================

10. Evaluations Of Normc( ) Function In Macsyma 2.2 - Huen Y.K. (Date Released 17/12/97, 14 Kbytes)

==================== END OF PAPER =================