Building Databases Via Sequence Algebra
by
Huen Y.K.
CAHRC, P.O.Box 1003, Singapore 911101
http://web.singnet.com.sg/~activweb/
Related URL-sites: http://web.singnet.com.sg/~huens/
email: huens@mbox3.singnet.com.sg
(A short communication - 1st released: 28/12/97, Revised:28/12)
Abstract
Combining sequence algebra and list processing functions in Macsyma,
this paper shows how to build a small database engine in minutes in your own backyard.
Databases have come a long way and have become very sophisticated, so much so that learning
to use one has become a major undertaking. There are occasions where one needs
only a simple database for local use. A very simple database complete with its own
search engine can be built from a template in minutes using a symbolic software called Macsyma 2.2.
Although it is no match to the plethora of commerical database programs, nevertheless
its simplicity might be difficult to ignore if the present trend of migration of data
processing from servers to clients' sides is anything to go by.
1. Introduction
The developments of databases have come a long way with today's plethora of choices
such as VB Data Access Objects, Open Database Connectivity (ODBC), Remote Data Objects,
OLE DB, ActiveX Data Objects, dBASE, Btrieve, Paradox, FoxPro and ODBC data
sources. The choices are bewildering and the learning curves steep. For small local
database, these seem like overkills.
This paper describes a very simple database which you can setup in minutes
using Macsyma 2.2. It is crude but if you wish to transmit your database together
with the search engine across the web to your readers, you will begin to appreciate
why such a database has it niche. Conventional database across the web cannot
be used without involving a server which means time delay in signal transmissions
and receptions. The project demonstrates once again the increasing number of potential
applications arising from sequence algebra.
The outer-product of two discrete number sequences F1(x) and F2(y) will generate a 2D-
number sequence involving order-variables x and y. One can extend this paradigm to an
nD-sequence by the outer product of n x 1D-sequences. Sequences are simply
lists in algebraic format. These can be easily converted over to list format and
manipulated by powerful list processing functions available in the LISP language. This
paper shows Macsyma 2.2 can be used for this purpose [3].
In Hgram graphical format, we start with 2D cartesian graphs as primitives. Since printed
pages and computer screens are 2-dimensional, one needs to overcome visual limitations by
using Hgram graphs [1(iii),(iv) and (v)]. To build a 4D-graphical space, we use 2 x 2D-subgraphs as primitives.
Given a 4D-point expressed in list format as P = [a,b,c,d], we can partition this into sublists
given by P = [[a,b], [c,d]]. The first list [a,b] is plotted in the first 2D-subgraph and the second
in the second subgraph. Whereas the first subgraph has a static graphical origin, the second
subgraph has a dynamic graphical orgin anchored on the nonstationary point [a,b]. It might
appear that there is no connection between Hgram graphs and sequence algebra but there
is since both are concerned with multidimensional visualisation. In fact a 1D sequences
can be regarded as a discrete cartesian axis. In this paper, the starting point of building
a database is via the outer product of 1D sequences.
2. The Concept Of Multidimensional Lists
This will be explained by way of an example:
Example 1: Building and manipulating a 3D-list based on sequence
algebraic sequences.
All terms in the 3D-sequence space can be either in rational fraction forms or in polynomial
power series form. In this example we use the rational fraction form where the denominator is
expressed as x^i*y^j*z^k where the coordinates of the 3D-point is ordered by point [i,j,k].
If variables are in algeraic forms, you can do away with the indices of i,j and k. In each term
every variable has the remaining part as its coefficient which can be extracted by list
processing using a symbolic software such as Macsyma 2.2. Starting with three
normalised 1D-sequences as cartesian axes as shown in equations (1) to (3), the outer
product will result in a normalised 3D-sequence as shown in equation (4).
..........................1........1.........1
............F1_x = ---- + ---- + ---- + 1 .............................................(1).
..........................x..........2.........3
....................................x.........x
..........................1........1.........1
............F1_y = ---- + ---- + ---- + 1 .............................................(2).
..........................y..........2.........3
....................................y.........y
..........................1.........1........1
............F1_z = ---- + ---- + ---- + 1 .............................................(3).
..........................z...........2........3
.....................................z........z
Equation (4) represents an empty database template awaiting the entries of data in
the numerators. Whilst the template can be generated algebraically, the database
only has information when these are entered. You can also mask away some
terms either as such information is not available or not required.
F1_xyz = F1_x*F1_y*F1_z =
(1/(x * y * z)) + (1/(x^2 * y * z)) + (1/(x^3 * y * z)) + (1/(y * z)) + (1/(x * y^2 * z)) + (1/(x^2 * y^2
* z))+ (1/(x^3 * y^2 * z)) + (1/(y^2 * z)) + (1/(x * y^3 * z)) + (1/(x^2 * y^3 * z)) + (1/(x^3 * y^3 *
z)) + (1/(y^3 * z))+ (1/(x * z)) + (1/(x^2 * z)) + (1/(x^3 * z)) + (1/z) + (1/(x * y * z^2)) + (1/(x^2 *
y * z^2)) + (1/(x^3 * y * z^2))+ (1/(y * z^2)) + (1/(x * y^2 * z^2)) + (1/(x^2 * y^2 * z^2)) + (1/(x^3
* y^2 * z^2)) + (1/(y^2 * z^2)) + (1/(x * y^3 * z^2))+ (1/(x^2 * y^3 * z^2)) + (1/(x^3 * y^3 * z^2))
+ (1/(y^3 * z^2)) + (1/(x * z^2)) + (1/(x^2 * z^2)) + (1/(x^3 * z^2))+ (1/(z^2)) + (1/(x * y * z^3)) +
(1/(x^2 * y * z^3)) + (1/(x^3 * y * z^3)) + (1/(y * z^3)) + (1/(x * y^2 * z^3))+ (1/(x^2 * y^2 * z^3))
+ (1/(x^3 * y^2 * z^3)) + (1/(y^2 * z^3)) + (1/(x * y^3 * z^3)) + (1/(x^2 * y^3 * z^3))+ (1/(x^3 *
y^3 * z^3)) + (1/(y^3 * z^3)) + (1/(x * z^3)) + (1/(x^2 * z^3)) + (1/(x^3 * z^3)) + (1/(z^3)) + (1/(x
* y))+ (1/(x^2 * y)) + (1/(x^3 * y)) + (1/y) + (1/(x * y^2)) + (1/(x^2 * y^2)) + (1/(x^3 * y^2)) +
(1/(y^2))+ (1/(x * y^3)) + (1/(x^2 * y^3)) + (1/(x^3 * y^3)) + (1/(y^3)) + (1/x) + (1/(x^2)) +
(1/(x^3)) + 1......................................................................................(4).
With multidimensional lists, linear orders can only arise if the orders of axes are specified.
Without orders, one will have 3! or 6 ways of accessing the lowest levelled lists as
x.y.[z_list], x.z.[y_list], y.x.[z_list], y.z.[x_list], z.x.[y_list] or z.y.[x_list]. By fixing the order as
x,y, and z, there is only one way of accessing elements in the list as x.y.[z_list]. 1-dimensional
arrays are used in the lowest levelled list which in this case is the z_list. The method
of accessing the elements are shown in equations (5) and (6) where one is interested to
access the sublist associated with the term 1/(x^2*y^2). In Macsyma, this sublist is simply the
coefficient expression to 1/(x^2*y^2) which can be either numeric or logic values.
F_xcoeff:coeff(F_xyz,1/(x^2));
This yields a 2D-list called F_xcoeff associated with 1/x^2, i.e., there are sixteen terms given
by equation (5).
(1/(y * z)) + (1/(y^2 * z)) + (1/(y^3 * z)) + (1/z) + (1/(y * z^2)) + (1/(y^2 * z^2)) + (1/(y^3 * z^2))
+ (1/(z^2)) + (1/(y * z^3)) + (1/(y^2 * z^3)) + (1/(y^3 * z^3)) + (1/(z^3)) + (1/y) + (1/(y^2))
+ (1/(y^3)) + 1.................................................................................(5).
F_xycoeff:coeff(F_xcoeff,1/(y^2));
This yields a 1D-list called F_xycoeff associated with 1/(x^2*y^2), i.e. there are four terms
given by equation (6). This is the lowest levelled sublist.
(1/z) + (1/(z^2)) + (1/(z^3)) + 1..........................................................(6).
Although the above is called a list, it is still in sequence algebraic format. To make a real
list, the args-function will be applied on equation (6) as shown in equation (7):
Fz: args(F_xycoeff);
...............................1......1......1
...........................[ ----, ----, ----, 1] ..................................................(7).
...............................z........2.......3
.......................................z.......z
Suppose we wish to assign a special value of 100 to the numerator of the second
term 1/z^2. The procedures are shown from equations (8) to (13).
Fz: args(F_xycoeff);
..............................1.......1......1
...........................[ ----, ----, ----, 1] .................................................(8).
..............................z.........2......3
.......................................z......z
Fzcoeffs:reverse(makelist(coeff(F_xycoeff,z,i),i,lopow(F_xycoeff,z),0));
..............................[1, 1, 1, 1] ................................................(9).
Fzcoeffs[3]:100;
..............................100 .........................................................(10).
Fzcoeffs;
..............................[1, 1, 100, 1].............................................(11).
Sublist_z:makelist(Fzcoeffs[i]/z^(i-1),i,1,4);
..........................................1......100.....1
...................................[1, ----, ------, ----] ........................................(12).
..........................................z............2......3
.....................................................z.......z
apply("+",%);
....................................1........100........1
..................................---- + ------ + ---- + 1 ......................................(13).
....................................z............2..........3
...............................................z...........z
Up to this point we have only shown how to enter data into the multidimensional
list. The next section will demonstrate a simple example.
3. A Simple Searchable Database
Here is how a simplified biological database concerning attributes of the three
kingdoms of animals, plants, and fungi are setup. The database only has two levels
represented by x1 and x2. As information is hierachical, you will need to plan
these lists carefully. The outer product of x1 and x2 will generate a template by
which you can enter the last level of information which in this example has only
three fuzzy levels of true (t), false (f) and probable (p).
x1:animal+plant+fungus;
....................animal+plant+fungus
x2:hunter+parasite+photosynthesizer;
....................photosynthesizer + parasite + hunter
template:expand(x1*x2);
photosynthesize * plant + parasite * plant + hunter * plant
+ fungus * photosynthesize + animal * photosynthesize
+ fungus * parasite + animal * parasite + fungus * hunter
+ animal * hunter
Below we enter the data as either t for true, f for false and p for probable. For
a developed database, data can be entered using the procedures described in
section 2.
Database:
...................photosynthesize * plant * t + animal * hunter * t + p * parasite * plant + f
* hunter * plant + f * fungus * photosynthesize + animal * f
* photosynthesize + fungus * p * parasite + animal * p * parasite + f
* fungus * hunter
Below, we subject the database to various test searches. We expect answers to
be either true, false or probable but if you enter nonsense, the answer will be 0 which
means no answer.
coeff(xxx:coeff(Database,hunter),animal);
Interpretation: Query: Are animals hunters? The answer is probable since some
animals are not hunters.
......................p
coeff(xxx:coeff(Database,fungus),animal);
Interpretation: Is fungus a member of the animal kingdom? The answer is 0 because
this information is not available from the database. If such information is required
you will need to set up two levels for animal, plant, and fungus with different postfixes
such as in level 1: animal_1, plant_1, and fungus_1 whilst in level 2: animal_2,
platn_2, and fungus_2. This is necessary since Macsyma does not treat x and a
coefficient of x^2 unless we put the product as x1*x2 where x1 will be treated as
the coefficient of x2 and vice versa. In other words, we cannot use the same atom within
the same list, nested or not.
.....................0
coeff(xxx:coeff(Database,t),fungus);
Not all fungi are parasitic. The lichen family is not parasitic. The next query
gives the correct answer.
......................0
coeff(xxx:coeff(Database,p),fungus);
......................p
4. Likely Applications
Often in multidimensional problems, the numerator coefficient is computed using values of
predefined surrounding nodes such as in the finite difference method of solving stationary
problems of Laplace's equation [2]. This should not be a problem since numerator functions
F(i,j) can be manipulated separately from the denominator expressions. For example in
Laplace's equation for 2D steady-state heat conduction with known boundary conditions, the
average temperature at a node (i,j) would be given by
..............F(i-1,j) + F(i+1,j) + F(i,j-1) + F(i,j+1)
F(i,j) = ----------------------------------------- ..........................(14).
........................................4
We would start with a normalised 2D-sequence space which is equivalent to an empty
graphical paper or template and assign boundary temperature values. Internally all nodes can start with
unity values of numerator coefficients but by repeated computations with minimisation of
weighted residues, one can force the internal temperature nodes to converge to a steady-state
pattern using equation (4). Such examples are well covered in most textbooks on heat
conduction and will not be repeated here. Using sequence algebra and list processing
it is easier to handle problems of high dimensions which might be difficult since
most symbolic and compiling softwares have upper limits on the dimensions of
arrays. In Macsyma 2.2 this is limited to 5 dimensions. Using the present method,
there is no such constraints since only 1D-arrays are used for the lowest levelled
list.
Most library and Internet databases have deeply nested dimensions. Database search can
be easily translated into search in high dimensional sequence space. Symbolic softwares
might be very suitable for such applications.
4. Summary
This paper outlines how multidimensional data can be manipulated by symbolic
softwares. High
dimensional arrays take a lot of memory resources especially for those with irregular
boundaries. Most software packages have upper limits on the number
of dimensions permitted. If this is a constraint one could bypass the use of dimensioned
arrays as described in this paper.
5. References
Comments: Not all titles are directly referenced in the main paper. These
are included for background readings for readers who are not familiar with sequence
algebra. Only papers having some relevance to the paper are included.
(1) The following articles are either published or in the press
but for copyright reasons, these cannot be reproduced here for download. Please contact
the publishers: Taylor & Francis Ltd., 1 Gunpowder Square, London, EC4A 3DE, England,
for further information:
(i) Some interesting properties of the natural number system, INT.J.MATH.EDUC.SCI.TECHNOL,
1996,VOL.27,NO.5,685-691.
(ii) Visual algebra and its applications, INT.J.MATH.EDUC.SCI.TECHNOL,
1997,VOL.28,NO.3, 333-344.
(iii) Huen Y.K. 1988: An introduction to the HGRAM
(an n-Dimensional Graph Paper). Copyright application, July 1988, USA Library of
Congress Copyright Registration No. Txu 354026.
(iv) Huen Y.K.: 1991: The HGRAM graphical format -
its geometrical interpretation and its applications, Int. J. Math. Educ. Sci.Technol.,
22,No.3, 403-418.
(v)Liyaw A.P., Kiu C.M., Pok Y.M. and Huen Y.K.: Hgram patterns of Routh stability zones in linear
systems, INT.J.MATH.EDUC.SCI.TECHNOL,
1997,VOL.28,NO.2,225-241.
=========================================================
2. Hanna O.T. and Sandall O.C. (1995): Computational Methods In Chemical Engineering
, Prentice Hall International Editions, New Jersey, pp373 to 376.
========================================================
3. Macsyma (1996):Macsyma symbolic/numeric/graphical mathematics
software - Mathematics and System Reference Manual (16th Edition).
========================================================
4. A Simple Introduction To Sequence
Algebra - by Huen Y.K.
(date release: 15.3.97) (38 KBytes, 11*A4 pages).
========================================================
5. The Canonical Generating Function
or CGF(z) ... - by Huen Y.K.
(date released : 27.5..97) (24 KBytes, 7*A4s).
========================================================
6. Unsolved Problems In Sequence
Algebra - by Huen Y.K. (date released : 6.6.97) (29.4 KBytes, 9*A4s).
========================================================
7. Methods Of Developing Sequence
Algebraic Formulations For Comp(z) and Prime(z) - by Huen Y.K. (date released : 20.6.97) (36.8 KBytes, 10*A4s).
========================================================
8. Lemmata, Corollaries, And
Theorems In Sequence Order Analysis. - by Huen Y.K. (date released : 6.7.97) (38.3 KBytes, 12*A4s).
========================================================
9. Generating Functions -
Closed Forms vs Open Forms
- by Huen Y.K. (date released : 1.10.97 ) (21 Kbytes).
========================================================
10. Evaluations Of Normc( ) Function
In Macsyma 2.2
- Huen Y.K. (Date Released 17/12/97, 14 Kbytes)
==================== END OF PAPER =================