Access:

» Developing a custom database - LhimkDB. Part 4: adding tree structure support

Related categories: Other DB | Tutorials | Database programming

Paweł Marciniak
Viewed: 3902 | Article date: 2006-05-11 15:21:57

In this final article of the series, Paweł will extend the basic database to support tree-structured data storage. LhimkDB layers, the tree structure functionality can be used as an independent component, and adding it does not preclude the use of LhimkDB as a key-value database. The author will emphasis that the database development is easy and nice.

Three issues ago I started a series of articles on developing a custom database system. Just to remind you, so far we've built an embedded key-value database. In this final article of the series we will extend the basic database to support tree-structured data storage. As with the remaining LhimkDB layers, the tree structure functionality can be used as an independent component, and adding it does not preclude the use of LhimkDB as a key-value database.

About the author

Paweł Marciniak has been involved with software development for 20 years now, using mainly C/C++. For the past 3 years he has been working on the Lhimk dynamic compilation framework.

Contact with the author: pawel@software.com.pl

LhimkDB is developed in a language of my devising called Lhimk. It is so similar to C/C++ that there is little point in learning it separately, let alone writing tutorials for it. Any language features found in Lhimk and not found in C/C++ are described in the inset Lhimk or within the article itself. Anyone who doesn't fancy using Lhimk (which I can well understand) can very easily rewrite all Lhimk code in C++. Lhimk code can also be linked to from within C/C++ code.

Lhimk

Lhimk is a dynamic compilation framework for a C-style language of the same name. Lhimk is an object-oriented language with very simple syntax, so practically anyone familiar with C/C++ can get programming straight away. The syntax is based on C, with added multi-inheritance, operator overloading, templates, signals, exceptions and a few other features. The Lhimk environment provides dynamic compilation, with the program code being compiled at runtime. The compiler is extremely fast, so the whole process takes less time than loading many libraries into traditionally compiled applications.

Lhimk is released partly on the LGPL licence and partly on the BSD licence. The duality is due to the use of LGPL-licensed TinyCC code - it is my intention to release all my own code with a BSD licence.

A module is the basic Lhimk code unit. Each module has a unique address, such as:

\lhimk.org\Db\UDB

Lhimk does not divide code into header files and implementations. To refer to a type or object from another module, you simply specify its full path. For example, here is what a reference to the UDB class would look like (references to types are preceded by @):

@\lhimk.org\Db\UDB\UDB

UDB is specified twice because the UDB class resides in a module of the same name. Global objects (built-in functions) are preceded by a backslash:

\printf("Hello!\n");

Classes can include virtual and static methods. Lhimk code usually does not refer to objects directly, but only through pointers (the Lhimk runtime has an integrated garbage collector). New objects are typically created like this:

@\lhimk.org\Db\UDB\UDB *udb = \lhimk.org\Db\UDB\UDB::new();

where new() is an ordinary static method that returns a UDB*.

Templates are instantiated within the current compilation unit (module). For example, if we wanted a list of integers, we could write:

@\lhimk.org\LTL\List\List[int T] *udb = @\lhimk.org\LTL\List\List[int T]::new();

This would cause the \lhimk.org\LTL\List\ module to be compiled with the type T defined as int. The approach is much simpler than for C++ templates and can sometimes offer more flexibility (you could for example add methods to classes depending on the type specified for T).

Lhimk signals provide functionality equivalent to an array of function pointers. A signal is declared using the signal keyword:

signal changed(A*);

Once the signal is declared, we can specify the method that should be called:

this->changed << a->handle; // no ()!

From now on, any call such as:

this->changed->call(a);

will trigger the call:

a->handle();

Lhimk supports generic programming though the use of # operators, such as #s, #d, #i etc. For example, in the following code:

string = "Number: " + count#s + " value: " + value#s;

count and value will be converted to strings, regardless of their actual types. The #d operator converts to a Datum, #i converts to an integer and so forth.

And that's about all you need to know to use Lhimk!

The Datum class

Before we set about implementing the actual tree structure, we need to add one simple and unexciting yet indispensable feature: the ability to store arbitrary data types in the database. The smallest piece of data that can be stored in a database is by convention simply called a datum. The Datum type in Lhimk type is similar to the Variant type found in high-level languages and some object-oriented libraries. There's nothing magical about it - a Datum object simply stores some data with a type identifier and provides a routine for serialising the data to a byte stream to be stored in the database. Serialisation is trivial for primitive types and strings, but for classes things get a bit more complicated. Fortunately, Lhimk provides a dynamic compilation environment, so we can store a class name and later use it to recreate an object by calling the special constructor method deserialize(). An Lhimk class name is also the URL where the source code for the class can be found, so even if we're loading someone else's database and don't have the classes stored in it, Lhimk can dynamically load and compile the relevant classes at runtime.

The Datum class has one more function: it allows generic programming. Let's examine an example. Whenever we store a key-value pair in the database, we need to make calls similar to:

key->serialize(stream);

value->serialize(stream);

Note that we don't know the actual types of key and value. If we could assume they are always class types, there would be no problem - we'd simply need to require classes to implement the serialize() method. Unfortunately, this need not be the case. In C++, the problem is solved by defining specialised methods, so typical C++ code for the above would be:

stream->serialize(key);

stream->serialize(value);

In the Stream class we would then define multiple versions of the serialize() method, specialised to cater for various argument types:

void serialize(int val);

void serialize(float val);

...

The theory is that the compiler should pick the right version for the type of argument passed. In practice, however, type conversion rules in C++ are fairly complex, and if you add to that implicit class conversion operators, it is all to easy to write even a short C++ program that is completely illegible. Using these techniques for larger projects is simply out of the question.

Being aware of these caveats, I obviously wanted to avoid implementing method specialisation (also called parametric polymorphism or multi-methods) in Lhimk. I also had another reason to avoid it: ensuring maximum compilation speed, which is especially important in environments with dynamic source code compilation at runtime. The solution implemented in Lhimk is to use the # operator to write generic code. For example, the #s operator converts its operand to a string, so we can write:

string = "Number: " + 100#s + " value: " + 11.22#s;

However, we can also write:

string = "Number: " + count#s + " value: " + value#s;

regardless of the actual types of count and value. For primitive types, the conversion is handled directly by the compiler, while for class types the operator_hash_s() method is called.

By the same token, we can convert any value (regardless of type) to a Datum using the #d operator. The code for serialising key-value pairs is now:

key#d->serialize(stream);

value#d->serialize(stream);

and the serialize() method is implemented just once, in the Datum class. Now we can store arbitrary values in the database:

@\lds\SkipList[Datum* T][Datum* K]* list =

\lds\SkipList[Datum* T][Datum* K]::new("SKIPDB", 1);

list->beginTransaction();

list[[10#d]] = "abcd"#d;

list->commitTransaction();

Tree structure part 1

A database element can store any data type, so if we store a list of elements, we will have a tree. The actual tree structure implementation comes almost for free - a bonus for using generic programming and the Datum type. I honestly spent two weeks thinking about it and finally sat down and wrote it in an hour. Natural though it was, the solution initially just seemed too trivial for a production application (see Figure 1). However, before we set about implementing tree structure, we need to add another vital database construct: user transactions.

Figure 1. Building a tree using the Datum type and lists

Page: 1 2
Buy article Buy subscription
Buy now add to cart
add to cart
Standard price: 2€/$3 Standard price: 25€/$30
Buy article for as little as (2€/$3) each allow access to individual articles. Buy a full access to our Software Developers's Journal archive portal. You will be able to read the articles from all archive issues from year 2005 and 2006. For just 25€/$30 you get unrestricted access to the entire website for the whole year.
SDJhakin9

.SDJ Users:


.:Login
.:Password

[Register]
[Forgotten your password?]

...Shopping Cart

sum: 0 €
Choose currency:

...Topics

...Advertisement

www.acunetix.com www.verifysoft.com

...Conferences




...Print Edition Archive

...Affiliate Program



 

 

Subscribe | Contact Us | Newsletter | Privacy policy | Regulations | See all issues | About SDJ
Copyright C 2006 by Software Developer's Journal. All rights reserved.