Access:

» Berkeley DB - Managing data on the move

Related categories: Berkeley DB

Michael O'Sullivan
Viewed: 11007 | Article date: 2005-12-13 15:30:03

Berkeley DB - in a nutshell - it is an open source storage management library that can be linked into an application to provide robust, easy-to-use data management capability. Berkeley DB runs inside the process, which makes it easier to deploy and manage and also means that it typically runs significantly faster than a typical client-server database management system. Michael illustrates how an in-process database such as Berkeley DB can provide a practical alternative to the relational database system, which is the default – though often inappropriate – data storage solution used in many applications.
About the Autor

Michael O'Sullivan is a consultant working with Sleepycat Software in Europe. An architect of several commercial network management and distributed systems, he currently assists Berkeley DB users in Europe with architecture and design issues.

He can be contacted at michael@sleepycat.com

Many developers will be familiar with the challenges of integrating an application with a relational database management system (RDBMS). Application data, such as XML documents, multimedia data-types or complex object structures must be translated into the tabular format required by this kind of database system. Apart from increasing the complexity of the application, this can significantly add to the length of time needed to package, test and debug the application. For non-trivial applications, connection management and session management must also be considered if the necessary system performance and reliability is going to be achieved.

For many developers, these challenges are so familiar that the question of whether or not an RDBMS is the best persistence mechanism to use is never considered. However, given the time and effort costs and the extra complexity that an RDBMS inevitably introduces, it can be profitable to consider other storage mechanisms.

Despite the huge variety in data structures and data access patterns that we need to use, it is possible to divide our applications into two broad categories. The first category of application involves dynamic queries on data which is generally static, that is, data which does not change frequently. Consider the customer record system of a large bank, where the customer information and account details do not change particularly often and where a senior manager may wish to perform an ad-hoc query, such as Tell me the names of all the customers in southern Finland who have more than 5,000EUR; in their current account?. The key point here is that, as database designers, we do not know in advance what queries our end users wish to run on the data.

By contrast, the other category of application involves mostly static queries over dynamic data. Consider the case of a telephone billing system, where the data associated with each call must be recorded and consolidated so that accurate customer bills may be prepared. Such systems run the same queries repeatedly, while the data in the database is constantly changing as new call records are being created and old records are deleted. There is typically no requirement to run ad-hoc queries over such data until it has been refined into meaningful customer-specific information; in this case, the queries are known at application design time. Another interesting feature of static data/dynamic query applications is that the data often exists for short periods of time - seconds, minutes or hours, rather than the longer lifetimes associated with dynamic data/static query systems, such as our banking customer record system above.

Static data/dynamic query applications have specific requirements for data management which do not require the full features of an SQL-based RDBMS or else have performance requirements that cannot easily be satisfied by such database systems. This article focuses on building persistence into these applications using Berkeley DB.

But what exactly is Berkeley DB? In a nutshell, it is an open source storage management library that can be linked into an application to provide robust, easy-to-use data management capability. Berkeley DB runs inside the process, which makes it easier to deploy and manage and also means that it typically runs significantly faster than a typical client - server database management system.

Building a simple example

The Berkeley DB API is very straightforward. We will build a simple example to introduce the concepts of data storage, search and retrieval using this API.

Let us start with the following application structure describing an item of merchandising stock we wish to store in the database.

Listing 1. The stock_item structure

typedef struct my_stock_item {
    char *name;               /* product name */
    char *category;           /* product category */
    float amount;             /* price per item */
} MY_STOCK_ITEM

There are several interesting things to note, even at this early stage in the application design. Firstly, Berkeley DB does not impose any schema on the application. It simply stores the data as an array of bytes with one or more associated keys. The advantage of this is that there is no requirement to switch from the native programming language into the table-oriented model associated with SQL. This eliminates a significant amount of coding required to map data to and from the SQL data types.

In fact, because Berkeley DB is linked into the application, all commands are made via API calls. There are C, C++ and Java APIs as well as support for .NET, Perl, PHP, Python and other scripting languages. For simplicity, the C API is used in this article but the Berkeley DB API is similar in the other languages.

Page: 1 2 3 4 5
Buy article Buy subscription
Buy now add to cart
add to cart
Standard price: 2€/$3 Standard price: 25€/$30
Buy article for as little as (2€/$3) each allow access to individual articles. Buy a full access to our Software Developers's Journal archive portal. You will be able to read the articles from all archive issues from year 2005 and 2006. For just 25€/$30 you get unrestricted access to the entire website for the whole year.
SDJhakin9

.SDJ Users:


.:Login
.:Password

[Register]
[Forgotten your password?]

...Shopping Cart

sum: 0 €
Choose currency:

...Topics

...Advertisement

www.acunetix.com www.verifysoft.com

...Conferences




...Print Edition Archive

...Affiliate Program



 

 

Subscribe | Contact Us | Newsletter | Privacy policy | Regulations | See all issues | About SDJ
Copyright C 2006 by Software Developer's Journal. All rights reserved.