Search:
|
Access:
» Berkeley DB - Managing data on the moveRelated categories: Berkeley DB Michael O'SullivanViewed: 11007 | Article date: 2005-12-13 15:30:03 Berkeley DB - in a nutshell - it is an open source storage management library that can be linked into an application to provide robust, easy-to-use data management capability. Berkeley DB runs inside the process, which makes it easier to deploy and manage and also means that it typically runs significantly faster than a typical client-server database management system.
Michael illustrates how an in-process database such as Berkeley DB can provide a practical alternative to the relational database system, which is the default – though often inappropriate – data storage solution used in many applications.
About the Autor
Michael O'Sullivan is a consultant working with Sleepycat Software in Europe. An architect of several commercial network management and distributed systems, he currently assists Berkeley DB users in Europe with architecture and design issues. Many developers will be familiar with the challenges of integrating an application with a relational database management system (RDBMS). Application data, such as XML documents, multimedia data-types or complex object structures must be translated into the tabular format required by this kind of database system. Apart from increasing the complexity of the application, this can significantly add to the length of time needed to package, test and debug the application. For non-trivial applications, connection management and session management must also be considered if the necessary system performance and reliability is going to be achieved. For many developers, these challenges are so familiar that the question of whether or not an RDBMS is the best persistence mechanism to use is never considered. However, given the time and effort costs and the extra complexity that an RDBMS inevitably introduces, it can be profitable to consider other storage mechanisms. Despite the huge variety in data structures and data access patterns that we need to use, it is possible to divide our applications into two broad categories. The first category of application involves dynamic queries on data which is generally static, that is, data which does not change frequently. Consider the customer record system of a large bank, where the customer information and account details do not change particularly often and where a senior manager may wish to perform an ad-hoc query, such as Tell me the names of all the customers in southern Finland who have more than 5,000EUR; in their current account?. The key point here is that, as database designers, we do not know in advance what queries our end users wish to run on the data. By contrast, the other category of application involves mostly static queries over dynamic data. Consider the case of a telephone billing system, where the data associated with each call must be recorded and consolidated so that accurate customer bills may be prepared. Such systems run the same queries repeatedly, while the data in the database is constantly changing as new call records are being created and old records are deleted. There is typically no requirement to run ad-hoc queries over such data until it has been refined into meaningful customer-specific information; in this case, the queries are known at application design time. Another interesting feature of static data/dynamic query applications is that the data often exists for short periods of time - seconds, minutes or hours, rather than the longer lifetimes associated with dynamic data/static query systems, such as our banking customer record system above. Static data/dynamic query applications have specific requirements for data management which do not require the full features of an SQL-based RDBMS or else have performance requirements that cannot easily be satisfied by such database systems. This article focuses on building persistence into these applications using Berkeley DB. But what exactly is Berkeley DB? In a nutshell, it is an open source storage management library that can be linked into an application to provide robust, easy-to-use data management capability. Berkeley DB runs inside the process, which makes it easier to deploy and manage and also means that it typically runs significantly faster than a typical client - server database management system. Building a simple exampleThe Berkeley DB API is very straightforward. We will build a simple example to introduce the concepts of data storage, search and retrieval using this API. Let us start with the following application structure describing an item of merchandising stock we wish to store in the database. Listing 1. The stock_item structure
typedef struct my_stock_item { char *name; /* product name */ char *category; /* product category */ float amount; /* price per item */ } MY_STOCK_ITEM There are several interesting things to note, even at this early stage in the application design. Firstly, Berkeley DB does not impose any schema on the application. It simply stores the data as an array of bytes with one or more associated keys. The advantage of this is that there is no requirement to switch from the native programming language into the table-oriented model associated with SQL. This eliminates a significant amount of coding required to map data to and from the SQL data types. In fact, because Berkeley DB is linked into the application, all commands are made via API calls. There are C, C++ and Java APIs as well as support for .NET, Perl, PHP, Python and other scripting languages. For simplicity, the C API is used in this article but the Berkeley DB API is similar in the other languages.
|
|
Copyright C 2006 by Software Developer's Journal. All rights reserved.





SDJ Users:
Shopping Cart









