Access:

» Apache Cocoon 2: Where Research and Production Software Meet

Related categories: XML | Frameworks

Gregory Weinger
Viewed: 3742 | Article date: 2006-05-13 17:29:07

Chances are you know Cocoon is a server-side XML publication and web application framework. Though it's been overshadowed by more popular projects like Struts, and has never been glamorized like Spring or Ruby on Rails, time and again you hear it compared to in another article or cited by another project as an influence.

Chances are you know Cocoon is a server-side XML publication and web application framework. Though it's been overshadowed by more popular projects like Struts, and has never been glamorized like Spring or Ruby on Rails, time and again you hear it compared to in another article or cited by another project as an influence. In fact, a surprising number of the hottest ideas in server-side programming in the last five years either originated in Cocoon or found an early testbed there, including SAX pipelines, dependency injection component management containers, and server-side continuations.

About the author

Gregory Weinger is the lead software engineer of the UCLA Medical Imaging Informatics Group. A graduate of Stanford University, he has over 10 years of professional software development and consulting experience.

Contact him at: gweinger@gmail.com

Cocoon's track record of realizing some of the latest research ideas in production software can be attributed partly to its robust developer community, which boasts not only a large number of experienced professionals, but a handful of true innovators, the foremost being the project's founder and spiritual leader, Stefano Mazzocchi. A forceful personality, Mazzocchi has bootstrapped numerous software projects and communities at Apache, including JMeter, Avalon, James, FOP, Ant and Batik, and helped on Tomcat and JServ. Discussion on the project's mailing list is lively and sometimes heated, covering technological ideas, industry trends, and even the practice of programming, which makes for an interesting read in and of itself.

But just as responsible for the project's innovations is Cocoon's base architecture of assembled component pipelines, which has remained stable over time, yet proved flexible enough to support these new ideas. After six years, the framework looks as durable and likely to adapt for future needs as another, more famous piece of software with which it shares the pipeline concept: the Unix operating system.

What makes it unique?

Cocoon's home page notes that the framework is "built around the concepts of separation of concerns and component-based web development." Separation of Concerns (SoC) is a computer science concept attributed to Dijkstra used to describe the ways by which we isolate units of functionality in programs. It's a highly general concept, and all programming paradigms-procedural, object-oriented (OOP), aspect-oriented (AOP)-and even design patterns such as model-view-controller (MVC) can be considered methods of achieving SoC. Mazzocchi describes SoC as a meta-pattern, which other patterns implement, rather than a design pattern, so we can consider MVC a subset of SoC.

Cocoon's aim is to realize the benefits of SoC not just in the way its code was organized, but also in the social realm of employees using the software in an enterprise. In fact, the harmony observable by users of the software is a direct reflection of the organization of the components underneath.

Smoothing Employee Interaction

Cocoon was originally designed to allow the many different types of employees in a web organization-programmers, graphic artists, content writers and system architects-to work in parallel without stepping on each other's toes. The way of achieving this was to avoid mixing concerns, or putting any information used in one concern "island" within a file, script, program or stylesheet, used by another. A classic example of mixed concerns was the processing instruction tag in XML, which names the XSLT stylesheet used to process it (see Listing 1). By 1999 it was already apparent that XML (data concern) should be kept distinct from what was done to it (processing concern). While harmless in a simple program, this created a maintenance nightmare in large-scale applications, for example, a publishing application with 10,000 XML documents, each of which would have to be modified if you ever wanted to use a different stylesheet.

Listing 1. Example of an XML document with processing instruction, mixing Data and Processing concerns

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="test.xsl"?> <!-- processing instuction -->
<document>
<test>Hello</test>
</document>

Cocoon's solution was to isolate concerns in components to be assembled in the Sitemap, an XML document that maps URI space of the web site to component pipelines. Figure 2 provides an example of a basic Sitemap pipeline, which consists of a Generator component, which initiates the pipeline by reading the XML document from the filesystem; a Transformer, which in this case uses an XSLT stylesheet, and a Serializer, which emits the result as HTML.

Listing 2. A minimal Sitemap pipeline

<map:pipeline>
<map:generate type="file" src="helloworld.xml"/>
<map:transform type="xslt" src="transformworld.xsl"/>
<map:serialize type="html"/>
</map:pipeline>

With the Sitemap it is no longer necessary to mix processing information in the website content (XML documents) because the order and method of processing is isolated in a separate location. By providing a place to manage and assemble pipelines, the Sitemap captures what emerges as a new application concern, referred to as the Management concern. The other major concerns in a web site are Logic, Content, and Style. Cocoon is designed to make it easy to keep these concerns completely distinct. Note that while these concerns may correspond to the Model (Content), View (Style) and Controller (Logic) of the popular MVC pattern, and Cocoon handles the separation of MVC admirably, its SoC can handle any number of other application concerns. See my example from the medical domain below, or visit for more information on SoC in Cocoon.

By addressing this problem in terms of SoC, Cocoon provides clear benefits to a web organization, where writers, editors, graphic artists, system architects and programmers all must coordinate on a project. No content writer should ever have to see a processing instruction, or, worse, be given the opportunity to accidentally mangle it; no graphic artist should have to see any programming code or potentially modify content, etc. By keeping the concerns of different employees separate, and reducing the amount of error and communication necessary between people addressing different concerns, Cocoon's design allows numerous employees to work together more efficiently and thus scales up for larger organizations.

Component Organization

While Cocoon excels in the web publishing realm, and it's no accident that a number of commercial and free Content Management Systems (CMS) are built on top of it (Apache Lenya, Daisy, DBPrism, etc.), it turns out that Cocoon's core engine and its way of achieving SoC provides a very stable and flexible platform for building dynamic web applications as well.

I'll draw an example from my work the medical domain, where it is often desirable to collect and integrate medical record data from disparate data sources (XML, SOAP, HTML, relational and hierarchical databases, etc.) in a uniform XML representation. The DataServer Framework () uses Cocoon to perform numerous operations in a set of nested SAX pipelines, which allows us to keep the code responsible for a number of concerns in separate Sitemap components: authentication, query audit, query execution, results caching, and dynamically-specified results formatting. When an XML query is sent to the server as in Listing 3, DataServer sends the query down a pipeline as shown in Listing 4, performing each operation in succession. The contract between each pipeline segment is the XML itself, which may be modified by any component in the chain. It becomes easy to modify or dynamically add new processing stages to the pipeline (though the Sitemap constructs for this are not shown here). Listing 4 should, however, give a good feel for how the different concerns are handled.

Listing 3. A DataServer XML query, source for a Sitemap pipeline

<ds:queryRequest>
<ds:login>user</ds:login>
<ds:pass>pass</ds:pass>
<ds:format>HL7</ds:format>
<ds:query>
<doc:documents patientID="12345" beginDate="8/1/05" endDate="1/1/06"/>
</ds:query>
</ds:queryRequest>

A d v e r t i s e m e n t
Linux BSD Unix ranking vote

Page: 1 2
Buy article Buy subscription
Buy now add to cart
add to cart
Standard price: 2€/$3 Standard price: 25€/$30
Buy article for as little as (2€/$3) each allow access to individual articles. Buy a full access to our Software Developers's Journal archive portal. You will be able to read the articles from all archive issues from year 2005 and 2006. For just 25€/$30 you get unrestricted access to the entire website for the whole year.
SDJhakin9

.SDJ Users:


.:Login
.:Password

[Register]
[Forgotten your password?]

...Shopping Cart

sum: 0 €
Choose currency:

...Topics

...Advertisement

www.acunetix.com www.verifysoft.com

...Conferences




...Print Edition Archive

...Affiliate Program



 

 

Subscribe | Contact Us | Newsletter | Privacy policy | Regulations | See all issues | About SDJ
Copyright C 2006 by Software Developer's Journal. All rights reserved.