Wednesday, May 18, 2011

xmlgen: a feature-rich and high-performance XML generation library

I’ve released xmlgen to hackage just a few days ago. Xmlgen is a pure Haskell library with a convenient API for generating XML documents. It provides support for all functionality defined by the XML information set and offers good performance and low memory usage. In our company, we developed xmlgen because we wanted the readability of XML literals (as for example provided by the hsp library) without the drawbacks of a custom preprocessor (wrong line numbers in error messages, non-compositionality).

In this blog article, I’ll show you how to use the combinators provided by xmlgen to generate the following XML document:

<?xml version="1.0"?>
<people>
  <person age="32">Stefan</person>
  <person age="4">Judith</person>
</people>

First, we import some modules:

> import Data.Monoid
> import qualified Data.ByteString.Lazy as BSL
> import Text.XML.Generator -- the main xmlgen module

Then we continue by generating the person element.

> genPersonElem :: (String, String) -> Xml Elem
> genPersonElem (name, age) =
>     xelem "person" $ xattr "age" age <#> xtext name

The xelem combinator constructs an XML element from an element name and from the children of the element. Xmlgen provides overloaded variants of xelem to support a uniform syntax for the construction of elements with qualified and unqualified names and with different kinds of children. The <#> combinator separates the element’s attributes from the other children (sub-elements and text nodes). The combinators xattr and xtext construct XML attributes and text nodes, respectively.

The result of an application of xelem has type Xml Elem, whereas xattr has result type Xml Attr. This distinction is important so that attributes and elements can not be confused. The result type of the xtext combinator is Xml Elem; we decided against an extra type for text nodes because for xmlgen’s purpose text nodes and elements are almost interchangeble.

The types Xml Elem and Xml Attr are both instances of the Monoid type class. Constructing a list of elements from a list of persons and their ages is thus quite easy:

> genPersonElems :: [(String, String)] -> Xml Elem
> genPersonElems = foldr mappend mempty . map genPersonElem

The pattern shown above is quite common, so xmlgen allows the following shorthand notation using the xelems combinator.

> genPersonElems' :: [(String, String)] -> Xml Elem
> genPersonElems' = xelems . map genPersonElem

We are now ready to construct the final XML document:

> genXml :: Xml Doc
> genXml = let people = [("Stefan", "32"), ("Judith", "4")]
>          in doc defaultDocInfo $ xelem "people" (genPersonElems people)

For convenience, here is a standalone variant of the genXml function:

> genXml' :: Xml Doc
> genXml' =
>   let people = [("Stefan", "32"), ("Judith", "4")]
>   in doc defaultDocInfo $
>        xelem "people" $
>          xelems $ map (\(name, age) -> xelem "person" (xattr "age" age <#> xtext name)) people

Xmlgen supports various output formats through the overloaded xrender function. Here we render to resulting XML document as a lazy byte string:

> outputXml :: IO ()
> outputXml = BSL.putStrLn (xrender genXml')

Loading the current file into ghci and evaluating outputXml produces the following result:

*Main> outputXml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<people
><person age="32"
>Stefan</person
><person age="4"
>Judith</person
></people
>

This blog post introduced most but not all features of the xmlgen API. Check out the documentation.

Happy hacking and have fun!

Author: Stefan Wehr

3 comments:

Patai Gergely said...

The foldr mappend mempty pattern is so common that it is even included in the Monoid type class under the name mconcat to allow for an efficient implementation if one exists.

Felipe Lessa said...

What about the high performance bit? We want numbers! =D

Stefan Wehr said...

@Patai: thanks for mconcat hint, I’ll use this function for the next release.

@Felipe: Here are some performance numbers:

Benchmark Count Time ms (mean) Mem MB
elems 1000 2.00 1
10000 21.59 1
100000 245.16 2
1000000 2752.21 2
attrs 1000 1.37 1
10000 15.60 1
100000 193.70 2
1000000 2231.68 2

The elems benchmarks generates an xml file with N elements with an attribute and a text node, where N is the number from the second column. The attrs benchmarks generates an xml file with a single root element with N attributes. The benchmarks were executed on a AMD Phenom(tm) II X6 1055T Processor with 3.2 GHz and 512KB cache.

You see that the run time is linear in N and that the memory usage is constant.

Post a Comment