Monday, January 24, 2011

Groovy's XmlSlurper :: Superior XML Processing For Java

I am a huge fan of Groovy.  Groovy is a dynamic language for the Java Virtual Machine that compiles to Java byte-code.  One of the great features of Groovy is that fact that any library available to Java is also available to Groovy so you can leverage all of your existing knowledge of the Java stack in Groovy.  Groovy also has a nice, dynamic syntax that is fun to use.  It supports many unique language features, most importantly closures!

As we will see in this post one of the great things that Groovy offers is a class called XmlSlurper.  Since Groovy is a dynamic language it offers a unique way of processing XML using a dot notation syntax.  Let's take a look shall we?

First, here is the XML we are going to parse:



It's a pretty simple XML document with one complex type Person with a two nested complex types called Address. Let's see how Groovy's XmlSlurper makes quick work of it!



The beauty of XmlSlurper is in the fact that it takes only one line of code to parse the XML document and return an object representation of it. Using dot notation syntax you can easily get value out of the document. Notice how we can reference the address using either the array syntax or using the find() method with a closure. The closure method is a little more "groovy."

To high-light the elegance of this solution let's look at the equivalent in Java using the standard JAXB library. The first thing we have to do is define an XML Schema for the document since JAXB works on Schema. That, in and of itself, is lame. What if we just need a quick XML document format to throw around for a few weeks then throw away? You still need a schema. Once you have a schema you have to run the xjc compiler on it to generate the stub code. Finally, after you have defined your Schema *and* have generated the JAXB code, only then can you write some Java code. Here is the Java equivalent.:



First note the added ceremony initializing the JAXBConext and Unmarshaller objects. After which you still then have to call the unmarshall() method passing in a java.io.File object. Finally you get a reference to Person where you can then use similar Java-style getters to get values out of the document.

I don't care for the JAXB solution simply because I have to first run code generators if I want to use a clean dot notation syntax to get values out of the document. It only adds another step in the build process but invariably, over time, it is forgotten about and is prone to breakage. I also think the JAXBContext and Unmarshaller objects don't make sense. I still have to reference the documentation when I use JAXB because I never remember the syntax the parse the document.

The great thing about XmlSlurper, and Groovy in general, is you don't have to replace all of you code just to get the benefits of it. If you are using Spring it is dead-simple to call Groovy classes from Java code.  You can write your XML parsing code quickly in Groovy and then call the code from your existing Java application.

Groovy also has superior file-handling capabilites by way of its File object.  If you mix XmlSlurper with Groovy's File object you have the ability to write extremely clean code for processing hugh amounts of XML.  We'll talk about that some other time!

Thanks for reading!

4 comments:

  1. I really like how easy it is to use XmlSlurper for parsing xml. (I came across this post while searching for XmlSlurper equivalents in Python and C#.)

    Groovy - where have you been all my life!

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. If there was a way to bind data types to specific nodes, it would be perfect. Case in point: I am passing a parsed object to a GSP page that formats dates using the g:format tag.

    ReplyDelete
  4. Tony, are you passing the object XMLSlurper returns directly to the GSP? The type of this object is GPathResult and probably not ideally suited to hand off directly to a GSP. I suggest you map your GPathResult object to your own domain object class. Do any conversions needed in the method to map between objects.

    ReplyDelete