Markup -> OpenOffice

Last modified on April 30, 2013

In my professional work, I frequently have to write documents. That itself is not a problem. Usually, I would just fire up my editor, write some LaTeX, compile it to PDF and that’s it. Well, life is not always that easy. Customers want documents they can edit and not everybody wants to learn LaTeX. So I gave OpenOffice a try. Unfortunately though, it feels extremely uncomfortable to use (at least for me). For some time I tried to get used to it, but it didn’t really work out. So I decided to take another approach: what if I had a markup language that I could compile to OpenOffice documents?

There is already quite an amount of markup languages that one could use and there are excellent tools for them. You should for sure have a look at Pandoc. However, I wanted something more custom, more special-purpose. So I decided to build some tiny markup language. Stealing^WBorrowing some ideas from other languages like markdown, the language has a pretty easy syntax:

This language is first parsed by small parser built with Parsec. The parser generates a representation of the parsed document and outputs this representation in a JSON-encoded form. This representation is then processed by the rendering backend, written in Python. And here things start to get ugly. The rendering code makes use of the UNO bridge in order to interface with OpenOffice. That is, the rendering code starts an OpenOffice instance and makes use of its scripting features in order to insert the text into a document. As ugly as this might sound, this enables you to use existing OpenOffice templates as they are. By changing the renderer code you can easily adopt the whole tool to your needs.

Get the Code

The code is hosted on GitHub.

git clone


The tool consists of two parts: the parser and the OpenOffice interface. As you might already have figured, the parser is written in Haskell. If you haven’t done so already, you should install the Haskell platform on your system.

The parser is shipped as a cabal package, therefore building should be as easy as:
cd markupParser
cabal build
cabal install
The OpenOffice part is written in Python. There’s no need to “build” it. However, in order to use it, you should make sure that some requirements are met. You can give the whole thing a try:
cat markupParser/test.udoc | parseUdoc > /tmp/test.json
python OpenOfficeIntegration/ /tmp/test.json\
       OpenOfficeIntegration/ /tmp/test.pdf
xpdf /tmp/test.pdf