I made a change in the blogger configuration to ease the later work when blogging. It is possible that older entries are not correctly formatted.

Tuesday 4 May 2010

UIMA - Unstructured Information Management Architecture

Disclaimer: This entry is not complete and will be finished later, since some information needs to be checked.

The UIMA (Unstructured Information Management Architecture) is a project which was first created by IBM, but which is now one of the top-level project of the Apache Software Foundation. It provides an architecture to annotate a unstructured information with the help of a set of annotators and analysis engines which can be combined and aggregated.

In the following sections, I will introduce the main elements which allow the understanding of the UIMA infrastructure.

CAS - Common Analysis Structure

The main structure in the UIMA architecture is the CAS (aka. Common Analysis Structure). Note that I had some difficulties finding what it means, but I finally found it in the glossary, which should be read at first because I even in the overview there was no explanation as to what a CAS is.

A CAS is the structure manipulated by the annotators and annotation engines.

Analysis Engines

The UIMA architecture provides the idea of analysis engines which take a CAS View (i.e some annotation structure representing a view of the data) and return a .

Annotators

The glossary of the UIMA documentation defines annotators as:

A software component that implements the UIMA annotator interface. Annotators are implemented to produce and record annotations over regions of an artifact (e.g., text document, audio, and video).
They represent the starting point for the analysis engine.

Indexing

One of the main interest of the UIMA architecture is that it provides a standard interface to define the indexing of the CAS and their views. However, I still need to clear things up here.

PEAR

A PEAR is an archive file packaging the code, descriptor files and other resources required to install and run a UIMA component in another environments. The UIMA SDK provides tools to create such PEAR. Note that the PEAR acronym is not defined in the documentation either.