Annotating text in LTP2006

This is an introduction to the text annotation tool used in the course Language Technology Project 2006 at the University of Amsterdam.

  1. We use MITRE's Callisto annotation tool which you can download from http://callisto.mitre.org/
  2. Within Callisto you can use different annotation tasks. For this course an example task has been created according to the CoNLL named entity tagging format. The task is defined in a DTD. If you want to use it then copy the associated file org.conll.ner.jar to Callisto's tasks directory. You can also create your own task.
  3. Start Callisto (on Linux: java -jar Callisto.jar in the Callisto directory).
  4. Select a text file with File and New. Choose file encoding ISO-8859-1 and task CoNLL NER Task and do not parse as SGML/XML. Press Annotate.
  5. Click on words with the left mouse button. Drag the mouse with the left button pressed down to select groups of words. Afterwards press the right button to select the class of the string.
  6. In order to remove a tag: left click on the tagged string and then click right and select Delete Annotation
  7. Save file with File and Save (usual extension: .aif.xml).

A basic named entity tagger for Dutch (Linux) is available in the file ner.tgz (version: 20060213 17:00). Instructions:

  1. Download the file and unpack it with tar zxf ner.tgz
  2. Change directory to ner and install the software with the command make
  3. You can run the tagger with the command: bin/ner < file where file contains tokenized Dutch sentences.
    Example output: George/PER bezoekt/O de/O VN/ORG in/O New/LOC York/LOC
  4. Output from the tagger can be converted to Callisto output like this: bin/ner < file > file.anno; bin/sent2aif file.anno > file.aif.xml
  5. Callisto files can be converted to tagger files like this: bin/aif2sent < file.aif.xml > file.txt
  6. Instructions on how to improve the tagger can be found in the file ner/tnt/000README

Send questions and bugs to the e-mail address mentioned below.

In order to obtain broad-class named entity tags for English, download the LingPipe named entity tagger. Process the demo files from the command line by executing the following commands:

LingPipe also has API's available.


Last update: February 13, 2006, erikt@science.uva.nl