Annotating text in LTP2006
This is an introduction to the text annotation tool used in the course
Language Technology Project 2006
at the University of Amsterdam.
- We use MITRE's Callisto annotation tool which you can download from
http://callisto.mitre.org/
- Within Callisto you can use different annotation tasks.
For this course an example task has been created according to the
CoNLL named entity tagging format. The task is defined in a
DTD.
If you want to use it then copy the associated file
org.conll.ner.jar
to Callisto's tasks directory. You can also
create your
own task.
- Start Callisto (on Linux: java -jar Callisto.jar in the
Callisto directory).
- Select a text file with File and New.
Choose file encoding ISO-8859-1 and task CoNLL NER
Task and do not parse as SGML/XML. Press Annotate.
- Click on words with the left mouse button. Drag the mouse with
the left button pressed down to select groups of words.
Afterwards press the right button to select the class of the string.
- In order to remove a tag: left click on the tagged string and
then click right and select Delete Annotation
- Save file with File and Save (usual extension:
.aif.xml).
A basic named entity tagger for Dutch (Linux) is available in the file
ner.tgz (version: 20060213 17:00).
Instructions:
- Download the file and unpack it with tar zxf ner.tgz
- Change directory to ner and install the software with
the command make
- You can run the tagger with the command: bin/ner < file
where file contains tokenized Dutch sentences.
Example output: George/PER bezoekt/O de/O VN/ORG in/O New/LOC York/LOC
- Output from the tagger can be converted to Callisto output like this:
bin/ner < file > file.anno; bin/sent2aif file.anno > file.aif.xml
- Callisto files can be converted to tagger files like this:
bin/aif2sent < file.aif.xml > file.txt
- Instructions on how to improve the tagger can be found in the
file ner/tnt/000README
Send questions and bugs to the e-mail address mentioned below.
In order to obtain broad-class named entity tags for English, download the
LingPipe named entity tagger.
Process the demo files from the command line by executing the
following commands:
- tar zxf lingpipe-2.1.1.tar.gz
- cd lingpipe-2.1.1
- ant compile
- cd demos/command
- mkdir bin
- sed 's/lib/../' < demo.sh > bin/demo.sh
- cd bin
- sh demo.sh
LingPipe also has API's available.
Last update: February 13, 2006,
erikt@science.uva.nl