20120313

This directory contains the data concerning the manual sentiment analysis
associated with the paper

"Predicting the 2011 Dutch Senate Election Results with Twitter" by
Erik Tjong Kim Sang and Johan Bos. In: Proceedings of SASN 2012, the
EACL 2012 Workshop on Semantic Analysis in Social Networks, Avignon,
France, 2012. http://ifarm.nl/erikt/papers/sasn2012.pdf

The pdf of this paper can be found in the directory ../pdf


DATA

The main data file "logfile" contains the annotations by two
annotators (erikt and Johan). The file contains one line per tweet
in the format annotator SPACE class SPACE Twitter id SPACE tweet

We used two classes: negative (-): expressing a negative sentiment
towards the party mentioned in the tweet, and nonnegative (.) for
the other tweets. 

All tweets have been anonymized: user names have been replaced by 
user ids like USER1234567 (except for user names that are also 
party names)

List of files:

logfile:          all tweets
logfile.agreed:   all tweets annotated by both annotators, with the same class
logfile.selected: subset of logfile.agreed, 1 party per tweet, 1 tweet per user
...nonnegative:   nonnegative subset of logfile.selected
...negative:      negative subset of logfile.selected

The line format of the logfile.* files is: Twitter id SPACE tweet SPACE class

Commands for generating the files (see bin/makeData):

bin/select < logfile > logfile.agreed
../bin/onePartyPerTweet < logfile.agreed |\
    ../bin/oneTweetPerUser > logfile.selected
grep '\.$' logfile.selected > logfile.selected.nonnegative
grep '\-$' logfile.selected > logfile.selected.negative


CONTACT

This data set can be retrieved from http://ifarm.nl/ps2011/ps2011.zip

The contact person for this data set is Erik Tjong Kim Sang 
erikt(at)xs4all.nl
