20120313

This directory contains the data and software associated with the paper

"Predicting the 2011 Dutch Senate Election Results with Twitter" by
Erik Tjong Kim Sang and Johan Bos. In: Proceedings of SASN 2012, the
EACL 2012 Workshop on Semantic Analysis in Social Networks, Avignon,
France, 2012. http://ifarm.nl/erikt/papers/sasn2012.pdf

The pdf of this paper can be found in the directory pdf


SOFTWARE

Use the bin/tableN (N=1-7) commands to generate the information of the
tables of the paper. For example, bin/table1 generates the numbers 
mentioned in Table 1 of the paper.

The software does not generate any results for the OSF party because we 
used a baseline prediction for this party: 1 seat.

The sentiment weights (Table 3) can be generated with the command:
bin/sentimentweights 

The population weights (Table 4) can be generated with the command:
bin/populationweights.sh.test

The population weights for the uniformly distributed tweets (Table 7)
can be generated with the command:
bin/populationweights.sh.uniform


DATA

The tweets used in this experiments can be found in the directory data.
The data files contain one tweet per line preceeded by the user id. All
tweets have been anonymized: user names have been replaced by user ids
like USER1234567 (except for user names that are also party names)

The data directory contains eight day files. The file 20110216 was used 
for development and testing. The files 20110223, 20110224, 20110225,
20110226, 20110227, 20110228 and 20110301 were used for predicting the
results of the elections of 2 March 2011.


CONTACT

This data set can be retrieved from http://ifarm.nl/ps2011/ps2011.zip

The contact person for this data set is Erik Tjong Kim Sang 
erikt(at)xs4all.nl
