This page contains instructions on how to use a crisis lexicon to download tweets.
This tweet collector reads a file containing the crisis lexicon, and uses the keywords included there to download tweets via Twitter's Streaming API:
$ python collect.py \ --lexicon crisis_lexicon
If you want to include your own additional keywords:
$ python collect.py \ --lexicon crisis_lexicon \ --optional_terms my_terms
The adaptive collector downloads tweets containing a lexicon terms for a while, automatically detects new terms that are highly frequent during a crisis, and updates the list of keywords dynamically to also capture tweets containing the new terms:
$ python adaptive_collect.py \ --lexicon crisis_lexicon \ --optional_terms my_terms \ --output my_collection.json \ --set_adaptive \ --new_terms_no 10 \ --hashtags 1\ --time 3
Browse on GitHub CrisisLexCollect-v1.0.zip (33 KB)
A. Olteanu, C. Castillo, F. Diaz, S. Vieweg. 2014. CrisisLex: A Lexicon for Collecting and Filtering Microblogged Communications in Crises. In Proceedings of the AAAI Conference on Weblogs and Social Media (ICWSM'14). AAAI Press, Ann Arbor, MI, USA.
We would like to host and/or provide links to other tools for Twitter data collection. Please contact us to include other tools in this list:
AIDR (Artificial Intelligence for Disaster Response) is a free and open source to collect and classify crisis-related tweets. It is completely web-based so no installation is required.
Tweepy: Twitter for Python! An easy-to-use open source Python library for accessing the Twitter API. Our collection scripts are built on top of Tweepy.