Real Time Aggregation of Keywords from Censored Posts on Sina Weibo using Hadoop and Cron

Eric Gilbert
Anurag Shivaprasad

With the increasing presence of censorship on Chinese social media, it is imperative to provide the users of platforms such as Sina Weibo a way to freely share information without alerting the censors and systems of surveillance on social media. The aim of this project is to implement a Real-Time Keyword Aggregator that collects keywords that have most likely resulted in censorship of posts from various publicly available archives of censored sina weibo posts. In this work, utilize a Distributed Computing based technique to identify additional possible keywords from the posts using a TF-IDF based technique. The result of this project will be a large, continuously populated and curated homophone dictionary for currently censored keywords on Sina Weibo.

Eric Gilbert

The lab focuses on the design and analysis of social media. According to their website they "like puppies, mixed methods and new students (particularly MS)."