Real Time Aggregation of Keywords from Censored Posts on Sina Weibo using Hadoop and Cron

Faculty: 
Eric Gilbert
Students: 
Anurag Shivaprasad

With the increasing presence of censorship on Chinese social media, it is imperative to provide the users of platforms such as Sina Weibo a way to freely share information without alerting the censors and systems of surveillance on social media. The aim of this project is to implement a Real-Time Keyword Aggregator that collects keywords that have most likely resulted in censorship of posts from various publicly available archives of censored sina weibo posts. In this work, utilize a Distributed Computing based technique to identify additional possible keywords from the posts using a TF-IDF based technique. The result of this project will be a large, continuously populated and curated homophone dictionary for currently censored keywords on Sina Weibo.

Lab: 
Director: 
Eric Gilbert

The comp.social lab focuses on the design and analysis of social media. According to their website they "like puppies, mixed methods and new students (particularly MS)."