With the horrible attack in Paris in the
Charlie Hebdo office, we are experiencing once more a new way to be informed about last news, this time powered by Twitter. It is amazing how fast people are sharing thoughts, photos, links, and absolutely everything. It thus becomes the data set of the world population's mind in real time.
In this post I am going to show how to query tweets and do some simple analysis using IBM SPSS Modeler and the new
SPSS Predictive Extensions based on R. All this analysis...
without any coding at all! We are going to do 3 things:
- Create a Word Cloud with a new WordCloud node based on the R
wordcloud package.
- Integration of RCharts with IBM SPSS Modeler.
RCharts (developed by
Ramnath Vaidyanathan) was born as the initiative to bring powerful JavaScript visualization for R users. So they can now create these interactive charts without having JavaScript skills, only with R. With this integration within SPSS workbench, you don't even need to know R in order to use them. Simply drag and drop the node and start getting powerful results that are easy to share. These are the libraries available now in IBM SPSS Modeler:
-Integration with the new R package
HTMLWidgets. This package enable you to add new types of HTML output to R Markdown documents. There are different types of widgets like maps, charts, 3D scatterplots and more.
This is the stream I built with these new IBM SPSS Predictive Extensions. I selected 10,000 tweets with the hashtag #CharlieHebdo. Then I am doing some text analysis of sentiment of the tweets and generating 4 different outputs. I want to emphasize that this analysis is very simple and I want only to show how fast and easy is to get quite interesting results. Another important point is that the outputs generated with IBM SPSS Modeler and these extensions are easily embeddable into blogs (like this one), social media or any other place you want to share them and...they are interactive! Try to mouse over the charts and you will see nice animations.
- Output 1: The word cloud. This node is useful to highlight the most commonly cited words in a text using a quick visualization. The node is cleaning up the data, removing words without meaning and keeping the important ones and counting them.
-Output 2: Bar Chart of Emotion. After using the new 'Sentiment Analysis' node, the tweets are classified into anger, fear, joy, surprise, disgust and sadness. I am aggregating the sentiment and creating a Bar Chart using the RCharts node. The sentiment analysis node is based in the
sentiment package that you can find in the CRAN network.
-Output 3: Bar Chart of Polarity. Another output of the new 'Sentiment Analysis' node is the polarity. This is classifing the tweets into positive, neutral or negative. Again we are creating a bar chart using the new RChart node.
As you can see, the sentiment about the attack is negative and people are feeling anger and fear. I am not focusing the post in explaining the algorithms behind, if you want more information about it you can go to the R packages documentation.
Output 4: HTML datatable. Here there is an interactive data frame of the first tweets and the classification after analyzing the sentiment of each of the tweets.
To finish the post, here you have a map created with
CartoDB of the geotagged tweets mentioning #JeSuisCharlie using the Paris time zone in January 7, 2014. It is not generated with IBM SPSS Modeler but we are working in the integration and we have already some experimental CartoDB nodes.
#bluemix#predictiveextensions#Programmability#rstats#SPSS#SPSSModeler#SPSSStatistics#Usecases