How do we migrate?

Recently I came across world migration data publicly available on United Nations site. Here you can find this data-  It shows the net number of migrants, that is, the number of immigrants minus the number of emigrants. It is expressed as thousands and the numbers are available across different countries. Looking into the data, the first thought came to my mind is to draw this on world map. And it can be done very easily in R. Later in this post I describe the r code. But before that, let us see the world heat-map.

As expected population drift is from less developed to more developed regions.

Here is the R code for this-

#Required Library- fastest way to draw this chart is through ‘rworldmap’ package


#Load Data- it loads data in R environment

migrantsData <- read.csv("Migrants_data.csv")

#Join data to map- this step essentially attaches location (Long\ Lat) for each country

spdf <- joinCountryData2Map(migrantsData,joinCode="UN", nameJoinColumn="Country.code",nameCountryColumn= "Country",verbose = TRUE)

#Draw world map- required function to draw this map is ‘mapCountryData’. For more details on this, #please refer - ‘’


mapParams <- mapCountryData( spdf, nameColumnToPlot="Migrants", addLegend=FALSE,numCats=7,colourPalette="terrain", mapTitle="Net Number of Migrants (Immigrants - Emigrants), 2005-2010 (Thousands)", oceanCol="lightblue") addMapLegend, c(mapParams, legendWidth=0.5, legendMar = 4,legendLabels="all"))  

And that’s it! Similar code can be used for any other metric.

Dear Flipkart: You got it completely wrong!

At first place #BigBillionDay ( seemed a good strategic move for Flipkart but sadly it ended with lots of unhappy customers. Flipkart planned this as a move to attract vast traffic on its site but it failed where it matters most i.e. addressing customers' expectation. It left no stone unturned to set-up high expectations but it seems nothing was done to address those.  Final result- Unhappy customers. Let me support this claim with numbers. I compared customers’ sentiment (derived using customers’ tweets as described in first post) on the previous day of #BigBillionDay which was 5th Oct’14 with customers’ sentiment on #BigBillionDay which was 6th Oct’14. It turns out that proportion of negative sentiment experienced a sharp increase from 20% to 55%- resulting into an overall increase of 175%. It’s a big drift considering the change happened in just one day.

Hope Flipkart would overcome this and do wonders in future as it’s just the starting of the game.

Sentiment Analysis: Distribution of tweets length based upon the sentiment

In last two posts, we worked upon twitter data for Flipkart and Amazon India and showed how to leverage that to capture customers' sentiments and show word clouds.

In this post we would try to explore an interesting pattern which is quite intuitive but supporting that with the data is the name of the game.

Did you ever observe that we try to convey our thoughts more eloquently under the effect of some sentiments e.g. in case of rage? I am sure you did. And that in turn lead us to put forward our thoughts in more detailed manner rather than we would otherwise. And that's the hypothesis we want to prove in this study.

All the data used in this analysis is same as that used in earlier posts i.e. we use tweets related to Flipkart and Amazon India. If you are interested how to fetch these tweets in R, please refer earlier posts.

OK, so that's the enough background I guess, let us see results-

Let us see what we did here. First we classified each tweet in one of the 7 buckets based upon the sentiment it carries (which we derived in first post). So if a tweet carries a strong positive sentiment it falls in last bucket (as in graph above) which is 'Excellent'. On the other hand if a tweet carries a strong negative sentiment, we put this in first bucket which is defined as 'Worst' in above graph. Any tweet which does not carry any sentiment falls under 'Neutral'. We have defined two more levels of sentiments each between 'Worst' and 'Neutral', and between 'Neutral' and 'Excellent'. Further we define length of a tweet as number of words present in that tweet. So in the above plot, we see average length of the tweets by sentiment categories.

Here are some interesting findings (findings? we had this as intuition :)) -
  1. Higher the degree of sentiment, lengthier is the tweet
  2. Tweets carrying negative sentiments are lengthier than tweets carrying positive sentiments, if we keep the degree of sentiment in same range. E.g. if we compare 'Worst' with 'Excellent' and so on
I would try to see if I can find more hypotheses to verify. Please suggest if you have any.

Flipkart VS. Word cloud using twitter posts

As promised in last post, here I present word clouds generated from Flipkart and Amazon India tweets. A word cloud ( is a graphical representation of word frequency with greater prominence to words which are more frequent in source text. I have generated two sets of word clouds where each set contains two clouds, one for Filpkart and another for Amazon. First set of clouds is based upon all the words present in extracted tweets however second set of clouds is based upon the sentiment words only which are present in extracted tweets for analysis. Corpus of tweets and sentiment words used in this analysis are same as that of used in earlier analysis which I shared in last post.

Before showing word clouds, here I summarize the key steps involved in analysis-

  1. Read the tweets in R as described in last post
  2. Perform some text processing. For this, I used text mining package in R (
    1. Generate a two-column format structure from tweets where first column contains words and second column contains frequency of the word in all tweets
    2. Convert all words in lower case
    3. Remove stop words
    4. Remove punctuation marks
    5. Get top 100 most occurring words for first set of clouds and top 100 sentiment words for second set of clouds
  3. Generate word cloud. I used word cloud package in R ( for this
Coming on the results- first we will see word clouds generated from all the words, which we referred as first set of clouds. Below is that for Flipkart-

Now the same for Amazon India-

As you see these word clouds, you can get the sense of what people are  talking about e.g. in case Flipkart people are mostly talking about Xiaomi mi3 mobile phone which has been a recent hit in Flipkart.

Now coming on the second set of word clouds where we focus only upon the words carrying some sort of sentiment, here is that for Flipkart-

And here is same for Amazon India-

As one can see in both of the above word clouds that most of the prominent words carry positive sentiments. However if you focus upon words carrying negative sentiments you would see that degree of prominence in case of Flipkart is more than that of Amazon India and this supports the outcome of analysis we saw in last post.

Flipkart VS. Sentiment analysis using twitter posts

Flipkart and Amazon India are emerging as two biggest players in rapidly growing online retail industry in India. Although Amazon started its operations in India much later than Flipcart, it is giving tough competition to Flipkart. Only future would tell who will surpass another in long run but it is evident that effectiveness to capture customers' needs and quickness to respond accordingly are going to play a major role.

In this exercise i tried to capture customers' sentiments using customers' twitter postings. I used R ( for this exercise. Below are the key steps describing analysis process.

1.     Search for presence of twitter handles (@Flipkart for Flipkart tweets and @amazonIN for Amazon India tweets) and scrape the tweets accordingly- I used twitteR package in R ( to fetch tweets

2.     Perform pre-processing like remove duplicate tweets etc.

3.     Apply sentiment analysis algorithm to group tweets in one of the two groups i.e. either positive sentiment or negative sentiment- I used a pretty simple algorithm for this which takes into account occurrence of positive and negative sentiment words in each tweet. For sentiment words I used publicly available dictionaries containing sentiment words

So, as you can see it’s quite simple and fast. The only caveat is that Twitter web API imposes restriction on the number of tweets one can access. Nevertheless, one can access thousands of tweets which are good enough to perform not so exhaustive analysis.

OK, now here is the stuff for which we did all this i.e. results. It comes as conclusion that both Flipkart and Amazon score impressively on customer sentiments however Amazon performs slightly better. For Flipcart, around 64% of tweets under analysis carry positive sentiments and 36% carry negative sentiments. However in case of Amazon these figures are 73% and 27% respectively. Below is the graph depicting these numbers.

In next post, i would try to show word cloud supporting above trends. 

