Tuesday, 26 August 2014

Flipkart VS. Amazon.in: Word cloud using twitter posts

As promised in last post, here I present word clouds generated from Flipkart and Amazon India tweets. A word cloud (http://en.wikipedia.org/wiki/Tag_cloud) is a graphical representation of word frequency with greater prominence to words which are more frequent in source text. I have generated two sets of word clouds where each set contains two clouds, one for Filpkart and another for Amazon. First set of clouds is based upon all the words present in extracted tweets however second set of clouds is based upon the sentiment words only which are present in extracted tweets for analysis. Corpus of tweets and sentiment words used in this analysis are same as that of used in earlier analysis which I shared in last post.

Before showing word clouds, here I summarize the key steps involved in analysis-

  1. Read the tweets in R as described in last post
  2. Perform some text processing. For this, I used text mining package in R (http://cran.r-project.org/web/packages/tm/index.html
    1. Generate a two-column format structure from tweets where first column contains words and second column contains frequency of the word in all tweets
    2. Convert all words in lower case
    3. Remove stop words
    4. Remove punctuation marks
    5. Get top 100 most occurring words for first set of clouds and top 100 sentiment words for second set of clouds
  3. Generate word cloud. I used word cloud package in R (http://cran.rproject.org/web/packages/wordcloud/index.html) for this
Coming on the results- first we will see word clouds generated from all the words, which we referred as first set of clouds. Below is that for Flipkart-



Now the same for Amazon India-






















As you see these word clouds, you can get the sense of what people are  talking about e.g. in case Flipkart people are mostly talking about Xiaomi mi3 mobile phone which has been a recent hit in Flipkart.

Now coming on the second set of word clouds where we focus only upon the words carrying some sort of sentiment, here is that for Flipkart-























And here is same for Amazon India-






















As one can see in both of the above word clouds that most of the prominent words carry positive sentiments. However if you focus upon words carrying negative sentiments you would see that degree of prominence in case of Flipkart is more than that of Amazon India and this supports the outcome of analysis we saw in last post.

Thanks for now!

Wednesday, 20 August 2014

Flipkart VS. Amazon.in: Sentiment analysis using twitter posts


Flipkart and Amazon India are emerging as two biggest players in rapidly growing online retail industry in India. Although Amazon started its operations in India much later than Flipcart, it is giving tough competition to Flipkart. Only future would tell who will surpass another in long run but it is evident that effectiveness to capture customers' needs and quickness to respond accordingly are going to play a major role.

In this exercise i tried to capture customers' sentiments using customers' twitter postings. I used R (http://www.r-project.org/) for this exercise. Below are the key steps describing analysis process.

1.     Search for presence of twitter handles (@Flipkart for Flipkart tweets and @amazonIN for Amazon India tweets) and scrape the tweets accordingly- I used twitteR package in R (http://cran.r-project.org/web/packages/twitteR/index.html) to fetch tweets

2.     Perform pre-processing like remove duplicate tweets etc.

3.     Apply sentiment analysis algorithm to group tweets in one of the two groups i.e. either positive sentiment or negative sentiment- I used a pretty simple algorithm for this which takes into account occurrence of positive and negative sentiment words in each tweet. For sentiment words I used publicly available dictionaries containing sentiment words

So, as you can see it’s quite simple and fast. The only caveat is that Twitter web API imposes restriction on the number of tweets one can access. Nevertheless, one can access thousands of tweets which are good enough to perform not so exhaustive analysis.

OK, now here is the stuff for which we did all this i.e. results. It comes as conclusion that both Flipkart and Amazon score impressively on customer sentiments however Amazon performs slightly better. For Flipcart, around 64% of tweets under analysis carry positive sentiments and 36% carry negative sentiments. However in case of Amazon these figures are 73% and 27% respectively. Below is the graph depicting these numbers.



In next post, i would try to show word cloud supporting above trends. 

Thanks for now!