Friday, 24 April 2015

Indian Railways Network using ‘R’


Indian railways is lifeline for Indian people. Millions of people travel daily by railway. Indian rail network comprises 71 thousand miles of track over a route of 40 thousand miles and more than 7 thousand stations.

Here I try to show the rail network using ‘R’ software. Here is the list of all the mail, express, and super-fast trains provided in Indian railway site-
 http://www.indianrail.gov.in/mail_express_trn_list.html.
It contains names of source and destination stations for each train. I will leverage this data to draw the network. Before looking into R code details, let us see the final output-



There is a link connecting two stations on map if there are one or more than one direct trains running between these stations. More the number of trains, more prominent the link is. As I have drawn a straight line between stations, it does not capture exact route of the train. Also the color of the link is same as that of map if there in only 1 train between two stations. This I have done to avoid the messy appearance of the map. The purpose of the map is to get general sense of Indian rail network. It can be seen that 4 major stations are Delhi, Mumbai, Kolkata, and Chennai.

Here are details of ‘R’ Code-
First load the required ‘R’ packages. I use ‘maps’ to draw India map and ‘ggmap’ to get longitude and latitude information for all railway stations. It uses Google maps API to get required information.
library(maps) 
library(ggmap) 
library(dplyr)

Next read the data. I have saved data from above mentioned link in csv format.
rail.data <- read.csv("rail_data_csv.csv")

Now I do some pre-processing on station names so that these can be passed to Google API in order to get location coordinates-
# Convert Station names into character string                                                                         rail.data$Train.Source.Stn <- as.character(rail.data$Train.Source.Stn)                  rail.data$Train.Destination.Stn <- as.character(rail.data$Train.Destination.Stn)
# Remove 'JN', which represents junction, from station names                                          rail.data$Train.Source.Stn <- gsub("JN", "", rail.data$Train.Source.Stn)              rail.data$Train.Destination.Stn <- gsub("JN", "", rail.data$Train.Destination.Stn)
# Append "INDIA" in station name to remove any ambiguity in location identification by API rail.data$Train.Source.Stn <- paste(rail.data$Train.Source.Stn,", INDIA",sep='')   rail.data$Train.Destination.Stn <- paste(rail.data$Train.Destination.Stn,", INDIA",sep='')
# Get all unique stations including source and destination stations 
all.stations <- unique(c(rail.data$Train.Source.Stn,rail.data$Train.Destination.Stn))

Get longitude/ latitude information for all station locations using ‘geocode’ function in ‘ggmap’ library, which uses Google Maps API to get coordinates.
all.longitudes <- as.numeric(NA[1:length(all.stations)])                                                                         all.latitudes <- as.numeric(NA[1:length(all.stations)])
 
for ( i  in 1: length(all.locations)) 
  { 
      coordinates <- geocode(all.stations[i])
      all.longitudes[i] <- coordinates$lon
      all.latitudes[i] <- coordinates$lat
   }
# Join coordinates with stations names 
all.locations <- as.data.frame(cbind(name=all.stations, lon=all.longitudes, lat=all.latitudes),stringsAsFactors=FALSE) 
all.locations$lon <- as.numeric(all.locations$lon)                                                                           all.locations$lat <- as.numeric(all.locations$lat)
# Get total number of trains between two stations using ‘group_by’ from ‘dplyr’ package        no.of.trains <- as.data.frame(rail.data[,c("Train.Source.Stn","Train.Destination.Stn")] %>% group_by(Train.Source.Stn,Train.Destination.Stn) %>% summarise(count=n()))

# Sort data based upon number of trains                                                                                           no.of.trains = no.of.trains[order(no.of.trains$count),] 
# Merge coordinates to rail data                                                                                                              no.of.trains$name <- no.of.trains$Train.Source.Stn
no.of.trains <- left_join (no.of.trains, all.locations) 
no.of.trains <- rename(no.of.trains, source.lon = lon, source.lat=lat )
no.of.trains$name <- no.of.trains$Train.Destination.Stn
no.of.trains <- left_join (no.of.trains, all.locations) 
no.of.trains <- rename(no.of.trains, dest.lon = lon, dest.lat=lat )

Draw map of India using ‘map’ function. Map function allows restricting drawing of map within user provided Longitude/ Latitude. I will use this parameter to get India map.

xlim <- c(67, 98)  
ylim <- c(7, 37)
map("world", col="lavender", fill=TRUE, bg="white", lwd=.1, xlim=xlim, ylim=ylim)

Set color palette with different shades of colors. It is required so that we can vary prominence between links. Link representing more number of trains are more prominent and vice-versa.
color.palette <- colorRampPalette(c("lavender", "red"))
all.colors <- color.palette(7)

# Get maximum number of trains between any of two stations- this is required to choose right shade for color                                                                                                                                      max.count <- max(no.of.trains$count)

Loop over all the links in data and draw corresponding stations and lines with right shade for that on top of earlier produced map-
for (i in 1:nrow(no.of.trains))                                                    
 {
    points.lon <- c(no.of.trains$source.lon[i],no.of.trains$dest.lon[i])                    
    points.lat <- c(no.of.trains$source.lat[i],no.of.trains$dest.lat[i]) 
    color.index <- round( (no.of.trains$count[i] / max.count) * length(all.colors) ) 
    lines(x = points.lon, y = points.lat , col=all.colors[color.index], lwd=.8)
    points(x = points.lon, y = points.lat, col = "blue", pch=20, cex=.8) 
  }

As mentioned earlier, links with just 1 train would not be visible on map as it is assigned same color as that of map background. Otherwise map gets messed up with unmanageable number of links.

Thanks for now!
 

3 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. working with R and lat long and hence can relate very well

    ReplyDelete
  3. for ( i in 1: length(all.locations))
    {
    coordinates <- geocode(all.stations[i])
    all.longitudes[i] <- coordinates$lon
    all.latitudes[i] <- coordinates$lat
    }

    should be
    for ( i in 1: length(all.stations[i])

    ReplyDelete