Indian railways is lifeline for Indian people. Millions of people travel daily by railway. Indian rail network comprises 71 thousand miles of track over a route of 40 thousand miles and more than 7 thousand stations.
Here I try to show the rail network using ‘R’ software. Here
is the list of all the mail, express, and super-fast trains provided in Indian
railway site-
http://www.indianrail.gov.in/mail_express_trn_list.html.
It contains names of source and destination stations for each train. I will leverage this data to draw the network. Before looking into R code details, let us see the final output-
http://www.indianrail.gov.in/mail_express_trn_list.html.
It contains names of source and destination stations for each train. I will leverage this data to draw the network. Before looking into R code details, let us see the final output-
There is a link connecting two stations on map if there are
one or more than one direct trains running between these stations. More the
number of trains, more prominent the link is. As I have drawn a straight line
between stations, it does not capture exact route of the train. Also the color
of the link is same as that of map if there in only 1 train between two stations.
This I have done to avoid the messy appearance of the map. The purpose of the
map is to get general sense of Indian rail network. It can be seen that 4 major
stations are Delhi, Mumbai, Kolkata, and Chennai.
Here are details of ‘R’ Code-
First load the required ‘R’ packages. I use ‘maps’ to draw
India map and ‘ggmap’ to get longitude and latitude information for all railway
stations. It uses Google maps API to get required information.
library(maps)
library(ggmap)
library(dplyr)
library(ggmap)
library(dplyr)
Next read the data. I have saved data from above mentioned
link in csv format.
rail.data
<- read.csv("rail_data_csv.csv")
Now I do some pre-processing on station names so that these
can be passed to Google API in order to get location coordinates-
# Convert Station names into
character string rail.data$Train.Source.Stn
<- as.character(rail.data$Train.Source.Stn) rail.data$Train.Destination.Stn <-
as.character(rail.data$Train.Destination.Stn)
# Remove 'JN', which represents
junction, from station names rail.data$Train.Source.Stn <- gsub("JN", "",
rail.data$Train.Source.Stn) rail.data$Train.Destination.Stn <-
gsub("JN", "", rail.data$Train.Destination.Stn)
# Append "INDIA" in
station name to remove any ambiguity in location identification by API rail.data$Train.Source.Stn <-
paste(rail.data$Train.Source.Stn,", INDIA",sep='') rail.data$Train.Destination.Stn <-
paste(rail.data$Train.Destination.Stn,", INDIA",sep='')
# Get all unique stations
including source and destination stations
all.stations <- unique(c(rail.data$Train.Source.Stn,rail.data$Train.Destination.Stn))
all.stations <- unique(c(rail.data$Train.Source.Stn,rail.data$Train.Destination.Stn))
Get longitude/ latitude information for all
station locations using ‘geocode’ function in ‘ggmap’ library, which uses
Google Maps API to get coordinates.
all.longitudes
<- as.numeric(NA[1:length(all.stations)]) all.latitudes <-
as.numeric(NA[1:length(all.stations)])
for ( i in 1: length(all.locations))
{
coordinates <- geocode(all.stations[i])
all.longitudes[i] <- coordinates$lon
all.latitudes[i] <- coordinates$lat
}
{
coordinates <- geocode(all.stations[i])
all.longitudes[i] <- coordinates$lon
all.latitudes[i] <- coordinates$lat
}
# Join coordinates with stations
names
all.locations <- as.data.frame(cbind(name=all.stations, lon=all.longitudes, lat=all.latitudes),stringsAsFactors=FALSE)
all.locations$lon <- as.numeric(all.locations$lon) all.locations$lat <- as.numeric(all.locations$lat)
all.locations <- as.data.frame(cbind(name=all.stations, lon=all.longitudes, lat=all.latitudes),stringsAsFactors=FALSE)
all.locations$lon <- as.numeric(all.locations$lon) all.locations$lat <- as.numeric(all.locations$lat)
# Get total number of trains between
two stations using ‘group_by’ from ‘dplyr’ package no.of.trains <-
as.data.frame(rail.data[,c("Train.Source.Stn","Train.Destination.Stn")]
%>% group_by(Train.Source.Stn,Train.Destination.Stn) %>%
summarise(count=n()))
# Sort data based upon number of trains no.of.trains = no.of.trains[order(no.of.trains$count),]
# Merge coordinates to rail data no.of.trains$name <- no.of.trains$Train.Source.Stn
no.of.trains <- left_join (no.of.trains, all.locations)
no.of.trains <- rename(no.of.trains, source.lon = lon, source.lat=lat )
no.of.trains <- left_join (no.of.trains, all.locations)
no.of.trains <- rename(no.of.trains, source.lon = lon, source.lat=lat )
no.of.trains$name
<- no.of.trains$Train.Destination.Stn
no.of.trains <- left_join (no.of.trains, all.locations)
no.of.trains <- rename(no.of.trains, dest.lon = lon, dest.lat=lat )
no.of.trains <- left_join (no.of.trains, all.locations)
no.of.trains <- rename(no.of.trains, dest.lon = lon, dest.lat=lat )
Draw map of India using ‘map’ function. Map function
allows restricting drawing of map within user provided Longitude/ Latitude. I
will use this parameter to get India map.
xlim <- c(67, 98)
ylim <- c(7, 37)
map("world", col="lavender", fill=TRUE,
bg="white", lwd=.1, xlim=xlim, ylim=ylim)
Set color palette with different shades of colors.
It is required so that we can vary prominence between links. Link representing
more number of trains are more prominent and vice-versa.
color.palette <- colorRampPalette(c("lavender",
"red"))
all.colors <- color.palette(7)
all.colors <- color.palette(7)
# Get maximum number of trains between any of two stations- this is required to choose right shade for color max.count <- max(no.of.trains$count)
Loop over all the links in data and draw corresponding stations and lines with right shade for that on top of earlier produced map-
for (i in 1:nrow(no.of.trains))
{
points.lon <- c(no.of.trains$source.lon[i],no.of.trains$dest.lon[i])
points.lat <- c(no.of.trains$source.lat[i],no.of.trains$dest.lat[i])
color.index <- round( (no.of.trains$count[i] / max.count) * length(all.colors) )
lines(x = points.lon, y = points.lat , col=all.colors[color.index], lwd=.8)
points(x = points.lon, y = points.lat, col = "blue", pch=20, cex=.8)
}
{
points.lon <- c(no.of.trains$source.lon[i],no.of.trains$dest.lon[i])
points.lat <- c(no.of.trains$source.lat[i],no.of.trains$dest.lat[i])
color.index <- round( (no.of.trains$count[i] / max.count) * length(all.colors) )
lines(x = points.lon, y = points.lat , col=all.colors[color.index], lwd=.8)
points(x = points.lon, y = points.lat, col = "blue", pch=20, cex=.8)
}
As mentioned earlier, links with just 1 train would not be visible on map as it is assigned same color as that of map background. Otherwise map gets messed up with unmanageable number of links.
Thanks for now!
This comment has been removed by the author.
ReplyDeleteworking with R and lat long and hence can relate very well
ReplyDeletefor ( i in 1: length(all.locations))
ReplyDelete{
coordinates <- geocode(all.stations[i])
all.longitudes[i] <- coordinates$lon
all.latitudes[i] <- coordinates$lat
}
should be
for ( i in 1: length(all.stations[i])