Mumbai and Delhi are the two most important metro cities in India. There has been a war for supremacy in terms of quality of life, jobs, education, entertainment, and recreational facilities that these cities have to offer to their residents. This post elaborates on a data science project that attempts to analyze the neighborhoods in each of these two cities and tries to understand what is popular in them and what they have to offer to someone who is contemplating to make a choice on seeking a life in either of the metro cities.
The deciding factor for most would be how lively, supportive, vibrant, and unique each of the cities can be when compared to each other. The business problem in this study assumes that people who would be interested in this study are those who would like to create a projection of potential life and activities in these metro city neighborhoods if the subject moves to live in one of them. The decision to choose one over the other would depend on popular venues in the neighborhoods in each of these metro cities.
For any “data science project” data is of paramount importance. For this study, we needed data about neighborhoods in each of these metro cities. The data published by the government on postal codes for all of India would serve us well for this study. We will specifically download the CSV provided under https://data.gov.in/resources/all-india-pincode-directory-contact-details-along-latitude-and-longitude.
In this study, we will download the CSV, read it into a pandas Dataframe and curate it to remove the data related to all other cities, towns, and places which are not Mumbai or Delhi, since we are only interested in comparing these two biggest metro cities in India.
We shall then clean up the unnecessary columns in the CSV, which is not relevant or useful for our current study. Post office names (office name) will be used as the neighborhood names in each of the regions such as Mumbai or Delhi.
Neighborhood names with the same Pincode will be combined as a single row.
Foursquare API will be used to find the longitude and latitude of each of the neighborhoods in both Mumbai and Delhi. This will form the dataset we will use for this study.
Dataset after clean up and curation
We now see that there are the same Pincode values for different neighborhoods. The next step is to combine the rows having the same Pincode, we do this by changing the value of the neighborhood by building a comma-separated concatenation of neighborhood values for rows with the same Pincode.
We also notice that the longitude and latitude values from the CSV data are NaN, which means that we do not have relevant data, we can drop these columns from the dataset as well. We now have the neighborhoods for both the metro cities.
First 5 rows of the dataset
Last 5 rows of the dataset
The next step is to enhance the dataset with the required information. We would need the longitude and latitude values for the neighborhoods. We will use the Nominatim library from geocoders.geopy package to find the longitude and latitude for each of the neighborhoods and would eventually create a dataset having all the necessary columns for our analysis.
Longitude and Latitude values added to neighborhoods in the dataset
We now have the necessary information to visualize the neighborhoods for both the cities on a folium map.
Neighborhoods in Mumbai and Delhi plotted on a map
Analyzing the neighborhoods
Finding top venues near Mumbai neighborhoods
We will use the Foursquare API to find the top venues in the neighborhoods of Mumbai. This will help us understand the nature of life Mumbai neighborhoods have to offer. We will iteratively make Foursquare API calls for each of the Mumbai neighborhoods in our dataset. For illustration purposes, we will look at venues close to one of the neighborhoods in Mumbai, which is Bazargate, Elephanta Caves Po, M.P.T., Stock Exchange, Tajmahal, Town Hall (Mumbai), Foursquare API returns the popular venues within 500m radius of this neighborhood.
Top venues close to one of the Mumbai neighborhoods
Next, we will employ statistical and analytical methods to find the unique venues/venue categories in the Mumbai neighborhoods and we will build a Dataframe that calibrates each of the neighborhoods with the frequency of occurrence for each of the venue categories.
From our analysis, we see that there are 116 unique venue categories in Mumbai neighborhoods. Yoga studios, Indian, Chinese, Thai, American, Spanish, Mediterranean, Deli restaurants, Burger joints, Tea shops, Cafes, Concert halls, theatres, Boutiques, Bowling Alleys, Bars, Flea markets, Harbors, Gourmet shops, Nigh Clubs, Pubs, Bagel shops, Pharmacies, and Spas being some of them.
We then create a dataset that lists the top 5 common venues against each of the neighborhoods in Mumbai. We get a representation such as below for all the neighborhoods in Mumbai.
Top 5 common venues around each of the Mumbai neighborhoods
Cluster the neighborhoods in Mumbai based on the similarity of top common venues
Given that we now have the required information regarding the top venues against each of the neighborhoods in Mumbai, let us now apply a clustering algorithm to group the neighborhoods based on the similarity in the types of venues they have. By clustering, we also provide information to users on a common type of neighborhood in Mumbai. We will use the k-Means clustering approach to cluster the neighborhoods. k will be selected as 5. This means that we will group the neighborhoods into 5 clusters. Each of the neighborhoods gets a Cluster Label assigned.
Neighborhoods with Cluster Labels assigned
We will then use the dataset with cluster labels assigned to visualize the clusters in a folium map.
Clusters of neighborhoods in Mumbai
A piece of important information this map provides is that many neighborhoods in Mumbai are of similar nature concerning the venues they have around, indicated by the cluster marked in blue.
Let us now dig a little deeper into how the neighborhoods are clustered and what is the characteristic of the cluster that is very common across most neighborhoods in Mumbai.
Cluster Label 0

The neighborhoods belonging to this cluster is popular for having Indian restaurants, Cafes, markets, and vegetarian joints. We see that this neighborhood would be something that a subsection of Indians would prefer if they want a scaled-down lifestyle with close to home vegetarian food.
Cluster Label 1

The neighborhoods belonging to this cluster is popular for having Indian restaurants, Irani Cafes, Cafes, Seafood, and fast food joints. We see that this neighborhood would be something that would be interesting to those who would like Seafood, fast food, probably these neighborhoods are also of interest to those who come from Iran and would like to visit places serving their kind of food.
Cluster Label 2

The neighborhoods belonging to this cluster is popular for having a mix of Indian and Chinese restaurants, Train stations, Pubs, Bus stations, Bakeries, etc. We see that this neighborhood would be something that would be interesting to those who depend more on the public commute since these neighborhoods are closer to train and bus stations. Also, these neighborhoods may interest people who have diverse food choices starting from Indian, Asian, Chinese, Afghan to having Snacks, Sandwich, Ice-cream shops. These neighborhoods also provide for some recreational places such as Gyms, Parks, Bowling Alleys, Theatres, and Harbours.
Cluster Label 3

Very few neighborhoods belong to this cluster, making this unique. The main attraction in this neighborhood seems to be its proximity to Theme Park, Pizza place and Cocktail bars.
Cluster Label 4

Again very few neighborhoods belong to this cluster, making this unique. The main attraction in this neighborhood seems to be its proximity to Ferry and College Auditorium.
Since, the objective of this study is to compare the neighborhoods between the two metro cities of Mumbai and Delhi, and not really to compare neighborhoods within Mumbai, we will put forth our conclusion from the study after doing a similar analysis on the neighborhoods in Delhi.
Finding top venues near Delhi neighborhoods
We will use the Foursquare API to find the top venues in the neighborhoods of Delhi. This will help us in understanding the nature of life Delhi neighborhoods have to offer. We will iteratively make a Foursquare API call for each of the Delhi neighborhoods in our dataset. For illustration purpose, we will look at venues close to one of the neighborhoods in Delhi, which is Sansad Marg, Sansadiya South, Secretariat North, Shastri Bhawan, Supreme Court, New Delhi G.P.O., Foursquare API returns the following response as the popular venues close to 500m radius of this neighborhood.
Top venues closest to one of the neighborhoods in Delhi
Next, we will employ statistically and analytical methods to find the unique venues/venue categories in the Delhi neighborhoods and will build a Dataframe that calibrates each of the neighborhoods with the frequency of occurrence of each of the venue category
From our analysis, we see that there are 14 unique venue categories in Delhi neighborhoods. ATMs, Arts and Crafts stores, Burger Joints, Cafes, Gardens, Gyms, Multiplexes, Museums, Pizza places, Indian restaurants, Shopping malls, Water Parks, Gardens and Hotels being some of them.
We then create a dataset that lists the top 5 common venues against each of the neighborhoods in Delhi. We get a representation such as below for all the neighborhoods in Delhi.
Top 5 common venues in the neighborhoods of Delhi
Cluster the neighborhoods in Delhi based on the similarity of top common venues
Given that we also have the required information regarding the top venues against each of the neighborhoods in Delhi, let us now apply a clustering algorithm to group the neighborhoods based on the similarity in the types of venues they have. By clustering, we also provide information to users on a common type of neighborhood in Delhi. We will use the k-Means clustering approach to cluster the neighborhoods. k will be selected as 5. This means that we will group the neighborhoods into 5 clusters. Each of the neighborhoods gets a Cluster Label assigned.
Delhi neighborhoods and venues with Cluster Label assigned
We will then use the dataset with cluster labels assigned to visualize the clusters in the folium map.
Caption
A piece of important information this map provides is that the neighborhoods in Delhi are of diverse nature concerning the venues they have around, indicated by the clusters marked in different colors. Also, we did see earlier that we did not have too many venue categories for the neighborhoods that were returned for the neighborhoods in Delhi.
Let us now dig a little deeper into how the neighborhoods are clustered and what is their characteristic.
Cluster Label 0

There are close to 93 neighborhoods belonging to this cluster type. This cluster is popular for having Arts and Crafts stores, Water Parks, Shopping malls and Museums. These neighborhoods are not good for foodies. However, this should be good for those who have children, since the venues close to these neighborhoods are great to keep the children engaged.
Cluster Label 1

Not many neighborhoods belong to this cluster, Multiplexes, department stores, and Gyms seem to be popular venues close to the neighborhood in this cluster.
Cluster Label 2

Not many neighborhoods belong to this cluster, ATMs, Water Parks and Museums seem to be popular venues close to the neighborhood in this cluster.
Cluster Label 3

Not many neighborhoods belong to this cluster, Pizza places, Water Parks, and Museums seem to be popular venues close to the neighborhood in this cluster.
Cluster Label 4

Not many neighborhoods belong to this cluster, Museums, Shopping malls and Gardens seem to be popular venues close to the neighborhood in this cluster.
Study findings & conclusion
In this project, we have attempted to load the dataset for two of India’s prime metro cities and have tried to analyze the neighborhood regions in these metro cities based on the type of popular and top venues they have. We have clustered the neighborhoods based on the most common top venues in each of the neighborhoods. Our intention with this project was to analyze and understand the difference in the type of life in these metros, which can offer decision points for anybody who is considering to settle in either of the metro cities and can get a peek into what type of experience and facilities he will be provided with.
Given our cluster information for both Mumbai and Delhi, we see that Mumbai and its neighborhoods are a great place for a foodie. There are a lot of restaurants, cafes, bars, etc in Mumbai neighborhoods. Also due to the proximity of Mumbai to the seashore, Mumbai neighborhoods offer harbors, seafood, boat, and ferry rides. On the other hand, we see how dissimilar life in Delhi neighborhoods would be compared to Mumbai neighborhoods. Delhi neighborhoods and good for those who like Arts and Crafts, Museums, Water Parks, and Pizza places. There is very little in terms of foreign cuisine restaurants in Delhi. Mumbai, on the other hand, is great for international visitors, ex-pats, etc, because of the variety and types of food outlets it has. Delhi is inland and its neighborhoods have proximity to Water Parks, Museums and Arts, and Crafts stores.
Thus with this project, we have analyzed the kind of life each of these big metro cities has to offer based on the popular venues in their neighborhood.
Mumbai would be the choice if you are a foodie!
Another important aspect the study reveals is that the categories of venues Mumbai offers are far too many compared to Delhi. This means that Delhi becomes restrictive in terms of variety and convenience. With the data, we have studied Mumbai wins this battle of metros!
#GlobalAIandDataScience#GlobalDataScience