1.1 Description and Discussion of the Background
Based on the article published in Business-Standard.com on 15th June 2016
Can Vizag remodel itself as San Francisco? If yes, what is the current growth rate of both the cities? How are they interrelated with respect to geography and demographics?
Hans Rosling was a young guest student in India when he first realized that Asia had all the capacities to reclaim its place as the world’s dominant economic force. At TEDIndia, he graphs global economic growth since 1858 and predicts the exact date that India and China will outstrip the US. Hans Rowling’s TEDIndia video:
Based on the report published by SmartCitiesCouncil India ,Fluentgrid Ltd (formerly Phoenix IT Solutions Ltd), in association with Greater Visakhapatnam Municipal Corporation (GVMC), has launched a state-of-the-art centralized City Command Center.
IBM, the Council’s lead partner in India, partnering with two other US-based organizations — AECOM and KPMG — has prepared the entire master plan for Vizag Smart City, marking a strong American role in the city’s ambitious plan.
Visakhapatnam (also known as Vizag) is the largest city and the financial capital of the Indian State of Andhra Pradesh. It is also the ninth-most populous metropolitan area in India with a population of 5,018,000. With an output of $43.5 billion, Visakhapatnam is the ninth-largest contributor to India’s overall gross domestic product as of 2016. (Wikipedia)
For the larger project where the US agencies are involved based on an event published in USTDA Blog, the government authorities have been benchmarking Vizag with San Francisco when it comes to the targeted outcomes. San Francisco was also chosen because of the geographical similarity. Situated in northern California, the city is on the tip of a peninsula, surrounded by the Pacific Ocean and a bay. San Francisco is also known for its hilly landscape, among other picturesque places. Vizag, too, has a hilly terrain, with several big formations such as Kailasagiri and Rishikonda overlooking the Bay of Bengal. 
As a resident of Vizag, I decided to explore the neighborhoods of Vizag, AP, India with the neighborhoods of San Francisco, US to understand the investment opportunities and the city overall growth and development at par with San Francisco using Clustering & Segmentation techniques, ML (Machine Learning). Data Visualizations (using seaborn and
Data that shows the current status of the 2 cities and identifies potential areas and different sectors of investment in Visakhapatnam. This is achieved by comparing the neighborhoods of Vizag and San Francisco and visualizing data for identifying patterns in their geographical and demographic similarities.
This project will highlight the investor opportunities with
2.1 Data Requirements
Following datasets have been used in the project:
- Postal Codes of Visakhapatnam. Data has been scraped and cleaned from Yo
!Vizag– City’s Exclusive Magazine and Portal  using Beautiful Soup and pandas libraries and saved in .csv format.
- Foursquare API to get the most common venues of given boroughs of Visakhapatnam and San Francisco respectively.
- Visakhapatnam  and San Francisco Wikipedia Pages  have been scraped and cleaned for creating Word clouds.
- Zip codes of San Francisco. Data has been downloaded in .csv format from https://datasf.org/ and cleaned using pandas.
- Economy of Visakhapatnam
- Per Capita Income of San Francisco
- Population data of Visakhapatnam
- GDP data of San Francisco
2 Cities will be analyzed in this project: Visakhapatnam and San Francisco.
I will be using the below datasets for analyzing Visakhapatnam.
Data 1: Neighborhood has a total of 684 areas. Most notable areas of the city include urban areas like Dwaraka Nagar, Gajuwaka, Gopalapatnam, Jagadamba Centre, Maddilapalem, Madhurawada, Seethammadhara and semi-rural suburbs such as Simhachalam, Pendurthi, and Parwada.
Data has been scraped and cleaned from Yo!Vizag – City’s Exclusive Magazine and Portal– using Beautiful Soup and pandas libraries and saved in .csv format. Below are the 1st five areas:
San Francisco Data:
Data 2: SFO has total of 36 neighborhoods . But due to limited data available we could analyze only 26 neighborhoods. Data has been downloaded in .csv format from https://datasf.org/ and cleaned using pandas. Below are the 1st five neighborhoods:
Data 3: For the below analysis we will get data from Wikipedia:
- Visakhapatnam and San Francisco City Demographics.
- Visakhapatnam Tourism and Attractions.
- San Francisco Tourism and conventions.
Fig. Word clouds of Vizag and SFO
Data 4: Visakhapatnam and SFO geographical coordinates will be utilized as input for Foursquare API, that will be leveraged to extract information for each neighborhood respectively. We will use the Foursquare API to explore neighborhoods in Visakhapatnam and SFO City. The below is image of Foursquare API data for the 2 cities.
Visakhapatnam Foursquare API data:
SFO Foursquare API data:
Data 5: Population, GDP, Per capita Income, Tourism, Educational
In this project, first part is clustering of Visakhapatnam using k means algorithm. Visakhapatnam has 648 pin codes/areas/postal codes, geocodes of only 326 locations have been included in the data analysis. We will explore the areas around central Visakhapatnam and compare it with the neighborhoods of San Francisco to understand the geographical similarities.
2nd part comprises of clustering of San Francisco. For San Francisco out of 36 neighborhoods venues of 27 neighborhoods have been explored in this project using Foursquare API.
3rd part includes data visualizations and comparison of available data of both the cities for insights to take investment decisions in Vizag. Word clouds created from the wiki pages of Vizag and SFO further add value to our discussion.
Exploratory Data Analysis:
Data 1: Visakhapatnam Geographical Coordinates Data.
We use geopy and folium libraries to create a map of Visakhapatnam city with neighborhoods imposed on it. 326 areas are plotted using their latitude and longitude values to obtain a high-level visualization of the neighborhoods.
Fig: Visakhapatnam Neighborhood Visualization
Now let’s explore venues around Andhra University, one of the most prestigious and oldest university in Andhra Pradesh located in central Vizag. We selected this location as Andhra University is located on the uplands of Visakhapatnam, the university campus is scenic, with the Bay of Bengal on one side of it and on the other, the green Kailasagiri hill range. This location is apt for our analysis as San Francisco was also chosen because of the geographical similarity.
Longitude and Latitude values of Andhra University, Sivajipalem Road, Sector 4, Pedda Waltair, Visakhapatnam, Andhra Pradesh, 530001, India are 17.7376312,83.3300513027767.
Now, let’s get the top 10 venues that are in Andhra university within a radius of 500 meters.
Foursquare API gave only 2 unique venues
Now we repeat the same steps for all the neighborhoods around Andhra university to get the most common venue categories. Snap shot of first 5 neighborhoods and their venue categories.
There are 39 unique categories of venues in the neighborhoods of Andhra University.
Now we repeat the same for all the neighborhoods in Visakhapatnam city. Let’s look at first 2 neighborhoods with the top 5 most common venues to get an idea.
Now we run the k-means algorithm to cluster the neighborhoods into 4 clusters. The no. of clusters is decided by using Elbow method for optimal k. In our scenario the optimal no. of k Is 4.
Below horizontal Bar Chart shows the count of most common venues in each cluster. Based on the analysis, we can clearly see the presence of clothing Store/Shopping complex in every cluster which shows the amount of urbanization and development throughout every neighborhood of Visakhapatnam. Breakfast spots, food restaurants are other common venues in cluster 1 and 2.
Fig. Clustering and segmentation of Visakhapatnam using k means algorithm –
Cluster 1 has the maximum no. of venues and development. There is a significant population increase in recent past. Below is the Bar chart depicting the population of both the cities in last 5 years.
Let’s explore the data further.
We can see the presence of Historic sites, harbor, fish markets and beach which gives us some idea on the geographical similarity between Vizag and SFO. Let’s Visualize this in word clouds with Tourism data of Vizag and San Francisco scraped from the travel website TripAdvisor:
Fig. Word cloud of San Francisco list of tourist Attractions:
Word Cloud of list of Visakhapatnam tourist attractions:
Above word clouds signify the similarity in the two cities Museum, Park and Beach/Bay being the most common among them. Some other already existing natural tourist spots adding to the beauty of the city are waterfalls, caves, hills, wildlife and temples in Visakhapatnam.
But when we closely observe the word cloud of tourism of San Francisco, there are several untapped opportunities like Fisherman’s wharf, Pier 39, Twin Peaks, Big Bus Hop on Hop off tour etc. that can be implemented in Visakhapatnam due to similar geographical features and weather conditions.
Box plot of weather conditions of Visakhapatnam and San Francisco in a Year:
The hot and humid conditions of Visakhapatnam as compared to San Francisco clearly show huge scope for establishment of amusement water parks and recreational activities. Cruises, Sailing, Hiking trails and Water tours can create major spike in tourism and boost GDP of Visakhapatnam.
Though there is a significant difference in the GDP and Per Capita Income of Vizag and San Francisco, Visakhapatnam has managed to top the charts of urban population amongst all the 13 districts in Andhra Pradesh, India. According to data uploaded onto the CM’s Dashboard, the 2011 Census of India states that Visakhapatnam stood first in the state with 47.45% of urban populace.
The difference in GDP and Per Capita Income of the two cities signify the importance of technology and investments required for the city to remodel itself as San Francisco in the next 10 years. Achieving the vision will require a “Smart City” approach to regional development and infrastructure planning and delivery. For further information please refer the below link – https://www.smartvizag.in/index.php/projects/
To Summarize, we created word cloud using seaborn libraries and web scraping Wikipedia page using beautiful soup.
Fig. Word Cloud of Visakhapatnam Wikipedia Page
In this word cloud we can clearly see that Visakhapatnam has a coast, port, railway, naval base, university, stadium and is a metropolitan city with historic sites and international airport.
With this information we move on to the analysis of San Francisco and identify potential ideas for development.
Data 2: SFO Geographical Coordinates Data is downloaded in .csv format from https://datasf.org/ and cleaned using pandas. We explored 27 neighborhoods of San Francisco in our analysis.
Below are the first 5 neighborhoods.
Fig. SFO Neighborhood Visualization using Folium and geopy libraries.
As we explore each neighborhood further for identifying similarities with Visakhapatnam, let’s start with venues around the neighborhood surrounded with Beach in SFO.
Now, let’s get the top 100 venues that are in North Beach, SF, California within a radius of 500 meters.
Foursquare API gave 100 unique venues. Let’s explore the data –
We will do the same analysis for all the neighborhoods of North Beach, SF and explore the venues returned by Foursquare API to understand the most common venue categories.
We repeat the same for all the neighborhoods of SF. There are 261 unique categories in SF. Now we run the k-means algorithm to cluster the neighborhoods into 4 clusters. The no. of clusters is decided by using Elbow method for optimal k. In our scenario the optimal no. of k Is 4.
Fig.Clustering of neighborhoods of San Francisco using k means algorithm
Below is the horizontal bar chart for most common venues in each cluster
As we can see from the above analysis, neighborhoods in cluster 0 are highly developed with wide range of restaurants, dance studios, juice bars, coffee shops, event spaces etc. The venues in Vizag and San Francisco are largely different and unique in nature due to different levels of development/urbanization rates in both the cities.
But this analysis gives a high-level idea on the new categories of venues that can be invested in Visakhapatnam and tailored based on the needs of local population. Some categories like juice bars, dance studios, event places which currently are not present in the most common venue categories in Vizag leaves some scope for new investments.
Finally, we will look at the word cloud of San Francisco created from Wikipedia to explore further.
Fig. Word cloud of San Francisco created from Wikipedia
We can see the words military, Bay area, hill, Pacific Ocean, Ferry, waterfront, historic building etc. which show some similarity in the geographic and demographic data of Visakhapatnam and San Francisco.
Word cloud of Educational institutions and universities in San Francisco:
Despite its limited geographical space, San Francisco, California is home to a multitude of colleges and universities. San Francisco Conservatory of Music, San Francisco School of Digital Filmmaking, San Francisco Art Institute and Art Institute of California – San Francisco, a private campus which focuses on video game and design-based education (interior, fashion etc.) are some of the unique colleges and universities which can be further explored and established in Visakhapatnam.
Though we could show limited results in demographic and geographical factors from the given data set in the clustering and segmentation of the two cities and word clouds of the Wikipedia pages of Visakhapatnam and San Francisco, but we could bring out some business ideas on the new venue categories like dance studios, juice bars, coffee shops, event spaces and wide range of restaurants like sushi restaurant, Mediterranean restaurant
5. Discussion and Conclusion:
- Tourism has
hugepotential ofdevelopment as a part of Smart city initiatives in Vizag. Cruises, Sailing, Hiking trails andWater tours can create majorspike in tourism and boost GDP of Visakhapatnam.
- Educational Institutions data can be explored further.
Businessinvestor looking for real estate investment can further explore areas/neighborhoods in cluster 1 of Visakhapatnam as these are the areas having the highest development with restaurants, breakfast spots, shopping complex etc. as compared to the places in other clusters.
- For people interested in coming up with startup ideas in the food sector of smart city – dance studios, juice bars, coffee shops, event spaces and wide range of restaurants like
sushirestaurant, Mediterranean restaurant etc. are some of the new business ideas that can be experimented with based on further data analysis.
- Individual investors looking for investment in residential plots can further explore areas in cluster 0 and cluster 2 of Visakhapatnam.
-  Yo!Vizag – City’s Exclusive Magazine and Portal
-  Foursquare API
-  Visakhapatnam Wikipedia page
-  San Francisco Wikipedia Page
-  https://india.smartcitiescouncil.com/article/vizag-smart-city-model-itself-san-francisco
-  https://www.smartvizag.in/
-  http://apedb.gov.in/about-visakhapatnam-district.html
-  https://www.opendatanetwork.com