Mapping Coronavirus with Python

Coronavirus

Introduction

The coronavirus, called Covid-19 by the WHO, is continuing to spread around the world. As of April 28, 2020, COVID-19 has infected more than 3.13 million people and killed at least 217,000 people worldwide. In this tutorial, we will learn how to map the virus spread using python libraries (Folium, Geopandas, Pandas, Numpy, hvplot, Matplotlib, kepler.gl) that can be helpful for data wrangling, and mapping COVID-19 data.

Data Source

There are a lot of COVID-19 datasets available such as Worldometer, Our World in Data (cumulative), and some other COVID-19 datasets from Google, Johns Hopkins University, European Union Open Data Portal. Worldometer provides up-to-date COVID-19 data[1]. Worldometer updates the total number of coronavirus COVID-19 cases on a daily basis. Our World in Data [2] provides a collection of the COVID-19 data together with a complete overview of data sources and more at this GitHub repository here. Let’s start web scraping COVID-19 data using python.

Web Scraping

Web scraping is all about the process of gathering information from HTML. There are some useful python libraries for extracting data from websites as below:

#import libraries
import os, sys
import json
import pandas as pd
import geopandas as gpd
import numpy as np
from numpy import int64
import requests, io
import urllib.request
import folium
from folium import plugins
import fiona
import branca
from branca.colormap import linear
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt

Let’s define the URL and get the data.

url = ‘https://www.worldometers.info/coronavirus/#countries'
response = requests.get(url)

Let’s check what we’ve got.

data = response.content.decode(‘utf-8’)
Web Scraping

In order to clean the above messy data, we have to parse the content we get from the request. So, now we can define our HTML table parser object. I found there are some table parser functions available to get HTML table data. There is one table parser function I prefer to use as below:

Now let’s try the above code to create a new data frame from the HTML table data.

hp = HTMLTableParser()
table = hp.parse_url(url)[0][1] # Grabbing the table from the tuple
table.head(10)

If we run the above code, we can get the result as below:

We can see a few special characters (“\n”, “+”) to remove in the table. Let’s check the dataframe.

#check bottom rows
table.tail(10)

There are some extra special characters (\n..\n) in the dataframe. We need to remove the extra characters. We only need country data for mapping in this tutorial. So we can drop the extra top and bottom rows that we do not need for data processing.

#Drop top buttom unwanted rows
df= table.drop(table.index[[0,1,2,3,4,5,6,7]]).reset_index(drop=True)
#drop tail unwanted rows
df.drop(df.tail(8).index,inplace=True)
#drop new line '\n' charachter
df.replace(['\n'], '', regex=True, inplace=True)
df.replace([','], '', regex=True, inplace=True)

We need to format the table before starting mapping. The special characters in the dataframe can be removed using a loop as below:

# drop unwanted drop unwanted special characters using a loop
for col in df.columns[1:11]:
df[col]=df[col].str.replace(“+”, “”).str.replace(“,”, “”).str.replace(“N/A”, “”).str.replace(“ “, “”).str.replace(“ “, “”)

All the extracted data is in text format and some column names are improper for data processing. We need to rename some column names.

df1 = df.rename(columns={'Country,Other': 'CNTRY_NAME', 'Serious,Critical': 'Serious_Critical', 'Tot Cases/1M pop': 'Tot_Cases_1M_pop', 'Deaths/1M pop': 'Deaths_1M_pop', 'Tests/\n1M pop\n': 'Tests_1M_pop'})

The final result is as below:

Worldometer Coronavirus Table

However, it is still not enough for data processing. We need to check the data type of each data frame column.

Data Types

The data type of each column is object in the dataframe. So we need to convert some data types to appropriate data types in the data frame. Type conversion is the conversion of object from one data type to another data type.

#convert object columns in dataframe to numeric
df1.fillna(0, inplace=True)
df1.replace(np.nan, 0, inplace=True)
df1.replace(np.inf, 0, inplace=True)
for col in df1.columns[1:11]:
df1[col] = pd.to_numeric(df1[col], errors=’ignore’)

Let’s check it again:

We can see that most of the data types converted to numeric, but some of them are float due to some empty column values. So let’s try again to convert them to integer data types. We can keep the float data type in the dataframe if it is in the correct data format.

#convert float columns in data frame to integer
df1.fillna(0, inplace=True)
df1.replace(np.nan, 0, inplace=True)
df1.replace(np.inf, 0, inplace=True)
for col in df1.columns[1:11]:
df1[col]=df1[col].apply(int)

It can be seen that web scraping is hard. We need to identify incomplete, incorrect, inaccurate, or irrelevant parts of the data and then replace, modify, or remove coarse data in the data frame.

df1.sort_values(by=[‘TotalCases’], inplace=True, ascending=False)
df1.head(10)
Cleaned Worldometer Table

It seems the data is clean and nicely formatted. Now let’s start mapping.

Mapping

Mapping is creating graphic representations of information using spatial relationships within the graphic to represent some relationships within the data. It can be static or dynamic (includes animated and real-time web maps or interactive web maps). We need geospatial country data for mapping. The country dataset is freely available on ArcGIS Open Data Hub.

# get country data
url = "https://opendata.arcgis.com/datasets/a21fdb46d23e4ef896f31475217cbb08_1.geojson"
world = gpd.read_file(url)
world.plot()
Countries WGS84 (ArcGIS Hub)

There is another option of downloading country shapefile data as below:

Downloading Zipped Shapefile

So far, we have prepared COVID-19 and country datasets for mapping. GeoPandas is an open-source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. There are two ways to combine datasets in geopandas — attribute joins and spatial joins. In an attribute join, merging is based on a common variable.

Let’s check the country data.

It has country names (CNTRY_NAME) and geometry for mapping. So there is a common column name in both data frames. However, CNTRY_NAME of the USA and UK are different (“United States”,” United Kingdom” )in the world data frame. So let’s rename them before merging both data frames.

world.CNTRY_NAME = world.CNTRY_NAME.replace({"United States": "USA"},{"United Kingdom": "UK"})

Let’s merge two datasets based on the common column “CNTRY_NAME”. Let’s check the data types of columns.

corona = world.merge(df1, on='CNTRY_NAME', how='left')
corona.dtypes

The data type of some columns are not in the correct format.

corona.fillna(0, inplace=True)
corona.replace(np.nan, 0, inplace=True)
corona.replace(np.inf, 0, inplace=True)
for col in corona.columns[3:10]:
corona[col]=corona[col].astype(int)
corona.dtypes

Let’s create a new geopandas data frame by merging these two datasets.

df_world = pd.merge(df1, world, on=’CNTRY_NAME’)
crs = {‘init’: ‘epsg:4326’}
corona_gpd = gpd.GeoDataFrame(df_world, crs=crs, geometry=’geometry’)
corona_gpd.head(5)

Plotting Data

Geopandas provides a high-level interface to the matplotlib library for making static maps. Mapping shapes is as easy as using the plot() method on a GeoSeries or GeoDataFrame.

f, ax = plt.subplots(1,1,figsize=(12,8))
ax = corona_gpd.plot(column=’TotalCases’, cmap=’rainbow’, ax=ax, legend=True,
legend_kwds={‘label’: “Total Cases by Country” })
ax = corona_gpd.plot(figsize=(15, 15), column=’TotalDeaths’, cmap=plt.cm.jet, scheme=’fisher_jenks’, k=9, alpha=1, legend=True, markersize = 0.5 )
plt.title(‘Coronavirus Total Death by Country’)

We’ll now start converting above static maps to interactive maps using hvplot.

import hvplot.pandas
corona_gpd.hvplot(c=”TotalDeaths”, cmap=’rainbow’,
width=800,height=450,
title=”TotalDeaths by Country”)
Interactive Plotting

Let’s try to plot COVID-19 time-series data from Our World in Data.

# plotting time series data
covid= pd.read_csv('https://covid.ourworldindata.org/data/owid-covid-data.csv', delimiter= ",")
covid1 = covid.rename(columns={'location': 'COUNTRY'})
covid_gdp = pd.merge(world, covid1)
crs = {'init': 'epsg:4326'}
covid_gdp = gpd.GeoDataFrame(covid_gdp, crs=crs, geometry='geometry')
covid_gdp.hvplot( c="total_deaths",
cmap="YlOrRd",
hover_cols=['COUNTRY', 'total_deaths'],
hover_fill_color="grey",
line_width=2,
width=800,
height=450,
groupby='date',
title="Covid-19 Total Deaths by Country/Date")
Covid-19 Timeseries Data

Interactive Mapping

It is very straightforward to add geometries (points, polygons, lines) from a GeoDataFrame to a map. This can be done using the additional python library Folium. It makes it easy to visualize data that’s been manipulated in Python on an interactive leaflet map. It enables both the binding of data to a map for choropleth visualizations as well as passing rich vector/raster/HTML visualizations as markers on the map. The library has a number of built-in tilesets from OpenStreetMap, Mapbox, and Stamen, and supports custom tilesets with Mapbox or Cloudmade API keys. Even Esri base maps can be added. Folium supports both Image, Video, GeoJSON, and TopoJSON overlays. For example:

import folium #OSM Map
m = folium.Map(location=[43.6532, -79.3832])#Toronto
m
OSM Map

Esri basemaps can also be added as below:

folium.Map(location=[43.6532, -79.3832],
zoom_start=12,
tiles=’https://services.arcgisonline.com/arcgis/rest/services/World_Topo_Map/MapServer/WMTS/tile/1.0.0/World_Topo_Map/default/default028mm/{z}/{y}/{x}.png',
attr=’Ablajan or anything else...’)
Esri World_Topo_Map

Let’s add COVID-19 data to the above map.

#load gdf to map
gjson = corona_gpd.to_crs(epsg=’4326').to_json()
#embed map
def embed_map(m):
from IPython.display import IFrame
m.save(‘index.html’)
return IFrame(‘index.html’, width=’100%’, height=’750px’)
map = folium.Map([43.783333, -79.866667], zoom_start=2)
country = folium.features.GeoJson(gjson)
map.add_child(country)
embed_map(map)

Let’s try to add a choropleth map with tooltips.

# add iframe in notebook
def embed_map(m):
from IPython.display import IFrame
m.save('index.html')
return IFrame('index.html', width='100%', height='500px')
#add basemap
map = folium.Map([0, 0], zoom_start=2, tiles='https://services.arcgisonline.com/arcgis/rest/services/World_Topo_Map/MapServer/WMTS/tile/1.0.0/World_Topo_Map/default/default028mm/{z}/{y}/{x}.png',
attr='Esri ..., Ablajan or anything else...')
gjson = corona_gpd.to_crs(epsg='4326').to_json()
df3 = df2.set_index('CNTRY_NAME')['TotalDeaths'].dropna()
colorscale = branca.colormap.linear.YlOrRd_09.scale(df2.TotalDeaths.min(), df2.TotalDeaths.max())
def style_function(feature):
TotalDeaths = df3.get(int(feature['id'][-1:]), None)
return {
'fillOpacity': 1,
'weight': 1,
'fillColor': '#black' if TotalDeaths is None else colorscale(TotalCases)
}
colorscale.add_to(map)
colorscale.caption = 'Total Deaths by Country'
country = folium.features.GeoJson(gjson, tooltip=folium.features.GeoJsonTooltip(fields=['TotalDeaths']),
style_function=style_function)
map.add_child(country)
folium.LayerControl().add_to(map)
# save map as html
results =”C:\\Users\\Desktop\\map\\”
map.save(os.path.join(results, ‘Total_Deaths_by_Country.html’))
embed_map(map)

When we run the above code, the output will be:

COVID-19 Total Deaths by Country

The interactive map can be served as an HTML leaflet map on any website. It can be accessed here. Please note that there are some countries without data on the map. It indicates that some countries and the COVID-19 dataset were not merged correctly due to naming discrepancies between country data and COVID19 data. It seems the dataframe needs more cleaning. As COVID-19 datasets [3] become more accessible, we can use free open datasets in a clean format instead of web scraping and data cleaning. The above python scripts can be tested with Our World in Data (cumulative data), and some other available COVID-19 datasets. For example:

covid= pd.read_csv(‘https://covid.ourworldindata.org/data/owid-covid-data.csv') 
# filter certain country
covid_ca =covid[covid.location =='Canada'].sort_values(['total_cases'],ascending=False)
covid_ca.head(10)
COVID-19 in Canada

The dataframe above (Our World in Data) includes the number of cumulative cases of coronavirus (COVID-19) worldwide for mapping. Interactive maps can be created easily by using a powerful open source geospatial visualization tool kepler.gl. For example:

# keep wanted columns only
covid_gdp=covid_gdp[[‘COUNTRY’, ‘TotalCases’, ‘TotalDeaths’,’NewDeaths’, ‘TotalRecovered’, ‘ActiveCases’,’Serious_Critical’, ‘geometry’ ]]
import keplergl
corona = keplergl.KeplerGl(height=500)
corona.add_data(data=covid_gdp, name='Covid_19')
corona.save_to_html(file_name='.\covid19_map.html')
corona
kepler.gl Interactive Map

The above map can be accessed here. kepler.gl is capable of processing large scale geospatial data to create beautiful visualizations. It can handle a variety of maps, such as points, networks, choropleth, cluster map, heat map, time series maps and can be configured easily. Now that we have some ideas for mapping using COVID-19 data. It’s time to put this into practice.

In this tutorial, we learned web scraping, data wrangling, plotting, and creating choropleth interactive maps by using python libraries Matplotlib, Geopandas, Pandas, Numpy, Folium, hvplot and kepler.gl . Web scraping and data cleaning rules (scripts) depend on your project. As we can see, different types of data will require different types of cleaning. Obviously, no matter how much we scrub the data, it will never be perfectly clean.

I hope you enjoyed the tutorial. Please don’t hesitate to write comments and questions.

References:

  1. https://www.worldometers.info/coronavirus/
  2. https://ourworldindata.org/coronavirus-source-data
  3. https://mdl.library.utoronto.ca/covid-19/data
  4. https://srome.github.io/Parsing-HTML-Tables-in-Python-with-BeautifulSoup-and-pandas/
  5. https://geopandas.org/
  6. Countries WGS84
  7. https://python-visualization.github.io/folium/quickstart.html
  8. https://github.com/python-visualization/folium
  9. https://hvplot.holoviz.org/user_guide/Customization.html
  10. https://kepler.gl/

Senior Geospatial Specialist in Toronto

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store