Geospatial Big Data Processing with Python: Detecting Green Roofs in Toronto

Ablajan Sulaiman
12 min readJan 3, 2021
Detecting Green Roofs in Toronto

Introduction

As geospatial data becomes more ubiquitous, processing geospatial big data has become an essential part of big data analytics. The amount of data is increasing at an exponential rate. Geospatial big data (2D, 3D, point cloud) processing has always been a challenge not only in the information and technology (IT) sectors but also in the geospatial domain. Efficiently handling geospatial data is essential for extracting meaningful information from big data. Big data processing techniques analyze big datasets at terabyte or even petabyte scale. In many cases, we need to use a combination of different tools and approaches to process geospatial big data. There are some useful python libraries and tools (Georasters, Gdal, Dask, Geopandas, Rasterstats, Databricks, Apache Spark…) that can be used to process large amounts of geospatial data. In this project, we will learn how to process geospatial big data (1.3 billion points) with python to detect green roofs. The project aims to identify green roofs and to explore potential green roofs in Toronto. Various approaches such as raster zonal statistics, raster point data processing, and Machine Learning (Deep Learning) are used to identify green roofs.

Detecting green roofs is a complicated process due to the impact of tree-shade on building rooftops in satellite images. The vegetation index is an indicator that describes the greenness — the relative density and health of vegetation — for each picture element, or pixel, in a satellite image. One of the most widely used indicators is the Normalized Difference Vegetation Index (NDVI). NDVI values range from +1.0 to -1.0. In most cases, NDVI values between 0.2 and 0.4 correspond to areas with sparse vegetation; moderate vegetation tends to vary between 0.4 +and 0.6; anything above 0.6 indicates the highest possible density of green leaves.

Datasets

The following datasets were used to perform NDVI data analyses.

1. Building Data: 3D Massing 2019 Shapefile (City of Toronto’s Open Data Portal)

2. Digital Globe Satellite Images (2018–07–10T16:27:14 private)

Image Processing: 1.3 Billion Points

--

--