Image, Video and Real-Time Webcam Object Detection & Instance Segmentation using Mask R-CNN

Ablajan Sulaiman
15 min readApr 17, 2020

Introduction

AI is growing fast and transforming numerous industries. Machine learning has become more popular today due to ever-increasing data volumes, advanced algorithms, and improvements in computing power and storage. Machine Learning has improved computer vision about recognition and tracking. In recent years, there has been an increase in research on object detection, image instance segmentation, video object tracking, video object detection, video semantic segmentation, and video object segmentation. In this tutorial, we will explore Mask R-CNN to understand how instance segmentation works, then implement object detection and instance segmentation in images, videos, and real-time webcam with Mask R-CNN using Keras and TensorFlow.

Object Detection

Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class in digital images and videos. Mask R-CNN[1–2] is a deep neural network aimed to solve instance segmentation in computer vision. The model generates bounding boxes and segmentation masks for each instance of an object in the image. It’s based on Feature Pyramid Network (FPN) and a ResNet101 backbone. It can be used to segment and construct pixel-wise masks for every object in an image or video with Mask R-CNN. Let’s start exploring the Mask R-CNN repository to see how it works.

--

--