Stormwater systems serve as critical urban drainage infrastructure, maintaining municipal functionality and playing key hydrological and environmental roles. Yet, stormwater outlets are not designed for stormflow monitoring, making traditional contact-based sensors unsuitable for gauging highly variable and turbulent flows.
To address this, we propose a computer vision approach that quantifies discharge from near-range images and videos of the outlet. Given the challenges of applying traditional computer vision in natural environments with variable lighting and environmental noise, our method employs a combination of machine learning (Mask R-CNN and YOLO8 models) and computer vision (CV-ML) techniques, leveraging the recognizable geometrical shape of culverts in otherwise very noisy images.
Specifically, water stage is obtained through subtracting the extracted shape of the round culvert from the height of empty area above water. Subsequently, image-based measurements are transformed into real-world units by applying a homography transformation calibrated using a checkerboard reference object placed parallel to the outlet.

Evaluation on a culvert of known dimensions demonstrated ± 1 cm accuracy on water stage for approximately 80% of general measurements. For water stages evaluated under calm flows, the method estimated more than 80% of stages within ± 0.5 cm, and under turbulent flows, the method estimated 63% of values within ± 1 cm (96% within ± 2 cm).
These results show great promise in the use of image-based techniques in difficult conditions where no traditional techniques are applicable. There also are the prerequisite for estimations of discharge, which remains the focus of ongoing development under subsequent NCDOT-supported research.
Metrology of the system conducted in the lab, showed that under well-lit conditions, the practical distance at which the majority of measurements had an error range within ±1 cm was found to be 8 meters for the Mask R-CNN model (corresponding to an object pixel resolution of 0.2 cm/px) and 6.5 meters for the YOLOv8 model (corresponding to an object pixel resolution of 0.15 cm/px). However, under dark conditions, practical distances were smaller. The YOLOv8 model also showed greater susceptibility to errors caused by the lighting pattern from the camera over the outlet edges.