Automated detection and geolocation of roadside objects are critical for effective roadway safety analysis and transportation planning, particularly in rural areas. The goal of this project was to detect and geolocate roadside objects using a videolog comprising over 43 million images of North Carolina's rural roads.
This study describes an approach for detecting and geolocating stationary roadside objects by fusing airborne LiDAR data with videolog images. While multi-modal sensor fusion has been widely studied and applied in autonomous navigation for enhanced spatial perception, to the best of our knowledge, existing methods all assume known sensor parameters and dense spatiotemporal resolution to facilitate spatiotemporal data alignment.

Filtered and integrated road edge segmentation
result (right) of an example image (left), where the filtered and integrated
road edge segmentation pixels are shown in light blue, and projected LiDAR road
edge pixels are shown in dark blue.
However, in practice, datasets may have incomplete sensor metadata and sparse spatiotemporal resolution. The study aimed to enable automated detection and geolocation of roadside objects using videolog data comprising over 43 million images of North Carolina's rural roads. The videolog lacks camera intrinsic and pose parameters, and due to temporal downsampling of the initial video capture, consecutive images are spaced 26 feet apart, and GPS coordinates must be approximated.
To address these limitations, the project team integrated airborne LiDAR data with videolog images through a novel data registration and alignment approach that estimated missing camera parameters through minimization of alignment errors between videolog road lane markings and projected LiDAR Road edges, enabling more accurate computation of object bearings in our geolocation pipeline.
The project team was able to apply this approach to detect utility poles in the roadside. This work contributes a practical and scalable solution to the often-overlooked challenge of sensor fusion with incomplete camera metadata.