We wanted to build a tool that optimises cycling around Berlin by avoiding cobblestones and giving preference to safe bike paths. We do this by using up-to-date street view images of Berlin from Mapillary.com to classify streets based on their surface and safety. Find a presentation of this project here, the slide deck and a Kaggle tutorial including the dataset here.
Cycling is one of the most popular forms of transport in the city, however, there are over 6,000 reported bicycle accidents every year. This excludes all minor, unreported accidents that people have every day. Berliners also battle daily with cobbled streets, which are uncomfortable and can be extremely dangerous in wet conditions. However, initiatives like pop-up bike lanes increased bicyle traffic by 25% since their introduction. Pop-up bike lanes are segregated cycle paths that often take the place of a lane of traffic and many such lanes have become permanent fixtures. Safety and comfort while riding a bike remain important factors for people, increasing both would see even more people cycling.
We therefore set out to automate the detection of bike lanes and identification of road surfaces using computer vision models and classification. Using street view images from Mapillary.com, we were able to access over 10 million recent pictures of Berlin (in lieu of Google street view images).
The first step of our pipeline was to identify whether there was a bike lane present in any given image taken from Mapillary. We did this by performing semantic segmentation. We used a Mask2Former vision transformer that was pretrained on the Mapillary Vistas dataset.
Because bike lanes are visually diverse, we applied an unsupervised learning approach to cluster bike lane images based on their visual features in order to categorize them. Those clusters were then visually inspected and labelled based on what bike lanes they contained. Based on our clustering results, new bike lane images will be soft labelled by assigning them to the cluster to which (medoid) they have the highest cosine similarity.
The clustering is performed in 4 steps.
- Extraction of embeddings from DINOv2 S14 in s01_extract_embeddings.ipynb
- The elbow method is used to investigate the optimal number of clusters to explore in s02_elbow_4_kmedoids.ipynb
- Perform k-medoids clustering with 2 clusters in s03_kmedoids_clustering.ipynb
- Visualize the clustering results using t-SNE in s04_tSNE_visualization.ipynb
If, however, no bike lane was initially detected, we would run the image through a small EfficientNetV2. We trained the model further on around 450 handlabelled images of both asphalt and cobblestone. The result was a model that gave us a binary output of either cobblestones or asphalt.
Following this, we then had an output label for every road in Berlin. We then wanted to put this label onto a map with an assigned weight, according to the road surface and level of safety. This was achieved by converting an OSM map of berlin into a NetworkX graph and matching the coordinates of each label with the coordinates of each edge.
