This is the repository for a keypoint detection machine learning model that accurately measures snowdepth from timelapse imagery of snowpoles. The model in this repository is a finetuned version of the model originally created by Breen et al. (2024). The finetuned version seeks to accurately predict the snowdepth from images taken during the day and night. The images used for this finetuning were taken in the Sleeper's River Research Watershed in Danville, VT. This model was finetuned on only 95 images, so the accuracy is still being improved as of December 2024.
Navigating This Repository
To give credit where credit is due, the instructions on how to finetune this model are available on Catherine Breen's Github: https://github.com/catherine-m-breen/snowpoles All of Breen's original folders and provided code are in this repository in addition to the finetuned code.
This document will go over how to approach finetuning Breen's model using Google Colab and ImageJ software. There isn't specific hardware that needs to be used, a CPU is totally fine.
Before Coding
Before beginning to code, you will need to set up a folder in your Google Drive that contains all of the photos you would like to finetune the model with. Additionally, you'll want to create a copy of Breen's GitHub repository (link above) to make your edits.
Initial Set Up
In a new Google Colab file, you'll want to import your Google Drive and cloned repository in addition to Breen's python environment and 'demo' version of the original model. This model is stored in the 'src' folder, which contains the code written by Breen.
!pip install -q google
import google.colab
from google.colab import drive
drive.mount('/content/drive')
!git clone [LINK TO CLONED REPOSITORY]
%cd CLONEDREPOSITORY
!conda env update -f environment.yml
!conda activate snowkeypoint
!python src/demo.py
Preprocessing to Retrain
Retraining the model includes 4 steps: Labeling your photos, finetuning the model with the train.py module, predicting the pole lengths using predict.py, and optionally using depth_conversion.py to get the snowdepth. The predict.py module will also give you the snowdepth, and for this initial finetuning I did not use the depth_conversion.py module.
Renaming Photos & Labeling The rename_photos.py module checks to make sure the image filenames are in the appropriate format for the model and updates them if not. It assumes that the photos are in a folder labeled with the camera name, although this is not important if all photos are from the same camera.
Labeling your photos in Google Colab will be a little bit different than what is described in Breen's README document. The labeling.py module uses the ginput function, which is interactive and not available on Google Colab. Instead, use ImageJ (https://ij.imjoy.io/) to find the pixel coordinates of the top and bottom of the snowpole in each image. These pixel coordinates should be put in a .csv file with the filename, datetime. You should make another .csv file with the filename, datetime, and actual snowdepth. Upload the .csvs into your google drive, and later into the labeling.py module.
In the Google Colab file where you set your working directory to your repository: !python preprocess/rename_photos.py !python code/editlabeling.py --datapath [/content/drive/MyDrive/IMAGEFOLDER] --pole_length [POLE LENGTH IN CM] --subset_to_label [# BETWEEN LABELED IMAGES]
This is the code that allows you to import a .csv file from ImageJ with the pixel coordinates into the editlabeling.py module: coords_df = pd.read_csv('/content/drive/MyDrive/[IMAGEJ PIXEL COORDINATES.csv') Later in the main argument, after '#loop to label nth photo!': lengths_df = None if args.length_file: lengths_df=pd.read_csv(args.lengths_file) This will set up loading in the second .csv file with the actual snowdepths, which will be input later in the script:
if lengths_df is not None:
length_row = lengths_df[lengths_df['filename']== base_filename]
else:
length_row= pd.read_csv('/content/drive/MyDrive/[ACTUAL SNOWDEPTHS].csv')
After making these changes and running the module, you should get a labels.csv and a metadata.csv.
Training the Model
In the original Google Colab file, you're going to run the command:
!python Code/updatetrain.py
However, in order for updatetrain.py to run, you're going to need to make edits to updateconfig.py, updateutils.py, and updatedataset.py modules. The edits in these modules will simply consist of putting the google drive links for your image directory/folder, the folder where you would like the trained model objects to be stored, and the links for your metadata and labels .csv files created by the editlabeling.py module. This will make sure that you are properly retraining the model with your data, and will split and create a training and validation dataset. These will be stored as .csv files in whatever folder you specify them to be in.
After making your updates, be sure that the new module names (if changed) are reflected in the updatetrain.py module. It's important as well to have copies of model.py and downloadmodel.py in your google drive so that they can be accessed by the other modules if necessary. Updatetrain.py will ultimately create the model.pth and modelepoch0.pth along with the training and validation loss figure, which will be stored in your model output folder.
Snowdepth Predictions
The last part of finetuning the model is running the predictions! This will be done by updatepredict.py
In your main Google Colab file:
!python Code/updatepredict.py \
--model_path /content/drive/MyDrive/[FOLDER WHERE MODEL IS]/model.pth \
--img_dir /content/drive/MyDrive/[IMAGE FOLDER]
--metadata /content/drive/MyDrive/[FOLDER WHERE METADATA IS]/pole_metadata.csv
Due to this initial finetuning being run in Google Colab where there is a limited amount of RAM and memory usage, a quantized model had to be run. This is reflected in the updatepredict.py module:
quantized_model = torch.quantization.quantize_dynamic(
model, {torch.nn.Linear}, dtype=torch.qint8)
quantized_model.eval()
return quantized_model
Be sure to update links to your image directory and other modules as well. This module will produce a results.csv that will give you the predicted snowdepths from your validation set. This can be easily combined in a separate .csv with the actual snowdepths that can be used for statistics and figure creation.
As the finetuned model is not complete and the original model is not mine, merge requests will not be considered at this time. There is no specific license for this model, and the credit for the creation of this model should be attributed to Catherine Breen (et al.).
Introduction/Overview
Snowdepth measurement is extremely important, as snowdepth is linked to other snow properties and decisions regarding water resource management. One of this properties is Snow Water Equivalent (SWE), which is the amount of available water in a snowpack if it melted entirely. SWE is important to measure because snow and snowmelt play a large role in the hydrological cycle for many areas of the world (Hill et al. 2019). For example, if there is a high amount of SWE across a large area, the melting of the snow could possibly cause flooding due to the amount of water that would be released with that melt. However, SWE is hard to measure and sometimes sensitive to the method of measurement as well (Hill et al. 2019). Snowdepth is an easy to measure and clear property of a snowpack, and many studies have found that there is a relationship between snowdpeth and SWE. Therefore, measuring snowdepth can give some insight into what the SWE of a snowpack might be.
There are many ways to measure snowdepth, but one of the most cost and time effective methods is using time-lapse photography and snowpoles. Time-lapse photography takes pictures of a snowpole over a large period, and can show what the snowdepth is based on its comparison to the height of the snowpole. However, thousands of images may be generated over an entire season, which adds an incredible amount of time to data analysis. Therefore, an automated method of labeling the snowdepth in these images is needed. Keypoint detection models, which are a type of convolutional neural network model, can define points that stand out in a single image (Breen et al. 2024). Through training a keypoint detection model to automate snowdepth extraction from snowpole time-lapse photography can improve the efficiency of analyzing large snowdepth datasets. In addition to seeing how snowdepth changes over time, creating a model like this can help to get insights about SWE at a particular site faster as compared to manual snowdepth measurements of the images.
A keypoint detection model is a type of deep learning model (Gupta et al. 2022 & Breen et al. 2024). Deep learning is a division of machine learning, and uses learned data to make choices about newly presented data (Gupta et al. 2022). One of the most common deep learning models is the Convolution Neural Network (CNN) model. CNN models are unique compared to non-deep learning models in that they don’t require human feature extraction to make choices about newly presented data, meaning that the model itself can extract features without user involvement (Gupta et al. 2022). The neural network part of the name comes from the fact that these models are trained as a replica of the human brain (Gupta et al. 2022). This is especially true in CNN models, where there are multiple layers between the input and output that perform matrix vector multiplication to break down images in order to analyze them correctly (Gupta et al. 2022). These layers can be called the convolutional layers. Additionally, there are “Max Pooling” layers where the data is reduced in dimension while still maintaining the crucial information before it gets sent off to the next layer (Gupta et al. 2022). The most common models used for snowdepth modeling are CNN keypoint detection models, and it’s through the linear algebra of the Convolutional and Max Pooling layers that keypoints can be extracted.
Very recently, a CNN keypoint detection model for snowdepth was created and published by Breen et al. (2024). They first trained the model with images from 20 sites in Colorado. They used red poles to stand out against the green vegetation, white snow, and blue sky. The cameras took 2-3 images daily between 11am-1pm. They then used the pre-trained Colorado model on 12 sites in Washington with a similar red pole setup, and took one photo every hour. However, they only used images taken at 12 pm to stay consistent with the pre-trained model, along with the fact that previous studies found inaccurate labeling for nighttime images. They first manually found the length of the pole first (through pixel length), as to create a dataset to validate the model’s results. Then, they used two key points in each image, the top and bottom of the pole, as the “keypoint” features in the model. These values were then predicted by the model using the same pixel-measuring technique that was done manually. Their Colorado model was extremely accurate, with an R2 of 0.99 when comparing predicted depths with the actual measured depths (Breen et al. 2024). However, the Colorado model was not able to detect the top or bottom of the pole very well for the images taken in Washington. However, once the model was trained with some additional images from Washington, it performed much better. This did come at the cost of lower accuracy of snowdepths for images from Colorado (Breen et al. 2024). Breen et al. were able to successfully create a machine learning model that measures snowdepth from timelapse photography of snowpoles.
One of the past studies that found issues with poorly lit images was published in 2021 by Bongio et al. They to use time-lapse photography to measure snowdepth by using an automated procedure that focuses on differences in brightness to tell the difference between the snowpack and snow stakes in a digital image. Timelapse photography was analyzed from the Italian Alps and the Arctic boreal forest. The point of this study was to understand the proper geometric and parameter configurations when it came to the field setup, sources of errors in their automated procedure, and whether snowdepth estimations can be more accurate with stakes that have 1 cm spacing markers. The authors commented that running their algorithm several times and with different parameter combinations was the best method, as an “ensemble” mean snowdepth would be more accurate (Bongio et al. 2021). However, they also commented on the failure of the algorithm when atmospheric conditions are bad, and the many corrections that need to be made to measure snowdepth from photos taken during poor atmospheric conditions. In the Arctic boreal forest, their procedure was somewhat successful. Over the course of four years, the automated estimated measurements were close to manually measured snowdepths for two of the years. The manually measured snowdepths were taken by an ultrasonic snowdepth sensor, and validated the data taken by the timelapse photography. However, the ultrasonic snowdepth sensor had a poor resolution, resulting in only 105 values that were measured by both the sensor and time depth photography for comparison (Bongio et al. 2021). The two sites in the Italian Alps were not as successful in accurately measuring snowdepth, in some cases due to poor atmospheric conditions and others due to poor markers on the snow stake itself. The authors concluded that there should be 0.01 m spacing on the stakes in order to account for snowpack alteration due to human or animal footprints (Bongio et al. 2021). They also stress a camera with high resolution and to position the snow stake at a maximum of 10m away from the camera in order to have more accurate results.
Keypoint detection models in other disciplines have already attempted to account for poor lighting. Petrakis & Partsinevelos (2023) explored how keypoint detection could be used in unstructured environments and planetary images with varying illumination. This would better allow for vehicles like rovers to identify keypoints in different environments. Their dataset includes images from Earth, Mars, and the Moon. Instead of a CNN, they used an architecture called HF-net2, which utilizes a multi-teacher-student architecture. Teacher-student architecture uses a ‘teacher’ model, in this case SuperPoint for keypoint detection and NetVLAD for global description in order to train a ‘student model’ that can do both. Their final goal was to create a ‘student model’ SLAM system, which would navigate unstructured environments without good visual cues and intense lighting conditions. The student model was trained on the images from Earth, Mars, and the Moon with varying light and weather conditions. The model was had a mean Average Precision (mAP) of 0.95 for keypoint description, which outperforms similar algorithms in this discipline. Additionally, the model had an RMSE error two times lower in areas with poor features and low illumination as compared to a prior model. Although this is quite different from snowdepth keypoint detection, the usage of a ‘teacher-student’ architecture could be useful in future finetuning or even creating a new model from scratch. If a current model is unable to be finetuned to fit images taken at night, perhaps it can act as a teacher model along with another model that focuses specifically on feature extraction in poor lighting conditions.
Some scientists have tried to create models that directly give insights about SWE based on snowdepth. Odry et al. (2020) proposes that the usage of an artificial neural network model can be able to outperform other regression models for estimating SWE based on snowdepth measurements and available meteorological data. The study consisted of three experiments: the first focused on developing the multilayer perceptron (MLP) model, the second on testing how the model responded to new data as compared to linear regression models, and the third on testing the different models in near-operational conditions. The results of their first experiment showed that the Root Mean Squared Error (RMSE) decreased with increased numbers of input variables (ie average air temp, snowdepth, day of year etc.) to the MLP (Odry et al. 2020). When comparing the model to the linear regression model using new data, the MLP shows more realistic results for higher SWE values as compared to the linear regression model. However, both models were unable to accurately estimate lower values of the depth-SWE relationship, pointing to a limitation of the MLP model. When testing the models in near-operational conditions, the MLP model once again outperformed the others, having the lowest RMSE values for an entire season of snowdepth data (Odry et al. 2020. This paper poses a really interesting idea of estimating SWE without having snow density data. Although this is not related to the main purpose of this project, if I am able to successfully retrain the model to accommodate images taken during the day and night, directly incorporating SWE into the model would be the next step. The goal of this project is to create a ‘first copy’ of a snowdepth keypoint detection model that is able to accurately predict snowdepth during both the day and night. Trying to gain insight into SWE with snowdepth measurements only taken during the day may not show the entire relationship between the two for a certain area. Additionally, some cold environments do not get much daylight during the winter season, so in order to establish a relationship at all, snowdepth at night needs to be analyzed. This will be explored through fine-tuning the most recent model created by Breen et al. (2024) to a dataset of timelapse photography that is taken both at night and during the day. I predict that the initial fine-tuning of this model will not yield results as accurate at those presented by Breen et al. (2024). Due to the fact that the original model used only images taken during the day, initially finetuning the model with a small mix of day and night photos may not yield accurate results for the images taken at night.
Data and Methods
The data used in this machine learning model are photos taken from time-lapse photography of snowpoles at different sites across the Sleeper’s River Research Watershed in Danville, Vermont. Each site contains one snowpole that is photographed hourly. For this initial model, daily photos from 1:00 PM and 1:00 AM were used at each site. The snowpoles have measurement markers on them at every 0.1 cm, and photos were taken using a WingScapes trail camera. The code for the model originally described in Breen et al. (2024) is an open-access code and is the first and only version available. The uploads in all of the folders are from July 2024. There is code for a trained and untrained model, and for the purpose of this project, I will be using the trained version and attempting transfer learning between the original model and my model. The GitHub repository to the original dataset can be found through this link: https://github.com/catherine-m-breen/snowpoles.
The photos will be stored in a standardized format (.JPG), and labeled based on site and pixel coordinates of the top and bottom of the snowpole. The dataset will be manually inspected and photos, where the snowpole is completely unable to be seen will be removed. The methods for data preprocessing/processing will closely mirror those of Breen et al. (2024), however I will be using ImageJ as opposed to ginput to manually inspect the images. This is due to using Google Colab as the virtual machine for this model as opposed to the one described in the paper. Additionally, the photos can be located in a folder in my personal Google Drive so that it can be easily compatible with the Google Colab format.
The heights of all the poles are consistent (168cm), so the snowdepth will be calculated by subtracting the height of the pole in the picture from the original 168cm. In the model, each photo will be resized to 224 x 224 pixels. Since I am using transfer learning to retrain the model on my images, I will be resizing the images to the exact dimensions used in Breen et al..The top and bottom of the poll will be predicted using the ResNet-50 CNN model from the “pytorch” Python package. Manual measurements of the snowdepth were already recorded by a previous student in the Hydroclimatology lab.
In this model, the pixel coordinates of the top and bottom of the snowpole are the main predictors, and the snowdepth is the main predictand. The height of the snowpole could be thought of as a predictor as well, but because it remains consistent throughout the photos, unlike the pixel coordinates, it isn’t the main predictor. The pixel coordinates and actual snow depth will be stored as metadata used for training and validation, as well as the creation of a labels.csv. The data was split into either a training (76 images) or validation group (19 images) after the height of each snowpole was manually measured. By predicting the pixel coordinates of the top and bottom of the visible snowpole with the given total snowpole height, the model can convert the pixel distance between the keypoints into centimeters. If there is a snowpack present, this distance will be subtracted from the initially recorded height of the snowpole to determine the snowdepth present in the photo.
The model that will be used is a keypoint detection Convolution Neural Network (CNN). This model interprets the photos from timelapse photography individually, which are represented as a tensor. The model has 48 convolutional layers and 2 MaxPool layers, which are filters that perform matrix calculations (convolutional layers) and can take the maximum value of the region of overlap between those filters and the input data (MaxPool layers). These statistical “layers” are able to extract the pixel coordinates of the top and bottom of the snowpole to calculate the height of the visible snowpole, and then the height of the snowpack if one is present.
The validity of the model will be evaluated by comparing the predicted snowdepth to the manually measured snowdepth. This will be done by calculating the residual error, mean absolute error (MAE), and coefficient of determination (R2) between the values. The MAE will be calculated using the “sklearn” Python package, while the coefficient of determination will be calculated using the corrcoeff function in the “numpy” package. These parameters were used by Breen et al., as well as in at least three other studies of snowdepth time series models. I will also be looking at the fitting of the model based on its residual errors between images taken during the day and at night. In addition to this, I will also be analyzing how the model predicts the snowdepth at different depth ‘groups’ so that it will be clear if the model is accurate certain depths, but not others.
Results In order to judge the accuracy of the model, the Mean Residual Error, R2 value, Correlation Coefficient, and Mean Absolute Error (MAE) were calculated. The results of these calculations were put in Table 1 as seen below. Mean Residual Error was 6.07 cm, which means that on average the predicted values of snowdepth were 6.07cm less than the actual snowdepth. The R2 value was extremely low (0.11) and the correlation coefficient was -0.34. The R2 shows a weak relationship between predicted snowdepths and actual snowdepths. The slightly negative correlation coefficient implies an inverse relationship between predicted and actual snowdepths, furthering the evidence for an inaccurate model. The mean absolute error was around 11.35cm, which means that the average predicted snowdepth was not close at all to the actual snowdepth.
Mean Residual Error (cm) R-Squared Correlation Coefficient Mean Absolute Error (MAE) (cm) 6.078853047 0.1182545518 -0.3438815955 11.35666984
Table 1. Statistics table for the overall model, including mean residual error, R-squared value, correlation coefficient, and mean absolute error (MAE). All values were calculated based on differences between predicted and measured snowdepth (cm).
| Mean Residual Error (cm) | R-Squared | Correlation Coefficient | Mean Absolute Error (MAE) (cm) |
|---|---|---|---|
| 6.08 | 0.11 | -0.34 | 11.4 |
The line graph in Figure 1 shows how train loss and validation loss changed over time during model training. Although both showed a general decrease over the twenty epoch runs, validation loss began to increase after ten epochs, showing fitting issues with the model. The gap between train loss and validation loss stayed somewhat large throughout the whole training. This gap that the model struggled to predict the snowdepths in the validation set and relied heavily on the training set.
Figure 1. Train loss and validation loss during model training. The updatemodel.py module generated a line graph showing how train loss and validation loss changed over the 20 epoch runs during the model training process. Validation loss is shown by the pink line, while train loss is shown by the yellow line.
To take a closer look at residual errors, a box and whisker plot of residual error based on snowdepth grouping and separated by whether the image was taken at day or night was constructed. This is visualized in Figure 2. Residual errors were similar between images taken during the daytime and nighttime for snowdepth groupings. However, residual errors were lowest for images with a snowdepth between 3-7cm and 7-9cm. On the other hand, images with snowdepths of 0-3cm had the largest range of residual errors. Images with snowdepths of 9-12 and 12-25cm also had larger residual errors, but with less of a range than that of the 0-3cm images.
Figure 2. Box and whisker plot of residual error (cm) based on snowdepth group and separated by lighting conditions. Snow depth groupings were assigned through creating bins for each group. Images taken during the day are in orange, while those taken at night are in blue.
Discussion and Conclusions
The results shown above are definitely those of an initial model. Compared to the R2 of 0.99 for the original model, the R2 of 0.12 showed that the finetuned model overall did not fit the data and did not predict the snowdepths well (Table 1)(Breen et al. 2024). Additionally, the finetuned model had an MAE of 11.4cm, which is incredibly large on its own and approximately ten times larger than the MAE of the original model (1.14cm) (Breen et al. 2024). This large MAE confirms that the finetuned model was not accurately predicting the snowdepths regardless of whether the images were taken at night or during the day. However, this could be entirely due to the small sample size that was used to finetune the model. If the sample size was too small, the model could have relied heavily on its training set to predict snowdepth. The training set for the finetuned model was 76 images and the validation set was 19, which is an extremely small sample to try to finetune the model with. Additionally, compared to the number of daytime photos in the original model and the daytime photos from the training data, there were simply not enough images taken at night or even overall to truly finetune the model accurately.
This is further demonstrated by Figure 1, which shows an increase in validation loss after 10 epochs. This increase shows overfitting of the model and fits with the large MAE and low R2 value. The train loss decreased over all 20 epochs, however this just further confirms that the model overfits the data when put together with the increase in validation loss. Additionally, a validation set of 19 images means that a poor prediction for just a few photos will largely affect the accuracy of the model. For this initial model, this makes sense and was expected. In the future, running the model for only 10 epochs could be one useful strategy in preventing overfitting of the data in addition to a larger sample size. The gap between train loss and validation loss is consistently large through all 20 epochs, which would also imply that early stopping may need to be employed earlier, most likely around the 10th epoch. With the small sample size and clear overfitting, it is not possible to finetune the model to predict the snowdepths of images taken at night or during the daytime.
Although there were several issues with the finetuning process, one aspect that prior literature has not considered is the way that images of snowpoles are captured at night. In Figure 3, there is no extreme difference between the residual error of the snowdepths predicted during the day vs at night in any snowdepth group. These results don’t match those of prior literature, and point to the possible solutions for better capturing nighttime images. It is possible that using a trail camera with flash may be able to take an easier to process photo at night, like the WingScape trail cameras have. The original model created by Breen et al. (2024) avoided timelapse photography taken at night. Bongio et al. (2021) found that images taken during “poor atmospheric conditions” were not accurately processed by their model. However, neither study included whether the cameras used were equipped with flash, or if another external light source could be used to illuminate the snowpole better at night. In the future, studies could focus on only images taken at night with different sources of illumination (ie camera flash, streetlight in urban areas) to better understand how to get accurate images for keypoint detection models.
Prior literature has also not looked at the residual error or fitting of their models based on snowdepth groupings, which I find to be an incredibly important aspect that needs to be further studied. Looking at the residual error of the model based on the snowdepth groupings could show what kind of data the model might be missing, assuming that the large residual error might be from a lack of training data in that snowdepth group. For example, in Figure 3 the residual error for images with an actual snow depth of 3-7 and 7-9 cm is close to zero. This means that the model predicted snowdepths that were very similar to the actual depths. This makes sense, as most of the actual snowdepths were between 3 and 9 cm. However, there weren’t many images with actual depths between 0-3cm and 12-25cm. The residual error for these groups was much larger. When selecting data to continue finetuning the model, this could be used to see what snowdepths need to be included in the training and validation sets. In this case, if I were to further finetune this model I would experiment with using more images with actual snowdepths between 0-3 and 12-25cm to see if that lowers the residual error for both that snowdepth group and for the overall model.
There were a few caveats that could have affected the accuracy of this model. Breen et al. (2024) took out tilted photos and had a singular red pole in each photo. The photos in this dataset were all taken at a three-quarter angle, and there were multiple white poles (some for collecting temperature not directly related to the experiment) in the shot. This could have easily confused the model, as a keypoint detection model only looks for the top and bottom of one white pole. Even though the entire training set for finetuning had images with these conditions, it’s likely that the model could have had issues with identifying which pole was the one to select the keypoints from. Additionally, the results of this initial finetuning were also heavily influenced by the small sample size of the photos used. The original model by Breen et al. (2024) was trained on over 8000 photos, while this fine-tuned model only used 95 images. This small sample size probably wasn’t enough to fully retrain the model to estimate snowdepth at both day and night. This is evident through the relatively high validation loss in Figure 1, showing poor performance on the new data as the model was originally trained on so many more photos than were provided during the fine-tuning.
While completing this project, one of the largest caveats was the accessibility of Google Colab to create a keypoint detection machine learning model and properly run the code. Google Colab has a limited amount of RAM that is free for users, and while I was running updatepredict.py, my module crashed several times due to my free RAM being used up. In order to work around this issue, I had to use a quantized model in order to fully run the module and get my predicted values. Quantized models are commonly used to reduce RAM and memory usage, however they can also cause a decrease in the accuracy of the model. Considering that this fine-tuned model was based on such a small sample size, this decrease in accuracy may have had a larger effect on the residual errors than it would have been for a model trained on more images. Keypoint detection models described in prior literature are trained on advanced virtual machines that are frequently used for robust machine learning models. These advanced virtual machines most likely do not have the same memory and RAM limitations, which would allow for a non-quantized model to run. However that was not available to me at this time, so the initial model had to be done on Google Colab. In the future, this model can definitely be improved upon, and a large part of that will be recoding the modules on a better virtual machine.
This initial model doesn’t exactly answer the question of whether the original model can be retrained to accurately predict snowdepth at night due to the amount and nature of the limitations of the study. Using a more robust virtual machine that could handle several thousand images, as compared to 95, would clarify the question much more. The model was not well fitted for most of the data, as seen by the residual errors in Figure 2 and the overall statistics in Table 1. Based on the combination of these statistics and the limitations of the study, there is no clear answer of whether the model can be retrained to accurately predict snowdepths at night. This might have also been due to the usage of a quantized model, which would not need to be run in a better virtual machine. However, similar residual errors between daytime and nighttime images across all snowdepth groups shows that both day and night images were equally poorly fitted to the model (Figure 2). If nighttime images weren’t able to be accurately predicted at all, nightime images would have a larger residual error across all snowdepth groups. That isn’t the case here, which gives me a little hope that the model could be better finetuned to find nighttime snowdepths.
This project was intended to be the ‘first copy’ of a snowdepth keypoint detection model using 24 hour timelapse photography, as prior studies have either ignored or had issues with using nighttime images in their models. When considering the important relationship between snowdepth and properties such as SWE, accurate measurements during all times of day are important. This keypoint detection model was based on an original model created by Breen et al. (2024). This was a 50 layer convolutional neural network (CNN) model that was finetuned to predict snowdepths based on snowpole imagery taken during the day and night. The model was also finetuned using Google Colab, an accessible but less robust virtual machine. As expected, the research question of whether this original model can be retrained to accurately detect keypoints of snowpoles at different times of day is not clear. The model was overall not well fit for the data. The sample size of 95 images is just too small to assume that these statistics would be similar in a finetuned model with a larger sample size. However, seeing similar residual errors between snowdepth groups of images taken at both day and night gives promise that this consistency in fit might continue if the model is retrained on a larger set of images. Additionally, not running a quantized version of the model could prove to show more accurate model fitting. With a larger sample size, a better virtual machine, and a little bit more time, this model could be retrained to detect accurate keypoints of snowpoles both day and night across all snowdepths.
References
Bongio, M., Arslan, A. N., Tanis, C. M., & De Michele, C. (2021). Snow depth time series retrieval by time-lapse photography: Finnish and Italian case studies. The Cryosphere, 15(1), 369–387. https://doi.org/10.5194/tc-15-369-2021
Breen, C. M., Currier, W. R., Vuyovich, C., Miao, Z., & Prugh, L. R. (2024). Snow Depth Extraction From Time‐Lapse Imagery Using a Keypoint Deep Learning Model. Water Resources Research, 60(7), e2023WR036682. https://doi.org/10.1029/2023WR036682
Gupta, J., Pathak, S., & Kumar, G. (2022). Deep Learning (CNN) and Transfer Learning: A Review. Journal of Physics: Conference Series, 2273(1), 012029. https://doi.org/10.1088/1742-6596/2273/1/012029
Hill, D. F., Burakowski, E. A., Crumley, R. L., Keon, J., Hu, J. M., Arendt, A. A., Wikstrom Jones, K., and Wolken, G. J. (2019). Converting snow depth to snow water equivalent using climatological variables. The Cryosphere, 13, 1767–1784. https://doi.org/10.5194/tc-13-1767-2019, 2019.
Odry, J., Boucher, M. A., Cantet, P., Lachance-Cloutier, S., Turcotte, R., & St-Louis, P. Y. (2020). Using artificial neural networks to estimate snow water equivalent from snowdepth. Canadian Water Resources Journal / Revue Canadienne Des Ressources Hydriques, 45(3), 252–268. https://doi.org/10.1080/07011784.2020.1796817
Petrakis, G., & Partsinevelos, P. (2023). Keypoint Detection and Description through Deep Learning in Unstructured Environments. Robotics, 12(5), 137. https://doi.org/10.3390/robotics12050137