- handreak80
- motokimura
- TheRealRoman
- mawanda-jun
- chok68
- (Graduate Student) Cuistiano
- (Undergraduate Student) pjmathematician
Each solution includes code and a description of their method.
The public training and testing data are available for download at the following links:
- Public Training: https://spacenet-dataset.s3.us-east-1.amazonaws.com/spacenet/SN9_cross-modal/train.zip
- Public Testing: https://spacenet-dataset.s3.us-east-1.amazonaws.com/spacenet/SN9_cross-modal/testpublic.zip
Swift and effective disaster response often relies on the integration and analysis of diverse remote sensing data sources such as electro-optical and Synthetic Aperture Radar (SAR). However, the co-registration of optical and SAR imagery remains a major challenge due to the inherent differences in their acquisition methods and data characteristics. SpaceNet 9 aims to address this issue by focusing on cross-modal image registration, a critical preprocessing step for disaster analysis and recovery.
Participants in this challenge will develop algorithms to compute pixel-wise spatial transformations between optical imagery and SAR imagery, specifically in earthquake-affected regions. These algorithms will be evaluated for their ability to align tie-points across modalities, enabling better downstream analytics such as damage assessment and change detection.
To support the competition, the challenge provides a dataset consisting of high-resolution optical imagery from the Maxar Open Data Program and SAR imagery from UMBRA. The dataset includes manually labeled tie-points to evaluate registration quality.
The objective of this challenge is to create algorithms that take two input images—an optical image and a SAR image—and output a two-channel transformation map. Each channel of the output image represents the shifts in the x and y directions required to align the optical image with the SAR image.
The accuracy of the output will be evaluated using tie-points, which are manually identified matching features (e.g., road intersections) between the two images. The spatial transformation predicted by the algorithm will be applied to the tie-points in the optical image, and the alignment quality will be assessed based on the distance between the transformed points and their corresponding reference points in the SAR image.
The dataset consists of three parts, intended for:
- development (training)
- public (provisional) testing during the submission phase
- final testing after the submission phase.
Each part contains several triples of files of the form (optical image, SAR image, manually labeled tie-points).
The files available for download are:
- Development:
- 02_optical_train_01.tif, 02_sar_train_01.tif, 02_tiepoints_train_01.csv
- 02_optical_train_02.tif, 02_sar_train_02.tif, 02_tiepoints_train_02.csv
- 03_optical_train_01.tif, 03_sar_train_01.tif, 03_tiepoints_train_01.csv
- Public testing:
- 02_optical_publictest.tif, 02_sar_publictest.tif
- 03_optical_publictest.tif, 03_sar_publictest.tif
The remaining files, which are hidden and will be used for evaluation, are:
- Public testing:
- 02_tiepoints_publictest.csv
- 03_tiepoints_publictest.csv
- Final testing:
- 01_optical_privatetest.tif
- 01_sar_privatetest.tif
- 01_tiepoints_privatetest.csv
The two-digit prefix in the file name corresponds to a specific area of interest (AOI) based on where the images were captured. While the development and public testing parts use two AOIs, which are split into several subimages, the final testing part contains images from a different AOI. There is a single set of three files in the final testing part. All images have a width and height of less than 13000 pixels.
Optical and SAR images are TIFF images captured by satellites. Both contain high resolution imagery with 0.3 - 0.5 meter cell size. Read the paper mentioned in the challenge overview for more details on the differences between how these images were obtained and what information they contain. These images contain embedded GeoTIFF metadata. The optical images contain 3 channels with order (red, green, blue). The SAR images contain a single channel.
The CSV files with tie-points contain a header and four columns. Each row corresponds to one manually labeled tie-point. The first and last two columns are the pixel coordinates of the tie-point in the SAR and optical image, respectively:
sar_row,sar_col,optical_row,optical_col
189,4969,430,6723
209,408,499,794
238,296,537,635
...
The algorithm must produce a TIFF file with two channels and with the same size as the input optical image. The first channel represents the x-shift and the second channel represents the y-shift of the corresponding pixel to project it at the correct location within the SAR image. The shifts must be expressed in optical image pixel coordinates, but they are computed based on transformations in geographical coordinates. The reference shifts are first determined in the geographical coordinate system and then rescaled to the optical image’s pixel grid for evaluation. See the Scoring section for details on this transformation process.
The performance of each submission will be evaluated based on the accuracy of the predicted transformation map. Let (Δx, Δy) denote the predicted pixel-wise transformation map, where Δx and Δy represent the x and y shifts, respectively, as returned by the participant’s solution output.
The evaluation will use a set of tie-points P = {(ri, ci)} for i = 1, 2, ..., N manually identified in the optical image, with corresponding reference tie-points P′ = {(r′i, c′i)} in the SAR image. That is, P is formed from the 3rd and 4th column and P’ from the 1st and 2nd column of the CSV file with manually labeled tie-points.
Sets P and P’ are first translated into geographical coordinates using the metadata contained in the input optical/SAR image (see method translate_tiepoints_to_geo_coords in the scoring script): (ri, ci) → (xi, yi), (r′i, c′i) → (x′i, y′i).
Then the reference shifts in geographical coordinates are calculated: Δxigeoref = x′i − xi, Δyigeoref = y′i − yi.
These shifts are rescaled from geographical coordinates back into optical image pixel size: Δxigeoref → Δxiref, Δyigeoref → Δyiref.
The error for each tie-point is calculated as the Euclidean distance between the predicted and reference shifts: Erri = sqrt((Δx(ri, ci) − Δxiref)2 + (Δy(ri, ci) − Δyiref)2),
where Δx(ri, ci) and Δy(ri, ci) are the predicted shifts at the tie-point coordinates (ri, ci).
The overall raw score for a submission is the mean error across all tie-points: RS = (Err1 + Err2 + ... + ErrN) / N.
For leaderboard purposes, the raw score is mapped into the interval [0, 100] using the function
Score = 100 / (1 + 0.01 ⋅ average(RS)),
where average(RS) is the average raw score across all test cases (there are two test cases in the provisional testing and one test case in the final testing). Score 100 indicates perfect performance, corresponding to exact alignment of the tie-points.
Note that only (ri, ci) coordinates present among tie-points are used for scoring. Any values of predicted shifts for coordinates which are not present among tie-points are ignored.
We provide the scoring script which you may use to test your solution locally. The same scoring script will be used for provisional and final testing.