This blog post details our approach to the AI for Earthquake Response Challenge, organized by the ESA Φ-lab and the International Charter "Space and Major Disasters".
We secured 2nd place among over 250 teams by developing AI models capable of accurately detecting building damage from satellite imagery.
Challenge Overview
When an earthquake strikes, every minute counts. Rapid damage assessments can guide emergency responders to
the hardest-hit areas, prioritize resources, and ultimately save lives. But even with access to
satellite imagery, the task of mapping damaged buildings is still largely manual, and that takes time.
This is the problem that the AI for Earthquake Response Challenge, organized by the ESA Φ-lab and
the International Charter “Space and Major Disasters” set out to address.
The mission: build AI models capable of automatically detecting building damage from
satellite imagery accurately and at scale.

Post-event Pleiades image over Latakia, Syria with building polygons overlaid
(categorized by damage field, green = undamaged, grey = damaged) © CNES 2023, distribution Airbus DS.
The competition was structured in two phases to reflect both research
and real-world emergency response conditions:
-
Phase 1 (8 weeks) – We received pre- and post-event satellite images for 12 earthquake-affected sites.
Each site came with building footprint polygons, and most also included building-level damage labels.
The task was framed as a binary classification problem, where each building was labeled as either
0 (undamaged) or 1 (damaged). However, the setup varied:
- 7 sites were fully annotated and used for training.
- 4 sites - Adiyaman (Turkey), Antakya East (Turkey), Marash (Turkey), and Cogo (China) - were partially annotated, with some buildings labeled and others left unlabeled. The labeled portion could be used for training, while the unlabeled buildings were evaluated through the live leaderboard, where our predictions were scored against hidden ground truth. The scoring on the leaderboard was based on the F-score metric, which balances precision and recall. For this phase, we could submit predictions every 12 hours to the live leaderboard to see how our models performed.
- 1 site only included post-event imagery.
-
Phase 2 (10 days) – This was the real stress test. We were given pre- and post-event imagery for
2 completely new earthquake sites. Unlike Phase 1, no labels were provided—only the building polygons.
We had to generate predictions directly, without retraining on the new data. Again, the scoring was done via a live leaderboard using the F-score metric.
This phase tested whether our models could generalize and remain accurate. For this phase, we could submit predictions every 24 hours to the live leaderboard to avoid overfitting.
More than 250 teams participated, ranging from academic researchers to industry professionals.
Our Approach
To tackle the challenge, we iterated step by step, starting simple and gradually incorporating more complexity as we understood the dataset better.
Data Exploration
The first step was to explore the dataset in detail. We:
- Plotted the building annotations on the satellite imagery to visually confirm alignment.
- Worked with GeoPandas and Rasterio to handle building polygons and raster data.
- Discovered that the dataset was imbalanced, with far more undamaged buildings than damaged ones.

Data exploration showing class imbalance (more undamaged buildings than damaged ones).
- Realized that if we wanted to use both pre- and post-event images effectively, we needed to perform image registration. Building annotations were provided only for the post-event imagery, and the pre-event images sometimes came from different satellites with slightly different angles, resolutions, or positions. Image registration adjusts one image so that it spatially aligns with another, ensuring that each building polygon matches the correct location in both images. More details on our registration approach are provided in the next section.
This early exploration shaped our modeling strategy.
First Model: Post-Event Only
As a baseline, we trained a model using only the post-event imagery. To do so, we extracted patches around building footprints.

Example of building patches extracted from post-event imagery for training the first model.
We then used fastai with a ResNet34 backbone and the fit_one_cycle training strategy. The task was framed as binary classification: 0 = undamaged, 1 = damaged.
We made a few iterations and the best results were achieved with data augmentation that wielded the following training results:

Training results of our first model using only post-event imagery (F-score Valid = 0.89 at epoch 13).
This simple setup already provided competitive scores on the leaderboard.
| SCORE | ADIYAMAN_F1 | ANTAKYA_EAST_F1 | CHINA_F1 | MARASH_F1 |
|---|
| Training Post Images Only | 0.80392 | 0.933 | 0.734 | 0.819 | 0.787 |
Second Model: Pre- and Post-Event with Siamese Network
To leverage both pre- and post-event imagery, we trained a Siamese network.
A Siamese network is a type of neural network architecture that takes two inputs and learns to compare them. In our case, the inputs were pre-event and post-event building image patches.
Image Registration
Before feeding the images into the network, we performed feature-based image registration.
As explained in the challenge overview, building annotations were only provided for the post-event imagery,
and pre-event images sometimes came from different satellites with slightly different angles,
resolutions, or positions. Feature-based registration detects and describes distinctive key points in both images,
matches them, and computes a transformation to align the pre-event image to the post-event one.
For keypoint detection and description, we relied on modern deep-learning-based local feature extractors,
specifically SuperPoint and DISK. We evaluated both methods qualitatively on the dataset and selected
whichever produced more reliable and consistent correspondences for a given image pair. The resulting matches
were then refined using RANSAC (Random Sample Consensus) to remove outliers and estimate a geometric
transformation. This transformation was applied to warp the post-event buildings polygons onto the pre-event image.
This process ensured that each building polygon matched the same location in both images,
allowing the Siamese network to accurately detect changes.

Example of registered pre- and post-event images.
Siamese Network Training