GAP is a reinforcement-learning attack framework for demonstrating an architectural blind spot in UAV control pipelines. This repository contains the patched PX4 and ArduPilot firmware, GAP evaluation code, analysis scripts, and the pre-baked outputs used by the paper. The artifact bundle is distributed through Zenodo.
Zenodo DOI: 10.5281/zenodo.19652756
Given externally observable drone state, GAP generates an action intended to move the drone toward a target region. The action is a gyroscope bias vector that is added to the raw gyroscope readings before they reach the flight controller's state estimator. The biased readings perturb the controller's attitude estimate in a controlled way, which in turn causes the drone to drift in the direction GAP wants. The bias is held for one second; GAP then reads the next drone state, computes the next bias, and repeats this loop until the drone reaches the target region or the episode times out.
The artifact uses two evaluation geometries.
- Inward (outer-ring spawn, central target).
RQ1(GAP), theRQ1baselines, andRQ2place drones on a 220 m circle around a fixed central target and attack inward toward the center.RQ1(GAP) andRQ2run twelve PX4-jMAVSim workers in parallel at 4× simulation speedup; theRQ1baselines run the same geometry with a single drone per trial (sequential, not parallel). - Outward (home spawn, outer-ring target).
RQ3andRQ4place a single drone at a fixed home location and put the target at one of 12 clock-direction positions on a surrounding circle; the drone attacks outward toward that target. Simulation speedup depends on the simulator: ArduPilot SITL runs at 4×; PX4 Gazebo and the ci-detector run at 1×.
RQ6-sim uses the same inward parallel PX4-jMAVSim geometry as RQ1
(GAP) and RQ2. RQ6-real does not follow either geometry: the physical
drone takes off, GAP issues bias commands, and the operator directs the
trial manually in real time, one flight per trial.
In either geometry, each episode is scored against four success criteria at
once, where Cyl. denotes a vertical cylinder around the target and Sph.
denotes a sphere centered on the target:
Cyl. 20 m— drone reaches within 20 m horizontal radius of the targetSph. 20 m— drone reaches within 20 m straight-line distance of the targetCyl. 10 m— drone reaches within 10 m horizontal radius (headline metric)Sph. 10 m— drone reaches within 10 m straight-line distance
The paper's headline numbers use Cyl. 10 m, which is what
verify_claims.sh also reports.
RQ1: GAP vs. three baselines in PX4 jMAVSim. Baseline 1 injects random bias every second. Baseline 2 injects a fixed directional bias toward the target. Baseline 3 injects an adaptive directional bias toward the target every second.RQ2: GAP under realistic noise conditions. The three conditions are tracking-noise corruption, attack delay/loss noise, and both together.RQ3: transfer to PX4 Gazebo and 9 ArduPilot multicopter frames. The ArduPilot set coversquad,hexa,octa,octaquad,y6,dodeca-hexa,tri,singlecopter, andcoaxcopter.RQ4: CI-detector evasion inside a legacy VMware environment. This is the manual detector workflow that reproduces the published CI-detector implementation (Choi et al., CCS 2018).RQ5: failsafe analysis over the shared PX4 attack-flight corpus. This is a post-hoc analysis of built-in failsafe activation, not a fresh attack runner.RQ6: sim-to-real model in SITL and on a physical drone. The repository provides a supplementary SITL evaluation and a manual real-flight workflow.
RQ4 and real-drone RQ6 are manual workflows. See:
src/evaluation/ci_detector/README.mdfor the RQ4 VMware workflowsrc/evaluation/sim_to_real/README.mdfor the RQ6 real-flight workflow
- Host: recent
x86_64CPU with 12 logical cores; 16 GB RAM (32 GB recommended if you plan to run the fresh path end-to-end); no GPU required. The 12-core target matches the main PX4-jMAVSim path that runs 12 parallel workers. - Disk: provisioning a 100 GB virtual disk is recommended. On our
clean Ubuntu 22.04 guest, used disk grew from 10.9 GiB to 40.7 GiB
after
./setup.sh; fresh reruns add more for raw flight logs. Host-side Path A VM folder is ~50 GiB after setup. - Optional: Path A uses VirtualBox. The optional
RQ4legacy image is VMware-specific and is not required by the main path.RQ6-realadditionally requires a physical multicopter, Pixhawk, telemetry radios, and a safe test site.
The Zenodo bundle offers two equivalent entry points. Pick whichever fits your environment.
A ready-to-run Ubuntu 22.04 VirtualBox image with the repository, Python environment, built firmwares, model checkpoints, and QGroundControl already in place. Import the image into VirtualBox, log in, and skip directly to Verify the Installation. Import time on a desktop-class host: about 5 minutes.
This VM image is equivalent to running Path B's ./setup.sh on a clean
Ubuntu 22.04 install, so reviewers who prefer to rebuild from scratch can
reproduce it end-to-end via Path B.
Build from scratch on a clean Ubuntu 22.04 host.
Side effects.
./setup.shrequires Internet, invokessudo apt-get install(needs sudo), installs Python packages, and compiles both PX4 and ArduPilot. Expect about 30 minutes and an extra ~30 GiB of disk. The setup script creates a project-local.venv/and builds the two firmware submodules underGAP-PX4-Autopilot/build/andGAP-ardupilot/build/.
git clone --recurse-submodules https://github.com/postech-compsec/GAP.git ~/GAP
cd ~/GAP
./setup.sh
source .venv/bin/activateDownload and setup GAP models
The two RL checkpoints are distributed through Zenodo, not Git. Unpack them into:
src/gap/models/gap_model/src/gap/models/sim-to-real_model/
gap_model/ is used by RQ1 to RQ5.
sim-to-real_model/ is used by RQ6.
Path A reviewers already have both checkpoints in place and can skip this step.
Run this first to check that the artifact is installed correctly and that the shipped pre-baked results are internally consistent:
./experiments/run_smoke_test.sh
bash analysis/verify_claims.sh pre-baked
bash analysis/generate_all.sh pre-bakedWhat this does:
- verifies imports, checkpoints, and representative analysis paths
- checks the paper metrics against the pre-baked data
- regenerates the shipped CSV and PNG outputs under
analysis/
The main fresh automated reruns are RQ1, RQ2, and RQ3:
./experiments/run_rq1.sh --mode approx
./experiments/run_rq2.sh --mode approx
./experiments/run_rq3.sh
bash analysis/generate_all.sh fresh
bash analysis/verify_claims.sh freshOr run the same automated path through one wrapper:
./experiments/run_all.sh --mode approx
bash analysis/verify_claims.sh freshrun_all.sh executes the fresh automated path for RQ1, RQ2, and RQ3,
then runs bash analysis/generate_all.sh fresh.
Fresh outputs are written under results/fresh/.
The --mode flag controls how many fresh trials are run per worker.
approx runs 2 trials per worker — reviewers can exercise the fresh
execution path without paying the paper's full wall-clock cost. full
runs 100 trials per worker and matches the paper sample count. Both
modes use the same code path; only the trial count differs. The shipped
pre-baked data remains the exact paper result set in either case. All
fresh reruns additionally need at least 10 GB free under
results/fresh/ for raw flight logs.
Practical wall-clock estimates from our 12-core Ubuntu 22.04 reference host:
| Path | Human | Compute | Supports |
|---|---|---|---|
| Smoke test + pre-baked verify + regenerate | 5 min | ~10 min | C1–C6 |
Fresh RQ1 (approx) |
5 min | ~2 h | C1 |
Fresh RQ1 (full) |
5 min | ~4 h | C1 |
Fresh RQ2 (approx) |
5 min | ~15 min | C2 |
Fresh RQ2 (full) |
5 min | ~9 h | C2 |
Fresh RQ3 |
5 min | ~7 h | C3 |
Shipped RQ4 analysis |
3 min | <1 min | C4 |
RQ5 failsafe analysis |
2 min | <1 min | C5 |
Shipped RQ6 analysis |
2 min | <1 min | C6 |
Full automated fresh path (RQ1 + RQ2 + RQ3, approx):
~10 h wall-clock on 12 cores. In full mode: ~20 h.
analysis/verify_claims.sh has two modes that answer two different questions:
pre-baked— canonical paper-reference check against the shipped pre-baked data. Reports PASS or SKIP.FAILappears only on a shipped-data mismatch. Use this to verify the paper metrics.fresh— observational summary of your own reruns against the same paper reference. Reports OBSERVED or SKIP only. RL reruns ofRQ1–RQ3are inherently stochastic, so the shipped pre-baked tree is treated as the exact paper result set and fresh reruns as directional verification.
Use pre-baked for formal PASS/SKIP. Use fresh to interpret OBSERVED values as supplementary evidence for the same claims.
Common entrypoints:
./experiments/run_smoke_test.sh
bash analysis/verify_claims.sh pre-baked
bash analysis/generate_all.sh pre-baked
bash analysis/generate_all.sh fresh
bash analysis/verify_claims.sh freshSee Claim Verification Modes above for the
semantics of pre-baked vs fresh.
Per-RQ experiment wrappers:
./experiments/run_rq1.sh --mode approx
./experiments/run_rq1.sh --mode full
./experiments/run_rq2.sh --mode approx
./experiments/run_rq2.sh --mode full
./experiments/run_rq3.sh
./experiments/run_rq3.sh --frames "quad hexa octa octaquad y6 dodeca-hexa tri singlecopter coaxcopter"
./experiments/run_rq3.sh --skip-gazebo
./experiments/run_rq6.sh --mode approxBy default, the fresh experiment wrappers move raw flight logs into
results/fresh/flight-logs/. To save disk, add --raw-logs off.
Use bash analysis/generate_all.sh <pre-baked|fresh> to regenerate all shipped
tables and figures for one result tree.
All analysis scripts aggregate all matching files under results/<source>/.
To run one analysis script by hand:
RQ1 / Table 3
python3 -m analysis.generate_rq1_table3 --source pre-bakedReads results/<source>/rq1/ and writes:
analysis/csv/rq1_table3_<source>.csvanalysis/figures/rq1_table3_<source>.png
RQ1 / Figure 7
python3 -m analysis.generate_rq1_figure7 --source pre-baked
python3 -m analysis.generate_rq1_figure7 --source pre-baked --use-ulogsDefault input is the shared PX4 attack-flight CSV corpus under
results/<source>/flight-logs/px4/csv/. Use --use-ulogs to read the raw
shared PX4 .ulg files instead if you separately have them locally. The
default shipped artifact uses the extracted CSV corpus only. Writes:
analysis/figures/rq1_figure7a_trajectories_<source>.pnganalysis/figures/rq1_figure7b_bias_<source>.png
RQ2 / Table 4
python3 -m analysis.generate_rq2_table4 --source pre-bakedReads results/<source>/rq2/ and writes:
analysis/csv/rq2_table4_<source>.csvanalysis/figures/rq2_table4_<source>.png
RQ3 / Table 5
python3 -m analysis.generate_rq3_table5 --source pre-bakedReads results/<source>/rq3/ and writes:
analysis/csv/rq3_table5_<source>.csvanalysis/figures/rq3_table5_<source>.png
RQ4
python3 -m analysis.generate_rq4_analysis --source pre-baked
python3 -m analysis.generate_rq4_analysis --source pre-baked --trials 1,2,3Default is the shipped trial set 1,2. Use --trials 1,2,3 only if
you explicitly want the extra shipped trial. Reads results/<source>/rq4/ and
writes rq4_analysis_<source>*.csv/.png under analysis/csv/ and
analysis/figures/.
RQ5 (analysis-only; there is no run_rq5.sh)
python3 -m analysis.generate_rq5_analysis --source pre-baked
python3 -m analysis.generate_rq5_analysis --source pre-baked --use-ulogsRQ5 does not have its own experiment runner; it post-processes the same
PX4 attack-flight corpus that RQ1 produces. Default input is the shared PX4
attack-flight CSV corpus under results/<source>/flight-logs/px4/csv/.
--use-ulogs refreshes that worker CSV corpus from
results/<source>/flight-logs/px4/raw/ first, then reruns the analysis. The
default shipped artifact provides the extracted CSV corpus, not the full raw
PX4 log set. Writes:
analysis/csv/rq5_analysis_<source>.csv
RQ6 / Table 6
python3 -m analysis.generate_rq6_table6 --source pre-bakedReads results/<source>/rq6/real_evaluation/ and writes:
analysis/csv/rq6_table6_<source>.csvanalysis/figures/rq6_table6_<source>.png
RQ6 / Figure 8
python3 -m analysis.generate_rq6_figure8 --source pre-baked
python3 -m analysis.generate_rq6_figure8 --source pre-baked --bias-trial 3Reads results/<source>/rq6/real_evaluation/. The trajectory panel uses all
available real-flight trials; --bias-trial N chooses which trial to show in
the bias panel. Writes:
analysis/figures/rq6_figure8b_trajectories_<source>.pnganalysis/figures/rq6_figure8c_bias_<source>.png
Most of these scripts also accept --results-dir to override the default
input tree. RQ4 uses --input-dir instead.
Top-level directories reviewers most often touch:
experiments/: shell wrappers for each RQ (run_rq1.sh,run_rq2.sh,run_rq3.sh,run_rq6.sh), the smoke test, andrun_all.shfor the one-command fresh path. These are thin drivers oversrc/.analysis/: onegenerate_*.pyper paper table or figure, plusverify_claims.sh/verify_claims.pyandgenerate_all.sh. CSV outputs go toanalysis/csv/, figures toanalysis/figures/.src/gap/: the GAP model and RLlib module definitions consumed by the trained checkpoints (gap_model,sim-to-real_model).src/evaluation/: reusable evaluation code shared by the experiment wrappers.primary/: the main PX4-jMAVSim 12-worker evaluator used byRQ1(GAP) andRQ2.baseline/:RQ1baselines (random / fixed-direction / adaptive-direction).cross_platform/:RQ3drivers for PX4 Gazebo and the nine ArduPilot multicopter frames.ci_detector/:RQ4manual VMware workflow, includingREADME.mdandVM_SETUP.md.sim_to_real/:RQ6supplementary SITL runner and the real-flight workflow (see itsREADME.md).common/: shared utilities (config, logging, result writers).
src/tools/: one-off helpers such asextract_px4_attack_csvs.py, which converts raw PX4.ulglogs into the shared worker CSV corpus used byRQ1 Figure 7andRQ5.GAP-PX4-Autopilot/,GAP-ardupilot/: git submodules pinned to the two patched firmware forks. GAP's changes are preserved as separate commits on top of the upstream releases so reviewers can inspect them withgit log.
The shipped artifact uses these pre-baked inputs:
results/pre-baked/rq1/results/pre-baked/rq2/results/pre-baked/rq3/results/pre-baked/rq4/results/pre-baked/rq6/results/pre-baked/flight-logs/px4/csv/worker1/...worker12/
The shared PX4 CSV corpus is used by:
RQ1 Figure 7RQ5
results/pre-baked/: shipped paper-reference outputs consulted byverify_claims.sh pre-bakedresults/fresh/: outputs from your own rerunsresults/*/rq1/,rq2/,rq3/,rq4/,rq6/: per-RQ result treesresults/*/flight-logs/px4/csv/worker1/...worker12/: shared extracted PX4 attack-flight CSVresults/*/flight-logs/px4/raw/: raw PX4.ulglogs from fresh reruns or optional local audit data, if presentresults/*/flight-logs/ardupilot/raw/: raw ArduPilot.BINlogs from fresh reruns, if presentresults/fresh/ray_results/: fresh Ray / RLlib logs
Generated summaries and figures are written to:
analysis/csv/analysis/figures/
All generated filenames are source-tagged, for example:
rq3_table5_pre-baked.csvrq5_analysis_fresh.csvrq1_figure7a_trajectories_fresh.png
- The wrappers prefer the project-local
.venvcreated by./setup.sh. --mode approxis the recommended reviewer path for fresh reruns.- QGroundControl buggy state during
RQ3.RQ3sequentially exercises PX4 and ArduPilot SITL, so the active autopilot firmware behind QGC switches during the run. QGC can get into a slightly buggy state after a switch (e.g., altitude reading stops updating). If you see this, just quit and relaunch QGC.
