Mingche Li
To address the UAS4STEM 2024 competition challenge (where a drone must identify a QR code on a 4'x4' target in the air and pick up/drop off an item), a system was designed using monocular camera-based QR recognition and vision-based localization technology. By using a pre-trained QR recognition model, the QR code's pixel position on the image plane is detected, and a series of coordinate transformations using the camera intrinsic parameters, and drone flight data (altitude, attitude angles, GPS), we can ultimately calculate the QR code's GPS coordinates.
The system mainly consists of a flight control system, a Raspberry Pi, and an OAK AI smart camera. The flight control system provides real-time altitude, attitude angles, and GPS information about the UAV. The OAK smart camera handles QR code recognition and provides the QR code’s pixel coordinates. The Raspberry Pi uses the data to calculate the QR code's location. See the schematic below:
-
A 2D coordinate system with the image's top-left corner as the origin.
-
A 3D coordinate system with the camera’s focal point as the origin, and the optical axis as the Z-axis, conforming to a right-handed coordinate system.
-
This auxiliary system is located on the ground plane. The Y-axis is vertical, passing through the origin of the camera coordinate system, and the height between origins is the camera’s altitude h.
- A 3D coordinate system with its origin at the center of the UAV (assuming the flight controller’s sensors are installed at the center of the UAV), conforms to a right-handed coordinate system.
Origin is based on the center of the UAV. The axes are defined as follows:
-
The axis parallel to the north-south direction is the N-axis (with the north as the positive direction)
-
The axis parallel to the east-west direction is the E-axis (with the east as the positive direction)
-
The axis parallel to the north-south direction is the N-axis (with the north as the positive direction)
ECEF. Refer to Reference 10 for details.
Based on the WGS84 reference system. Refer to Reference 9 for details.
First, we need to integrate the coordinate axis rotation function (reference 1):
Rotation around the coordinate axis, counterclockwise is positive, clockwise is negative
def rotate_axis_x(alpha):
rot = np.array([[1, 0, 0], [0, np.cos(alpha), np.sin(alpha)], [0,-np.sin(alpha), np.cos(alpha)]])
return rot
def rotate_axis_y(beta):
rot = np.array([[np.cos(beta), 0, -np.sin(beta)], [0, 1, 0], [0,np.sin(beta), np.cos(beta)]])
return rot
def rotate_axis_z(gamma):
rot = np.array([[np.cos(gamma), np.sin(gamma), 0], [-np.sin(gamma),np.cos(gamma), 0], [0, 0, 1]])
return rot
- On the 2D pixel coordinates, the point Oc lays on (U/2, V/2) according to the pinhole camera model(Refer to Reference 2, 3, 4)
x_c = -(u-u0)*dx/focalL
y_c = -(v-v0)*dy/focalL
z_c = 1
Where:
-
(u,v): Pixel coordinates of point P
-
u0,v0: Camera origin in the pixel plane
-
dx, dy: Pixel dimensions
-
focalL: focal length
When the camera is installed, the optical axis (Zc) faces
downwards, as described in this
picture:
The camera height from the ground is h. In order to convert the camera coordinates into world coordinates, we have to rotate -90∘ clockwise about Xc and translate by h. If the UAV has a pitch (pitch) and roll (roll) angle, then we need to rotate clockwise about Yc by roll and rotate clockwise about Xc by 90-pitch.
let alpha = -(90-pitch),beta = -roll
R_x = rotate_axis_x(np.deg2rad(-(90-pitch)))
R_y = rotate_axis_y(np.deg2rad(-roll))
R = R_x @ R_y
camera_coords = np.array([x_c, y_c, z_c])
rotated_coords = R @ camera_coords
x_prime_c = rotated_coords[0]
y_prime_c = rotated_coords[1]
z_prime_c = rotated_coords[2]
Since OcOw=h, y_prime_c=-h, and scaling is defined as scale= -h/y_prime_c. thus the world coordinates of point P are:
xw = x_prime_c * scale
yw = 0
zw = z_prime_c * scale
To calculate the original camera coordinates:
x_prime_c *= scale
z_prime_c *= scale
y_prime_c = -h
original_camera_coords = np.linalg.inv(R) @ np.array([x_prime_c, -h,z_prime_c])
-
As shown in the diagram, the relationship between the camera and the UAV involves:
-
A counterclockwise rotation of 90∘ around the Zc axis.
-
A translation by offset T.
-
Thus:
uav_pos = rotate_axis_z(np.deg2rad(90)) @ camera_pos + T
- Transformation depends on attitude angles (Reference 5):
- Define rotation matrices:
r_z = rotate_axis_z(np.deg2rad(yaw))
r_y = rotate_axis_y(np.deg2rad(pitch))
r_x = rotate_axis_x(np.deg2rad(roll))
-
From NED to UAV:
Rotate by yaw(Rz), then pitch(Ry), and finally roll(Rx):
RNED2UAV =Rz(yaw)->Ry(pitch)->Rx(roll)
RNED2UAV =r_x @ r_y @ r_z
- Using known UAV GPS coordinates, we can convert them into ECEF coordinates.
Where
Due to:
Where PNED and PRef are calculated by the previous steps, rotational matrix R will be defined as:To convert the ECEF coordinates of PECEF back to GPS coordinates, we can use the following formulas:
Where
Finally, the process of converting pixel coordinates into GPS coordinates are complete.
The experiment is divided into two phases:
The experimental procedure is described as follows: first, attach the camera to a stand with a connector of the length T. The connector must be able to freely rotate at the stand’s top, simulating the UAV’s attitude angles (excluding yaw). Additionally, the stand has to be perpendicular to the surface you are testing with. Then, adjust the height of the stand to simulate various altitudes of the UAV. Through the use of the IMU within the OAK-D Lite Camera, we can use the attitude angles (excluding yaw), and calculate the NED coordinates of the QR code’s center. Finally, record the calculated GPS coordinates at each position.
Install the camera on the UAV, and measure the camera’s displacement T from the UAV’s center of gravity. Place the QR code on the ground and fly the UAV at various altitudes and yaw angles. Record the calculated GPS coordinates at each position.
Pixel coordinates to NED coordinates testing:
The QR code is placed x=5cm and y=6cm away from the camera, the results are shown in the table:
| x(cm) | y(cm) | h(cm) | pitch(°) |
|---|---|---|---|
| 5.306 | 6.459 | 26.7 | 0 |
| 5.283 | 6.164 | 28.8 | 0 |
| 5.212 | 6.238 | 29.7 | 0 |
| 5.325 | 6.244 | 30.7 | 0 |
| 5.368 | 6.248 | 33.1 | 0 |
| 5.116 | 6.625 | 35.3 | 6.07 |
| 4.852 | 6.303 | 34.7 | 2.35 |
| 5.154 | 6.742 | 30.8 | -10.05 |
| 5.104 | 6.174 | 31.7 | -3.63 |
System errors include installation errors, camera distortion, and sensor errors. Mitigations include experimental adjustments and algorithmic corrections.
















