Understanding the Transformations between Depth and HD Camera to Align Content 

Hi everyone,

First I postet my question beneath a closed issue but I though it might be useful to open up a new issue. I am having the problem that I cannot get my depth images and HD images aligned. The picture below visualises my problem. I try to align a calibration pattern. First I filter the 3D depth points, which belong to the calibration pattern, then I project these points into the HD image.

![CalibrationPattern_ViewPointLeft](https://user-images.githubusercontent.com/39343975/67786451-2e808800-fa6f-11e9-81f5-26d5476978b6.png)

To figure out the problem I investigated the meaning/content of the transformation matrices in some more detail. In the following I outline my understanding of the transformation matrices. Then I describe how I try to align my images. I would appreciate your help very much!

**1. CameraCoordinateSystem (MFSampleExtension_Spatial_CameraCoordinateSystem)**
In the example HoloLensForCV this coordinate system is used to obtain the transformation "FrameToOrigin". The FrameToOrigin transformation is obtained by transforming the CameraCoordinateSystem to the OriginFrameOfReference. (line 140-142 in MediaFrameReaderContext.cpp)

I still do not exactly know what is described by this transformation. What is meant by "frame"?

Through experimenting I found out that the translation vector changes when moving. In fact, the changes do make sense: If I move forward, the z-component becomes smaller. This agrees with the coordinate system in the image below. The z-axis is pointing in the opposite direction of the image plane.

![coordinatesystems](https://user-images.githubusercontent.com/39343975/67787248-87045500-fa70-11e9-8dbd-b9193bbf1790.png)

The same applies for moving left or right: moving right makes the x component increase. The y component is about stable. This makes sense as I am not moving up or down.
What I am really uncertain about is the rotational part of the transformation matrix. The rotational part is almost an Identity Matrix. The rotation of my head seems to be contained in the CameraViewTransform, which I describe in the second point.

As far as I understand, the FrameToOrigin Matrix looks as follows:
[1, 0, 0, 0,
 0, 1, 0, 0,
 0, 0, 1, 0,
 x, y, z, 1]

For me the **"FrameToOrigin"** seems to describe the relation between a **fixed point on the HoloLens** to the **Origin** (The Origin is defined each time the app is started, this helps to map each frame to a frame of reference). In the Image above the Origin is probably the "App-specific Coordinate System".


**2. CameraViewTransform (MFSampleExtension_Spatial_CameraViewTransform )**
The CameraViewTransform is directly saved with each frame (In contrast to FrameToOrigin, no transformation is neccessary).

The rotation of the head seems to be saved within the rotational part of this matrix. I tested this by moving my head around the y-axis. If I turn about 180 ° to the right around my y-axis, the rotational part looks as follows:
[0, 0, 1,
 0, 1, 0,
-1, 0, 0].
This corresponds to a 180° rotation around the y-axis - what we expect..

The translational part seems to stay about stable. This would make sense if the translational part described the translation between the fixed point on the HoloLens and the respective camera (hd or depth). However, I would expect the translational part to stay exactly equal. This is not the case. The translational part is only "about" equal and not exactly.

If I do not turn my head (rotational part is an Identity Matrix) the CameraViewTransform looks as follows:

CameraViewTransform for HD Camera
[1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0,
0.00631712, -0.184793, 0.145006, 1]

CameraViewTransform for Depth Camera
[1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0,
0.00798517, -0.184793, 0.0537722, 1]

So the CameraViewTransform seems to capture the rotation of the users head. What is captured by the translational part? If the translational part is the distance between a fixed point on the HoloLens and the respective camera - why is the translational part not always exactly equal?

**3. CameraProjectionTransform (MFSampleExtension_Spatial_CameraProjectionTransform)**
This transformation is described on the following github page:
[https://github.com/MicrosoftDocs/mixed-reality/blob/5b32451f0fff3dc20048db49277752643118b347/mixed-reality-docs/locatable-camera.md](url)

However, what is still unclear: What is the meaning of the terms A and B?



**My aim is to map between the depth camera and the hd camera of the HoloLens. To do this I do the following:**

   1.  I record images with the Recorder Tool of the HoloLensForCV sample
   2. I take a depth image and look for the corresponding hd image by checking the timestamps.
   3. I use the unprojection mapping to find the 3D points in the CameraViewSpace of the Depth Camera.
   4. I transform the 3D points from the Depth Camera View to the HD Camera View and project them onto the image plane. I use the following transformations:
    [Pixel Coordinates.x,Pixel Coordinates.y , 1, 1] = [3D depth point,1] * inv(CameraViewTransform_Depth) * FrameToOrigin_Depth * inv(FrameToOrigin_HD) * CameraViewTransform_HD * CameraProjectionTransfrom_HD

These Pixel Coordinates are in the range from -1 to 1 and need to be adjusted to the image size of 720x1280. This is done as follows:
     x_rgb = 1280 * (PixelCoordinates.x + 1) / 2;
     y_rgb = 720 * (1 - ((PixelCoordinates.y +1)/2));

**Result:** When transforming my detections from the depth camera to the hd image camera, the objects (In this case the Calibration Pattern) are not 100 % aligned. So I am trying to figure out where the misalignment is coming from. Am I understanding the transformation matrices wrong or has anyone experienced similar problems?

The problems might occur if the spatial mapping of the HoloLens is not 100% working correctly. This might happen If the HoloLens cannot find enough features to map the room. Thus, I tested my setup in different rooms. Especially, in smaller rooms with more clutter in the background (such that the HoloLens can find more features to map the room). However, the problem still occurs. As I outlined above, the rough appearance of the transformations seems to be correct. I do not have any idea of how to test the transformation matrices further to grasp the problem. 

I would appreciate your help very much! Thanks a lot in advance!
Lisa




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Understanding the Transformations between Depth and HD Camera to Align Content #119

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Understanding the Transformations between Depth and HD Camera to Align Content #119

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions