If we want to start the tracking we need to get the user's hands as close as possible to an initial pose. Like the single-hand FORTH OpenNI implementation does, we should provide a sort of overlay or 3d representation which is then color-coded by how good the calibration pose is matched, possibly starting tracking as soon as a threshold is reached.