In dataloader what are other info apart from 6 images for flashocc

I was testing the vis_occ.py and found that the data dict is passed to model for inference. And the data includes not only concatenated images but the points aswell. My question is can we skip the points and just pass the 6 camera input to the model.  What is the role of those points during inference if the model is based on camera only.