Could you please explain how the alpha parameter affects the training of FVIT? Why does VITB use 0.7, while VITL uses 0.95?