-
Notifications
You must be signed in to change notification settings - Fork 11
Consultation on a similar problem? #16
Description
Hello @DaniloSorano ,
I couldn't find your email so putting it here, sorry :) I'm working on a similar problem and wanted to consult with you. I'm trying to predict the events from the broadcast video and specifically am focusing on "Ground Pass" for now. The difference being I only have the moment of the event so I'm just adding a certain width of +-n frames to that moment.
I'm writing it in TF/Keras, and so far implemented only a visual branch of your model. I extract the video features with ResNet50(2048 dimensional), 25FPS, and try to fit with a combination of LSTM and Dense layers. I don't expect this to perform well, but I was expecting it to perform better than the model predicting all frames as negative.
So I was wondering if you have any insight/tips/suggestions on how to proceed. Maybe you've encountered a similar problem when you were working on it. I feel like just the visual branch should predict something(albeit poorly). Currently, it's learning to mostly predict 0s for all frames...
In the simplest case, it's just LSTM with Dense layer. I take 100 frames per input sequence. Even when expanding the pass moment to 25 frames I don't get anything. I tried a few combinations of LSTM, Dense and Conv1D with same result.
Layer (type) Output Shape Param #
=================================================================
input_4 (InputLayer) [(None, 100, 2048)] 0
_________________________________________________________________
lstm_5 (LSTM) (None, 100, 128) 1114624
_________________________________________________________________
dropout_5 (Dropout) (None, 100, 128) 0
_________________________________________________________________
batch_normalization_5 (Batch (None, 100, 128) 512
_________________________________________________________________
time_distributed_3 (TimeDist (None, 100, 2) 258
=================================================================
Total params: 1,115,394
Trainable params: 1,115,138
Non-trainable params: 256