This directory contains test results for the real-time surveillance system, which is designed to detect violent activities using deep learning models. The system has been tested with three different versions of the model on various video samples.
-
0.81.h5 Model
- An initial version of the violence detection model.
- Provides basic detection capabilities with moderate accuracy.
-
0.86.h5 Model
- An improved version with better feature extraction and optimized training.
- Higher accuracy and reduced false positives compared to the previous model.
-
ViolenceModelFinal.keras
- The most refined version with advanced deep learning techniques.
- Utilizes enhanced training data for more reliable threat detection.
- Designed for real-world deployment with the highest accuracy among the three.
The models were tested on multiple videos containing different scenarios of violent activities. The processed results for each model can be accessed using the links below:
-
Indian Street Fight
-
Street Fight TKO 30 Seconds
-
div-dip-test
-
hem-dip-test
-
local
-
WWE
The violence detection model follows the LRCN (Long-term Recurrent Convolutional Network) architecture, combining CNNs for spatial feature extraction and LSTMs for temporal sequence modeling. This approach ensures effective video classification by learning both frame-level features and sequential patterns over time.
-
CNN Feature Extractor:
- Uses three convolutional layers with Batch Normalization and MaxPooling to extract spatial features from individual frames.
- The output is flattened and transformed into a sequence representation.
-
LSTM-based Temporal Modeling:
- The extracted frame features are repeated across time steps to form a sequence.
- Two stacked LSTM layers process this sequence to learn temporal dependencies between frames.
-
Classification Layer:
- Fully connected (
Dense) layers refine the extracted features. - A Dropout layer helps prevent overfitting.
- A softmax layer classifies the video as either Violence (1) or Non-Violence (0).
- Fully connected (
Three different trained models have been tested on various video samples, demonstrating their performance differences:
- 0.81.h5 – A model trained with an accuracy of 81%, showing moderate reliability.
- 0.86.h5 – A refined model achieving 86% accuracy, improving detection capability.
- ViolenceModelFinal.keras – The most optimized version, trained for better generalization and robustness.