A natural language processing pipeline for analyzing free recall of avalanche-related text.
- PyTorch-accelerated text analysis
- Sentence embedding with Sentence-BERT (SBERT)
- Text generation using GPT-4o
- Machine learning classification
- Embedding visualization and analysis
-
Sentence Generation
- Uses GPT-4o to generate related sentences about avalanches
- Supports both simplified and detailed processing modes
- Segments input text and generates multiple responses
-
Embedding Generation
- Converts generated sentences into numerical vector representations
- Utilizes MPNet model from Sentence-BERT
- Leverages PyTorch for efficient tensor computations
-
Classification and Visualization
- Applies random forest classifiers
- Generates probability heatmaps
- Visualizes sentence recall and topic probabilities
- PyTorch
- Sentence-BERT (SBERT)
- GPT-4o
- Pandas
- NumPy
- Matplotlib
- Seaborn
sentences_gpt-4o/: Scripts for sentence generation and embeddingrecall-analysis/: Classification and analysis scriptsutilities_py/: Utility functions and helper modules
- Python 3.8+
- PyTorch
- Sentence-BERT
- OpenAI API (for GPT-4o)
- Pandas
- NumPy
- Matplotlib
- Seaborn
Refer to the Makefiles in each directory for specific execution commands. The project uses a Docker-based workflow with PyTorch containers.
The pipeline is optimized for computational efficiency, utilizing PyTorch's parallel processing and tensor computation capabilities.
MIT license