This project provides an automated grading tool for student submissions. It processes student submissions, extracts questions and answers, generates an answer key and rubric, and (optionally) grades the submissions using various document processing and language model (LLM) utilities.
Note: Some parts of the grading process (such as the final grading step) are currently commented out in the main code. Adjust and uncomment as needed.
- Submission Conversion: Converts PDF/DOCX submissions to Markdown.
- Question Extraction: Extracts questions and context from a provided assignment or from student submissions.
- Answer Key Generation:
- Converts a provided answer key document to a standardized question-level format.
- Alternatively, auto-generates an answer key using multiple attempts with an LLM.
- Rubric Generation:
- Processes a provided rubric document, or
- Auto-generates a synthetic rubric based on submissions and answer key.
- Grading Engine:
- Grades student responses against the answer key and rubric using Azure OpenAI.
- Generates detailed feedback for each question.
- Temporary File Management: Uses a backup folder to store intermediate files to avoid redundant processing.
- Python 3.7+
- Libraries:
ossysargparseasynciopandaspython-dotenv(for environment variable management)openai(for AsyncAzureOpenAI client)
- Custom Modules:
llm_grader(for grading questions)process_documents(for processing documents)extract_problems(for extracting questions from submissions)generate_rubric(for creating grading rubrics)create_answer_key(for processing answer keys)generate_answer_key(for generating answer keys)
-
Clone the repository and navigate to the project directory:
git clone https://github.com/your-repo/automated-grader.git cd automated-grader -
Install the required packages (preferably in a virtual environment):
pip install -r requirements.txt
Create a .env file in the root of your project with the following variables (replace with your actual values):
AZURE_ENDPOINT_GPT=your_azure_openai_endpoint
AZURE_API_KEY_GPT=your_azure_api_key
The custom modules (llm_grader, process_documents, extract_problems, generate_rubric, create_answer_key, generate_answer_key) should be present in the project directory.
Run the script from the command line using:
./grade.py submissions_folder [--answer_key ANSWER_KEY] [--blank_assignment BLANK_ASSIGNMENT]
[--output_csv OUTPUT_CSV] [--rubric RUBRIC] [--truncate PAGES [PAGES ...]]
[--backup_folder BACKUP_FOLDER]submissions_folder
The folder containing student submissions in PDF/DOCX format. All files will be validated and converted to PDF if needed.
-
--answer_key
Path to the answer key document (PDF, DOCX, etc.). If not provided, an answer key will be generated automatically. -
--blank_assignment
Path to an unaltered copy of the assignment. If not provided, a blank assignment will be generated from a submission. -
--output_csv(Default:./grader_output.csv)
Path/filename for the CSV file that will contain the final grading results. Will be saved in the parent directory of the submissions folder. -
--rubric(Optional)
Path to a rubric document. If not provided, a synthetic rubric is generated. -
--truncate(Optional)
List of page numbers to exclude from PDF submissions (e.g., metadata pages). -
--backup_folder(Default:temp)
Directory to store temporary files generated during processing. Will be created in the parent directory of the submissions folder. -
--threads_file(Optional)
Path to a CSV file containing PingPong threads data. If provided, will replace links with conversation text. -
--model(Default:gpt-5-mini)
Azure OpenAI model to use for grading.
- Converts all student submissions to Markdown and stores them in
submissions_markdown.csv. - Creates a backup of the original submissions.
- Generates or retrieves a blank assignment.
- Extracts questions with context.
- Converts submissions into a question-level format.
- If an answer key is provided, it is processed and standardized.
- Otherwise, an answer key is generated using multiple LLM attempts.
- A provided rubric document is processed.
- If no rubric is provided, a synthetic rubric is generated.
- Uses the rubric and answer key to grade submissions.
- Saves the results to a CSV file in the parent directory of the submissions folder.
- Generates detailed feedback and saves it as
feedback.csv.
- Allow specifying different model versions for various functions.
- Add more robust error handling and user-friendly error messages.
- Ensure all submission files are either PDF or DOCX format.
- The script will attempt to convert DOCX files to PDF automatically.
- Ensure required files (e.g., Markdown conversions, extracted questions) exist in the backup folder.
- Verify the format and paths of provided documents.
- All output files (including
grader_output.csvandfeedback.csv) are saved in the parent directory of the submissions folder. - Temporary files are stored in the
tempfolder within the parent directory.