Meeting Data Analysis

A Python tool for analyzing meeting recordings from the Grain API. This tool fetches meeting data, analyzes speaking patterns, participant engagement, and generates comprehensive reports.

Features

Fetches meeting recordings from Grain API within a specified date range
Analyzes speaking time for internal vs external participants
Identifies who spoke first in meetings
Calculates speaking wait times and participant join spread
Classifies meetings by speaking patterns (internal only, external only, both, or no speech)
Participant classification using scope field (internal/external/unknown) with intelligent fallback
Participant emails - Lists all participant emails for each meeting
Meeting owner identification - Identifies meeting owner from internal participants
Tracks no-shows (participants scheduled but didn't attend)
Calculates average handle time (using meeting duration as proxy)
Lateness tracking (with limitations - see API Limitations section)
Exports results to CSV for further analysis

Prerequisites

Python 3.8 or higher
A Grain API key (Personal Access Token)
Access to a Grain workspace

Installation

Clone or download this repository
Create a virtual environment (recommended):
```
python -m venv venv
```
Activate the virtual environment:
- On macOS/Linux:
```
source venv/bin/activate
```
- On Windows:
```
venv\Scripts\activate
```
Install dependencies:
```
pip install -r requirements.txt
```

Getting Your API Key

Log in to your Grain account
Navigate to API settings (Under Settings > Integration > API)
Generate a Personal Access Token (PAT) or Workspace Access Token (WAT)
Copy the token - you'll need it to run the analysis

Configuration

All parameters can be configured via command-line arguments or environment variables. Command-line arguments take precedence over environment variables.

Required Parameters

API Key: Your Grain Personal Access Token (workspace is determined automatically from the API key)
Start Date: Start date in YYYY-MM-DD format (inclusive)
End Date: End date in YYYY-MM-DD format (exclusive)

Option 1: Command-Line Arguments (Recommended)

python run_analysis.py \
  --api-key "your-api-key-here" \
  --start-date "2025-10-19" \
  --end-date "2025-11-20"

Option 2: Environment Variables

Set environment variables:

export GRAIN_API_KEY="your-api-key-here"
export GRAIN_START_DATE="2025-10-19"
export GRAIN_END_DATE="2025-11-20"

On Windows:

set GRAIN_API_KEY=your-api-key-here
set GRAIN_START_DATE=2025-10-19
set GRAIN_END_DATE=2025-11-20

Then run:

python run_analysis.py

Option 3: Mix of Both

You can use environment variables for some parameters and command-line arguments for others. CLI arguments override environment variables:

export GRAIN_API_KEY="your-api-key-here"

python run_analysis.py --start-date "2025-10-19" --end-date "2025-11-20"

Usage

Basic Usage

Run with all required parameters:

python run_analysis.py \
  --api-key "your-api-key" \
  --start-date "2025-10-19" \
  --end-date "2025-11-20"

Advanced Options

python run_analysis.py \
  --api-key "your-api-key" \
  --start-date "2025-10-19" \
  --end-date "2025-11-20" \
  --simultaneous-threshold 10 \
  --output "custom_output.csv"

View All Options

python run_analysis.py --help

Complete Example

# Set all parameters via environment variables
export GRAIN_API_KEY="grain_pat_your_key_here"
export GRAIN_START_DATE="2025-10-19"
export GRAIN_END_DATE="2025-11-20"

# Run the analysis
python run_analysis.py

# Or use command-line arguments
python run_analysis.py \
  --api-key "grain_pat_your_key_here" \
  --start-date "2025-10-19" \
  --end-date "2025-11-20" \
  --output "my_analysis_results.csv"

Using as a Module

You can also import and use the function in your own scripts:

from meeting_data_analysis import meeting_data_analysis

df = meeting_data_analysis(
    api_key="your-api-key",
    start_date="2025-10-19",
    end_date="2025-11-20",
    simultaneous_threshold_seconds=5
)

# Process the DataFrame
print(df.head())
df.to_csv("my_analysis.csv", index=False)

Command-Line Parameters

Required Parameters

--api-key or GRAIN_API_KEY: Your Grain API Personal Access Token (workspace is determined automatically)
--start-date or GRAIN_START_DATE: Start date in YYYY-MM-DD format (inclusive)
--end-date or GRAIN_END_DATE: End date in YYYY-MM-DD format (exclusive)

Optional Parameters

--simultaneous-threshold: Threshold in seconds to consider two speakers as speaking simultaneously (default: 5)
--output: Output CSV file path (default: meeting_data_analysis.csv)

Function Parameters (for direct use)

When using meeting_data_analysis() as a Python function:

api_key (str, required): Your Grain API Personal Access Token (workspace is determined automatically)
start_date (str, required): Start date in YYYY-MM-DD format (inclusive)
end_date (str, required): End date in YYYY-MM-DD format (exclusive)
simultaneous_threshold_seconds (int, optional): Threshold in seconds to consider two speakers as speaking simultaneously. Default: 5

Output

The analysis generates a CSV file (meeting_data_analysis.csv) with the following columns:

Core Metrics

internal_user_ids: List of internal user IDs who participated
recording_id: Unique identifier for the recording
start_datetime: When the meeting started
duration_minutes: Meeting duration in minutes
meeting_owner_email: Email of the meeting owner (first internal participant)
meeting_owner_name: Name of the meeting owner
participant_emails: List of all participant emails for the meeting (sorted)
num_speakers: Total number of unique speakers
num_internal_speakers_who_spoke: Number of internal participants who spoke
num_external_speakers_who_spoke: Number of external participants who spoke
internal_speaking_minutes: Total internal speaking time in minutes
external_speaking_minutes: Total external speaking time in minutes
speaking_category: Classification (e.g., "Both spoke", "Only internal spoke", "Only external spoke", "No one spoke")
internal_speaking_pct: Percentage of total speaking time by internal participants
num_internal_participants: Total number of internal participants
num_external_participants: Total number of external participants
first_internal_spoke_time: Timestamp when first internal participant spoke
first_external_spoke_time: Timestamp when first external participant spoke
who_spoke_first: Classification of who spoke first ("internal", "external", "simultaneous", "only_internal", "only_external", "no_speech")
speaking_wait_time_minutes: Time difference between first internal and external speech (in minutes)
first_join_time: When the first participant joined
last_join_time: When the last participant joined
total_meeting_participants: Total number of participants
join_spread_minutes: Time difference between first and last join (in minutes)

Operational Metrics (Requested Features)

The following metrics were requested for appointment/operational analysis:

num_no_shows: Number of participants who were scheduled but did not attend (confirmed_attendee=False)
num_internal_no_shows: Number of internal participants who were no-shows
num_external_no_shows: Number of external participants who were no-shows
average_handle_time_minutes: Average handle time (using total meeting duration as proxy)
customer_late_minutes: LIMITED - Customer lateness in minutes (requires scheduled start time - see limitations)
agent_late_minutes: LIMITED - Agent lateness in minutes (requires scheduled start time - see limitations)
meeting_start_late_minutes: LIMITED - Meeting start lateness in minutes (requires scheduled start time - see limitations below)

Participant Classification

The tool classifies participants as internal or external using the following priority:

Scope field (primary method):
- scope="internal" → Classified as internal
- scope="external" → Classified as external
- scope="unknown" or null → Uses fallback methods below
Email domain (for unknown scope):
- Participants with @grain.co or @grain.com emails are classified as internal
Participant cache (for unknown scope):
- Uses a cache of known internal participants (by ID and name) built from all recordings
- Helps identify internal participants even when email is missing in some recordings
User ID (for unknown scope):
- If user_id is present, participant is classified as internal
Default: External (if none of the above apply)

This multi-method approach ensures accurate classification even when the API doesn't consistently return complete participant data across all recordings.

API Limitations & Data Availability

Important: The Grain API has limitations that affect some requested metrics:

✅ Available Metrics

No-Shows: Partially available
- The API provides confirmed_attendee field which indicates if a participant was present
- confirmed_attendee=False indicates a no-show
- Limitation: This only works if participants are marked as scheduled in Grain. If the API doesn't return scheduled participants who didn't attend, they won't be counted.
Average Handle Time: Available as proxy
- Using duration_ms (total meeting duration) as a proxy for handle time
- Limitation: This is the total meeting duration, not necessarily the exact "handle time" which may have a different definition (e.g., active engagement time, time from first contact to resolution, etc.)

❌ Not Available via API

Lateness Metrics (Customer/Agent/Meeting Start):
- Missing Data: The Grain API does not provide scheduled start times
- What's Available: Only actual start time (start_datetime) and actual join times
- Impact: Cannot calculate lateness without comparing actual vs scheduled times
- Workaround: Would require integration with calendar system (Google Calendar, Outlook, etc.) or CRM to get scheduled times
Complete No-Show Tracking:
- Missing Data: The API doesn't provide a list of all scheduled participants
- What's Available: Only participants who were in the recording or marked as no-shows
- Impact: If someone was scheduled but never appeared in the recording data, they may not be counted
- Workaround: Would require integration with calendar/CRM system to get full scheduled attendee list

Recommendations

To fully support the requested metrics, consider:

For Lateness Calculation:
- Integrate with calendar API (Google Calendar, Outlook, etc.) to get scheduled start times
- Compare start_datetime from Grain API with scheduled start time from calendar
- Calculate: lateness = actual_start - scheduled_start
For Complete No-Show Tracking:
- Integrate with calendar API to get all scheduled attendees
- Compare scheduled attendees with confirmed_attendee=True participants from Grain
- Missing attendees = no-shows
For Handle Time:
- Clarify the exact definition of "handle time" (total duration, active time, time to resolution, etc.)
- If different from total duration, may need additional processing or different data source

Summary

The analysis now includes the requested operational metrics where possible:

✅ Implemented:

No-show tracking (using confirmed_attendee field)
Average handle time (using meeting duration as proxy)

⚠️ Partially Implemented:

Lateness metrics (columns included but will be null/empty - requires scheduled start times not available in API)

The CSV output includes all requested columns. Lateness metrics are included as placeholders but will be empty/null until scheduled meeting data is available from an external source (calendar system, CRM, etc.).

Example Output

Analysis complete! Found 76 recordings.

First 10 rows:
  internal_user_ids  ... join_spread_minutes
0               NaN  ...                 0.0
1               NaN  ...                 0.0
...

Troubleshooting

API Authentication Errors

If you get authentication errors:

Verify your API key is correct
Ensure the API key has not expired
Check that your API key has the necessary permissions

No Recordings Found

If no recordings are found:

Verify the date range is correct
Check that recordings exist for that date range in the workspace associated with your API key
Ensure your API key has access to view recordings

Module Not Found Errors

If you get import errors:

Ensure you've activated the virtual environment
Run pip install -r requirements.txt to install dependencies

Dependencies

pandas>=2.3.3 - Data manipulation and analysis
numpy>=2.3.5 - Numerical computing
requests>=2.32.5 - HTTP library for API calls

License

This project is provided as-is for analysis purposes.

Support

For issues related to:

Grain API: Contact Grain support or check Grain API Documentation
This tool: Check the code comments or modify as needed for your use case

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
README.md		README.md
meeting_data_analysis.py		meeting_data_analysis.py
requirements.txt		requirements.txt
run_analysis.py		run_analysis.py

grain-team/meeting-data-analysis

Folders and files

Latest commit

History

Repository files navigation

Meeting Data Analysis

Features

Prerequisites

Installation

Getting Your API Key

Configuration

Required Parameters

Option 1: Command-Line Arguments (Recommended)

Option 2: Environment Variables

Option 3: Mix of Both

Usage

Basic Usage

Advanced Options

View All Options

Complete Example

Using as a Module

Command-Line Parameters

Required Parameters

Optional Parameters

Function Parameters (for direct use)

Output

Core Metrics

Operational Metrics (Requested Features)

Participant Classification

API Limitations & Data Availability

✅ Available Metrics

❌ Not Available via API

Recommendations

Summary

Example Output

Troubleshooting

API Authentication Errors

No Recordings Found

Module Not Found Errors

Dependencies

License

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages