Global Dialogues data and analysis tools
This repository contains data and analysis tools from the "Global Dialogues" survey project, conducted in collaboration with Remesh.ai, Prolific, and The Collective Intelligence Project.
Global Dialogues are recurring surveys where participants answer demographic questions, attitude polls, and open-ended opinion questions about the future of AI. Participants also vote (Agree/Disagree) on a sample of other participants' responses and complete pairwise comparisons of responses.
To view the results of the Global Dialogues rounds in a simpler Spreadsheet format, you can view the GD round results in Google Sheets here.
To interact with an ongoing public version of the survey, you can participate in an active demonstration version of the GD4 Survey here.
Install required packages:
pip install -r requirements.txtThis repository includes a comprehensive Makefile with commands for all common tasks. To see all available commands:
make helpFor new users, the typical workflow is:
- Preprocess data:
make preprocess-gd3(or other GD number) - Run analysis:
make analyze-gd3 - Download embeddings (for advanced analysis):
make download-embeddings-gd3
For users to understand the processes to update Data files and run analysis:
-
Download Raw Data: The raw CSV export files from Remesh.ai are downloaded for a given Global Dialogue round (e.g., GD3) as soon as the round is completed and added to the corresponding
Data/GD<N>/directory. -
Preprocess Data: Use the Makefile command (recommended) or run scripts directly:
# Recommended: Use Makefile make preprocess-gd3 # Alternative: Run scripts directly python tools/scripts/preprocess_cleanup_metadata.py 3 python tools/scripts/preprocess_aggregate.py --gd_number 3
-
Run Analysis: Use the Makefile command (recommended) or run scripts directly:
# Recommended: Use Makefile for full analysis pipeline make analyze-gd3 # Alternative: Run individual analyses python tools/scripts/calculate_consensus.py --gd_number 3 python tools/scripts/calculate_divergence.py --gd_number 3 # etc.
-
Advanced Analysis (Optional): Download embeddings and run thematic analysis:
make download-embeddings-gd3 make run-thematic-ranking-gd3
Note: All Make commands are documented in the Makefile. Run make help to see the full list of available commands for preprocessing, analysis, and utilities.
The project involves recurring survey rounds:
| Dialogue | Time | Status |
|---|---|---|
| GD1 | September 2024 | Complete |
| GD2 | January 2025 | Complete |
| GD3 | March 2025 | Complete |
| GD4 | June 2025 | Complete |
| GD5 | July 2025 | Planned |
Data for each completed round is stored in its respective folder within /Data (e.g., /Data/GD1). Detailed descriptions of the data files and columns can be found in Data/Documentation/DATA_GUIDE.md.
Some advanced analyses (such as thematic ranking) require pre-computed text embeddings, which result in large data files (~800MB each) unsuitable for direct storage in this Git repository. These files are hosted separately and can be downloaded as needed.
The repository includes a script to easily download embedding files:
# Show available embedding files and their status
python tools/scripts/download_embeddings.py --list
# Download embeddings for a specific Global Dialogue
python tools/scripts/download_embeddings.py 3 # For GD3
# Download all available embedding files
python tools/scripts/download_embeddings.py --all
# Force re-download even if file already exists
python tools/scripts/download_embeddings.py 3 --forceYou can also use the Makefile commands (recommended):
make download-embeddings # Show available files
make download-embeddings-gd3 # Download for GD3
make download-all-embeddings # Download allThe files will be automatically placed in the correct locations:
global-dialogues/
└── Data/
├── GD1/
│ └── GD1_embeddings.json
├── GD2/
│ └── GD2_embeddings.json
└── GD3/
└── GD3_embeddings.json
Note: These embedding files are included in .gitignore to prevent accidental commits.
Each round folder contains the following data files:
GD<N>_aggregate_standardized.csv: The primary file for all analyses - a cleaned and standardized version of the raw aggregate data with consistent columns and formatting. This file is created by running the preprocessing script and should be used for all analysis work.GD<N>_segment_counts_by_question.csv: Contains participant counts for each segment per question, needed for certain analyses.
GD<N>_aggregate.csv: Raw survey data aggregated by Question, including segment agreement rates for Ask Opinion questions. Has inconsistent formatting and metadata rows.GD<N>_binary.csv: Individual participant votes (agree/disagree/neutral) on Ask Opinion responses.GD<N>_participants.csv: Survey data organized by Participant, showing individual responses to each question. Includes overall agreement rates for Ask Opinion responses submitted by the participant.GD<N>_preference.csv: Pairwise preference judgments between Ask Opinion responses.GD<N>_verbatim_map.csv: Mapping of Ask Opinion response text (Thought Text) to the participant who authored it and the question it belongs to.GD<N>_summary.csv: LLM-generated summaries for the overall dialogue and individual questions.
Detailed descriptions of file formats and columns can be found in Data/Documentation/DATA_GUIDE.md.
Analysis scripts and notebooks can be found in the /tools directory.
The PRI is a composite score that assesses participant response quality and reliability. It combines multiple signals including duration, response quality tags, universal disagreement rates, and consensus voting patterns. For detailed documentation, see PRI_GUIDE.md.