An easy-to-use toolkit for visualizing patterns in qualitative data, helping researchers see and share connections between words, concepts and themes alongside in-depth accounts.
- Overview
- What This Toolkit Does
- Installation
- Using the Toolkit
- Using Your Own Data
- Troubleshooting
- Uninstallation
- Training Resources
- References
- Policies
The CMAP Visualization Toolkit offers a free suite of open-source tools to analyze and visualize text data: including fieldnotes, in-depth interview transcripts, historical documents, web-pages, and other forms of non-numeric information. It is designed for scholars working with qualitative methods, who have an interest in the possibilities for pattern analysis, data visualization, and identifying alternative explanations found in computational social science's (CSS).
The CMAP (Cultural Mapping and Pattern Analysis) tool is free, open-source and produced by the Computational Ethnography Lab at Rice University.
For an introduction, in an easy to use online version (not for sensitive data) check out the collab version here
You can find a short tutorial on using the toolkit in collab here). You can read a general description in the working paper here.
This notebook introduces elements of visualizing text data from qualitative sources and provides tools for:
- Validating text data
- Generating basic text statistics
- Charting concepts
- Visualizing themes
- Drawing comparisons at the level of words, codes, concepts, and documents
- Allowing analyses across subsets of data (e.g. examining variation by neighborhood, occupation, time)
CMAP Visualization Toolkit supports advanced analytic methods that are appropriate for computational text analysis and can be used alongside in-depth readings-- including co-occurence, clustering and embedding apporaches-- with visuals such as heatmaps, t-SNE dimensional reducation plots (like a scatter plot, with words), semantic networks, word clouds, and more. The examples are designed to work with common qualitative data sources and allow granular analysis that mirror qualitative practices (at the level of words, sentences, paragraphs), yet are scalable for large datasets produced by teams.
Examples from this toolkit using public data on scientists' careers.
** Read more about the approach here**
For the simplest installation, follow these steps:
If you are using anaconda:
- Open the anaconda program
- Left-click 'Environments' (left side of the interface)
- Left-click (left click the arrow next to base(root)-> cick open terminal.
- In the terminal window that opens, cut and paste the text in steps 1-3.
- Make sure to use the version for your system (windows, mac, or linux)
- Note: some windows versions require pasting by right-clicking instead of ctrl-v
-
Clone the repository:
git clone https://github.com/Computational-Ethnography-Lab/cmap_visualization_toolkit.git cd cmap_visualization_toolkit -
Run the installation script:
-
This sets up the python packages needed to run the toolkit.
For macOS/Linux:
chmod +x install.sh ./install.sh
For Windows (in Anaconda Prompt):
conda create -y --name cmap_visualization_toolkit python=3.11 conda activate cmap_visualization_toolkit pip install -r requirements.txt python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords'); nltk.download('wordnet'); nltk.download('averaged_perceptron_tagger'); print('NLTK resources downloaded successfully!')"
- Launch Jupyter Notebook:
jupyter notebook visualization_toolkit_final.ipynb
Now, you should be in the toolkit! You can run each block by clicking the run buton (triangular arrow) for each cell. The program explains what each cell does, and what can be edited for analysis/
If you get 'unable to compare versions error' check the kernel, in the top right. It should say something like visualization toolkit." ** You can also open the .ipynb file in vscode or another development environment, just check the kernel
If the one-command method doesn't work, or you want more granular control at the command line, try these step-by-step commands:
First, open your terminal. Then paste the following into the command line interface in order.
# 1. Create and activate conda environment (assuming a conda version)
conda create -y --name cmap_visualization_toolkit python=3.11
conda activate cmap_visualization_toolkit
conda install git
# 2. Clone the repository
git clone https://github.com/Computational-Ethnography-Lab/cmap_visualization_toolkit.git
cd cmap_visualization_toolkit
# 3. Install Jupyter (to ensure we have it before other packages)
conda install -y jupyter
# 4. Install packages from requirements.txt
pip install -r requirements.txt
# 5. Download NLTK resources
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords'); nltk.download('wordnet'); nltk.download('averaged_perceptron_tagger')"
# 6. Launch Jupyter Notebook
jupyter notebook visulization_toolkit_final.ipynb
# 6b. If you would rather use VS code or another integrated development enviornment, simply open the .ipynb file in that application.This installation method ensures all packages are installed with the correct versions specified in the requirements.txt file.
If you prefer to install step by step or need more control over the process:
Before starting, you need to install Anaconda, which is free software that helps manage Python packages.
-
Download Anaconda:
- Go to the Anaconda website
- Click the "Download" button
- Choose the version for your computer (Windows, Mac, or Linux)
-
Install Anaconda:
- Double-click the downloaded file
- Follow the on-screen instructions
- Accept the default options if you're unsure
For users comfortable with command line:
-
Open Terminal or Command Prompt:
- Windows: Open "Anaconda Prompt" from Start menu
- Mac/Linux: Open Terminal app
-
Create and Set Up Environment:
# Create a new environment conda create --name cmap_visualization_toolkit python=3.11 # Activate the environment conda activate cmap_visualization_toolkit # Get the code git clone https://github.com/Computational-Ethnography-Lab/cmap_visualization_toolkit.git cd cmap_visualization_toolkit # Install Jupyter conda install -y jupyter # Install other packages with version constraints pip install -r requirements.txt # Download NLTK resources (standard language processing datasets) python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords'); nltk.download('wordnet'); nltk.download('averaged_perceptron_tagger'); print('NLTK resources downloaded successfully!')"
For users who prefer a visual interface:
-
Open Anaconda Navigator:
- Windows: Click Start menu → Anaconda Navigator
- Mac: Open Applications folder → Anaconda Navigator
- Linux: Open terminal and type
anaconda-navigator
-
Create a New Environment:
- Click on "Environments" tab on the left side
- Click "Create" button at the bottom
- Type
cmap_visualization_toolkitas the name - Select Python 3.11 from the dropdown
- Click "Create" button
-
Install Jupyter:
- With your new environment selected, go to the "Home" tab
- Select your new environment from the dropdown menu
- Install Jupyter Notebook by clicking "Install"
-
Open Terminal in Your Environment:
- Go back to "Environments" tab
- Click on your
cmap_visualization_toolkitenvironment - Click the play button (▶) and select "Open Terminal"
- In the terminal, run:
# Get the code git clone https://github.com/Computational-Ethnography-Lab/cmap_visualization_toolkit.git cd cmap_visualization_toolkit # Install packages pip install -r requirements.txt # Download NLTK resources (standard language processing datasets) python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords'); nltk.download('wordnet'); nltk.download('averaged_perceptron_tagger'); print('NLTK resources downloaded successfully!')"
-
Start Your Environment (if not already activated):
conda activate cmap_visualization_toolkit
-
Open the Notebook:
jupyter notebook visulization_toolkit_final.ipynb
If using VS Code:
- Open VS Code
- Click "File" → "Open Folder" and select the cmap_visualization_toolkit folder
- Find and double-click on
visulization_toolkit_final.ipynb - When prompted, select the
cmap_visualization_toolkitkernel
-
Run the Code:
- Click on the first gray box (called a "cell")
- Click the "Run" button (triangle symbol ▶) or press Shift+Enter
- Wait for it to finish (when the * symbol disappears)
- Move to the next cell and repeat
-
Start Your Environment:
- IF using Navigator: Open Anaconda Navigator, click your environment, then click "▶" and "Open Terminal"
- IF using command line: Open terminal and type
conda activate cmap_visualization_toolkit, and switch to the directory with the toolkid 'cd cmap_visualization_toolkit'
-
Open the Notebook:
jupyter notebook visulization_toolkit_final.ipynb
-
Run Each Section:
- Click on a cell
- Press the Run button (▶) or Shift+Enter
- Continue through all cells in order
-
Save Results:
- To save an image: Right-click on it and select "Save Image As..."
- To copy text: Highlight it and press Ctrl+C (Windows) or Cmd+C (Mac)
To analyze your own text data:
-
Prepare Your Data:
- Create a CSV file with at least a column called
text - Optionally add a column called
projectto group texts - Save it in the
datafolder
- Create a CSV file with at least a column called
-
Change the File Path:
- In the notebook, find the cell that loads data
- Change the file name to your CSV file name
- Run the cells in order
-
Adjust Settings:
- Word clouds: Change keywords to find specific topics
- Networks: Adjust threshold values to show more/fewer connections
- Heatmaps: Change clustering method (1=RoBERTa, 2=Jaccard, 3=PMI, 4=TF-IDF)
For technically minded users, here's the complete schema for data files (See Abramson et al. 2025):
# Updated schema with Python typing
schema = {
"project": str, # List project
"number": str, # Position information
"reference": int, # Position information
"text": str, # Content, critical field: must not be empty
"document": str, # Data source, Critical field: must not be empty
"old_codes": list[str], # Optional: codings, must be a list of strings
"start_position": int, # Position information
"end_position": int, # Position information
"data_group": list[str],# Optional, to differentiate document sets: Must be a list of strings
"text_length": int, # Optional: NLP info
"word_count": int, # Optional: NLP info
"doc_id": str, # Optional: NLP info, unique paragrah level identifier
"codes": list[str] # Critical for analyses with codes, Must be a list of strings
} example (simulated, not actual data):
{
"project": "engineer_interviews",
"number": "675:113" #For reconstructing with QDA software
"reference": 244 #For reconstructing with QDA software
"text": "EG002: So the thing is..." # Actual paragraph level text,
"document": "INTV_EG002_20250801.txt" # name of document, in which text is found
"start_position": 244 #For reconstructing with QDA software
"end_position": 248 #For reconstructing with QDA software
"data_group": ["data_type_cg_interviews","interview"] # For subsetting by data type, or characteristic
"text_length": 441 # nlp
"word_count": 82 # nlp
"doc_id": "EG002_76426", #unique ID for text segment in file, to reconstruct or link to raw data, sequential
"codes": ["narrative_life", "education_perceptions"] # qualitative codes, for concepts, themes, variables
"old_codes": [
"science_belief",
"subject_speech_all"
] # qualitative codes, for concepts, themes, variables; archived to silo and reduce clutter
},//Critical Fields:
text: Main content field - cannot be emptydocument: Source information - cannot be emptycodes: Required for code-based analyses - must be a list of strings if used
Important Notes:
- Lists (like
codesanddata_group) must be proper Python lists, not strings that look like lists - If you're exporting from qualitative data analysis software, ensure you convert any code fields to proper lists
- The toolkit will validate your data structure and provide error messages for common issues
Here are solutions to common issues you might encounter:
-
Installation Script Issues:
- If the installation script doesn't work, try the step-by-step commands in the Step-by-Step Installation section
- For permission issues with
install.sh, run:chmod +x install.shbefore executing
-
Package Version Conflicts:
- If you see version compatibility errors, try installing without version specifications:
pip install -r requirements.txt --no-deps - For Mac with Apple Silicon (M1/M2/M3), you may need:
pip install torch --extra-index-url https://download.pytorch.org/whl/cpu
- If you see version compatibility errors, try installing without version specifications:
-
CUDA/GPU Issues with PyTorch:
- If you encounter CUDA errors, you might need a specific torch version:
pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cu118 - For CPU-only:
pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cpu
- If you encounter CUDA errors, you might need a specific torch version:
-
Memory Errors:
- If you get "out of memory" errors, try processing smaller batches of data
- Close other applications to free up system memory
-
Import Errors:
- Make sure your directory structure is correct with the
functionfolder at the same level as the notebook - Check that all packages are installed correctly
- Make sure your directory structure is correct with the
-
Visualization Issues:
- If plots are not displaying correctly, try running
%matplotlib inlinein a notebook cell - For interactive plots, run
pip install ipywidgetsand thenjupyter nbextension enable --py widgetsnbextension
- If plots are not displaying correctly, try running
-
Data Format Issues:
- If you see errors related to data types, ensure your CSV has the correct format per the schema
- Common issue: Make sure
codesanddata_groupare proper lists, not strings - Fix: Use
df['codes'] = df['codes'].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x)to convert string representations to lists
To remove the CMAP Visualization Toolkit from your system:
-
Remove the Environment:
# Deactivate the environment if it's currently active conda deactivate # Remove the environment and all its packages conda env remove --name cmap_visualization_toolkit
-
Delete the Code:
# Navigate up one directory (if you're in the project directory) cd .. # Remove the project directory rm -rf cmap_visualization_toolkit
-
Clean Conda Cache (Optional):
# Remove unused packages and caches conda clean --all
This will completely remove all toolkit components from your system.
- Anaconda Navigator (GUI) Setup: Getting Started with Anaconda Navigator
- Conda Command Line (CLI) Setup: Getting Started with Conda
For more detailed information, refer to the Anaconda Documentation.
This toolkit builds on academic work combining computational text analysis with qualitative research methods (Abramson et al. 2018, 2025). Please see the lab repo for additional resources and related research papers.
See LICENSE.md. BSD 3-Clause License Copyright (c) 2025 Computational Ethnography Lab (Abramson et al.)
Important: If you use this software, please cite as: "Abramson, Corey and Yuhan (Victoria) Nian. 2025. CMAP Visualization Toolkit.
https://github.com/Computational-Ethnography-Lab/cmap_visualization_toolkit."
No warranty is provided. If you want to contribute, please email corey.abramson@rice.edu.
- Corey M. Abramson, Ph.D. - Associate Professor of Sociology, Rice University
- Zhuofan Li, Ph.D. — Assistant Professor of Sociology, Virginia Tech
- Tara Prendergast, Ph.D. Candidate — School of Sociology, University of Arizona
- Victoria (Yuhan) Nian — Undergraduate Student, Statistics/Data Science, Rice University
- Jakira Silas, Graduate Student — Graduate Student, Sociology, Rice University
- Kieran Turner, Graduate Sudent - Graduate Student, Sociology, Rice University
We thank all contributors to various iterations of this code for their valuable feedback, particularly the contributors above, and UC San Francisco's Medical Cultures Lab.
See data_ethics.md for details on ethical considerations, including data anonymization, consent, and use restrictions.
See DISCLAIMER.md for important legal and usage disclaimers.
- LLMs (primarily Claude-Sonnet) were used to check for errors and help annotate code.
- If you find errors or are interested in collaboration, contact corey.abramson@rice.edu.
- This free software carries no warranty or guarantee.



