Platform policies, such as Terms of Service (ToS), community guidelines, and privacy policies, are instrumental in setting standardized rules for digital platforms. Scholars emphasize the importance of studying platform governance frameworks, which offer idealized accounts of platform identity and governance principles. However, longitudinal studies of platform policies are rare in communication research due to challenges like corporate secrecy and the lack of historical records. To address this gap, we present the Platform Policy Watch tool. This tool aims to systematically collect, analyze, and document platform policies, providing a methodological framework for studying platform histories.
Please acknowledge this repository by citing it if you decide to use our tool.
Chris Chao Su; Ngai Keung Chan; Wanjiang Jacob Zhang; Benita Dederichs; Yangshuo Teng; Scarlett Sun
Su, C. C., & Chan, N. K. (2025). Assembling platform governance as private ordering in the age of generative AI: platform interdependence in policy evolution. Information, Communication & Society, 1-25. https://doi.org/10.1080/1369118X.2025.2513672
Chan, N. K., Su, C. C., & Shore, A. (2023). Shifting platform values in community guidelines: Examining the evolution of TikTok’s governance frameworks. New Media & Society, 27(2), 1127-1151. https://doi.org/10.1177/14614448231189476
Our tool comprises three components:
-
PolicyTimeMachine: This component is responsible for scraping and tracking platform policies from various sources, including historical policy scraping from the Wayback Machine and recording the latest policies from official platforms.
-
PolicyParser: Designed to clean and segment the collected policies into distinct sections for further data analysis.
-
PolicyLabeler: Utilizes ChatGPT to code the segmented data, streamlining the coding process.
With this tool, researchers can explore the evolution of platform policies and the methodological challenges involved in longitudinal research.
We've developed a custom Python program named PolicyTimeMachine for tracking policies on approximately 50 social media and labor platforms. It consists of two main functions:
This function retrieves data from the Wayback Machine, an archive containing billions of web pages stored chronologically. Our program modifies the URL's date and target address to generate a list of all archived target pages. It accesses these URLs, storing relevant data in structured Comma-Separated Values (CSV) files for future analysis.
This function directly extracts the latest policies from platforms' official sites, saving them in the same CSV format.
PolicyParser is designed to segment collected policies into distinct sections for data analysis. These sections help in addressing specific subjects, situations, or aspects related to platform usage, application, and content production.
PolicyLabeler automates the labeling of extracted policy sections based on ChatGPT's large language models by sending queries to the OpenAI API. This Python-based program enables researchers to extract codes, such as values, actors, and responsibilities, from platform policies. It involves setting up the environment, obtaining OpenAI's API Key, preparing prompts, running models, and potential fine-tuning based on accuracy and performance.
The Platform Policy Watch tool empowers researchers to systematically document and analyze platform governance changes. It can help in triangulating narratives of platform histories with platforms' and users' perspectives. However, it's essential to recognize that research tools involve invisible work, especially during moments of breakdown. We advocate for a more reflexive approach to computational research tools, allowing communication researchers to adapt tools to their research and critically assess their methodological strengths and weaknesses.
Feel free to adjust, modify, and add any additional information based on the study context to effectively use this tool.