The X.com Tweet Extraction Project is designed to extract and store tweets related to specific topics (e.g., technology, AI, innovation, and science) using asynchronous requests. The project scrapes tweets based on a search query while handling rate-limiting scenarios. Extracted tweet data is saved in a CSV file for further analysis.
- Search Query: Extracts tweets related to "technology", "innovation", "AI", and "science", excluding links and replies.
- Asynchronous Processing: Utilizes Python's
asynciofor non-blocking operations, enabling efficient tweet extraction. - Rate-Limiting Handling: Gracefully handles rate-limiting by waiting for the reset time when the API limit is reached.
- Text Normalization: Ensures tweet text is sanitized by removing problematic non-ASCII characters.
- CSV Storage: Saves tweet data such as tweet count, username, text, creation time, retweet count, and like count into a CSV file.
- Python: Programming language used for the project.
- Twikit: A Python client for interacting with X.com.
- asyncio: Used for handling asynchronous tasks and non-blocking I/O operations.
- ConfigParser: For reading configuration files containing login credentials.
- CSV: Used to save the extracted tweet data in a structured format.
- unicodedata: For normalizing and handling non-ASCII characters in tweet text.
- Credentials Loading: The project loads login credentials (e.g., username, email, password) from a configuration file (
config.ini) to authenticate with X.com. - Client Initialization: Initializes the
twikit.Clientand loads necessary cookies from a file (cookies.json) for authentication. - Tweet Extraction: Tweets are fetched asynchronously based on the specified search query. The program waits for the rate limit to reset when the limit is reached.
- Text Normalization: Non-ASCII characters in tweet text are removed or replaced to ensure compatibility and clean data.
- CSV Output: Extracted tweet details, including tweet count, username, tweet text, creation time, retweets, and likes, are written to a CSV file (
tweets.csv).