-
Notifications
You must be signed in to change notification settings - Fork 1
feat: add TikTok scraper and updated error handling #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Replace old modules (handler.rs, insta.rs, main.rs, tiktok.rs, x.rs) - Add new discord integration module - Add scraper functionality with twitter support - Update dependencies in Cargo.toml and Cargo.lock - Add utility macros for error handling Add tracing subscriber and implement parallel media download/sending for Discord bot Refactor architecture with dependency injection pattern - Implement service container pattern for better dependency management - Create dedicated service modules for scraping and messaging functionality - Use enum-based polymorphism for service implementations - Improve separation of concerns and testability - Maintain backward compatibility with existing functionality Enhance logging with structured tracing - Add more detailed logging with different log levels (info, debug, warn, error) - Include contextual information like user_id, chat_id, and command in spans - Add progress tracking for media download and send operations - Improve error reporting with detailed error messages - Add success/failure counts for better monitoring - Use structured logging fields for better log analysis Refactor error handling with custom BotError enum - Create custom BotError enum with specific error variants - Replace generic anyhow::Error with BotError in core services - Add specific error variants for different error conditions - Implement user-friendly error messages for each error type - Remove old macros and error modules, consolidate error handling - Use structured error handling throughout the application fixed twitter photo json parsing Refactor scraper service with improved error handling and type safety save moved error builder macros declaration to error.rs tweaked error handling for scraper response deserializing (impl TryFrom<&serde_json::Value> for TwitterMediaMetadata) sorted dependencies with cargo-sort Refactor error handling and cleanup logging - Added BotError::Custom and custom! macro for flexible error messages. - Simplified error! macros to use ::tracing directly. - Refactored Twitter scraper to handle API error responses and internalize URL parsing. - Cleaned up Telegram bot logging and command handling. - Simplified VSCode launch configurations. - Improved error reporting in message service. Refactor project structure: reorganize modules and remove services
- Migrate `bots/telegram` and `senders/telegram` to `src/telegram/` - Move `scrapers/twitter` to `src/twitter/` - Consolidate global error handling into `src/telegram/error.rs` - Update `lib.rs` exports and internal imports to match new paths - Introduce `main.rs` and `prelude.rs` for cleaner entry points
- Remove `src/telegram/prelude.rs` and replace with explicit exports in [src/telegram/mod.rs](cci:7://file:///home/ibaby/Desktop/bot-rs/src/telegram/mod.rs:0:0-0:0). - Decouple `TelegramBot` from `BotTrait` and move `run` logic directly into the struct. - Optimize `TelegramSender` to reuse `Bot` instance via `Arc` for concurrent dispatch. - Feature-gate `telegram` and `twitter` modules in `src/lib.rs`. - Update `TwitterScraper` imports to align with new module paths.
- Transition main entry point to tokio::task::JoinSet to enable multiple bots to runs at once (for future bots). - Introduce feature-gated initialization for the Telegram bot in main.rs. - Reorganize twitter module visibility and move re-exports to twitter/mod.rs. - Use fully qualified paths for tracing components and simplify internal imports.
- Update `TelegramBot::run` signature to return `()`. - Simplify `JoinSet` task management in `main.rs`. - Add `#[allow(dead_code)]` to currently unused core types. These changes reduce boilerplate in the main startup sequence and clean up compiler warnings for future-use types.
- Moved `Error` enum and `BotResult` to `src/core/error.rs` - Moved and genericized error macros to `src/core/error.rs` - Deleted `src/telegram/error.rs` - Updated `TelegramSender` and `TwitterScraper` to use core errors - Renamed `Error::Other` to `Error::Unknown`
- Restrict visibility of internal types, traits, and errors to `pub(crate)` in `core`, `telegram`, and `twitter` modules. - Encapsulate `TelegramBot` fields and `Command` enum. - Remove unnecessary public re-exports from `lib.rs`. - Reorder file contents (macros, types, impls) and sort imports to adhere to project style standards.
Updates visibility modifiers from `pub(crate)` to `pub` for core traits, errors, and types to ensure they are accessible to the binary target. Simplifies internal module imports by leveraging unified usage of the `core` module and updates `main.rs` to use the library prelude. Changes: - Make `Error`, `BotResult`, `MediaMetadata`, and traits public. - Make `TwitterScraper` public. - Replace verbose imports in submodules with `crate::core::*`. - Update `main.rs` to use `media_bot::prelude::*`.
Refactor `TelegramSender` and `TwitterScraper` to move their core logic into inherent `impl` blocks. The `MediaSender` and `MediaScraper` trait implementations now delegate to these inherent methods. - Improves ergonomics by removing the need to import traits for usage. - Hides trait implementations from documentation using `#[doc(hidden)]`.
Renames the central `Error` enum to `BotError` to reduce ambiguity with the standard library `std::error::Error` and improve type clarity. Changes include: - Renaming the enum definition and `BotResult` alias in [src/core/error.rs](cci:7://file:///home/ibaby/Desktop/bot-rs/src/core/error.rs:0:0-0:0). - Updating error generation macros (`custom!`, `helper_error_macro!`). - Updating `Display` and `Error` trait implementations. - Propagating the name change to `MediaSender` and `MediaScraper` trait implementations in [src/telegram/sender.rs](cci:7://file:///home/ibaby/Desktop/bot-rs/src/telegram/sender.rs:0:0-0:0) and [src/twitter/scraper.rs](cci:7://file:///home/ibaby/Desktop/bot-rs/src/twitter/scraper.rs:0:0-0:0). - Updating public re-exports in `src/lib.rs`. Includes minor code formatting adjustments in `src/telegram/sender.rs`.
…tures - introduce `send_msg!` macro to simplify message sending and error logging - add `FeatureNotEnabled` error to handle disabled feature flags gracefully - add `CommandNotFound` error for unknown commands - refactor `answer` function to use new macro and error types
…nitions Refactor the `MediaSender` trait to replace `Result` return types with a generic `Output`, allowing flexible result handling. Key changes: - Update `TelegramSender` to process media downloads and sends concurrently using `JoinSet`. - Change `send_medias` to return a collection of individual message results instead of a single result. - Simplify `TelegramBot` command handling and adapt to new sender interface. - Remove unused `BotError::FeatureNotEnabled` and duplicate error macros. Remove telegram feature on import from main.rs Add conditional compilation attributes for feature-gated modules replaced 'X_LINK' env variable with const value updated Readme.md Changed twitter module visibility to public (pub mod) fixed features dependencies Modified MediaScraper trait moved X_LINK variable to twitter::config Changed TwitterScraper struct to an empty enum tweak scraper link variable management
- Implement `TikTokScraper` in `src/tiktok` with URL redirection support. - Add `/tiktok` (alias `/tk`) command to the Telegram bot, guarded by feature flag. - Refactor `TwitterScraper` to explicitly implement `MediaScraper` trait. - Remove unused `id` field from `MediaMetadata` and add constructor. - Extract tracing initialization into helper and fix `.env` loading order. - Reorganize `src/core/error.rs` types and macros. - Instrument `TelegramSender` tasks with tracing spans for better observability.
- Rename to for better semantic clarity. - Update , , and implementations to match new trait definitions. - Extract method in to decouple redirection logic. - Integrate comprehensive instrumentation across scrapers and senders to improve runtime observability.
- Implement strict validation for TikTok URLs to reject malformed inputs (domains, IDs) early. - Add error notification logic in Telegram sender to inform users when media delivery fails. - Improve observability by promoting logs to 'warn' level for partial scraping or sending failures. - Fix typo in bot response logging.
- Convert `TelegramBot` to a stateless execution model using static run methods. - Add `default_handler` to gracefully catch and report unknown commands. - Downgrade operational log levels from `error!` to `warn!` and remove verbose debug logs. - Offload error message sending to async tasks in `TelegramSender` to avoid blocking. - Optimize `get_tiktok_url` signature and fix reference passing in scraper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds TikTok scraping support and refactors the error handling system to provide better user feedback and cleaner logging. The changes include a new TikTok scraper module, improved error messages, architectural improvements to the Telegram bot, and renaming of core traits for clarity.
Changes:
- Added TikTok scraper with support for standard and shortened URLs (vm.tiktok.com, vt.tiktok.com)
- Refactored error handling to use distinct error variants with user-friendly messages and downgraded log levels from error to warn for expected failures
- Renamed
MediaScraper::scrapetoget_mediasand refactored Telegram bot to use static dispatcher pattern
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
| src/tiktok/scraper.rs | New TikTok scraper implementation with URL validation and redirection handling |
| src/tiktok/mod.rs | Module exports and configuration constants for TikTok scraper |
| src/twitter/scraper.rs | Refactored to align with new trait signature and error handling patterns |
| src/twitter/mod.rs | Added config submodule for scraper link constant |
| src/telegram/bot.rs | Converted to static dispatcher with separate command and default handlers |
| src/telegram/sender.rs | Updated error handling and added tracing spans |
| src/core/error.rs | Expanded error variants and changed logging from error to warn |
| src/core/traits.rs | Renamed scrape method to get_medias |
| src/core/types.rs | Removed unused id field and added constructor method |
| src/core/mod.rs | Added module-level unused attribute |
| src/main.rs | Refactored to use JoinSet and extracted tracing initialization |
| src/lib.rs | Added tiktok module export |
| Cargo.toml | Added tiktok feature |
| Cargo.lock | Updated url dependency |
| README.md | Updated documentation with TikTok command |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/tiktok/mod.rs
Outdated
| pub const TIKTOK_SCRAPER_LINK_END: &str = ".mp4"; | ||
| #[expect(unused)] | ||
| pub const BROWSER_UA: &str = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"; | ||
| pub const MINIMAL_USER_AGENT: &str = "curl/8.7.1"; // Use the exact version from your output |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment mentions this is a minimal user agent from 'your output', but the version 'curl/8.7.1' is hardcoded. This comment should either be removed or clarified, as it suggests this was copied from specific test output rather than being a deliberate choice. If the exact curl version matters for the TikTok API, this should be documented.
| pub const MINIMAL_USER_AGENT: &str = "curl/8.7.1"; // Use the exact version from your output | |
| pub const MINIMAL_USER_AGENT: &str = "curl/8.7.1"; // Minimal user agent string used for TikTok requests; update if TikTok starts requiring a different value |
src/main.rs
Outdated
| .with_target(true) // Include module target in logs | ||
| .init(); | ||
| // enable tracing logs | ||
| tracing_subcriber().init(); |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in function name: 'tracing_subcriber' should be 'tracing_subscriber'.
src/tiktok/scraper.rs
Outdated
| info!("TikTok media Metadata retrieving results: 1 total, 1 successfull, 0 failed"); | ||
| } else { | ||
| warn!("TikTok media Metadata retrieving results: 1 total, 0 successfull, 1 failed"); |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in log message: 'successfull' should be 'successful'.
| info!("TikTok media Metadata retrieving results: 1 total, 1 successfull, 0 failed"); | |
| } else { | |
| warn!("TikTok media Metadata retrieving results: 1 total, 0 successfull, 1 failed"); | |
| info!("TikTok media Metadata retrieving results: 1 total, 1 successful, 0 failed"); | |
| } else { | |
| warn!("TikTok media Metadata retrieving results: 1 total, 0 successful, 1 failed"); |
src/tiktok/scraper.rs
Outdated
| let path_segments = url.path().split('/').collect::<Vec<_>>(); | ||
| if path_segments.len() != 1 { | ||
| return Err(invalid_url!( | ||
| "url path should look like 'https://vm.tiktok.com/ABC123'" | ||
| )); | ||
| } | ||
|
|
||
| if path_segments[0].len() < 6 || !path_segments[0].chars().all(|c| c.is_alphanumeric()) { |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validation logic for the shortened URL path is flawed. When the path is split by '/', an empty string will be the first element due to the leading '/'. The check path_segments[0].len() < 6 will always be checking an empty string. The actual path segment should be at index 1 after filtering or using a proper path parsing method.
| let path_segments = url.path().split('/').collect::<Vec<_>>(); | |
| if path_segments.len() != 1 { | |
| return Err(invalid_url!( | |
| "url path should look like 'https://vm.tiktok.com/ABC123'" | |
| )); | |
| } | |
| if path_segments[0].len() < 6 || !path_segments[0].chars().all(|c| c.is_alphanumeric()) { | |
| let mut segments = url | |
| .path_segments() | |
| .ok_or_else(|| invalid_url!( | |
| "url path should look like 'https://vm.tiktok.com/ABC123'" | |
| ))?; | |
| let segment = segments | |
| .next() | |
| .ok_or_else(|| invalid_url!( | |
| "url path should look like 'https://vm.tiktok.com/ABC123'" | |
| ))?; | |
| // Ensure there is exactly one non-empty path segment | |
| if segments.next().is_some() | |
| || segment.len() < 6 | |
| || !segment.chars().all(|c| c.is_alphanumeric()) | |
| { |
src/core/mod.rs
Outdated
| #![allow(unused)] | ||
|
|
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module-level #![allow(unused)] attribute is too broad and suppresses all unused code warnings in the entire core module. This can hide legitimate issues such as unused functions, variables, or imports that should be removed. Consider using more specific attributes like #[allow(dead_code)] on individual items that are intentionally unused, or removing genuinely unused code.
| #![allow(unused)] |
src/twitter/scraper.rs
Outdated
| .map(ToString::to_string) | ||
| .unwrap_or(BotError::Unknown.to_string()); |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using ToString::to_string() on a JSON Value will produce a JSON representation (e.g., "\"error message\"" with quotes), not a plain string. Use .as_str().unwrap_or("unknown error") instead to extract the actual string value from the JSON.
| .map(ToString::to_string) | |
| .unwrap_or(BotError::Unknown.to_string()); | |
| .and_then(|v| v.as_str()) | |
| .map(str::to_owned) | |
| .unwrap_or_else(|| BotError::Unknown.to_string()); |
src/telegram/sender.rs
Outdated
| if let Err(err) = result { | ||
| warn!("Failed to send error message to chat: {err}"); | ||
| return Err(err); |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling logic is redundant. Lines 44-47 check if the result is an error, log and return it. Then line 49 returns the result again. If the result is Ok, it's returned twice (once in the implicit Ok from the if-let, once explicitly). If it's Err, it's also returned after being logged. The explicit return on line 46 makes line 49 unreachable for error cases. Consider simplifying to just returning the result after optional logging.
| if let Err(err) = result { | |
| warn!("Failed to send error message to chat: {err}"); | |
| return Err(err); | |
| if let Err(err) = &result { | |
| warn!("Failed to send error message to chat: {err}"); |
src/tiktok/scraper.rs
Outdated
| )); | ||
| } | ||
|
|
||
| let path_segments = url.path().split('/').collect::<Vec<_>>(); |
Copilot
AI
Jan 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path validation logic is incorrect. When splitting the path by '/', if there are 2 segments (e.g., '/video/123'), splitting by '/' will produce 3 elements: ['', 'video', '123']. The check path_segments.len() != 1 will always fail for valid shortened URLs like 'https://vm.tiktok.com/ABC123' which have path '/ABC123' (splitting gives ['', 'ABC123']).
| let path_segments = url.path().split('/').collect::<Vec<_>>(); | |
| let path = url.path().trim_matches('/'); | |
| if path.is_empty() { | |
| return Err(invalid_url!( | |
| "url path should look like 'https://vm.tiktok.com/ABC123'" | |
| )); | |
| } | |
| let path_segments = path.split('/').collect::<Vec<_>>(); |
- fix(typo): correct typos in `tracing_subscriber`, `successful`, `occurred` - refactor(tiktok): improve URL validation and update User Agent comment - refactor(core): narrow `allow(unused)` to `allow(unused_imports)` - refactor(telegram): improve error classification and remove redundant error handling - refactor(telegram): simplify username extraction in tracing - fix(twitter): correct JSON error message extraction - style: normalize capitalization in logs
1. Summary of Changes
This PR introduces support for TikTok scraping (including handling of shortened URLs). It also includes a significant refactor of the core error handling system to provide better user feedback and cleaner logs, alongside architectural improvements to the Telegram bot structure and tasks management using
tokio::task::JoinSet.2. Details of Changes
New Features
src/tiktok/module withTikTokScraperto handletiktok.com,vm.tiktok.com, andvt.tiktok.comlinks. Implemented logic to resolve redirections and extract video IDs./tiktok(alias/tk) command to the Telegram bot.Architecture & Refactor
ErrortoBotErrorinsrc/core/error.rs.BotErrorvariants (CommandNotFound,NoMediaFound, etc.) to provide distinct, user-friendly error messages.error!towarn!for expected failures (e.g., invalid user input).MediaScraper::scrapetoget_mediasto better reflect its purpose of retrieving metadata vs downloading.TelegramBotto a static dispatcher (enum) instead of an instance struct.command_handleranddefault_handler.TwitterScraperto align with the newMediaScrapertrait and error handling.MediaMetadataby removing the unusedidfield.Chores & Cleanup
src/main.rsto usetokio::task::JoinSetfor managing bot tasks.configsubmodules intiktokandtwitter.3. Type of Change
4. Verification & Testing
Manual Verification:
vm.tiktok.comandwww.tiktok.comlinks; verified video ID extraction and metadata retrieval./twittercommand still works with the refactored scraper.WARNinstead ofERROR.mainstarts the bot successfully with the newJoinSetlogic.