diff --git a/seatunnel-skill/SKILL.md b/seatunnel-skill/SKILL.md index b0ceb2d..1fce1fd 100644 --- a/seatunnel-skill/SKILL.md +++ b/seatunnel-skill/SKILL.md @@ -4,11 +4,13 @@ description: Apache SeaTunnel - A multimodal, high-performance, distributed data author: auto-generated by repo2skill platform: github source: https://github.com/apache/seatunnel -tags: [data-integration, data-pipeline, etl, elt, real-time-streaming, batch-processing, cdc, distributed-computing, apache, java] +website_source: https://github.com/apache/seatunnel-website +tags: [data-integration, data-pipeline, etl, elt, real-time-streaming, batch-processing, cdc, distributed-computing, apache, java, docusaurus, documentation] version: 2.3.13 -generated: 2026-01-28 +generated: 2026-02-26 license: Apache 2.0 repository: apache/seatunnel +website_repository: apache/seatunnel-website --- # Apache SeaTunnel OpenCode Skill @@ -45,10 +47,10 @@ mvn clean install -pl seatunnel-core -DskipTests Visit the [official download page](https://seatunnel.apache.org/download) and select your version: ```bash -# Example: Download SeaTunnel 2.3.12 -wget https://archive.apache.org/dist/seatunnel/2.3.12/apache-seatunnel-2.3.12-bin.tar.gz -tar -xzf apache-seatunnel-2.3.12-bin.tar.gz -cd apache-seatunnel-2.3.12 +VERSION=2.3.12 +wget https://archive.apache.org/dist/seatunnel/${VERSION}/apache-seatunnel-${VERSION}-bin.tar.gz +tar -xzf apache-seatunnel-${VERSION}-bin.tar.gz +cd apache-seatunnel-${VERSION} ``` #### 4. Basic Configuration @@ -130,12 +132,17 @@ seatunnel.sh -c config/hello_world.conf -e flink - **Full + Incremental**: Combined approach 3. **100+ Pre-built Connectors** - - Databases: MySQL, PostgreSQL, Oracle, SQL Server, MongoDB + - Databases: MySQL, PostgreSQL, Oracle, SQL Server, MongoDB, DB2, OceanBase - Data Warehouses: Snowflake, BigQuery, Redshift, Iceberg + - Data Lakes: Hive, Iceberg, Hudi, Paimon - Cloud SaaS: Salesforce, Shopify, Google Sheets - - Message Queues: Kafka, RabbitMQ, Pulsar - - Search Engines: Elasticsearch, OpenSearch - - Object Storage: S3, GCS, HDFS + - Message Queues: Kafka, RabbitMQ, Pulsar, RocketMQ, ActiveMQ + - Search Engines: Elasticsearch, OpenSearch, Easysearch + - OLAP Engines: ClickHouse, StarRocks, Doris, Druid + - Time-series Databases: IoTDB, TDengine, InfluxDB + - Vector Databases: Milvus, Qdrant + - Graph Databases: Neo4j + - Object Storage: S3, GCS, HDFS, OssFile, CosFile 4. **Multi-Engine Support** - **Zeta Engine**: Lightweight, standalone deployment (no Spark/Flink required) @@ -162,9 +169,9 @@ seatunnel.sh -c config/hello_world.conf -e flink ### Developer-Friendly -- **SQL-like Configuration**: Intuitive job definition syntax +- **SQL-like Configuration**: Intuitive HOCON job definition syntax - **Visual Web UI**: Drag-and-drop job builder (SeaTunnel Web Project) -- **Extensive Documentation**: Comprehensive guides and examples +- **Extensive Documentation**: Comprehensive guides with i18n (English + Chinese) - **Community Support**: Active community via Slack and mailing lists ### Production Ready @@ -521,13 +528,21 @@ sink { ### Core Connector Types #### Source Connectors -- **Jdbc**: Generic JDBC databases (MySQL, PostgreSQL, Oracle, SQL Server) +- **Jdbc**: Generic JDBC databases (MySQL, PostgreSQL, Oracle, SQL Server, DB2, OceanBase) - **Kafka**: Apache Kafka topics - **Mysql**: MySQL with CDC support - **MongoDB**: MongoDB collections - **PostgreSQL**: PostgreSQL with CDC -- **S3**: Amazon S3 and compatible storage +- **S3/OssFile/CosFile/HdfsFile**: Object storage / file systems - **Http**: HTTP/HTTPS endpoints +- **Hive/Iceberg/Hudi/Paimon**: Data lake formats +- **Elasticsearch/Easysearch**: Search engines +- **Redis/HBase/Cassandra**: NoSQL databases +- **Pulsar/RocketMQ/RabbitMQ/ActiveMQ**: Message queues +- **ClickHouse/StarRocks/Doris/Druid**: OLAP engines +- **IoTDB/TDengine/InfluxDB**: Time-series databases +- **Milvus/Qdrant**: Vector databases +- **Neo4j**: Graph databases - **FakeSource**: For testing and development #### Transform Connectors @@ -544,8 +559,12 @@ sink { - **Redis**: Write to Redis - **HBase**: Write to HBase tables - **StarRocks**: Write to StarRocks tables +- **ClickHouse/Doris/Druid**: Write to OLAP engines +- **Hive/Iceberg/Hudi/Paimon**: Write to data lakes - **Console**: Output to console (testing) +All source connectors above also have corresponding sink implementations where applicable. + ### Configuration Options #### Common Source Options @@ -771,6 +790,160 @@ source { --- +## Website & Documentation System + +The official documentation site (https://seatunnel.apache.org) is managed through the `apache/seatunnel-website` repository, built with **Docusaurus 2.4.3**. + +### How Documentation Works + +Documentation content lives in the **main codebase** (`apache/seatunnel` repo, `docs/` directory). The website repo fetches docs via `npm run sync` (or `tools/build-docs.js`) before building. + +The sync process: +1. Creates `swap/` directory, clones `apache/seatunnel` codebase +2. Copies `docs/en` to website's `docs/` directory +3. Copies `docs/zh` to `i18n/zh-CN/` for Chinese translations +4. Copies `docs/images` to `static/image_en` and `static/image_zh` +5. Copies `docs/sidebars.js` to root level + +### Website Repository Structure +``` +seatunnel-website/ +├── blog/ # Blog posts (EN) +├── community/ # Community docs +│ ├── contribution_guide/ # Contribution guidelines +│ │ ├── contribute.md +│ │ ├── committer.md +│ │ ├── code-review.md +│ │ ├── release.md +│ │ └── subscribe.md +│ └── submit_guide/ # Submission guidelines +│ ├── document.md +│ ├── license.md +│ └── submit-code.md +├── docs/ # Current version docs (synced from main repo) +├── docusaurus.config.js # Site configuration +├── i18n/zh-CN/ # Chinese translations +│ ├── docusaurus-plugin-content-blog/ +│ ├── docusaurus-plugin-content-docs/ +│ └── docusaurus-plugin-content-docs-community/ +├── package.json +├── sidebars.js # Doc sidebar config (synced) +├── src/ +│ ├── components/ +│ ├── css/ +│ ├── pages/ +│ │ ├── home/ # Homepage +│ │ ├── team/ # Team page (with Base64 avatars) +│ │ ├── user/ # User showcase +│ │ └── versions/ # Version selector +│ └── styles/ +├── static/ # Static assets +│ ├── doc/image/, image_en/, image_zh/ +│ ├── home/, image/, user/ +│ └── js/google_translate_init.js +├── tools/ +│ ├── build-docs.js # Doc sync script +│ ├── common.js # Shared constants +│ ├── version.js # Version management +│ ├── image-copy.js # Image processing +│ └── fetch-team-avatars.js # Team avatar updater +├── versioned_docs/ # Historical version docs (20 versions: 1.x ~ 2.3.12) +├── versioned_sidebars/ # Historical sidebars +├── versions.json # Version registry +└── seatunnel_web_versions.json # Web UI version registry +``` + +### Documentation Categories (per version) + +Each version's docs contain: +- `about.md` - Project overview +- `command/` - CLI commands (e.g., connector-check) +- `concept/` - Core concepts (config, schema-evolution, speed-limit, SQL config, event-listener) +- `connector-v2/` - Connector documentation + - `source/` - Source connector docs (100+ connectors) + - `sink/` - Sink connector docs + - `changelog/` - Per-connector changelogs + - `formats/` - Data formats (Avro, Canal JSON, Debezium JSON, OGG JSON, Protobuf) +- `contribution/` - Contributor guides +- `faq.md` - Frequently asked questions +- `other-engine/` - Flink/Spark engine specifics +- `seatunnel-engine/` - Zeta engine docs + telemetry +- `start-v2/` - Getting started guides + - `docker/` - Docker deployment + - `kubernetes/` - K8s deployment + - `locally/` - Local setup +- `transform-v2/` - Transform documentation (SQL, etc.) + +### Versioned Documentation + +20 historical versions maintained: `1.x`, `2.1.0`~`2.1.3`, `2.2.0-beta`, `2.3.0-beta`~`2.3.12` + +Total: ~3691 versioned doc files + ~1652 i18n files + +### Website Development + +```bash +# Clone +git clone git@github.com:apache/seatunnel-website.git +cd seatunnel-website + +# Sync docs from main repo +npm run sync +# Or with SSH: export PROTOCOL_MODE=ssh && npm run sync + +# Install dependencies +npm install + +# Dev server (English) +npm run start + +# Dev server (Chinese) +npm run start-zh + +# Production build (needs ~10GB heap) +npm run build +# Or faster parallel build +npm run build:fast + +# Serve built site +npm run serve +``` + +### Adding a New Version + +```bash +# 1. Create versioned snapshot +npm run version + +# 2. Update download links +# Edit src/pages/download/st_data.json +``` + +### Branching Strategy + +| Branch | Purpose | +|---|---| +| `main` | Default development branch | +| `asf-site` | Production (https://seatunnel.apache.org) | +| `asf-staging` | Staging (https://seatunnel.staged.apache.org) | + +### CI/CD + +GitHub Actions workflow (`.github/workflows/deploy.yml`): +- Triggers: push to `main`, PRs to `main`, daily cron (5:00 AM UTC) +- Node.js 18.20.7 +- Steps: `npm install` -> `npm run sync` -> `npm run build` -> deploy to `asf-site` branch + +### Key Conventions + +- Directory names: lowercase, underscore-separated, plural (e.g., `scripts`, `components`) +- JS/static files: lowercase, dash-separated (e.g., `render-dom.js`) +- Images: stored under `static/{module_name}` +- Styles: placed in `src/css/` +- Most pages have "Edit this page" link pointing to GitHub source + +--- + ## Development ### Project Architecture @@ -785,7 +958,7 @@ seatunnel/ │ ├── seatunnel-engine-flink/ │ ├── seatunnel-engine-spark/ │ └── seatunnel-engine-zeta/ -├── seatunnel-connectors/ # Connector implementations +├── seatunnel-connectors/ # 100+ connector implementations │ ├── seatunnel-connectors-*/ # One per connector type └── seatunnel-dist/ # Distribution package ``` @@ -1159,11 +1332,14 @@ java -jar target/seatunnel-web-*.jar ## Resources -### Official Documentation +### Official Links - [SeaTunnel Official Website](https://seatunnel.apache.org/) -- [GitHub Repository](https://github.com/apache/seatunnel) +- [GitHub (Engine)](https://github.com/apache/seatunnel) +- [GitHub (Website)](https://github.com/apache/seatunnel-website) +- [GitHub (Web UI)](https://github.com/apache/seatunnel-web) - [Documentation Hub](https://seatunnel.apache.org/docs/) - [Connector List](https://seatunnel.apache.org/docs/2.3.12/connector-v2/overview) +- [Downloads](https://seatunnel.apache.org/download) ### Community - [Slack Channel](https://the-asf.slack.com/archives/C01CB5186TL) @@ -1184,8 +1360,9 @@ java -jar target/seatunnel-web-*.jar - [Distributed Systems Concepts](https://en.wikipedia.org/wiki/Distributed_computing) ### Version History -- **2.3.12** (Stable) - Current recommended version +- **2.3.12** (Latest Stable) - Current recommended version - **2.3.13-SNAPSHOT** (Development) +- 20 historical versions maintained in documentation (1.x ~ 2.3.12) - [All Releases](https://archive.apache.org/dist/seatunnel/) --- @@ -1207,6 +1384,6 @@ Apache License 2.0 - See [LICENSE](https://github.com/apache/seatunnel/blob/mast --- -**Last Updated**: 2026-01-28 +**Last Updated**: 2026-02-26 **Skill Version**: 2.3.13 -**Status**: Production Ready ✓ +**Sources**: apache/seatunnel + apache/seatunnel-website