From c05224bc1e0b66b067e39d3a617e9557809e0099 Mon Sep 17 00:00:00 2001
From: "Marcos G. Zimmermann" <mgzmaster@gmail.com>
Date: Sat, 18 Apr 2026 15:39:54 -0300
Subject: [PATCH 1/2] docs: add comprehensive documentation

Adds a 10-file documentation set under docs/ covering:
- Getting started, processes, adapters, CLI
- Rack middleware, SEO extensions, events, Rails integration
- Full API reference
---
 docs/README.md          |  67 +++++++++++++++++
 docs/adapters.md        | 143 ++++++++++++++++++++++++++++++++++++
 docs/api.md             | 154 +++++++++++++++++++++++++++++++++++++++
 docs/cli.md             |  93 ++++++++++++++++++++++++
 docs/events.md          |  79 ++++++++++++++++++++
 docs/extensions.md      | 141 ++++++++++++++++++++++++++++++++++++
 docs/getting-started.md | 138 +++++++++++++++++++++++++++++++++++
 docs/middleware.md      |  85 ++++++++++++++++++++++
 docs/processes.md       | 156 ++++++++++++++++++++++++++++++++++++++++
 docs/rails.md           | 128 +++++++++++++++++++++++++++++++++
 10 files changed, 1184 insertions(+)
 create mode 100644 docs/README.md
 create mode 100644 docs/adapters.md
 create mode 100644 docs/api.md
 create mode 100644 docs/cli.md
 create mode 100644 docs/events.md
 create mode 100644 docs/extensions.md
 create mode 100644 docs/getting-started.md
 create mode 100644 docs/middleware.md
 create mode 100644 docs/processes.md
 create mode 100644 docs/rails.md

diff --git a/docs/README.md b/docs/README.md
new file mode 100644
index 0000000..17c1cad
--- /dev/null
+++ b/docs/README.md
@@ -0,0 +1,67 @@
+# site_maps
+
+Concurrent, adapter-based sitemap.xml generation for Ruby applications.
+
+`site_maps` is a framework-agnostic sitemap builder with built-in Rails support. It produces valid sitemap XML (with full SEO extensions — image, video, news, hreflang, mobile, PageMap), splits large sitemaps into indexed chunks automatically, generates them concurrently across a thread pool, and ships them to the filesystem, S3, or a custom backend through a pluggable adapter layer.
+
+## Contents
+
+- [Getting started](getting-started.md) — install, first sitemap, Rails
+- [Processes](processes.md) — static and dynamic process DSL
+- [Adapters](adapters.md) — filesystem, S3, no-op, custom
+- [CLI](cli.md) — `site_maps generate`
+- [Rack middleware](middleware.md) — serve generated sitemaps from the app
+- [SEO extensions](extensions.md) — image, video, news, hreflang, mobile, PageMap
+- [Events](events.md) — instrumentation hooks
+- [Rails integration](rails.md) — URL helpers, Railtie, precompile
+- [API reference](api.md) — full public API
+
+## Install
+
+```ruby
+# Gemfile
+gem 'site_maps'
+```
+
+## One-minute tour
+
+```ruby
+# config/sitemap.rb
+SiteMaps.use(:file_system) do
+  configure do |config|
+    config.url       = 'https://example.com/sitemap.xml'
+    config.directory = Rails.public_path.to_s
+  end
+
+  process do |s|
+    s.add('/', priority: 1.0, changefreq: 'daily')
+    s.add('/about', lastmod: Time.now)
+
+    Post.find_each do |post|
+      s.add("/posts/#{post.slug}", lastmod: post.updated_at)
+    end
+  end
+end
+```
+
+```bash
+bundle exec site_maps generate --config-file config/sitemap.rb
+```
+
+Generated: `public/sitemap.xml` (plus an indexed chain if the URL set exceeds 50k links).
+
+## Why site_maps
+
+- **Concurrency.** Processes run in a `Concurrent::FixedThreadPool`; threads share a thread-safe repo that handles file splitting.
+- **Pluggable storage.** Write the same sitemap to disk in development and S3 in production by swapping one line.
+- **Incremental sitemaps.** Full URL extensions support — images, videos, news, hreflang alternates, mobile, PageMap.
+- **Dynamic processes.** Parameterized templates like `posts/%{year}-%{month}/sitemap.xml` let you rebuild a single shard without regenerating the whole site.
+
+## Version
+
+- Ruby: `>= 3.2.0`
+- Depends on: `builder ~> 3.0`, `concurrent-ruby >= 1.1`, `rack >= 2.0`, `zeitwerk`, `thor`
+
+## License
+
+MIT.
diff --git a/docs/adapters.md b/docs/adapters.md
new file mode 100644
index 0000000..5ae93c5
--- /dev/null
+++ b/docs/adapters.md
@@ -0,0 +1,143 @@
+# Adapters
+
+An **adapter** is the storage backend for generated sitemap files. Three adapters ship with the gem; a clean interface makes it easy to write your own.
+
+## Built-in adapters
+
+| Adapter | When to use |
+|---------|-------------|
+| `:file_system` | Write to disk. Ideal for local dev, or for serving via the bundled Rack middleware. |
+| `:aws_sdk` | Upload to S3. Production deployments behind CloudFront or similar. |
+| `:noop` | Discard writes. Ideal for tests that care about "what URLs got added" but not "what ended up on disk". |
+
+Select with `SiteMaps.use(<symbol>)`.
+
+## `:file_system`
+
+```ruby
+SiteMaps.use(:file_system) do
+  configure do |config|
+    config.url       = 'https://example.com/sitemap.xml'
+    config.directory = Rails.public_path.to_s     # default: "public/sitemaps"
+  end
+  process { |s| ... }
+end
+```
+
+**Config attributes:**
+
+| Key | Purpose |
+|-----|---------|
+| `url` | Public URL — drives filename layout and is written into sitemap `<loc>` entries. |
+| `directory` | Filesystem root under which files land. |
+
+If `config.url` ends in `.gz`, the adapter writes gzipped files. The middleware transparently decompresses on serve.
+
+## `:aws_sdk`
+
+```ruby
+SiteMaps.use(:aws_sdk) do
+  configure do |config|
+    config.url           = 'https://my-bucket.s3.amazonaws.com/sitemap.xml'
+    config.directory     = '/tmp/sitemaps'          # local scratch space
+    config.bucket        = 'my-bucket'
+    config.region        = ENV.fetch('AWS_REGION', 'us-east-1')
+    config.access_key_id = ENV['AWS_ACCESS_KEY_ID']
+    config.secret_access_key = ENV['AWS_SECRET_ACCESS_KEY']
+    config.acl           = 'public-read'            # default
+    config.cache_control = 'private, max-age=0, no-cache'
+  end
+  process { |s| ... }
+end
+```
+
+**Config attributes:**
+
+| Key | Default |
+|-----|---------|
+| `bucket` | `ENV['AWS_BUCKET']` |
+| `region` | `ENV.fetch('AWS_REGION', 'us-east-1')` |
+| `access_key_id` | `ENV['AWS_ACCESS_KEY_ID']` |
+| `secret_access_key` | `ENV['AWS_SECRET_ACCESS_KEY']` |
+| `acl` | `"public-read"` |
+| `cache_control` | `"private, max-age=0, no-cache"` |
+| `directory` | Local scratch dir for staging before upload |
+
+The adapter writes locally first (to `directory`), then uploads to S3 with the configured ACL and Cache-Control headers. You'll need `aws-sdk-s3` in your Gemfile:
+
+```ruby
+gem 'aws-sdk-s3'
+```
+
+## `:noop`
+
+```ruby
+SiteMaps.use(:noop) do
+  configure { |c| c.url = 'https://example.com/sitemap.xml' }
+  process { |s| ... }
+end
+```
+
+Writes are discarded. Use it in tests when you want to assert on the URLs being added (via events, for example) without hitting disk.
+
+## Writing a custom adapter
+
+Subclass `SiteMaps::Adapters::Adapter` and implement `write`, `read`, `delete`:
+
+```ruby
+class GoogleCloudStorageAdapter < SiteMaps::Adapters::Adapter
+  class Config < SiteMaps::Configuration
+    attribute :bucket
+    attribute :project_id
+  end
+
+  def write(url, raw_data, **_kwargs)
+    storage = Google::Cloud::Storage.new(project_id: config.project_id)
+    bucket  = storage.bucket(config.bucket)
+    bucket.create_file(StringIO.new(raw_data), path_from(url))
+  end
+
+  def read(url)
+    file = storage.bucket(config.bucket).file(path_from(url))
+    [file.download.string, { content_type: 'application/xml' }]
+  end
+
+  def delete(url)
+    storage.bucket(config.bucket).file(path_from(url))&.delete
+  end
+
+  private
+
+  def path_from(url)
+    URI(url).path[1..]
+  end
+
+  def storage
+    @storage ||= Google::Cloud::Storage.new(project_id: config.project_id)
+  end
+end
+```
+
+Register and use it:
+
+```ruby
+SiteMaps.use(GoogleCloudStorageAdapter) do
+  configure do |config|
+    config.url        = 'https://cdn.example.com/sitemap.xml'
+    config.bucket     = 'my-bucket'
+    config.project_id = 'my-project'
+  end
+  process { |s| ... }
+end
+```
+
+## Adapter interface
+
+| Method | Purpose |
+|--------|---------|
+| `#write(url, raw_data, **kwargs)` | Persist `raw_data` at the location implied by `url`. |
+| `#read(url)` | Return `[raw_data, { content_type: '…' }]` for the given URL. |
+| `#delete(url)` | Remove the file at the URL. |
+| `.config_class` | (optional) Return a `Configuration` subclass to expose adapter-specific settings. |
+
+The adapter base class handles everything else: URL filters, the process registry, and thread-safe URL tracking.
diff --git a/docs/api.md b/docs/api.md
new file mode 100644
index 0000000..c930e9d
--- /dev/null
+++ b/docs/api.md
@@ -0,0 +1,154 @@
+# API Reference
+
+## `SiteMaps` (top-level module)
+
+| Method | Description |
+|--------|-------------|
+| `SiteMaps.use(adapter, **opts, &block)` | Register an adapter (`:file_system`, `:aws_sdk`, `:noop`, or a class) and yield its configuration block. |
+| `SiteMaps.define(&block)` | Register a context-aware definition. Called by `.generate` with the `context:` hash splatted as kwargs. |
+| `SiteMaps.configure { |config| ... }` | Mutate global defaults. |
+| `SiteMaps.config` | Return global `Configuration`. |
+| `SiteMaps.generate(config_file:, context: {}, **runner_opts) → Runner` | Load `config_file` and return a `Runner` ready to `.enqueue` and `.run`. |
+| `SiteMaps.current_adapter` | Last-registered adapter (thread-local during `.generate`). |
+| `SiteMaps.logger` | Configurable logger (default `Logger.new($stdout)`). |
+
+### Constants
+
+```ruby
+SiteMaps::MAX_LENGTH   # { links: 50_000, images: 1_000, news: 1_000 }
+SiteMaps::MAX_FILESIZE # 50_000_000 bytes
+```
+
+### Errors
+
+- `SiteMaps::Error` — base error
+- `SiteMaps::AdapterNotFound` — unknown adapter symbol
+- `SiteMaps::AdapterNotSetError` — generate called without an adapter
+- `SiteMaps::FileNotFoundError` — missing file at adapter read
+- `SiteMaps::FullSitemapError` — internal signal that a URL set is full (triggers split)
+- `SiteMaps::ConfigurationError` — invalid config
+
+---
+
+## `SiteMaps::Configuration`
+
+Base configuration. Adapter configs subclass this.
+
+| Attribute | Default | Purpose |
+|-----------|---------|---------|
+| `url` | — (required) | Public URL of the main sitemap index. |
+| `directory` | `"/tmp/sitemaps"` | Local storage directory. |
+| `max_links` | `50_000` | URLs per file before split. |
+| `emit_priority` | `true` | Emit `<priority>`. |
+| `emit_changefreq` | `true` | Emit `<changefreq>`. |
+| `xsl_stylesheet_url` | `nil` | Stylesheet for URL sets. |
+| `xsl_index_stylesheet_url` | `nil` | Stylesheet for the sitemap index. |
+| `ping_search_engines` | `false` | Auto-ping after generation. |
+| `ping_engines` | `{ bing: '...' }` | URL templates per engine; `%{url}` is URL-encoded at ping time. |
+
+---
+
+## `SiteMaps::Adapters::Adapter` (base class)
+
+Abstract base. Subclass to build custom adapters.
+
+| Method | Description |
+|--------|-------------|
+| `.config_class` | Override to return a `Configuration` subclass with adapter-specific attributes. |
+| `#write(url, raw_data, **kwargs)` | Abstract. Persist `raw_data` at the storage location implied by `url`. |
+| `#read(url) → [raw_data, { content_type: '…' }]` | Abstract. |
+| `#delete(url)` | Abstract. |
+| `#configure { |c| ... }` | Yield the adapter's configuration. |
+| `#process(name = :default, location = nil, **kwargs, &block)` | Register a process. |
+| `#external_sitemap(url, lastmod:)` | Add an external sitemap to the index. |
+| `#extend_processes_with(mod)` | Mix `mod` into all process blocks. |
+| `#url_filter { |url, options| ... }` | Register a URL filter. |
+| `#apply_url_filters(url, options)` | Run all filters; returns modified options or `nil` if excluded. |
+| `#reset!` | Clear index and repo. Called before `Runner#run`. |
+
+---
+
+## `SiteMaps::Runner`
+
+Executes enqueued processes concurrently.
+
+```ruby
+Runner.new(adapter = SiteMaps.current_adapter, max_threads: 4, ping: nil)
+```
+
+| Method | Description |
+|--------|-------------|
+| `#enqueue(process_name, **kwargs)` | Queue one process with kwargs. |
+| `#enqueue_remaining` / `#enqueue_all` | Queue every process not yet enqueued. |
+| `#run` | Execute queued processes, finalize index, optionally ping. |
+
+---
+
+## `SiteMaps::SitemapBuilder`
+
+Yielded as `s` inside every `process` block.
+
+| Method | Description |
+|--------|-------------|
+| `#add(path, **options)` | Add one URL to the current URL set. Automatically splits when full. |
+| `#finalize!` | Finalize the current URL set. Called automatically when the process block returns. |
+
+`options` supports every extension documented in [extensions.md](extensions.md): `lastmod`, `priority`, `changefreq`, `images`, `videos`, `news`, `alternates`, `mobile`, `pagemap`.
+
+In Rails apps, `s.route` is an object exposing all URL helpers.
+
+---
+
+## `SiteMaps::Middleware`
+
+Rack middleware for serving generated sitemaps. See [middleware.md](middleware.md).
+
+```ruby
+use SiteMaps::Middleware,
+  adapter: ...,
+  public_prefix: nil,
+  storage_prefix: nil,
+  x_robots_tag: 'noindex, follow',
+  cache_control: 'public, max-age=3600'
+```
+
+---
+
+## `SiteMaps::Notification`
+
+| Method | Description |
+|--------|-------------|
+| `.subscribe(event_or_class, &block)` | Subscribe to one event (string) or every event named on a class. |
+| `.unsubscribe(subscriber)` | Remove a subscription. |
+| `.instrument(event, payload) { ... }` | Emit an event, wrapping the block in a timer. |
+
+See [events.md](events.md) for the event catalog.
+
+---
+
+## `SiteMaps::RobotsTxt`
+
+| Method | Description |
+|--------|-------------|
+| `.sitemap_directive(url) → String` | Return `"Sitemap: <url>"`. |
+| `.render(sitemap_url:, extra_directives: []) → String` | Build a full robots.txt body. |
+
+---
+
+## `SiteMaps::Ping`
+
+| Method | Description |
+|--------|-------------|
+| `.ping(url, engines: { bing: '...' }) → Hash` | Fire a GET to each engine's template (substituting `%{url}`). Returns a hash of `{engine => { status:, url: }}`. |
+
+---
+
+## CLI entry point
+
+`exec/site_maps` — the executable shipped with the gem.
+
+```bash
+bundle exec site_maps generate [processes] [options]
+```
+
+See [cli.md](cli.md).
diff --git a/docs/cli.md b/docs/cli.md
new file mode 100644
index 0000000..471af6c
--- /dev/null
+++ b/docs/cli.md
@@ -0,0 +1,93 @@
+# CLI
+
+The gem installs a `site_maps` executable backed by Thor.
+
+```bash
+bundle exec site_maps generate [PROCESS_NAMES...] [options]
+```
+
+If no process names are given, every process in the config file is enqueued.
+
+## Options
+
+| Flag | Default | Purpose |
+|------|---------|---------|
+| `--config-file`, `-r` | — | Path to the config file defining processes. **Required.** |
+| `--max-threads`, `-c` | `4` | Thread pool size for concurrent process execution. |
+| `--context` | `{}` | Hash-style kwargs passed to `SiteMaps.define` blocks: `--context=tenant:acme locale:en`. |
+| `--enqueue-remaining` | `false` | In addition to specified processes, enqueue any others. |
+| `--ping` | `false` | Override config to ping search engines after generation. |
+| `--debug` | `false` | Set logger to DEBUG level. |
+| `--logfile` | — | Write logs to a file instead of stdout. |
+
+## Examples
+
+Generate everything:
+
+```bash
+bundle exec site_maps generate --config-file config/sitemap.rb
+```
+
+Regenerate a single shard of a dynamic process:
+
+```bash
+bundle exec site_maps generate monthly_posts \
+  --config-file config/sitemap.rb \
+  --context=year:2024 month:3
+```
+
+Generate `posts` and `products`, then let the config decide what else to include:
+
+```bash
+bundle exec site_maps generate posts products \
+  --config-file config/sitemap.rb \
+  --enqueue-remaining
+```
+
+Tune concurrency:
+
+```bash
+bundle exec site_maps generate --config-file config/sitemap.rb --max-threads 10
+```
+
+Ping Bing and any custom engines (config-driven — see below):
+
+```bash
+bundle exec site_maps generate --config-file config/sitemap.rb --ping
+```
+
+## Search-engine pinging
+
+Pinging is off by default. Enable globally in config or flip it on per run via `--ping`.
+
+```ruby
+SiteMaps.use(:file_system) do
+  configure do |config|
+    config.url                 = 'https://example.com/sitemap.xml'
+    config.ping_search_engines = true
+    config.ping_engines = {
+      bing:   'https://www.bing.com/ping?sitemap=%{url}',
+      custom: 'https://search.example.com/ping?url=%{url}'
+    }
+  end
+end
+```
+
+`%{url}` in the template is replaced with a URL-encoded `config.url` at ping time.
+
+## Rails / bundler
+
+The CLI auto-requires `config/environment` if it detects a `config/application.rb`, so Rails URL helpers (via the Railtie) are available inside your config file.
+
+If you don't want that — say, a Ruby-only script in a Rails repo — pass a config file outside the Rails root or invoke the library directly via `SiteMaps.generate(...)`.
+
+## Logging
+
+- `--debug` sets the logger to `Logger::DEBUG`.
+- `--logfile PATH` writes to a file; otherwise stdout.
+- A built-in event listener prints one line per finalized URL set with link counts and runtime.
+
+## Exit codes
+
+- `0` — success.
+- Non-zero — any process raised. Errors are captured per-future and re-raised after all futures complete, so you see the real backtrace rather than a generic runner failure.
diff --git a/docs/events.md b/docs/events.md
new file mode 100644
index 0000000..22aff0e
--- /dev/null
+++ b/docs/events.md
@@ -0,0 +1,79 @@
+# Events
+
+`site_maps` ships a lightweight pub/sub system under `SiteMaps::Notification`. Use it for logging, metrics, or reacting to particular generation phases.
+
+## Subscribing
+
+### Block subscribers
+
+```ruby
+SiteMaps::Notification.subscribe('sitemaps.finalize_urlset') do |event|
+  Rails.logger.info(
+    "[sitemap] wrote #{event[:links_count]} urls to #{event[:url]} in #{event[:runtime]}s"
+  )
+end
+```
+
+### Class subscribers
+
+A class with one method per event name (dots become underscores):
+
+```ruby
+class SitemapMetrics
+  def self.sitemaps_process_execution(event)
+    StatsD.timing('sitemaps.process', event[:runtime], tags: ["process:#{event[:process].name}"])
+  end
+
+  def self.sitemaps_finalize_urlset(event)
+    StatsD.increment('sitemaps.urlset.written', tags: ["url:#{event[:url]}"])
+  end
+
+  def self.sitemaps_ping(event)
+    event[:results].each do |engine, result|
+      StatsD.increment('sitemaps.ping', tags: ["engine:#{engine}", "status:#{result[:status]}"])
+    end
+  end
+end
+
+SiteMaps::Notification.subscribe(SitemapMetrics)
+```
+
+### The built-in listener
+
+For colored terminal output during CLI runs:
+
+```ruby
+SiteMaps::Notification.subscribe(SiteMaps::Runner::EventListener)
+```
+
+This is subscribed automatically by the CLI.
+
+## Events
+
+| Event | Payload keys |
+|-------|-------------|
+| `sitemaps.enqueue_process` | `process`, `kwargs` |
+| `sitemaps.before_process_execution` | `process`, `kwargs` |
+| `sitemaps.process_execution` | `process`, `kwargs`, `runtime` |
+| `sitemaps.finalize_urlset` | `url`, `links_count`, `news_count`, `last_modified`, `runtime`, `process` |
+| `sitemaps.ping` | `results` |
+
+`process` is a `SiteMaps::Process` struct (`name`, `location_template`, `kwargs_template`, `block`).
+
+## Event ordering
+
+For each process the sequence is:
+
+1. `sitemaps.enqueue_process`
+2. `sitemaps.before_process_execution`
+3. One or more `sitemaps.finalize_urlset` (one per split file)
+4. `sitemaps.process_execution`
+
+After all processes complete, one final `sitemaps.finalize_urlset` fires for the sitemap index itself. If pinging is enabled, `sitemaps.ping` fires last.
+
+## Use cases
+
+- **Logging.** Tail-friendly output of what just ran, how many URLs, runtime.
+- **Metrics.** StatsD / OpenTelemetry counters for throughput and ping outcomes.
+- **Alerting.** Subscribe to `sitemaps.ping`, alert on non-200 results.
+- **Cache busting.** After `sitemaps.finalize_urlset`, purge the CDN entry for the written URL.
diff --git a/docs/extensions.md b/docs/extensions.md
new file mode 100644
index 0000000..0aa1177
--- /dev/null
+++ b/docs/extensions.md
@@ -0,0 +1,141 @@
+# SEO Extensions
+
+`s.add` accepts options for every sitemap extension recognized by Google and Bing. Pass any of the following alongside `lastmod`, `priority`, and `changefreq`.
+
+## Image
+
+Up to 1,000 images per URL.
+
+```ruby
+s.add('/gallery/summer', images: [
+  {
+    loc:          'https://cdn.example.com/summer/beach.jpg',
+    title:        'Beach sunset',
+    caption:      'A photo from the summer trip',
+    geo_location: 'Cape Cod, MA',
+    license:      'https://creativecommons.org/licenses/by/4.0/'
+  }
+])
+```
+
+## Video
+
+Up to 1,000 video entries per sitemap file.
+
+```ruby
+s.add('/videos/how-to', videos: [
+  {
+    thumbnail_loc:         'https://cdn.example.com/thumbs/how-to.jpg',
+    title:                 'How to use site_maps',
+    description:           'A quick walkthrough',
+    content_loc:           'https://cdn.example.com/videos/how-to.mp4',
+    player_loc:            'https://example.com/embed/how-to',
+    duration:              600,
+    publication_date:      Time.now,
+    rating:                4.8,
+    view_count:            12_345,
+    family_friendly:       true,
+    requires_subscription: false,
+    live:                  false,
+    tags:                  %w[tutorial guide],
+    category:              'Technology',
+    uploader:              'example-team',
+    uploader_info:         'https://example.com/about',
+    gallery_loc:           'https://example.com/videos',
+    gallery_title:         'Example video gallery',
+    price:                 nil,
+    allow_embed:           true,
+    autoplay:              'ap=1'
+  }
+])
+```
+
+## News
+
+Up to 1,000 news entries per sitemap file (use a dedicated process for news URLs).
+
+```ruby
+s.add('/news/breaking', news: {
+  publication_name:     'Example Times',
+  publication_language: 'en',
+  publication_date:     Time.now,
+  title:                'Breaking news headline',
+  keywords:             'breaking, politics',
+  genres:               'PressRelease',
+  access:               'Subscription',
+  stock_tickers:        'NASDAQ:EXMP'
+})
+```
+
+## Alternate language / hreflang
+
+```ruby
+s.add('/', alternates: [
+  { href: 'https://example.com/en', lang: 'en' },
+  { href: 'https://example.com/es', lang: 'es' },
+  { href: 'https://example.com/fr', lang: 'fr', nofollow: true }
+])
+```
+
+The `nofollow: true` variant emits `rel="nofollow alternate"` on the link. Use it to declare locale variants without signalling Google to crawl them as equivalents.
+
+## Mobile
+
+Declare a URL as mobile-friendly:
+
+```ruby
+s.add('/mobile-page', mobile: true)
+```
+
+## PageMap
+
+Structured data for Google Custom Search.
+
+```ruby
+s.add('/products/widget', pagemap: {
+  dataobjects: [
+    {
+      type: 'product',
+      id:   'sku-123',
+      attributes: [
+        { name: 'name',  value: 'Widget' },
+        { name: 'price', value: '19.99' },
+        { name: 'color', value: 'blue' }
+      ]
+    }
+  ]
+})
+```
+
+## Combined example
+
+Everything can coexist on a single URL:
+
+```ruby
+s.add('/products/widget',
+  lastmod:    Time.now,
+  priority:   0.9,
+  changefreq: 'weekly',
+  images:     [{ loc: 'https://cdn.example.com/widget.jpg', title: 'Widget' }],
+  alternates: [{ href: 'https://example.com/es/products/widget', lang: 'es' }],
+  mobile:     true,
+  pagemap:    { dataobjects: [{ type: 'product', id: 'sku-123', attributes: [] }] }
+)
+```
+
+## Disabling `priority` / `changefreq`
+
+Both fields are optional per the sitemap spec, and many search engines ignore them. Disable globally if you want smaller files:
+
+```ruby
+configure do |config|
+  config.emit_priority   = false
+  config.emit_changefreq = false
+end
+```
+
+## Output size
+
+- Per URL set: 50,000 links **or** 1,000 news items **or** 50 MB uncompressed — whichever comes first. When one of these is hit, the current file is finalized and a new one starts.
+- File naming is automatic (`posts/sitemap.xml` → `posts/sitemap1.xml`, `posts/sitemap2.xml`, …).
+- Use the `.gz` extension in `config.url` to emit gzipped files — most search engines fetch either form.
diff --git a/docs/getting-started.md b/docs/getting-started.md
new file mode 100644
index 0000000..d91cb7c
--- /dev/null
+++ b/docs/getting-started.md
@@ -0,0 +1,138 @@
+# Getting Started
+
+## Install
+
+```ruby
+# Gemfile
+gem 'site_maps'
+```
+
+```bash
+bundle install
+```
+
+## Your first sitemap
+
+Create `config/sitemap.rb`:
+
+```ruby
+SiteMaps.use(:file_system) do
+  configure do |config|
+    config.url       = 'https://example.com/sitemap.xml'
+    config.directory = File.expand_path('public', __dir__)
+  end
+
+  process do |s|
+    s.add('/',       priority: 1.0, changefreq: 'daily')
+    s.add('/about',  priority: 0.8, lastmod: Time.now)
+    s.add('/contact', priority: 0.5)
+  end
+end
+```
+
+Generate:
+
+```bash
+bundle exec site_maps generate --config-file config/sitemap.rb
+```
+
+Output: `public/sitemap.xml`.
+
+## Dynamic URLs
+
+Yield `s.add` for every URL you want indexed. Database records work naturally:
+
+```ruby
+process :posts do |s|
+  Post.published.find_each do |post|
+    s.add("/posts/#{post.slug}", lastmod: post.updated_at, priority: 0.7)
+  end
+end
+```
+
+When the URL count of a single process exceeds `max_links` (default 50,000), the file is split into `sitemap1.xml`, `sitemap2.xml`, … and a sitemap index is written at `config.url`.
+
+## Named processes
+
+Named processes get their own file and run in parallel:
+
+```ruby
+SiteMaps.use(:file_system) do
+  configure { |c| c.url = 'https://example.com/sitemap.xml'; c.directory = 'public' }
+
+  process :static do |s|
+    s.add('/')
+    s.add('/about')
+  end
+
+  process :posts, 'posts/sitemap.xml' do |s|
+    Post.find_each { |p| s.add("/posts/#{p.slug}") }
+  end
+
+  process :products, 'products/sitemap.xml' do |s|
+    Product.find_each { |p| s.add("/products/#{p.id}") }
+  end
+end
+```
+
+Run all:
+
+```bash
+bundle exec site_maps generate --config-file config/sitemap.rb --max-threads 4
+```
+
+Run one:
+
+```bash
+bundle exec site_maps generate posts --config-file config/sitemap.rb
+```
+
+See [processes.md](processes.md) for the full process DSL including parameterized templates.
+
+## Using it in Rails
+
+Add `site_maps` to your Gemfile and generate from a Rake task, a scheduled job, or your deploy pipeline. The Railtie injects URL helpers:
+
+```ruby
+# config/sitemap.rb
+SiteMaps.use(:file_system) do
+  configure do |config|
+    config.url       = 'https://example.com/sitemap.xml'
+    config.directory = Rails.public_path.to_s
+  end
+
+  process do |s|
+    s.add(s.route.root_path, priority: 1.0)
+    s.add(s.route.about_path)
+    Post.find_each { |post| s.add(s.route.post_path(post), lastmod: post.updated_at) }
+  end
+end
+```
+
+See [rails.md](rails.md) for the full Rails integration, including asset precompile hooks and the Rack middleware for serving generated sitemaps.
+
+## Uploading to S3
+
+Swap the adapter line:
+
+```ruby
+SiteMaps.use(:aws_sdk) do
+  configure do |config|
+    config.url    = 'https://my-bucket.s3.amazonaws.com/sitemap.xml'
+    config.bucket = 'my-bucket'
+    config.region = ENV['AWS_REGION']
+    # access_key_id / secret_access_key default to ENV vars
+  end
+
+  process { |s| ... }
+end
+```
+
+See [adapters.md](adapters.md) for adapter specifics and how to build your own.
+
+## Next steps
+
+- [Processes](processes.md) — split your sitemap into static and dynamic shards
+- [SEO extensions](extensions.md) — image, video, news, hreflang
+- [CLI](cli.md) — automation-friendly generate command
+- [Rack middleware](middleware.md) — serve the generated files with correct headers
diff --git a/docs/middleware.md b/docs/middleware.md
new file mode 100644
index 0000000..df1e1c4
--- /dev/null
+++ b/docs/middleware.md
@@ -0,0 +1,85 @@
+# Rack Middleware
+
+`SiteMaps::Middleware` serves generated sitemap files directly from the app. Useful when you've generated to `public/sitemaps/` (filesystem adapter) and want proper `Content-Type`, gzip handling, and XSL stylesheet routing without editing your web-server config.
+
+## Basic usage
+
+```ruby
+# config/application.rb (Rails)
+config.middleware.use SiteMaps::Middleware, adapter: -> { SiteMaps.current_adapter }
+```
+
+Or inline in `config.ru`:
+
+```ruby
+require 'site_maps'
+
+use SiteMaps::Middleware, adapter: SiteMaps.current_adapter
+run MyApp
+```
+
+## Options
+
+```ruby
+use SiteMaps::Middleware,
+  adapter:        SiteMaps.current_adapter,
+  public_prefix:  nil,
+  storage_prefix: nil,
+  x_robots_tag:   'noindex, follow',
+  cache_control:  'public, max-age=3600'
+```
+
+| Option | Purpose |
+|--------|---------|
+| `adapter` | Adapter instance (or a callable returning one — useful if the adapter is reconfigured at boot). |
+| `public_prefix` | Strip from request path before lookup — e.g. `/sitemap` if your app mounts them under a sub-path. |
+| `storage_prefix` | Prepend to the lookup key — e.g. `tenants/acme` for multi-tenant layouts. |
+| `x_robots_tag` | `X-Robots-Tag` header added to served files. |
+| `cache_control` | `Cache-Control` header. |
+
+## Behavior
+
+The middleware intercepts requests for `*.xml` and `*.xml.gz` files:
+
+- Matches → serve from the adapter with `Content-Type: application/xml`, plus `X-Robots-Tag` and `Cache-Control`.
+- Gzipped sources → auto-decompress on serve so XSL stylesheets render in the browser. Clients asking for `.xml.gz` still get the compressed bytes.
+- Doesn't match → `env` passes through to `@app.call`.
+
+## XSL stylesheets
+
+The middleware also serves the built-in XSL stylesheets — pretty sitemap rendering for human visitors — at their referenced paths. Configure their URLs via:
+
+```ruby
+configure do |config|
+  config.xsl_stylesheet_url       = '/_sitemap-stylesheet.xsl'
+  config.xsl_index_stylesheet_url = '/_sitemap-index-stylesheet.xsl'
+end
+```
+
+## Multi-tenant routing
+
+For per-tenant sitemaps stored under subpaths:
+
+```ruby
+use SiteMaps::Middleware,
+  adapter:        per_request_adapter,
+  storage_prefix: ->(request) { "tenants/#{request.host.split('.').first}" }
+```
+
+If the adapter itself already scopes paths by tenant, no prefix is needed — just point it at the right one for each request.
+
+## robots.txt integration
+
+Emit a `Sitemap:` directive for the generated file:
+
+```ruby
+# config.ru or a controller
+SiteMaps::RobotsTxt.sitemap_directive('https://example.com/sitemap.xml')
+# => "Sitemap: https://example.com/sitemap.xml"
+
+SiteMaps::RobotsTxt.render(
+  sitemap_url:      'https://example.com/sitemap.xml',
+  extra_directives: ['Disallow: /admin']
+)
+# => "Sitemap: https://example.com/sitemap.xml\nDisallow: /admin"
+```
diff --git a/docs/processes.md b/docs/processes.md
new file mode 100644
index 0000000..3a8d5a1
--- /dev/null
+++ b/docs/processes.md
@@ -0,0 +1,156 @@
+# Processes
+
+A **process** is a unit of work that produces part of a sitemap. Each process runs on its own thread, writes its own URL set, and becomes an entry in the sitemap index.
+
+## Static processes
+
+A static process has no parameters. It runs once and writes one (possibly split) sitemap file.
+
+```ruby
+SiteMaps.use(:file_system) do
+  configure { |c| c.url = 'https://example.com/sitemap.xml'; c.directory = 'public' }
+
+  process do |s|
+    s.add('/', priority: 1.0)
+    s.add('/about')
+  end
+
+  process :posts, 'posts/sitemap.xml' do |s|
+    Post.find_each { |post| s.add("/posts/#{post.slug}", lastmod: post.updated_at) }
+  end
+end
+```
+
+- Without an explicit name, the process is named `:default`.
+- Without an explicit location, a default filename is assigned.
+- The block receives a `SitemapBuilder` (`s`), on which `add` is called per URL.
+
+## Dynamic processes
+
+A dynamic process has placeholders in its location template and corresponding kwargs. Each unique combination of kwargs produces a separate sitemap file.
+
+```ruby
+process :monthly_posts, 'posts/%{year}-%{month}/sitemap.xml', year: 2024, month: 1 do |s, year:, month:, **|
+  Post.where('extract(year from published_at) = ? AND extract(month from published_at) = ?', year, month)
+      .find_each { |p| s.add("/posts/#{p.slug}", lastmod: p.updated_at) }
+end
+```
+
+The kwargs passed to `process` are **defaults**; the real values come from `Runner#enqueue`:
+
+```ruby
+runner = SiteMaps.generate(config_file: 'config/sitemap.rb')
+runner.enqueue(:monthly_posts, year: 2024, month: 1)
+runner.enqueue(:monthly_posts, year: 2024, month: 2)
+runner.enqueue(:monthly_posts, year: 2024, month: 3)
+runner.run
+```
+
+Or from the CLI:
+
+```bash
+bundle exec site_maps generate monthly_posts \
+  --config-file config/sitemap.rb \
+  --context=year:2024 month:1
+```
+
+## Execution model
+
+When you call `runner.run`:
+
+1. Each enqueued process is wrapped in a `Concurrent::Future`.
+2. The pool (default 4 threads, configurable via `--max-threads`) runs them in parallel.
+3. Each process builds a `URLSet`. When the set fills up (50,000 links, 1,000 news items, or 50 MB uncompressed), it's finalized and written, and a new URLSet starts — automatically.
+4. After every process finishes, the sitemap index is aggregated and written to `config.url`.
+
+## Splitting rules
+
+A URL set is finalized and rolled over when **any** of these apply:
+
+- Links reach `config.max_links` (default 50,000 — the sitemap spec limit).
+- News entries reach 1,000.
+- Uncompressed XML reaches 50 MB.
+
+Split files are named by `IncrementalLocation`: `posts/sitemap.xml` becomes `posts/sitemap1.xml`, `posts/sitemap2.xml`, etc.
+
+## Index generation
+
+A sitemap index is produced when:
+
+- More than one process exists,
+- A single process was split across multiple files, or
+- External sitemaps were added.
+
+Otherwise a single `urlset` is written directly at `config.url` (the "inline" optimization).
+
+## Adding external sitemaps
+
+Reference third-party or pre-existing sitemaps in the index:
+
+```ruby
+SiteMaps.use(:file_system) do
+  configure { |c| c.url = 'https://example.com/sitemap.xml'; c.directory = 'public' }
+
+  external_sitemap('https://cdn.example.com/legacy-sitemap.xml', lastmod: Time.parse('2024-01-15'))
+
+  process { |s| s.add('/') }
+end
+```
+
+## Shared helpers across processes
+
+Use `extend_processes_with` to add methods that every process block can call:
+
+```ruby
+module Helpers
+  def post_path(post) = "/posts/#{post.slug}"
+  def published_posts = Post.where.not(published_at: nil)
+end
+
+SiteMaps.use(:file_system) do
+  configure { |c| c.url = 'https://example.com/sitemap.xml'; c.directory = 'public' }
+  extend_processes_with(Helpers)
+
+  process :posts do |s|
+    published_posts.find_each { |p| s.add(post_path(p), lastmod: p.updated_at) }
+  end
+end
+```
+
+## URL filters
+
+Filters run per URL inside every process — use them for global exclusions or default attributes:
+
+```ruby
+SiteMaps.use(:file_system) do
+  configure { |c| c.url = 'https://example.com/sitemap.xml'; c.directory = 'public' }
+
+  # Exclude any /admin path
+  url_filter { |url, _options| false if url.include?('/admin') }
+
+  # Boost blog priority
+  url_filter do |url, options|
+    if url.include?('/blog/')
+      options.merge(priority: 0.9, changefreq: 'daily')
+    else
+      options
+    end
+  end
+
+  process { |s| ... }
+end
+```
+
+A filter returning `false` (or `nil`) excludes the URL entirely. Returning a hash replaces the options.
+
+## Re-running a single shard
+
+Only regenerate what changed — the rest is preserved from the existing sitemap index:
+
+```ruby
+runner = SiteMaps.generate(config_file: 'config/sitemap.rb')
+runner.enqueue(:monthly_posts, year: 2024, month: 3)  # only March
+runner.run                                             # Jan and Feb kept as-is
+```
+
+This is the main advantage of parameterized dynamic processes: you can rebuild one month's shard on a cron and leave the rest untouched.
diff --git a/docs/rails.md b/docs/rails.md
new file mode 100644
index 0000000..ed93bf3
--- /dev/null
+++ b/docs/rails.md
@@ -0,0 +1,128 @@
+# Rails Integration
+
+The Railtie loads automatically when Rails is present. It wires two things:
+
+1. **URL helpers** — `s.route.<helper>` inside process blocks.
+2. **No other magic** — no initializer, no autoloaded directories, no patched generators.
+
+## URL helpers in processes
+
+```ruby
+# config/sitemap.rb
+SiteMaps.use(:file_system) do
+  configure do |config|
+    config.url       = 'https://example.com/sitemap.xml'
+    config.directory = Rails.public_path.to_s
+  end
+
+  process do |s|
+    s.add(s.route.root_path,  priority: 1.0)
+    s.add(s.route.about_path)
+    Post.find_each { |p| s.add(s.route.post_path(p), lastmod: p.updated_at) }
+  end
+end
+```
+
+`s.route` is a singleton wrapping `Rails.application.routes.url_helpers`.
+
+## Generating from Rails
+
+### One-off
+
+```bash
+bundle exec site_maps generate --config-file config/sitemap.rb
+```
+
+The CLI auto-requires `config/environment.rb` if it finds a `config/application.rb`, so ActiveRecord, URL helpers, and everything else loads as normal.
+
+### From a Rake task
+
+```ruby
+# lib/tasks/sitemap.rake
+namespace :sitemap do
+  desc 'Generate sitemaps'
+  task generate: :environment do
+    runner = SiteMaps.generate(config_file: Rails.root.join('config/sitemap.rb').to_s)
+    runner.enqueue_all.run
+  end
+end
+```
+
+Run on deploy or via cron:
+
+```bash
+bundle exec rake sitemap:generate
+```
+
+### From a scheduled job
+
+```ruby
+class SitemapJob < ApplicationJob
+  def perform
+    runner = SiteMaps.generate(config_file: Rails.root.join('config/sitemap.rb').to_s)
+    runner.enqueue_all.run
+  end
+end
+
+SitemapJob.set(cron: '0 3 * * *').perform_later
+```
+
+## Serving generated sitemaps
+
+Add the Rack middleware to serve files generated by the `:file_system` adapter:
+
+```ruby
+# config/application.rb
+config.middleware.use SiteMaps::Middleware, adapter: -> { SiteMaps.current_adapter }
+```
+
+See [middleware.md](middleware.md) for options.
+
+## Asset precompile integration
+
+If you want sitemaps regenerated on every deploy, hook into `assets:precompile`:
+
+```ruby
+# lib/tasks/sitemap.rake
+Rake::Task['assets:precompile'].enhance(['sitemap:generate'])
+```
+
+## robots.txt
+
+```erb
+<%# public/robots.txt.erb or app/views/robots.text.erb %>
+User-agent: *
+Disallow: /admin
+
+<%= SiteMaps::RobotsTxt.sitemap_directive('https://example.com/sitemap.xml') %>
+```
+
+## Multi-tenant
+
+`SiteMaps.define` gives you a generation function parameterized by runtime context:
+
+```ruby
+# config/sitemap.rb
+SiteMaps.define do |tenant:|
+  use(:file_system) do
+    configure do |config|
+      config.url       = "https://#{tenant.domain}/sitemap.xml"
+      config.directory = tenant.public_path
+    end
+
+    process { |s| tenant.pages.each { |page| s.add(page.path, lastmod: page.updated_at) } }
+  end
+end
+```
+
+```ruby
+Tenant.find_each do |tenant|
+  SiteMaps.generate(config_file: 'config/sitemap.rb', context: { tenant: tenant }).enqueue_all.run
+end
+```
+
+The context hash is splatted into the `define` block as keyword args.
+
+## Dependencies
+
+- Rails is **not** listed in the gemspec. The Railtie is loaded only if Rails is already present. If you're using `site_maps` in a non-Rails Ruby project, the Rails-specific pieces are inert.

From 26e2619f51c34c46ef792bd26b3949410bf41824 Mon Sep 17 00:00:00 2001
From: "Marcos G. Zimmermann" <mgzmaster@gmail.com>
Date: Sun, 19 Apr 2026 09:06:43 -0300
Subject: [PATCH 2/2] docs: link to gems.marcosz.com.br/site_maps documentation

---
 README.md | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/README.md b/README.md
index ae95e49..fb2652c 100644
--- a/README.md
+++ b/README.md
@@ -4,8 +4,13 @@ A concurrent, incremental sitemap generator for Ruby. Framework-agnostic with bu
 
 Generates SEO-optimized XML sitemaps with support for sitemap indexes, XSL stylesheets, gzip compression, image/video/news extensions, search engine pinging, and Rack middleware for serving sitemaps with proper HTTP headers.
 
+## Documentation
+
+Full guides, adapter reference, CLI docs, and recipes are published at **[gems.marcosz.com.br/site_maps](https://gems.marcosz.com.br/site_maps/)** — part of the [marcosgz Ruby gem catalogue](https://gems.marcosz.com.br).
+
 ## Table of Contents
 
+- [Documentation](#documentation)
 - [Installation](#installation)
 - [Quick Start](#quick-start)
 - [Configuration](#configuration)