Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,13 @@ A concurrent, incremental sitemap generator for Ruby. Framework-agnostic with bu

Generates SEO-optimized XML sitemaps with support for sitemap indexes, XSL stylesheets, gzip compression, image/video/news extensions, search engine pinging, and Rack middleware for serving sitemaps with proper HTTP headers.

## Documentation

Full guides, adapter reference, CLI docs, and recipes are published at **[gems.marcosz.com.br/site_maps](https://gems.marcosz.com.br/site_maps/)** — part of the [marcosgz Ruby gem catalogue](https://gems.marcosz.com.br).

## Table of Contents

- [Documentation](#documentation)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Configuration](#configuration)
Expand Down
67 changes: 67 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# site_maps

Concurrent, adapter-based sitemap.xml generation for Ruby applications.

`site_maps` is a framework-agnostic sitemap builder with built-in Rails support. It produces valid sitemap XML (with full SEO extensions — image, video, news, hreflang, mobile, PageMap), splits large sitemaps into indexed chunks automatically, generates them concurrently across a thread pool, and ships them to the filesystem, S3, or a custom backend through a pluggable adapter layer.

## Contents

- [Getting started](getting-started.md) — install, first sitemap, Rails
- [Processes](processes.md) — static and dynamic process DSL
- [Adapters](adapters.md) — filesystem, S3, no-op, custom
- [CLI](cli.md) — `site_maps generate`
- [Rack middleware](middleware.md) — serve generated sitemaps from the app
- [SEO extensions](extensions.md) — image, video, news, hreflang, mobile, PageMap
- [Events](events.md) — instrumentation hooks
- [Rails integration](rails.md) — URL helpers, Railtie, precompile
- [API reference](api.md) — full public API

## Install

```ruby
# Gemfile
gem 'site_maps'
```

## One-minute tour

```ruby
# config/sitemap.rb
SiteMaps.use(:file_system) do
configure do |config|
config.url = 'https://example.com/sitemap.xml'
config.directory = Rails.public_path.to_s
end

process do |s|
s.add('/', priority: 1.0, changefreq: 'daily')
s.add('/about', lastmod: Time.now)

Post.find_each do |post|
s.add("/posts/#{post.slug}", lastmod: post.updated_at)
end
end
end
```

```bash
bundle exec site_maps generate --config-file config/sitemap.rb
```

Generated: `public/sitemap.xml` (plus an indexed chain if the URL set exceeds 50k links).

## Why site_maps

- **Concurrency.** Processes run in a `Concurrent::FixedThreadPool`; threads share a thread-safe repo that handles file splitting.
- **Pluggable storage.** Write the same sitemap to disk in development and S3 in production by swapping one line.
- **Incremental sitemaps.** Full URL extensions support — images, videos, news, hreflang alternates, mobile, PageMap.
- **Dynamic processes.** Parameterized templates like `posts/%{year}-%{month}/sitemap.xml` let you rebuild a single shard without regenerating the whole site.

## Version

- Ruby: `>= 3.2.0`
- Depends on: `builder ~> 3.0`, `concurrent-ruby >= 1.1`, `rack >= 2.0`, `zeitwerk`, `thor`

## License

MIT.
143 changes: 143 additions & 0 deletions docs/adapters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# Adapters

An **adapter** is the storage backend for generated sitemap files. Three adapters ship with the gem; a clean interface makes it easy to write your own.

## Built-in adapters

| Adapter | When to use |
|---------|-------------|
| `:file_system` | Write to disk. Ideal for local dev, or for serving via the bundled Rack middleware. |
| `:aws_sdk` | Upload to S3. Production deployments behind CloudFront or similar. |
| `:noop` | Discard writes. Ideal for tests that care about "what URLs got added" but not "what ended up on disk". |

Select with `SiteMaps.use(<symbol>)`.

## `:file_system`

```ruby
SiteMaps.use(:file_system) do
configure do |config|
config.url = 'https://example.com/sitemap.xml'
config.directory = Rails.public_path.to_s # default: "public/sitemaps"
end
process { |s| ... }
end
```

**Config attributes:**

| Key | Purpose |
|-----|---------|
| `url` | Public URL — drives filename layout and is written into sitemap `<loc>` entries. |
| `directory` | Filesystem root under which files land. |

If `config.url` ends in `.gz`, the adapter writes gzipped files. The middleware transparently decompresses on serve.

## `:aws_sdk`

```ruby
SiteMaps.use(:aws_sdk) do
configure do |config|
config.url = 'https://my-bucket.s3.amazonaws.com/sitemap.xml'
config.directory = '/tmp/sitemaps' # local scratch space
config.bucket = 'my-bucket'
config.region = ENV.fetch('AWS_REGION', 'us-east-1')
config.access_key_id = ENV['AWS_ACCESS_KEY_ID']
config.secret_access_key = ENV['AWS_SECRET_ACCESS_KEY']
config.acl = 'public-read' # default
config.cache_control = 'private, max-age=0, no-cache'
end
process { |s| ... }
end
```

**Config attributes:**

| Key | Default |
|-----|---------|
| `bucket` | `ENV['AWS_BUCKET']` |
| `region` | `ENV.fetch('AWS_REGION', 'us-east-1')` |
| `access_key_id` | `ENV['AWS_ACCESS_KEY_ID']` |
| `secret_access_key` | `ENV['AWS_SECRET_ACCESS_KEY']` |
| `acl` | `"public-read"` |
| `cache_control` | `"private, max-age=0, no-cache"` |
| `directory` | Local scratch dir for staging before upload |

The adapter writes locally first (to `directory`), then uploads to S3 with the configured ACL and Cache-Control headers. You'll need `aws-sdk-s3` in your Gemfile:

```ruby
gem 'aws-sdk-s3'
```

## `:noop`

```ruby
SiteMaps.use(:noop) do
configure { |c| c.url = 'https://example.com/sitemap.xml' }
process { |s| ... }
end
```

Writes are discarded. Use it in tests when you want to assert on the URLs being added (via events, for example) without hitting disk.

## Writing a custom adapter

Subclass `SiteMaps::Adapters::Adapter` and implement `write`, `read`, `delete`:

```ruby
class GoogleCloudStorageAdapter < SiteMaps::Adapters::Adapter
class Config < SiteMaps::Configuration
attribute :bucket
attribute :project_id
end

def write(url, raw_data, **_kwargs)
storage = Google::Cloud::Storage.new(project_id: config.project_id)
bucket = storage.bucket(config.bucket)
bucket.create_file(StringIO.new(raw_data), path_from(url))
end

def read(url)
file = storage.bucket(config.bucket).file(path_from(url))
[file.download.string, { content_type: 'application/xml' }]
end

def delete(url)
storage.bucket(config.bucket).file(path_from(url))&.delete
end

private

def path_from(url)
URI(url).path[1..]
end

def storage
@storage ||= Google::Cloud::Storage.new(project_id: config.project_id)
end
end
```

Register and use it:

```ruby
SiteMaps.use(GoogleCloudStorageAdapter) do
configure do |config|
config.url = 'https://cdn.example.com/sitemap.xml'
config.bucket = 'my-bucket'
config.project_id = 'my-project'
end
process { |s| ... }
end
```

## Adapter interface

| Method | Purpose |
|--------|---------|
| `#write(url, raw_data, **kwargs)` | Persist `raw_data` at the location implied by `url`. |
| `#read(url)` | Return `[raw_data, { content_type: '…' }]` for the given URL. |
| `#delete(url)` | Remove the file at the URL. |
| `.config_class` | (optional) Return a `Configuration` subclass to expose adapter-specific settings. |

The adapter base class handles everything else: URL filters, the process registry, and thread-safe URL tracking.
154 changes: 154 additions & 0 deletions docs/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# API Reference

## `SiteMaps` (top-level module)

| Method | Description |
|--------|-------------|
| `SiteMaps.use(adapter, **opts, &block)` | Register an adapter (`:file_system`, `:aws_sdk`, `:noop`, or a class) and yield its configuration block. |
| `SiteMaps.define(&block)` | Register a context-aware definition. Called by `.generate` with the `context:` hash splatted as kwargs. |
| `SiteMaps.configure { |config| ... }` | Mutate global defaults. |
| `SiteMaps.config` | Return global `Configuration`. |
| `SiteMaps.generate(config_file:, context: {}, **runner_opts) → Runner` | Load `config_file` and return a `Runner` ready to `.enqueue` and `.run`. |
| `SiteMaps.current_adapter` | Last-registered adapter (thread-local during `.generate`). |
| `SiteMaps.logger` | Configurable logger (default `Logger.new($stdout)`). |

### Constants

```ruby
SiteMaps::MAX_LENGTH # { links: 50_000, images: 1_000, news: 1_000 }
SiteMaps::MAX_FILESIZE # 50_000_000 bytes
```

### Errors

- `SiteMaps::Error` — base error
- `SiteMaps::AdapterNotFound` — unknown adapter symbol
- `SiteMaps::AdapterNotSetError` — generate called without an adapter
- `SiteMaps::FileNotFoundError` — missing file at adapter read
- `SiteMaps::FullSitemapError` — internal signal that a URL set is full (triggers split)
- `SiteMaps::ConfigurationError` — invalid config

---

## `SiteMaps::Configuration`

Base configuration. Adapter configs subclass this.

| Attribute | Default | Purpose |
|-----------|---------|---------|
| `url` | — (required) | Public URL of the main sitemap index. |
| `directory` | `"/tmp/sitemaps"` | Local storage directory. |
| `max_links` | `50_000` | URLs per file before split. |
| `emit_priority` | `true` | Emit `<priority>`. |
| `emit_changefreq` | `true` | Emit `<changefreq>`. |
| `xsl_stylesheet_url` | `nil` | Stylesheet for URL sets. |
| `xsl_index_stylesheet_url` | `nil` | Stylesheet for the sitemap index. |
| `ping_search_engines` | `false` | Auto-ping after generation. |
| `ping_engines` | `{ bing: '...' }` | URL templates per engine; `%{url}` is URL-encoded at ping time. |

---

## `SiteMaps::Adapters::Adapter` (base class)

Abstract base. Subclass to build custom adapters.

| Method | Description |
|--------|-------------|
| `.config_class` | Override to return a `Configuration` subclass with adapter-specific attributes. |
| `#write(url, raw_data, **kwargs)` | Abstract. Persist `raw_data` at the storage location implied by `url`. |
| `#read(url) → [raw_data, { content_type: '…' }]` | Abstract. |
| `#delete(url)` | Abstract. |
| `#configure { |c| ... }` | Yield the adapter's configuration. |
| `#process(name = :default, location = nil, **kwargs, &block)` | Register a process. |
| `#external_sitemap(url, lastmod:)` | Add an external sitemap to the index. |
| `#extend_processes_with(mod)` | Mix `mod` into all process blocks. |
| `#url_filter { |url, options| ... }` | Register a URL filter. |
| `#apply_url_filters(url, options)` | Run all filters; returns modified options or `nil` if excluded. |
| `#reset!` | Clear index and repo. Called before `Runner#run`. |

---

## `SiteMaps::Runner`

Executes enqueued processes concurrently.

```ruby
Runner.new(adapter = SiteMaps.current_adapter, max_threads: 4, ping: nil)
```

| Method | Description |
|--------|-------------|
| `#enqueue(process_name, **kwargs)` | Queue one process with kwargs. |
| `#enqueue_remaining` / `#enqueue_all` | Queue every process not yet enqueued. |
| `#run` | Execute queued processes, finalize index, optionally ping. |

---

## `SiteMaps::SitemapBuilder`

Yielded as `s` inside every `process` block.

| Method | Description |
|--------|-------------|
| `#add(path, **options)` | Add one URL to the current URL set. Automatically splits when full. |
| `#finalize!` | Finalize the current URL set. Called automatically when the process block returns. |

`options` supports every extension documented in [extensions.md](extensions.md): `lastmod`, `priority`, `changefreq`, `images`, `videos`, `news`, `alternates`, `mobile`, `pagemap`.

In Rails apps, `s.route` is an object exposing all URL helpers.

---

## `SiteMaps::Middleware`

Rack middleware for serving generated sitemaps. See [middleware.md](middleware.md).

```ruby
use SiteMaps::Middleware,
adapter: ...,
public_prefix: nil,
storage_prefix: nil,
x_robots_tag: 'noindex, follow',
cache_control: 'public, max-age=3600'
```

---

## `SiteMaps::Notification`

| Method | Description |
|--------|-------------|
| `.subscribe(event_or_class, &block)` | Subscribe to one event (string) or every event named on a class. |
| `.unsubscribe(subscriber)` | Remove a subscription. |
| `.instrument(event, payload) { ... }` | Emit an event, wrapping the block in a timer. |

See [events.md](events.md) for the event catalog.

---

## `SiteMaps::RobotsTxt`

| Method | Description |
|--------|-------------|
| `.sitemap_directive(url) → String` | Return `"Sitemap: <url>"`. |
| `.render(sitemap_url:, extra_directives: []) → String` | Build a full robots.txt body. |

---

## `SiteMaps::Ping`

| Method | Description |
|--------|-------------|
| `.ping(url, engines: { bing: '...' }) → Hash` | Fire a GET to each engine's template (substituting `%{url}`). Returns a hash of `{engine => { status:, url: }}`. |

---

## CLI entry point

`exec/site_maps` — the executable shipped with the gem.

```bash
bundle exec site_maps generate [processes] [options]
```

See [cli.md](cli.md).
Loading
Loading