PHP Robots.txt

A modern, fluent PHP package for managing robots.txt rules with type safety and great developer experience.

Requirements

PHP 8.4 or higher
Code coverage driver

Installation

You can install the package via composer:

composer require fkrzski/robots-txt

Documentation

The RobotsTxt class provides a fluent interface for creating and managing robots.txt rules with type safety and immutability.

Basic Methods

Constructor

Creates a new instance of the RobotsTxt class.

$robots = new RobotsTxt();

Output:

// Empty output - no rules defined yet

allow(string $path)

Adds an Allow rule for the specified path. The path must:

Start with a forward slash (/)
Not contain query parameters
Not contain fragments
Not be empty

$robots = new RobotsTxt();
$robots->allow('/public');

Output:

User-agent: *
Allow: /public

disallow(string $path)

Adds a Disallow rule for the specified path. Has the same path requirements as allow().

$robots = new RobotsTxt();
$robots->disallow('/private');

Output:

User-agent: *
Disallow: /private

crawlDelay(int $seconds)

Sets the crawl delay in seconds. The delay value must be non-negative.

$robots = new RobotsTxt();
$robots->crawlDelay(10);

Output:

User-agent: *
Crawl-delay: 10

sitemap(string $url)

Adds a Sitemap URL. The URL must:

Be a valid URL
Use HTTP or HTTPS protocol
Have an .xml extension

$robots = new RobotsTxt();
$robots->sitemap('https://example.com/sitemap.xml');

Output:

Sitemap: https://example.com/sitemap.xml

disallowAll(bool $disallow = true)

A convenience method for quickly blocking access to the entire site. When $disallow is true (default):

Clears all existing rules in the current context (global or user-agent specific)
Adds a single "Disallow: /*" rule
Preserves sitemap entries and rules for other user agents

// Block everything globally
$robots = new RobotsTxt();
$robots
    ->allow('/public')    // This will be cleared
    ->disallow('/admin')  // This will be cleared
    ->disallowAll();     // Only Disallow: /* remains

Output:

User-agent: *
Disallow: /*

Block access only for specific crawler:

$robots = new RobotsTxt();
$robots
    ->disallow('/admin')          // Global rule - keeps
    ->userAgent(CrawlerEnum::GOOGLE)
    ->allow('/public')            // Google rule - cleared
    ->disallow('/private')        // Google rule - cleared
    ->disallowAll()               // Only Disallow: /* for Google
    ->userAgent(CrawlerEnum::BING)
    ->disallow('/secret');        // Bing rule - keeps

Output:

User-agent: *
Disallow: /admin

User-agent: Googlebot
Disallow: /*

User-agent: Bingbot
Disallow: /secret

When $disallow is false, the method does nothing.

userAgent(CrawlerEnum $crawler)

Sets the context for subsequent rules to apply to a specific crawler.

$robots = new RobotsTxt();
$robots->userAgent(CrawlerEnum::GOOGLE);

Output:

User-agent: Googlebot

Combining Methods

Basic Rule Combinations

You can chain multiple rules together:

$robots = new RobotsTxt();
$robots
    ->disallow('/admin')
    ->allow('/public')
    ->crawlDelay(5);

Output:

User-agent: *
Disallow: /admin
Allow: /public
Crawl-delay: 5

Crawler-Specific Rules

You can set rules for specific crawlers:

$robots = new RobotsTxt();
$robots
    ->userAgent(CrawlerEnum::GOOGLE)
    ->disallow('/private')
    ->allow('/public')
    ->crawlDelay(10);

Output:

User-agent: Googlebot
Disallow: /private
Allow: /public
Crawl-delay: 10

Multiple Crawlers

You can define rules for multiple crawlers:

$robots = new RobotsTxt();
$robots
    ->userAgent(CrawlerEnum::GOOGLE)
    ->disallow('/google-private')
    ->userAgent(CrawlerEnum::BING)
    ->disallow('/bing-private');

Output:

User-agent: Googlebot
Disallow: /google-private

User-agent: Bingbot
Disallow: /bing-private

Using forUserAgent()

The forUserAgent() method provides a closure-based syntax for grouping crawler-specific rules:

$robots = new RobotsTxt();
$robots->forUserAgent(CrawlerEnum::GOOGLE, function (RobotsTxt $robots): void {
    $robots
        ->disallow('/private')
        ->allow('/public')
        ->crawlDelay(10);
});

Output:

User-agent: Googlebot
Disallow: /private
Allow: /public
Crawl-delay: 10

Complex Example

Combining global rules, multiple crawlers, and sitemaps:

$robots = new RobotsTxt();
$robots
    ->disallow('/admin')  // Global rule
    ->sitemap('https://example.com/sitemap1.xml')
    ->forUserAgent(CrawlerEnum::GOOGLE, function (RobotsTxt $robots): void {
        $robots
            ->disallow('/google-private')
            ->allow('/public/*');
    })
    ->forUserAgent(CrawlerEnum::BING, function (RobotsTxt $robots): void {
        $robots
            ->disallow('/bing-private')
            ->crawlDelay(5);
    })
    ->sitemap('https://example.com/sitemap2.xml');

Output:

User-agent: *
Disallow: /admin

User-agent: Googlebot
Disallow: /google-private
Allow: /public/*

User-agent: Bingbot
Disallow: /bing-private
Crawl-delay: 5

Sitemap: https://example.com/sitemap1.xml
Sitemap: https://example.com/sitemap2.xml

Working with Wildcards

The library supports wildcards in paths:

$robots = new RobotsTxt();
$robots
    ->disallow('/*.php')    // Block all PHP files
    ->allow('/public/*')    // Allow all under public
    ->disallow('/private/$'); // Exact match for /private/

Output:

User-agent: *
Disallow: /*.php
Allow: /public/*
Disallow: /private/$

toFile(?string $path = null)

Saves the robots.txt content to a file. If no path is provided, saves to robots.txt in the project root directory.

$robots = new RobotsTxt();
$robots
    ->disallow('/admin')
    ->allow('/public');

// Save to default location (project root)
$robots->toFile();

// Save to custom location
$robots->toFile('/var/www/html/robots.txt');

The method will throw a RuntimeException if:

The target directory doesn't exist or isn't writable
The existing robots.txt file isn't writable

Returns true if the file was successfully written.

Best Practices

Start with Global Rules: Define global rules before crawler-specific rules for better organization.
Group Related Rules: Use the forUserAgent() method to group rules for the same crawler.
Use Wildcards Carefully: Be precise with wildcard patterns to avoid unintended matches.
Order Matters: More specific rules should come before more general ones.
Validate Paths: Always ensure paths start with a forward slash and don't contain query parameters or fragments.

Error Handling

The class will throw InvalidArgumentException in the following cases:

Path doesn't start with forward slash
Path contains query parameters or fragments
Path is empty
Sitemap URL is invalid or not HTTP/HTTPS
Sitemap URL doesn't end with .xml
Crawl delay is negative

These validations ensure that the generated robots.txt file is always valid and follows the standard format.

Testing and Development

The project includes several command groups for testing and code quality:

Code Quality & Formatting

# Run profanity checks
composer test:profanity

# Run static analysis with PHPStan
composer analyse

# Format code with Laravel Pint
composer lint

# Check code formatting (without fixing)
composer test:lint

# Run automated refactoring with Rector
composer refactor

# Check refactor suggestions (dry-run)
composer test:refactor

# Check type coverage (100% required)
composer test:type-coverage

Testing

# Check PHP syntax
composer test:syntax

# Run unit tests with coverage
composer test:unit

# Run mutation testing
composer test:unit:mutation

Complete Test Suite

# Run all tests and quality checks
composer test

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details on:

Setting up the development environment
Running tests
Submitting pull requests
Code style guidelines
Reporting issues

Quick Contribution Setup

Fork this repository
Clone your fork: git clone https://github.com/yourusername/php-package-skeleton.git
Install dependencies: composer install
Create a feature branch: git checkout -b feature/amazing-feature
Make your changes and run tests: composer test
Submit a pull request

📜 License

This project is open-sourced software licensed under the MIT License.

👨‍💻 Author

PHP Package Skeleton was created by Filip Krzyżanowski.

🙏 Acknowledgments

Special thanks to the amazing PHP community and the maintainers of:

Happy coding! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.github		.github
art		art
src		src
tests		tests
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
composer.json		composer.json
phpstan.neon.dist		phpstan.neon.dist
phpunit.xml.dist		phpunit.xml.dist
pint.json		pint.json
rector.php		rector.php

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

PHP Robots.txt

Requirements

Installation

Documentation

Basic Methods

Constructor

allow(string $path)

disallow(string $path)

crawlDelay(int $seconds)

sitemap(string $url)

disallowAll(bool $disallow = true)

userAgent(CrawlerEnum $crawler)

Combining Methods

Basic Rule Combinations

Crawler-Specific Rules

Multiple Crawlers

Using forUserAgent()

Complex Example

Working with Wildcards

toFile(?string $path = null)

Best Practices

Error Handling

Testing and Development

Code Quality & Formatting

Testing

Complete Test Suite

🤝 Contributing

Quick Contribution Setup

📜 License

👨‍💻 Author

🙏 Acknowledgments

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 6

Sponsor this project

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages