Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,7 @@
# CrawlerProtection
Protect wikis against crawler bots

# Configuration

* `$wgCrawlerProtectedSpecialPages` - array of special pages to protect (default: `[ 'recentchangeslinked', 'whatlinkshere' ]`). Supported values are lowercase special page names, titled spacial page names and prefixed special page names.
* `$wgCrawlerProtectionDenyFast` - drop denied requests in a quick way via `die();` with [418 I'm a teapot](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/418) code (default: `false`)
11 changes: 11 additions & 0 deletions extension.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,17 @@
"MediaWikiPerformAction": "main",
"SpecialPageBeforeExecute": "main"
},
"config": {
"CrawlerProtectedSpecialPages": {
"value": [
"recentchangeslinked",
"whatlinkshere"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add "mobilediff" too

]
},
"CrawlerProtectionDenyFast": {
"value": false
}
},
"license-name": "MIT",
"Tests": {
"phpunit": "tests/phpunit"
Expand Down
26 changes: 25 additions & 1 deletion includes/Hooks.php
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ class_alias( '\Article', '\MediaWiki\Page\Article' );

use MediaWiki\Actions\ActionEntryPoint;
use MediaWiki\Hook\MediaWikiPerformActionHook;
use MediaWiki\MediaWikiServices;
use MediaWiki\Output\OutputPage;
use MediaWiki\Page\Article;
use MediaWiki\Request\WebRequest;
Expand Down Expand Up @@ -96,16 +97,39 @@ public function onSpecialPageBeforeExecute( $special, $subPage ) {
return true;
}

$config = MediaWikiServices::getInstance()->getMainConfig();
$protectedSpecialPages = $config->get( 'CrawlerProtectedSpecialPages' );
$denyFast = $config->get( 'CrawlerProtectedSpecialPages' );

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having multiple checks, please add a line to get a version of $protectedSpecialPages which has been tolowercaseed and had Special: stripped from it.

For example:

$result = array_map(
    fn($p) => ($p = strtolower($p)) && strpos($p, NS_SPECIAL_NAME) === 0
        ? substr($p, 8)
        : $p,
    $protectedSpecialPages
);

$name = strtolower( $special->getName() );
if ( in_array( $name, [ 'recentchangeslinked', 'whatlinkshere' ], true ) ) {
if (
// allow forgiving entries in the setting array for Special pages names
in_array( $special->getName(), $protectedSpecialPages, true )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 3 lines will be redundant once the transformation is applied. Please remove the two extra lines

|| in_array( $name, $protectedSpecialPages, true )
|| in_array( 'Special:' . $special->getName(), $protectedSpecialPages, true )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refactor magic word "Special:" to a constant variable at the top of the file

) {
$out = $special->getContext()->getOutput();
if ( $denyFast ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add unit tests to test this branch

$this->denyAccessFast();
}
$this->denyAccess( $out );
return false;
}

return true;
}

/**
* Helper: output 418 Teapot and halt the processing immediately
*
* @return void
* @suppress PhanPluginNeverReturnMethod
*/
protected function denyAccessFast() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Deny access fast" is a subjective name which I don't think properly addresses why one might choose to use this. Naming is a hard problem to solve, so I do empathize. How about we change the 403 vs. 418 preference variable to $wgCrawlerProtectionUse418, and this function's name to denyAccessWith418()?

header( 'HTTP/1.0 418 Forbidden' );
die( 'I am a teapot' );
}

/**
* Helper: output 403 Access Denied page using i18n messages.
*
Expand Down