switch to google custom search engine by raylu · Pull Request #31 · mat-1/metasearch2

raylu · 2025-10-10T06:00:30Z

copying 4get's homework: https://4get.ca/help-me.html
https://git.lolcat.ca/lolcat/4get/src/branch/master/scraper/google_api.php

this disables the google search engine by default since it no longer works
to use the new metasearch2 google engine, you must create and configure a custom_search_api_key
without setting up billing, this gives you 100 free searches a day

mat-1 · 2025-10-10T14:48:43Z

This looks useful but it doesn't really fit my vision for metasearch (which is that it should Just Work without needing API keys). The scraper for Google (and Bing 🥴) should be fixed instead.

raylu · 2025-10-11T04:45:14Z

do you want me to split out the disable-by-default then?
google search doesn't work as is but it's on by default now
(it also won't work until you start parsing some JS)

mat-1 · 2025-10-11T15:13:28Z

I'd merge if Google Custom Search was implemented as its own separate engine from Google and the google engine was kept as the default, so in theory everything works out of the box but the user can configure Custom Search if they want to.

raylu · 2025-10-14T06:03:50Z

er... but the current google search scraper does not work right now and it likely never will (unless we start parsing some JS)

mat-1 · 2025-10-14T13:22:41Z

The plan is to do whatever it takes to make the scrapers work, I just haven't found the time to update them yet :(

If SearxNG can get something working then it's relatively easy for me to copy their homework, too.

mat-1 · 2025-10-20T19:24:16Z

Just to give an update on this, I have been working on fixing the Google engine. The main change is that requesting the search page (on your first search when logged out) now gives you a JavaScript challenge that you have to execute and set the SG_SS cookie to. After making the first search with the NID cookie that Google gave you and the SG_SS that you generated, you only need to persist the NID cookie to keep getting actual results.

The challenge is some insane cipher (that involves a big base64 string that Google gives you and the current system time) obfuscated with insane Closure Compiler passes (you can tell someone at Google had fun writing them). It's very easily solvable by just including a JavaScript runtime (I was able to get a valid challenge solution in NodeJS/Deno by just setting a few global variables) but I'd prefer deobfuscating the challenge and reimplementing it instead, since fully executing Google's JS would make it way easier for them to detect us in the future by adding more random integrity checks later. This deobfuscation just happens to be very time-consuming, as you may expect (and as Google intended). :)

If you're curious, some of the obfuscation passes that stood out to me were:

Every function/variable/field has a randomly generated name (this is just a normal Closure Compiler thing), but one function is named after the first three characters of the big string (plus an underscore).
Statements are often merged into expressions, like a = 1; b = 2 becomes b = (a = 1, 2).
Most class fields are stored in an array with random indexes.
Most functions are now state machines to make their control flow harder to follow.
Sometimes multiple functions are merged into one, with a number field that's checked with bitwise operators to determine which function it is.
Sometimes constant values are included as parameters.
Most booleans are represented in funny randomly-generated ways like NaN != NaN and ![] == true.
Rarely, statements are wrapped like while (true) { statement; if (true) break }.
Some if statements are converted into switch statements.
Sometimes if/else statements are reversed.

rinj-shine · 2025-10-20T20:16:23Z

Can you share the part of Google code that generates the SG_SS please?

mat-1 · 2025-10-20T20:28:06Z

Can you share the part of Google code that generates the SG_SS please?

Make a request to https://google.com/search?query=meow without any cookies and find the script with challenge_version = 0 and the script immediately before that. The earlier script has a string with JavaScript code that contains the bulk of the challenge.

Here's the NodeJS code I used (uncleaned and barely tested, the code in the string and big base64 string were manually extracted, could easily be done automatically though)

globalThis.window = globalThis
globalThis.performance = { now: () => Date.now() }
this.document = {
    readyState: 'complete' 
}
const innerJs = require('fs').readFileSync('./inner.js', 'utf8')
eval(innerJs);
const bigString = "insert the value of the big base64 string here";
const res = this.knitsail.a(bigString, function () { }, false)
console.log('SG_SS:', res[0]([]))

lukasmaz · 2025-10-20T21:34:47Z

@mat-1 I was trying to run the script after small fixes but I failed. I'm not sure what part of Google Search response must be extracted to inner.js & bigString. I want to run it as a proof of concept for now and improve it later. When you have a moment to add some more details please do.

mat-1 · 2025-10-20T21:49:27Z

Sure, I'll write a better proof of concept in a bit when I have time

mat-1 · 2025-10-20T23:03:52Z

Im confused, how are you able to run their javascript code without an actual browser runtime? Doesn't it check for things like Canvas, GPU configuration and such?

Nope, at least for now their "integrity" checks are quite weak. :)

That's part of why I don't want to rely on a server-side JS runtime though; it wouldn't be particularly hard for Google to improve this in the future and I'd rather not be forced to run a full browser since I know my software is often deployed on very weak servers. In theory (assuming I can consistently reverse-engineer whatever Google does), reimplementing Google's challenge is the most reliable solution for my use-case.

yeyuchen198 · 2025-10-21T02:28:11Z

@mat-1 I was trying to run the script after small fixes but I failed. I'm not sure what part of Google Search response must be extracted to inner.js & bigString. I want to run it as a proof of concept for now and improve it later. When you have a moment to add some more details please do.

inner.js:
(function(){var *********.call(this);'].join('\n')));}).call(this);

bigString:
var p='*******'

lukasmaz · 2025-10-21T08:54:07Z

@yeyuchen198 thanks, yeah, I managed to make it work now but even with the SG_SS cookie generated & set, I'm getting flagged (HTTP 429), like you wrote here.

yeyuchen198 · 2025-10-21T10:01:29Z

@yeyuchen198 thanks, yeah, I managed to make it work now but even with the SG_SS cookie generated & set, I'm getting flagged (HTTP 429), like you wrote here.

I got the same result — the generated SG_SS seems to be invalid, which is why it returns a 429 error. I suspect the JavaScript environment check didn’t pass.

mat-1 · 2025-10-21T13:28:41Z

Oh yeah, you're right, the tokens generated from my NodeJS snippet do always result in a captcha page. I still don't believe the integrity check is particularly sophisticated but it seems there's more to it than I realized.

unixfox · 2025-10-22T20:06:05Z

Maybe you should try it in jsdom. It supports canvas. And that's what Youtube.js uses for also loading a "challenge" javascript code (called Potoken) from YouTube.

Example: https://github.com/LuanRT/BgUtils/blob/main/examples/node/innertube-challenge-fetcher-example.ts

yeyuchen198 · 2025-10-23T11:03:05Z

The following is a part of the environment detection information I obtained in the browser. It differs in Node.js because the exposed environment causes the JS detection to enter a different control flow—for example, one check is performance.nodeTiming, which exists in Node.js but not in the browser.

[Hook] performance.getEntriesByType("navigation") -> [object PerformanceNavigationTiming]
[Hook][get] performance.timeOrigin -> 1761214372048.5
[Hook][get] performance.timing -> [object PerformanceTiming]
[Hook] performance.now() -> 870.9000000953674
[Hook][get] performance.memory -> [object MemoryInfo]


[Hook][get] navigator.webdriver -> false
[Hook][get] navigator.hardwareConcurrency -> 12
[Hook][get] navigator.maxTouchPoints -> 0
[Hook][get] navigator.languages -> zh-CN
[Hook][get] navigator.deviceMemory -> 8
[Hook][get] navigator.connection -> [object NetworkInformation]



[Hook] document.createElement("iframe") -> [object HTMLIFrameElement]
[Hook] document.appendChild({}) -> [object HTMLIFrameElement]
[Hook] document.createElement("div") -> [object HTMLDivElement]
[Hook] document.createElement("img") -> [object HTMLImageElement]
[Hook] document.createEvent("MouseEvents") -> [object MouseEvent]
[Hook] document.removeChild({}) -> [object HTMLIFrameElement]
[Hook] document.createElement("a") -> 
[Hook] document.createElement("iframe") -> [object HTMLIFrameElement]



[Hook][get] window.isSecureContext -> true
[Hook][get] window.trustedTypes -> [object TrustedTypePolicyFactory]
[Hook][get] window.parent -> [object Window]
[Hook][get] window.self -> [object Window]
[Hook][get] window.outerWidth -> 1280
[Hook][get] window.outerHeight -> 672
[Hook][get] window.innerWidth -> 1280
[Hook][get] window.innerHeight -> 551
[Hook][get] window.devicePixelRatio -> 1.5
[Hook][get] window.opener -> null
[Hook][get] window.screen -> [object Screen]
[Hook][get] window.performance -> [object Performance]
[Hook][get] window.navigator -> [object Navigator]
[Hook][get] window.history -> [object History]
[Hook][get] window.localStorage -> [object Storage]
[Hook][get] window.sessionStorage -> [object Storage]


[Hook][get] screen.width -> 1280
[Hook][get] screen.height -> 720
[Hook][get] screen.availWidth -> 1280
[Hook][get] screen.availHeight -> 672
[Hook][get] screen.availLeft -> 0
[Hook][get] screen.availTop -> 0




[Hook][get] history.length -> 50



[Hook] sessionStorage.setItem("o__5aIyZNY2Ixc8P3ZmM8Ag", "1761214372048") -> undefined

mat-1 · 2025-10-23T14:17:06Z

Nice! I'm still working on writing my deobfuscator (going well but taking a while as I don't have much free time) but this is good to know.

lukasmaz · 2025-10-24T06:43:12Z

I've seen some (successful) efforts on csdn.net:
https://blog.csdn.net/m0_66390393/article/details/151690312

but I don't know if there is an article with more details available somewhere, I don't have an account there

raylu · 2026-01-11T21:21:18Z

searxng/searxng#5644

mat-1 mentioned this pull request Oct 20, 2025

Bug: google engine searxng/searxng#5286

Closed

switch to google custom search engine

ab39dd6

raylu force-pushed the google branch from bf042ff to ab39dd6 Compare November 26, 2025 06:29

Conversation

raylu commented Oct 10, 2025

Uh oh!

mat-1 commented Oct 10, 2025

Uh oh!

raylu commented Oct 11, 2025

Uh oh!

mat-1 commented Oct 11, 2025

Uh oh!

raylu commented Oct 14, 2025

Uh oh!

mat-1 commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mat-1 commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rinj-shine commented Oct 20, 2025

Uh oh!

mat-1 commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukasmaz commented Oct 20, 2025

Uh oh!

mat-1 commented Oct 20, 2025

Uh oh!

mat-1 commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yeyuchen198 commented Oct 21, 2025

Uh oh!

lukasmaz commented Oct 21, 2025

Uh oh!

yeyuchen198 commented Oct 21, 2025

Uh oh!

mat-1 commented Oct 21, 2025

Uh oh!

unixfox commented Oct 22, 2025

Uh oh!

yeyuchen198 commented Oct 23, 2025

Uh oh!

mat-1 commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukasmaz commented Oct 24, 2025

Uh oh!

raylu commented Jan 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mat-1 commented Oct 14, 2025 •

edited

Loading

mat-1 commented Oct 20, 2025 •

edited

Loading

mat-1 commented Oct 20, 2025 •

edited

Loading

mat-1 commented Oct 20, 2025 •

edited

Loading

mat-1 commented Oct 23, 2025 •

edited

Loading