[CoomerFansly] Fix weird index 0 out of bounds error#22
[CoomerFansly] Fix weird index 0 out of bounds error#22you-cant-see-me wants to merge 2 commits intoFansDB:mainfrom
Conversation
|
actually just noticed the studio name doesnt get filled in from json, its just missing from the json... |
|
seems like the issue with xpath scraper is that coomer is now a vite webapp, it sends a barebones html and then loads the page with javascript. so xpath scraping is no longer viable for coomer. sample html reply from coomer: <!doctype html><html prefix="og: https://ogp.me/ns#"><head><meta charset="UTF-8"><meta name="viewport" content="width=device-width,initial-scale=1"><title>Coomer</title><meta http-equiv="delegate-ch" content="sec-ch-ua https://soonpubplatform.online; sec-ch-ua-bitness https://soonpubplatform.online; sec-ch-ua-arch https://soonpubplatform.online; sec-ch-ua-model https://soonpubplatform.online; sec-ch-ua-platform https://soonpubplatform.online; sec-ch-ua-platform-version https://soonpubplatform.online; sec-ch-ua-full-version https://soonpubplatform.online; sec-ch-ua-full-version-list https://soonpubplatform.online; sec-ch-ua-mobile https://soonpubplatform.online"><script defer="" data-api="/api/v1/probable" data-domain="coomer.st" src="/assets/probable-Iq9DWEG2.js"></script><script src="/static/js/lazy-styles.js"></script><link rel="icon" href="/assets/favicon-CPB6l7kH.ico"><meta name="og:type" content="website"><meta name="og:site_name" content="Coomer"><meta name="og:title" content="Coomer"><meta name="og:image" content="https://coomer.st/static/kemono-logo.svg"><meta name="og:image:width" content="150"><meta name="og:image:height" content="150"><script type="module" crossorigin src="/assets/index-Cc6RzG_Y.js"></script><link rel="stylesheet" crossorigin href="/assets/style-AFYTO-AU.css"><script type="module">import.meta.url;import("_").catch(()=>1);(async function*(){})().next();if(location.protocol!="file:"){window.__vite_is_modern_browser=true}</script><script type="module">!function(){if(window.__vite_is_modern_browser)return;console.warn("vite: loading legacy chunks, syntax error above and the same error below should be ignored");var e=document.getElementById("vite-legacy-polyfill"),n=document.createElement("script");n.src=e.src,n.onload=function(){System.import(document.getElementById('vite-legacy-entry').getAttribute('data-src'))},document.body.appendChild(n)}();</script></head><body><div id="root"></div><script nomodule>!function(){var e=document,t=e.createElement("script");if(!("noModule"in t)&&"onbeforeload"in t){var n=!1;e.addEventListener("beforeload",(function(e){if(e.target===t)n=!0;else if(!e.target.hasAttribute("nomodule")||!n)return;e.preventDefault()}),!0),t.type="module",t.src=".",e.head.appendChild(t),t.remove()}}();</script><script nomodule crossorigin id="vite-legacy-polyfill" src="/assets/polyfills-legacy-CTFsgIEY.js"></script><script nomodule crossorigin id="vite-legacy-entry" data-src="/assets/index-legacy-DUvmxCLf.js">System.import(document.getElementById('vite-legacy-entry').getAttribute('data-src'))</script></body></html> |
|
if it is somehow possible to make 2 json requests for one "scrapeSceneURL" query, then there is |
|
i see i can add a second name: Coomer (Fansly)
sceneByURL:
- action: scrapeJson
url:
- https://coomer.st/fansly
scraper: sceneScraper
queryURL: "{url}"
queryURLReplace:
url:
- regex: 'https://coomer.st/fansly/user/(\d+)/post/(\d+)'
with: 'https://coomer.st/api/v1/fansly/user/$1/post/$2'
- action: scrapeJson
url:
- https://coomer.st/fansly
scraper: profileScraper
queryURL: "{url}"
queryURLReplace:
url:
- regex: 'https://coomer.st/fansly/user/(\d+)/post/(\d+)'
with: 'https://coomer.st/api/v1/fansly/user/$1/profile'
sceneByFragment:
- action: scrapeJson
scraper: sceneScraper
queryURL: "{url}"
queryURLReplace:
url:
- regex: 'https://coomer.st/fansly/user/(\d+)/post/(\d+)'
with: 'https://coomer.st/api/v1/fansly/user/$1/post/$2'
- action: scrapeJson
scraper: profileScraper
queryURL: "{url}"
queryURLReplace:
url:
- regex: 'https://coomer.st/fansly/user/(\d+)/post/(\d+)'
with: 'https://coomer.st/api/v1/fansly/user/$1/profile'
jsonScrapers:
sceneScraper:
scene:
Title:
selector: post.content
postProcess:
- replace:
- regex: '\n.*'
with: ""
Details:
selector: post.content
Date:
selector: post.published
postProcess:
- parseDate: 2006-01-02T15:04:05
profileScraper:
scene:
Performers:
Name:
selector: name
Studio:
Name:
selector: name
postProcess:
- replace:
- regex: '$'
with: " (Fansly)"
driver:
headers:
- Key: User-Agent
Value: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:133.0) Gecko/20100101 Firefox/133.0
- Key: Referer
Value: https://coomer.st
- Key: Accept
Value: text/css |
| - replace: | ||
| - regex: '$' | ||
| with: " (Fansly)" | ||
| selector: post.user |
There was a problem hiding this comment.
It returns internal Fansly ID, not username.
There was a problem hiding this comment.
yes, thats why i made the other pr comments with possible ideas and asking for help
the code seemed fine, but just didnt work for me, neither on stash 0.28 nor latest 0.30, nothing i tried worked, even stripping the parser to not parse anything still failed. this works fine, its based on the coomer onlyfans scraper, just makes more sense to use json imo