-
Notifications
You must be signed in to change notification settings - Fork 37
Open
Description
DefaultNormalizer should concatenate relative url begin with '?' with the url which conatains it, not the base url.
For example, if the base url is http://www.some.com, when crawler follows the relative url "?pageno=3" in page http://www.some.com/sample, the DefaultNormalizer will return http://www.some.com?pageno=3,but not http://www.some.com/sample?pageno=3, which it should be.
I solved this problem by change the interface method signature from LinkNormalizer#String normalize( final String relativeUrl) to String normalize(final String urlToCrawl, final String relativeUrl), and in PageCrawlerExecutor#run() invoke normalizer.normalize(urlToCrawl.link(), l);
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels