[Bug]: URL Seeding does not work when Common Crawl indexes are not accessible even on setting the source as "sitemap" only.

### crawl4ai version

0.8.0

### Expected Behavior

URL Seeding should not fail when the source is set to "sitemap" only even if the Common Crawl indexes are not reachable/updatable.

### Current Behavior

On 01/29/2026, the Common Crawl servers were down and the indexing URL (https://index.commoncrawl.org/collinfo.json) was not accessible. This caused the URL Seeding to cause a httpx.ConnectTimeout error. However, this should not happen if the source is set to "sitemap" only as URL seeding using sitemaps should not require the latest Common Crawls index.

### Is this reproducible?

Yes

### Inputs Causing the Bug

```bash
The Common Crawl Indexing URL should not be accessible. However, we can reproduce this issue in code using the steps mentioned below.
```

### Steps to Reproduce

```bash
Create a async function to perform the URL Seeding task. Before you call the urls() method, make the _latest_index value in the seeder, return a HTTP Error. This simulated the behavior of what would happen if the Common Crawl Indexes were not reachable. Below is a method that does exactly this.
```

### Code snippets

```python
async def recreate_cc_error():
    config = SeedingConfig(source="sitemap")

    async with AsyncUrlSeeder(logger=AsyncLogger(verbose=True)) as seeder:
        async def boom(*args, **kwargs):
            print("DEBUG: _latest_index called")
            raise httpx.ConnectTimeout("Simulated CommonCrawl outage")

        seeder._latest_index = boom
        try:
            await seeder.urls("https://docs.crawl4ai.com/", config)
            print("PASS: _latest_index was NOT called (expected after fix).")
        except httpx.ConnectTimeout:
            print("FAIL: _latest_index WAS called even though source='sitemap'.")
```

### OS

macOS

### Python version

3.12.12

### Browser

Chrome, Safari

### Browser version

_No response_

### Error logs & Screenshots (if applicable)

Traceback (most recent call last):
  File ".../site-packages/httpx/_transports/default.py", line 101, in map_httpcore_exceptions
    yield
  File ".../site-packages/httpx/_transports/default.py", line 394, in handle_async_request
    resp = await self._pool.handle_async_request(req)
  File ".../site-packages/httpcore/_async/connection_pool.py", line 256, in handle_async_request
    raise exc from None
  File ".../site-packages/httpcore/_async/connection.py", line 124, in _connect
    stream = await self._network_backend.connect_tcp(**kwargs)
  File ".../site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.ConnectTimeout

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File ".../crawl4ai/async_url_seeder.py", line 405, in urls
    self.index_id = await self._latest_index()
  File ".../crawl4ai/async_url_seeder.py", line 1754, in _latest_index
    j = await c.get(COLLINFO_URL, timeout=10)
  File ".../site-packages/httpx/_client.py", line 1768, in get
    return await self.request(...)
  File ".../site-packages/httpx/_transports/default.py", line 393, in handle_async_request
    raise mapped_exc(message) from exc
httpx.ConnectTimeout

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: URL Seeding does not work when Common Crawl indexes are not accessible even on setting the source as "sitemap" only. #1747

crawl4ai version

Expected Behavior

Current Behavior

Is this reproducible?

Inputs Causing the Bug

Steps to Reproduce

Code snippets

OS

Python version

Browser

Browser version

Error logs & Screenshots (if applicable)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: URL Seeding does not work when Common Crawl indexes are not accessible even on setting the source as "sitemap" only. #1747

Description

crawl4ai version

Expected Behavior

Current Behavior

Is this reproducible?

Inputs Causing the Bug

Steps to Reproduce

Code snippets

OS

Python version

Browser

Browser version

Error logs & Screenshots (if applicable)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions