A Next.js web application and crawler for discovering, indexing, and searching Solid WebIDs from the decentralized web.
Live Demo: webid-search.vercel.app
A WebID is a unique URI that identifies a person on the web. In the Solid ecosystem, WebIDs point to profile documents that contain information about the user, including links to their data pods and connections to other users via foaf:knows relationships.
- π Search Interface - Search for WebIDs by name or WebID URL
- π Multiple Response Formats - API returns JSON, JSON-LD, or Turtle based on
Acceptheader - π Shareable URLs - Search queries are reflected in the URL for easy sharing
- πΈοΈ Social Graph Traversal - Follow
foaf:knowslinks to discover connected WebIDs - π₯ Solid Catalog Integration - Automatically fetches WebIDs from the Solid Catalog
- βΈοΈ Resumable - Resumes from previously crawled profiles
- π‘οΈ OIDC Validation - Only indexes WebIDs with valid
solid:oidcIssuerdeclarations
If you want your WebID to appear in search results, submit your information to the Solid Catalog. WebIDs registered there are automatically discovered and indexed by this crawler.
npm install# Development mode
npm run dev
# Production build
npm run build
npm run start# Run crawler with default settings (fetches from Solid Catalog + existing profiles)
npm run crawl
# Add additional seed WebIDs
npm run crawl -- https://example.com/profile/card#meAfter crawling, generate the search index:
npm run build:dataThe search API is available at /api/search and supports content negotiation.
| Parameter | Required | Description |
|---|---|---|
q |
Yes | Search query (matches against name and WebID URL) |
The API returns different formats based on the Accept header:
| Accept Header | Content-Type | Description |
|---|---|---|
application/json |
application/json |
Simple JSON with webid, name, and img |
application/ld+json |
application/ld+json |
Full JSON-LD with semantic context |
text/turtle |
text/turtle |
RDF Turtle format |
JSON (default):
curl "https://webid-search.vercel.app/api/search?q=tim"Response:
{
"query": "tim",
"count": 2,
"results": [
{
"webid": "https://example.com/tim/profile/card#me",
"name": "Tim Example",
"img": "https://example.com/tim/photo.jpg"
}
]
}JSON-LD:
curl "https://webid-search.vercel.app/api/search?q=tim" \
-H "Accept: application/ld+json"Turtle:
curl "https://webid-search.vercel.app/api/search?q=tim" \
-H "Accept: text/turtle"Response:
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix solid: <http://www.w3.org/ns/solid/terms#> .
@prefix pim: <http://www.w3.org/ns/pim/space#> .
@prefix schema: <https://schema.org/> .
<https://example.com/tim/profile/card#me> foaf:name "Tim Example" .
<https://example.com/tim/profile/card#me> foaf:img <https://example.com/tim/photo.jpg> .webid-search/
βββ app/ # Next.js App Router
β βββ page.tsx # Main search page
β βββ layout.tsx # Root layout
β βββ globals.css # Global styles (Tailwind)
β βββ api/
β β βββ search/
β β βββ route.ts # Search API endpoint
β βββ components/
β βββ SearchComponent.tsx # React search UI
βββ src/
β βββ crawler.ts # WebID crawler script
β βββ prepareData.ts # Builds search index from crawled data
β βββ ldo/ # LDO (Linked Data Objects) type definitions
βββ shapes/
β βββ solidProfile.shex # ShEx shape for Solid profiles
β βββ catalogPerson.shex # ShEx shape for Solid Catalog entries
βββ webids/ # Crawled WebID Turtle files
βββ public/
β βββ profiles.json # Generated search index (JSON-LD)
β βββ profiles.ttl # Generated search index (Turtle)
βββ package.json
| Script | Description |
|---|---|
npm run dev |
Start development server |
npm run build |
Full build (LDO shapes β TypeScript β search data β Next.js) |
npm run build:ldo |
Generate LDO type definitions from ShEx shapes |
npm run build:tsc |
Compile TypeScript |
npm run build:data |
Generate profiles.json from crawled WebIDs |
npm run build:next |
Build Next.js application |
npm run crawl |
Run the WebID crawler |
npm run start |
Start production server |
-
Seed Collection: Gathers initial WebIDs from:
- Previously crawled profiles in the
webids/directory - The Solid Catalog (fetched automatically)
- Command-line arguments (additional seeds)
- Previously crawled profiles in the
-
Profile Fetching: Requests each WebID URL with
Accept: text/turtle -
Validation: Only stores profiles that have a
solid:oidcIssuerdeclaration (indicating a valid Solid WebID) -
Social Graph Traversal: Extracts
foaf:knowslinks and adds them to the queue (up to depth 3) -
Storage: Saves valid profiles as Turtle files in
webids/(URL-encoded filenames)
The prepareData.ts script:
- Reads all
.ttlfiles fromwebids/ - Parses each profile using LDO with ShEx validation
- Extracts key fields:
foaf:name,schema:name,solid:oidcIssuer,pim:storage,foaf:img - Generates
public/profiles.json(JSON-LD) andpublic/profiles.ttl(Turtle)
- Next.js - React framework with App Router
- LDO (Linked Data Objects) - Type-safe RDF manipulation
- ShEx - Shape Expressions for RDF validation
- Tailwind CSS - Utility-first CSS framework
- JSON-LD - JSON for Linked Data
This crawler is designed to be respectful of server resources:
- Concurrent request limiting: Maximum 100 simultaneous requests
- Public data only: Only indexes publicly accessible profile information
- OIDC validation: Only stores confirmed Solid WebIDs
Please use responsibly and respect the privacy of WebID owners.
MIT