WebID Search

A Next.js web application and crawler for discovering, indexing, and searching Solid WebIDs from the decentralized web.

What is a WebID?

A WebID is a unique URI that identifies a person on the web. In the Solid ecosystem, WebIDs point to profile documents that contain information about the user, including links to their data pods and connections to other users via foaf:knows relationships.

Features

Web Application

🔍 Search Interface - Search for WebIDs by name or WebID URL
📊 Multiple Response Formats - API returns JSON, JSON-LD, or Turtle based on Accept header
🔗 Shareable URLs - Search queries are reflected in the URL for easy sharing

Crawler

🕸️ Social Graph Traversal - Follow foaf:knows links to discover connected WebIDs
📥 Solid Catalog Integration - Automatically fetches WebIDs from the Solid Catalog
⏸️ Resumable - Resumes from previously crawled profiles
🛡️ OIDC Validation - Only indexes WebIDs with valid solid:oidcIssuer declarations

Want Your WebID Listed?

If you want your WebID to appear in search results, submit your information to the Solid Catalog. WebIDs registered there are automatically discovered and indexed by this crawler.

Installation

npm install

Quick Start

Run the Web Application

# Development mode
npm run dev

# Production build
npm run build
npm run start

Crawl WebIDs

# Run crawler with default settings (fetches from Solid Catalog + existing profiles)
npm run crawl

# Add additional seed WebIDs
npm run crawl -- https://example.com/profile/card#me

Prepare Search Data

After crawling, generate the search index:

npm run build:data

API Usage

The search API is available at /api/search and supports content negotiation.

Query Parameters

Parameter	Required	Description
`q`	Yes	Search query (matches against name and WebID URL)

Response Formats

The API returns different formats based on the Accept header:

Accept Header	Content-Type	Description
`application/json`	`application/json`	Simple JSON with webid, name, and img
`application/ld+json`	`application/ld+json`	Full JSON-LD with semantic context
`text/turtle`	`text/turtle`	RDF Turtle format

Examples

JSON (default):

curl "https://webid-search.vercel.app/api/search?q=tim"

Response:

{
  "query": "tim",
  "count": 2,
  "results": [
    {
      "webid": "https://example.com/tim/profile/card#me",
      "name": "Tim Example",
      "img": "https://example.com/tim/photo.jpg"
    }
  ]
}

JSON-LD:

curl "https://webid-search.vercel.app/api/search?q=tim" \
  -H "Accept: application/ld+json"

Turtle:

curl "https://webid-search.vercel.app/api/search?q=tim" \
  -H "Accept: text/turtle"

Response:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix solid: <http://www.w3.org/ns/solid/terms#> .
@prefix pim: <http://www.w3.org/ns/pim/space#> .
@prefix schema: <https://schema.org/> .

<https://example.com/tim/profile/card#me> foaf:name "Tim Example" .
<https://example.com/tim/profile/card#me> foaf:img <https://example.com/tim/photo.jpg> .

Project Structure

webid-search/
├── app/                      # Next.js App Router
│   ├── page.tsx              # Main search page
│   ├── layout.tsx            # Root layout
│   ├── globals.css           # Global styles (Tailwind)
│   ├── api/
│   │   └── search/
│   │       └── route.ts      # Search API endpoint
│   └── components/
│       └── SearchComponent.tsx  # React search UI
├── src/
│   ├── crawler.ts            # WebID crawler script
│   ├── prepareData.ts        # Builds search index from crawled data
│   └── ldo/                   # LDO (Linked Data Objects) type definitions
├── shapes/
│   ├── solidProfile.shex     # ShEx shape for Solid profiles
│   └── catalogPerson.shex    # ShEx shape for Solid Catalog entries
├── webids/                   # Crawled WebID Turtle files
├── public/
│   ├── profiles.json         # Generated search index (JSON-LD)
│   └── profiles.ttl          # Generated search index (Turtle)
└── package.json

Scripts

Script	Description
`npm run dev`	Start development server
`npm run build`	Full build (LDO shapes → TypeScript → search data → Next.js)
`npm run build:ldo`	Generate LDO type definitions from ShEx shapes
`npm run build:tsc`	Compile TypeScript
`npm run build:data`	Generate `profiles.json` from crawled WebIDs
`npm run build:next`	Build Next.js application
`npm run crawl`	Run the WebID crawler
`npm run start`	Start production server

How the Crawler Works

Seed Collection: Gathers initial WebIDs from:
- Previously crawled profiles in the webids/ directory
- The Solid Catalog (fetched automatically)
- Command-line arguments (additional seeds)
Profile Fetching: Requests each WebID URL with Accept: text/turtle
Validation: Only stores profiles that have a solid:oidcIssuer declaration (indicating a valid Solid WebID)
Social Graph Traversal: Extracts foaf:knows links and adds them to the queue (up to depth 3)
Storage: Saves valid profiles as Turtle files in webids/ (URL-encoded filenames)

Data Preparation

The prepareData.ts script:

Reads all .ttl files from webids/
Parses each profile using LDO with ShEx validation
Extracts key fields: foaf:name, schema:name, solid:oidcIssuer, pim:storage, foaf:img
Generates public/profiles.json (JSON-LD) and public/profiles.ttl (Turtle)

Technologies

Next.js - React framework with App Router
LDO (Linked Data Objects) - Type-safe RDF manipulation
ShEx - Shape Expressions for RDF validation
Tailwind CSS - Utility-first CSS framework
JSON-LD - JSON for Linked Data

Ethical Considerations

This crawler is designed to be respectful of server resources:

Concurrent request limiting: Maximum 100 simultaneous requests
Public data only: Only indexes publicly accessible profile information
OIDC validation: Only stores confirmed Solid WebIDs

Please use responsibly and respect the privacy of WebID owners.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
app		app
public		public
shapes		shapes
src		src
webids		webids
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
proxy.ts		proxy.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WebID Search

What is a WebID?

Features

Web Application

Crawler

Want Your WebID Listed?

Installation

Quick Start

Run the Web Application

Crawl WebIDs

Prepare Search Data

API Usage

Query Parameters

Response Formats

Examples

Project Structure

Scripts

How the Crawler Works

Data Preparation

Technologies

Ethical Considerations

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

solid/webid-search

Folders and files

Latest commit

History

Repository files navigation

WebID Search

What is a WebID?

Features

Web Application

Crawler

Want Your WebID Listed?

Installation

Quick Start

Run the Web Application

Crawl WebIDs

Prepare Search Data

API Usage

Query Parameters

Response Formats

Examples

Project Structure

Scripts

How the Crawler Works

Data Preparation

Technologies

Ethical Considerations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages