Skip to content

solid/webid-search

Repository files navigation

WebID Search

A Next.js web application and crawler for discovering, indexing, and searching Solid WebIDs from the decentralized web.

Live Demo: webid-search.vercel.app

What is a WebID?

A WebID is a unique URI that identifies a person on the web. In the Solid ecosystem, WebIDs point to profile documents that contain information about the user, including links to their data pods and connections to other users via foaf:knows relationships.

Features

Web Application

  • πŸ” Search Interface - Search for WebIDs by name or WebID URL
  • πŸ“Š Multiple Response Formats - API returns JSON, JSON-LD, or Turtle based on Accept header
  • πŸ”— Shareable URLs - Search queries are reflected in the URL for easy sharing

Crawler

  • πŸ•ΈοΈ Social Graph Traversal - Follow foaf:knows links to discover connected WebIDs
  • πŸ“₯ Solid Catalog Integration - Automatically fetches WebIDs from the Solid Catalog
  • ⏸️ Resumable - Resumes from previously crawled profiles
  • πŸ›‘οΈ OIDC Validation - Only indexes WebIDs with valid solid:oidcIssuer declarations

Want Your WebID Listed?

If you want your WebID to appear in search results, submit your information to the Solid Catalog. WebIDs registered there are automatically discovered and indexed by this crawler.

Installation

npm install

Quick Start

Run the Web Application

# Development mode
npm run dev

# Production build
npm run build
npm run start

Crawl WebIDs

# Run crawler with default settings (fetches from Solid Catalog + existing profiles)
npm run crawl

# Add additional seed WebIDs
npm run crawl -- https://example.com/profile/card#me

Prepare Search Data

After crawling, generate the search index:

npm run build:data

API Usage

The search API is available at /api/search and supports content negotiation.

Query Parameters

Parameter Required Description
q Yes Search query (matches against name and WebID URL)

Response Formats

The API returns different formats based on the Accept header:

Accept Header Content-Type Description
application/json application/json Simple JSON with webid, name, and img
application/ld+json application/ld+json Full JSON-LD with semantic context
text/turtle text/turtle RDF Turtle format

Examples

JSON (default):

curl "https://webid-search.vercel.app/api/search?q=tim"

Response:

{
  "query": "tim",
  "count": 2,
  "results": [
    {
      "webid": "https://example.com/tim/profile/card#me",
      "name": "Tim Example",
      "img": "https://example.com/tim/photo.jpg"
    }
  ]
}

JSON-LD:

curl "https://webid-search.vercel.app/api/search?q=tim" \
  -H "Accept: application/ld+json"

Turtle:

curl "https://webid-search.vercel.app/api/search?q=tim" \
  -H "Accept: text/turtle"

Response:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix solid: <http://www.w3.org/ns/solid/terms#> .
@prefix pim: <http://www.w3.org/ns/pim/space#> .
@prefix schema: <https://schema.org/> .

<https://example.com/tim/profile/card#me> foaf:name "Tim Example" .
<https://example.com/tim/profile/card#me> foaf:img <https://example.com/tim/photo.jpg> .

Project Structure

webid-search/
β”œβ”€β”€ app/                      # Next.js App Router
β”‚   β”œβ”€β”€ page.tsx              # Main search page
β”‚   β”œβ”€β”€ layout.tsx            # Root layout
β”‚   β”œβ”€β”€ globals.css           # Global styles (Tailwind)
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   └── search/
β”‚   β”‚       └── route.ts      # Search API endpoint
β”‚   └── components/
β”‚       └── SearchComponent.tsx  # React search UI
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ crawler.ts            # WebID crawler script
β”‚   β”œβ”€β”€ prepareData.ts        # Builds search index from crawled data
β”‚   └── ldo/                   # LDO (Linked Data Objects) type definitions
β”œβ”€β”€ shapes/
β”‚   β”œβ”€β”€ solidProfile.shex     # ShEx shape for Solid profiles
β”‚   └── catalogPerson.shex    # ShEx shape for Solid Catalog entries
β”œβ”€β”€ webids/                   # Crawled WebID Turtle files
β”œβ”€β”€ public/
β”‚   β”œβ”€β”€ profiles.json         # Generated search index (JSON-LD)
β”‚   └── profiles.ttl          # Generated search index (Turtle)
└── package.json

Scripts

Script Description
npm run dev Start development server
npm run build Full build (LDO shapes β†’ TypeScript β†’ search data β†’ Next.js)
npm run build:ldo Generate LDO type definitions from ShEx shapes
npm run build:tsc Compile TypeScript
npm run build:data Generate profiles.json from crawled WebIDs
npm run build:next Build Next.js application
npm run crawl Run the WebID crawler
npm run start Start production server

How the Crawler Works

  1. Seed Collection: Gathers initial WebIDs from:

    • Previously crawled profiles in the webids/ directory
    • The Solid Catalog (fetched automatically)
    • Command-line arguments (additional seeds)
  2. Profile Fetching: Requests each WebID URL with Accept: text/turtle

  3. Validation: Only stores profiles that have a solid:oidcIssuer declaration (indicating a valid Solid WebID)

  4. Social Graph Traversal: Extracts foaf:knows links and adds them to the queue (up to depth 3)

  5. Storage: Saves valid profiles as Turtle files in webids/ (URL-encoded filenames)

Data Preparation

The prepareData.ts script:

  1. Reads all .ttl files from webids/
  2. Parses each profile using LDO with ShEx validation
  3. Extracts key fields: foaf:name, schema:name, solid:oidcIssuer, pim:storage, foaf:img
  4. Generates public/profiles.json (JSON-LD) and public/profiles.ttl (Turtle)

Technologies

Ethical Considerations

This crawler is designed to be respectful of server resources:

  • Concurrent request limiting: Maximum 100 simultaneous requests
  • Public data only: Only indexes publicly accessible profile information
  • OIDC validation: Only stores confirmed Solid WebIDs

Please use responsibly and respect the privacy of WebID owners.

License

MIT

About

An index of WebIDs, search API and search interface

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published