Overview
Add a scraper to packages/sync that extracts liturgical calendar data from lagumisa.web.id/saranps.php and seeds the ordo table via a staging pipeline.
Data extracted:
- Liturgical feast/celebration — name, date, rank, liturgical color
- Scripture readings per celebration
- Puji Syukur song suggestions per celebration
Background
lagumisa.web.id is a public, server-side rendered page — cheerio + undici is sufficient, no Playwright needed. No authentication required. A separate lightweight PublicHttpClient is used (no session management).
Source URL: https://www.lagumisa.web.id/saranps.php
Reference: packages/sync/docs/sync-tdd.md Section 6.
HTML Structure
Each celebration is a <tr> row in the main table.
Date column:
<time class="iconminggu"> <!-- iconminggu = Sunday, icon = weekday -->
<em>Desember</em>
<strong>Minggu</strong>
<span>20</span>
</time>
Content column:
<div id="ccungu" class="circle"></div> <!-- color encoded in id -->
<strong>HARI MINGGU ADVEN IV</strong>
<strong>Bacaan: </strong><a class=aayat>2Sam. 7:1-5...</a>; ...
<strong>Saran Nyanyian: </strong><font class=psnum>PS 440, 441, ...</font>
Color map:
| div id |
LiturgicalColor |
ccungu |
purple |
ccputih |
white |
ccmerah |
red |
ccijau |
green |
ccmerahmuda |
rose |
cchitam |
black |
Rank inference from celebration name prefix:
| Prefix |
CelebrationRank |
HARI RAYA |
solemnity |
PESTA |
feast |
PERINGATAN |
memorial |
Pw. |
commemoration |
| (default) |
feria |
Staging Table
Add to packages/db/src/schema/sync-staging.ts (schema: sync_staging):
export const syncStagingLiturgi = pgTable('liturgi', {
id: serial('id').primaryKey(),
celebrationName: text('celebration_name').notNull(),
month: text('month').notNull(),
dayName: text('day_name').notNull(),
dateNumber: integer('date_number').notNull(),
isSunday: boolean('is_sunday').notNull().default(false),
liturgicalColor: text('liturgical_color').notNull(), // raw id e.g. "ccungu"
massLabel: text('mass_label'),
readings: text('readings').array().notNull().default([]),
songs: text('songs'),
scrapedAt: timestamp('scraped_at', { withTimezone: true }).defaultNow(),
}, (t) => ({
uniq: unique().on(t.celebrationName, t.month, t.dateNumber, t.massLabel).nullsNotDistinct(),
}))
Implementation Plan
1. PublicHttpClient — packages/sync/src/public-http-client.ts
Simple fetch wrapper, no session/auth. Throws SyncScrapeError on non-200.
2. Scraper — packages/sync/src/scraper/liturgi.ts
export async function scrapeLiturgi(client: PublicHttpClient): Promise<RawLiturgiEntry[]>
Fetches /saranps.php, passes HTML to parser.
3. Parser — packages/sync/src/parser/liturgi.ts
Cheerio-based. Per row:
- Extract
month, dayName, dateNumber, isSunday from <time>
- Map circle
div id → liturgicalColor via LITURGICAL_COLOR_MAP
- Extract celebration name (first
<strong> after circle, skip Bacaan: / Saran Nyanyian: labels)
- Split multi-mass entries by sub-mass label (
Misa Malam, Misa Fajar, Misa Siang)
- Extract
<a class=aayat> text → readings[]
- Extract
<font class=psnum> text → songs
4. Constants — packages/sync/src/constants/liturgi.ts
LITURGICAL_COLOR_MAP and RANK_INFERENCE_RULES (prefix → CelebrationRank).
5. Staging writer — extend packages/sync/src/staging/index.ts
export async function writeLiturgiToStaging(
db: DrizzleClient,
entries: RawLiturgiEntry[],
logger: ILogger,
): Promise<void>
Upsert with ON CONFLICT DO NOTHING.
6. Transform — packages/sync/src/transform/liturgi.ts
Maps sync_staging.liturgi → ordo:
export async function transformLiturgi(
db: DrizzleClient,
year: number,
logger: ILogger,
): Promise<void>
- Resolve full
date from month (Indonesian name) + dateNumber + year
- Map
liturgicalColor → LiturgicalColor enum value
- Infer
rank from celebrationName prefix
- Upsert into
ordo with ON CONFLICT (date, massLabel) DO UPDATE — skip if source = 'manual'
- Set
source = 'lagumisa', createdBy = null
7. Export from packages/sync/src/index.ts
Export scrapeLiturgi and transformLiturgi as part of the public API.
Acceptance Criteria
Files to Create / Modify
packages/sync/src/
public-http-client.ts # new
scraper/liturgi.ts # new
parser/liturgi.ts # new
transform/liturgi.ts # new
constants/liturgi.ts # new
staging/index.ts # extend
index.ts # export new public API
packages/db/src/schema/
sync-staging.ts # add syncStagingLiturgi
packages/db/drizzle/
<timestamp>_add_sync_staging_liturgi.sql # new migration
References
- HTML structure analysis:
packages/sync/docs/sync-tdd.md Section 6
- Staging table definition:
docs/erd.md Section 6.1
ordo table definition: docs/erd.md Section 2.5
ILogger injection pattern: docs/tdd.md Section 11.2
- Existing scraper pattern:
packages/sync/src/scraper/umat.ts
Overview
Add a scraper to
packages/syncthat extracts liturgical calendar data from lagumisa.web.id/saranps.php and seeds theordotable via a staging pipeline.Data extracted:
Background
lagumisa.web.idis a public, server-side rendered page — cheerio + undici is sufficient, no Playwright needed. No authentication required. A separate lightweightPublicHttpClientis used (no session management).Source URL:
https://www.lagumisa.web.id/saranps.phpReference:
packages/sync/docs/sync-tdd.mdSection 6.HTML Structure
Each celebration is a
<tr>row in the main table.Date column:
Content column:
Color map:
ccungupurpleccputihwhiteccmerahredccijaugreenccmerahmudarosecchitamblackRank inference from celebration name prefix:
HARI RAYAsolemnityPESTAfeastPERINGATANmemorialPw.commemorationferiaStaging Table
Add to
packages/db/src/schema/sync-staging.ts(schema:sync_staging):Implementation Plan
1.
PublicHttpClient—packages/sync/src/public-http-client.tsSimple fetch wrapper, no session/auth. Throws
SyncScrapeErroron non-200.2. Scraper —
packages/sync/src/scraper/liturgi.tsFetches
/saranps.php, passes HTML to parser.3. Parser —
packages/sync/src/parser/liturgi.tsCheerio-based. Per row:
month,dayName,dateNumber,isSundayfrom<time>div id→liturgicalColorviaLITURGICAL_COLOR_MAP<strong>after circle, skipBacaan:/Saran Nyanyian:labels)Misa Malam,Misa Fajar,Misa Siang)<a class=aayat>text →readings[]<font class=psnum>text →songs4. Constants —
packages/sync/src/constants/liturgi.tsLITURGICAL_COLOR_MAPandRANK_INFERENCE_RULES(prefix → CelebrationRank).5. Staging writer — extend
packages/sync/src/staging/index.tsUpsert with
ON CONFLICT DO NOTHING.6. Transform —
packages/sync/src/transform/liturgi.tsMaps
sync_staging.liturgi→ordo:datefrommonth(Indonesian name) +dateNumber+yearliturgicalColor→LiturgicalColorenum valuerankfromcelebrationNameprefixordowithON CONFLICT (date, massLabel) DO UPDATE— skip ifsource = 'manual'source = 'lagumisa',createdBy = null7. Export from
packages/sync/src/index.tsExport
scrapeLiturgiandtransformLiturgias part of the public API.Acceptance Criteria
PublicHttpClientfetches/saranps.phpwithout authisSunday: true)massLabelliturgicalColorcorrectly maps from div id toLiturgicalColorvaluerankcorrectly inferred from celebration name prefixwriteLiturgiToStaging()upserts without duplicatestransformLiturgi()resolves fulldatecorrectlytransformLiturgi()never overwritesordoentries withsource = 'manual'packages/dbILoggerinjected via constructor throughoutFiles to Create / Modify
References
packages/sync/docs/sync-tdd.mdSection 6docs/erd.mdSection 6.1ordotable definition:docs/erd.mdSection 2.5ILoggerinjection pattern:docs/tdd.mdSection 11.2packages/sync/src/scraper/umat.ts