Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,4 @@ jobs:
node-version: 20

- name: Check formatting
run: npx --yes prettier@latest --log-level=debug --check .
run: npx --yes prettier@3.8.1 --log-level=debug --check .
2 changes: 1 addition & 1 deletion mise.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[tools]

[tasks]
fmt = { description = "Format files with prettier", run = "npx --yes prettier@latest --write ." }
fmt = { description = "Format files with prettier", run = "npx --yes prettier@3.8.1 --write ." }
pre-commit = { description = "Pre-commit hook to format files", depends = [
"fmt",
] }
Expand Down
48 changes: 34 additions & 14 deletions plugin/skills/evaluating/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Analyze screened patents by decomposing claims into elements and storing analysi

- `patents.db` must exist with `screened_patents` table populated (from screening skill)
- Load `investigation-fetching` skill for data retrieval operations
- Load `investigation-recording` skill for data recording operations
- Load `investigation-recording` skill for elements recording

## Constitution

Expand All @@ -35,9 +35,8 @@ Analyze screened patents by decomposing claims into elements and storing analysi

**Skill-Only Database Access**:

- ALWAYS use the Skill tool to load `investigation-recording` for ALL database operations
- NEVER write raw SQL commands or read instruction files from investigation-recording
- The investigation-recording skill handles SQL operations internally when invoked via Skill tool
- Use `investigation-recording` skill for elements recording (LLM interpretation task)
- For claims recording, use sqlite3 JSON functions directly with `output_file` — do NOT pass claim text through LLM generation (see Step 3)

## Skill Orchestration

Expand All @@ -53,21 +52,42 @@ Analyze screened patents by decomposing claims into elements and storing analysi
2. **Batch Fetch Patent Data** (up to 10 patents in parallel):
- Split patents into batches of 10
- For each batch, invoke `Skill: google-patent-cli:patent-fetch` for all patents **in parallel**
- After `fetch_patent` returns each dataset, use `execute_cypher` to get claims.
**You MUST use this EXACT query — do NOT modify the node label or property names:**
```cypher
MATCH (c:claims) RETURN c.number, c.text

3. **Record Claims** (for each patent — mechanical, no LLM text generation):
- After `fetch_patent` returns the `output_file`, use sqlite3 JSON functions to INSERT directly.
**Do NOT read claim text and regenerate it — LLM will summarize/compress long repetitive structures.**
```bash
sqlite3 patents.db "
INSERT OR REPLACE INTO claims (patent_id, claim_number, claim_type, claim_text, created_at, updated_at)
SELECT
'<patent_id>',
CAST(json_extract(value, '$.number') AS INTEGER),
CASE
WHEN CAST(json_extract(value, '$.number') AS INTEGER) = 1 THEN 'independent'
ELSE 'dependent'
END,
json_extract(value, '$.text'),
datetime('now'),
datetime('now')
FROM json_each(json_extract(CAST(readfile('<output_file>') AS TEXT), '$.claims'));
"
```
- After INSERT, verify with: `sqlite3 patents.db "SELECT COUNT(*) FROM claims WHERE patent_id = '<patent_id>'"`
- Then UPDATE `claim_type` for each independent claim identified by reading claims from the DB:
```bash
sqlite3 patents.db "SELECT claim_number, substr(claim_text, 1, 80) FROM claims WHERE patent_id = '<patent_id>'"
```
Identify independent claims (those NOT starting with "前記", "The ... of claim", "請求項", etc.) and UPDATE:
```bash
sqlite3 patents.db "UPDATE claims SET claim_type = 'independent', updated_at = datetime('now') WHERE patent_id = '<patent_id>' AND claim_number IN (<independent_numbers>)"
```
- **CRITICAL**: Do NOT add `ORDER BY toInteger(c.number)` — it causes `c.text` to return `expression: null` due to a Cypher parser bug.
Also do NOT use `MATCH (p:Patent)-[:claims]->(c:claims)` (relationship pattern), `[:HAS_CHILD]->(c:claim)`, `[:claim]->(c:claim)`, `p.claims`, or `[:claims]->(c:claim)`.

3. **Analyze and Record** (for each patent):
- Extract ALL claims (both independent and dependent)
4. **Analyze and Record Elements** (for each patent — LLM interpretation task):
- Read claims from the DB: `sqlite3 patents.db "SELECT claim_number, claim_text FROM claims WHERE patent_id = '<patent_id>'"`
- For EACH claim, decompose into constituent elements (A, B, C...)
- Invoke `Skill: investigation-recording` with request "Record claims for patent <patent-id>: <claims_data>"
- Invoke `Skill: investigation-recording` with request "Record elements for patent <patent-id>: <elements_data>"

4. **Verify Results**: Confirm all claims and elements are recorded in the database
5. **Verify Results**: Confirm all claims and elements are recorded in the database

## State Management

Expand Down
35 changes: 21 additions & 14 deletions plugin/skills/screening/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ Filter collected patents by legal status and relevance to prepare for evaluation
- `patents.db` will be initialized by this skill via `investigation-preparing` if it does not exist
- `specification.md` must exist (Product/Theme definition)
- Load `investigation-fetching` skill for data retrieval operations
- Load `investigation-recording` skill for data recording operations

## Constitution

Expand All @@ -35,8 +34,8 @@ Filter collected patents by legal status and relevance to prepare for evaluation

**Skill-Only Database Access**:

- ALWAYS use the Skill tool to load `investigation-recording` for ALL database operations
- NEVER write raw SQL commands or read instruction files from investigation-recording
- Use `investigation-recording` skill for elements recording (LLM interpretation task)
- For claims and screening recording, use sqlite3 JSON functions directly with `output_file` — do NOT pass text through LLM generation

## Skill Orchestration

Expand Down Expand Up @@ -65,11 +64,9 @@ Filter collected patents by legal status and relevance to prepare for evaluation
3. **Batch Fetch Patent Data** (up to 10 patents in parallel):
- Split unscreened patents into batches of 10
- For each batch, invoke `Skill: google-patent-cli:patent-fetch` for all patents **in parallel**
- From each result, extract:
- `abstract_text` property — the official patent abstract (with 【課題】【解決手段】 format for JP patents)
- `legal_status` property — the patent's current legal status (e.g., `Pending`, `Expired`, `Withdrawn`)
- `title` property
- **CRITICAL**: Do NOT use `snippet` — `snippet` is a search result summary, NOT the official abstract. Always use `abstract_text`.
- From each result, note the `output_file` path — this contains `abstract_text`, `legal_status`, and `title` as JSON fields
- **Do NOT use `execute_cypher`** — all needed data is in the `output_file`, extract with `json_extract()`
- **CRITICAL**: Do NOT use `snippet` — `snippet` is a search result summary, NOT the official abstract.

4. **Evaluate and Record** (for each patent):

Expand All @@ -78,14 +75,24 @@ Filter collected patents by legal status and relevance to prepare for evaluation
- **Relevant**: Matches Theme/Domain, Direct Competitors, Core Tech
- **Exception**: Even if domain differs, KEEP if technology could serve as infrastructure or common platform

Legal status handling:
- Record `legal_status` from `fetch_patent` as-is in the database
- Note expired/withdrawn patents in the reason field, but judgment remains based on relevance

Judgment values: `relevant`, `irrelevant` (lowercase)

For each patent, invoke `Skill: investigation-recording` with request "Record screening result for patent <patent-id>: judgment=<judgment>, legal_status=<legal_status>, reason=<reason>, abstract_text=<abstract_text from fetch_patent>"
- **CRITICAL**: The `abstract_text` passed to recording MUST be the `abstract_text` from `fetch_patent`, NOT the `snippet` from `search_patents`.
After determining judgment and reason, record using sqlite3 JSON functions directly.
**Do NOT pass `abstract_text` through LLM generation — use `readfile()` to extract from `output_file` mechanically:**

```bash
sqlite3 patents.db "INSERT OR REPLACE INTO screened_patents (patent_id, judgment, legal_status, reason, abstract_text, updated_at)
VALUES (
'<patent_id>',
'<judgment>',
json_extract(CAST(readfile('<output_file>') AS TEXT), '$.legal_status'),
'<reason>',
json_extract(CAST(readfile('<output_file>') AS TEXT), '$.abstract_text'),
datetime('now')
);"
```

Note: Only `judgment` and `reason` come from LLM analysis. `abstract_text` and `legal_status` are extracted mechanically from the `output_file`.

5. **Verify Results**: Confirm all patents have corresponding `screened_patents` entries

Expand Down
Loading