diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index fc62bab..87ed2a1 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -24,4 +24,4 @@ jobs: node-version: 20 - name: Check formatting - run: npx --yes prettier@latest --log-level=debug --check . + run: npx --yes prettier@3.8.1 --log-level=debug --check . diff --git a/mise.toml b/mise.toml index 4c5e9d3..1c69e58 100644 --- a/mise.toml +++ b/mise.toml @@ -1,7 +1,7 @@ [tools] [tasks] -fmt = { description = "Format files with prettier", run = "npx --yes prettier@latest --write ." } +fmt = { description = "Format files with prettier", run = "npx --yes prettier@3.8.1 --write ." } pre-commit = { description = "Pre-commit hook to format files", depends = [ "fmt", ] } diff --git a/plugin/skills/evaluating/SKILL.md b/plugin/skills/evaluating/SKILL.md index f8df809..c368c0f 100644 --- a/plugin/skills/evaluating/SKILL.md +++ b/plugin/skills/evaluating/SKILL.md @@ -20,7 +20,7 @@ Analyze screened patents by decomposing claims into elements and storing analysi - `patents.db` must exist with `screened_patents` table populated (from screening skill) - Load `investigation-fetching` skill for data retrieval operations -- Load `investigation-recording` skill for data recording operations +- Load `investigation-recording` skill for elements recording ## Constitution @@ -35,9 +35,8 @@ Analyze screened patents by decomposing claims into elements and storing analysi **Skill-Only Database Access**: -- ALWAYS use the Skill tool to load `investigation-recording` for ALL database operations -- NEVER write raw SQL commands or read instruction files from investigation-recording -- The investigation-recording skill handles SQL operations internally when invoked via Skill tool +- Use `investigation-recording` skill for elements recording (LLM interpretation task) +- For claims recording, use sqlite3 JSON functions directly with `output_file` — do NOT pass claim text through LLM generation (see Step 3) ## Skill Orchestration @@ -53,21 +52,42 @@ Analyze screened patents by decomposing claims into elements and storing analysi 2. **Batch Fetch Patent Data** (up to 10 patents in parallel): - Split patents into batches of 10 - For each batch, invoke `Skill: google-patent-cli:patent-fetch` for all patents **in parallel** - - After `fetch_patent` returns each dataset, use `execute_cypher` to get claims. - **You MUST use this EXACT query — do NOT modify the node label or property names:** - ```cypher - MATCH (c:claims) RETURN c.number, c.text + +3. **Record Claims** (for each patent — mechanical, no LLM text generation): + - After `fetch_patent` returns the `output_file`, use sqlite3 JSON functions to INSERT directly. + **Do NOT read claim text and regenerate it — LLM will summarize/compress long repetitive structures.** + ```bash + sqlite3 patents.db " + INSERT OR REPLACE INTO claims (patent_id, claim_number, claim_type, claim_text, created_at, updated_at) + SELECT + '', + CAST(json_extract(value, '$.number') AS INTEGER), + CASE + WHEN CAST(json_extract(value, '$.number') AS INTEGER) = 1 THEN 'independent' + ELSE 'dependent' + END, + json_extract(value, '$.text'), + datetime('now'), + datetime('now') + FROM json_each(json_extract(CAST(readfile('') AS TEXT), '$.claims')); + " + ``` + - After INSERT, verify with: `sqlite3 patents.db "SELECT COUNT(*) FROM claims WHERE patent_id = ''"` + - Then UPDATE `claim_type` for each independent claim identified by reading claims from the DB: + ```bash + sqlite3 patents.db "SELECT claim_number, substr(claim_text, 1, 80) FROM claims WHERE patent_id = ''" + ``` + Identify independent claims (those NOT starting with "前記", "The ... of claim", "請求項", etc.) and UPDATE: + ```bash + sqlite3 patents.db "UPDATE claims SET claim_type = 'independent', updated_at = datetime('now') WHERE patent_id = '' AND claim_number IN ()" ``` - - **CRITICAL**: Do NOT add `ORDER BY toInteger(c.number)` — it causes `c.text` to return `expression: null` due to a Cypher parser bug. - Also do NOT use `MATCH (p:Patent)-[:claims]->(c:claims)` (relationship pattern), `[:HAS_CHILD]->(c:claim)`, `[:claim]->(c:claim)`, `p.claims`, or `[:claims]->(c:claim)`. -3. **Analyze and Record** (for each patent): - - Extract ALL claims (both independent and dependent) +4. **Analyze and Record Elements** (for each patent — LLM interpretation task): + - Read claims from the DB: `sqlite3 patents.db "SELECT claim_number, claim_text FROM claims WHERE patent_id = ''"` - For EACH claim, decompose into constituent elements (A, B, C...) - - Invoke `Skill: investigation-recording` with request "Record claims for patent : " - Invoke `Skill: investigation-recording` with request "Record elements for patent : " -4. **Verify Results**: Confirm all claims and elements are recorded in the database +5. **Verify Results**: Confirm all claims and elements are recorded in the database ## State Management diff --git a/plugin/skills/screening/SKILL.md b/plugin/skills/screening/SKILL.md index c920833..7d94781 100644 --- a/plugin/skills/screening/SKILL.md +++ b/plugin/skills/screening/SKILL.md @@ -21,7 +21,6 @@ Filter collected patents by legal status and relevance to prepare for evaluation - `patents.db` will be initialized by this skill via `investigation-preparing` if it does not exist - `specification.md` must exist (Product/Theme definition) - Load `investigation-fetching` skill for data retrieval operations -- Load `investigation-recording` skill for data recording operations ## Constitution @@ -35,8 +34,8 @@ Filter collected patents by legal status and relevance to prepare for evaluation **Skill-Only Database Access**: -- ALWAYS use the Skill tool to load `investigation-recording` for ALL database operations -- NEVER write raw SQL commands or read instruction files from investigation-recording +- Use `investigation-recording` skill for elements recording (LLM interpretation task) +- For claims and screening recording, use sqlite3 JSON functions directly with `output_file` — do NOT pass text through LLM generation ## Skill Orchestration @@ -65,11 +64,9 @@ Filter collected patents by legal status and relevance to prepare for evaluation 3. **Batch Fetch Patent Data** (up to 10 patents in parallel): - Split unscreened patents into batches of 10 - For each batch, invoke `Skill: google-patent-cli:patent-fetch` for all patents **in parallel** - - From each result, extract: - - `abstract_text` property — the official patent abstract (with 【課題】【解決手段】 format for JP patents) - - `legal_status` property — the patent's current legal status (e.g., `Pending`, `Expired`, `Withdrawn`) - - `title` property - - **CRITICAL**: Do NOT use `snippet` — `snippet` is a search result summary, NOT the official abstract. Always use `abstract_text`. + - From each result, note the `output_file` path — this contains `abstract_text`, `legal_status`, and `title` as JSON fields + - **Do NOT use `execute_cypher`** — all needed data is in the `output_file`, extract with `json_extract()` + - **CRITICAL**: Do NOT use `snippet` — `snippet` is a search result summary, NOT the official abstract. 4. **Evaluate and Record** (for each patent): @@ -78,14 +75,24 @@ Filter collected patents by legal status and relevance to prepare for evaluation - **Relevant**: Matches Theme/Domain, Direct Competitors, Core Tech - **Exception**: Even if domain differs, KEEP if technology could serve as infrastructure or common platform - Legal status handling: - - Record `legal_status` from `fetch_patent` as-is in the database - - Note expired/withdrawn patents in the reason field, but judgment remains based on relevance - Judgment values: `relevant`, `irrelevant` (lowercase) - For each patent, invoke `Skill: investigation-recording` with request "Record screening result for patent : judgment=, legal_status=, reason=, abstract_text=" - - **CRITICAL**: The `abstract_text` passed to recording MUST be the `abstract_text` from `fetch_patent`, NOT the `snippet` from `search_patents`. + After determining judgment and reason, record using sqlite3 JSON functions directly. + **Do NOT pass `abstract_text` through LLM generation — use `readfile()` to extract from `output_file` mechanically:** + + ```bash + sqlite3 patents.db "INSERT OR REPLACE INTO screened_patents (patent_id, judgment, legal_status, reason, abstract_text, updated_at) + VALUES ( + '', + '', + json_extract(CAST(readfile('') AS TEXT), '$.legal_status'), + '', + json_extract(CAST(readfile('') AS TEXT), '$.abstract_text'), + datetime('now') + );" + ``` + + Note: Only `judgment` and `reason` come from LLM analysis. `abstract_text` and `legal_status` are extracted mechanically from the `output_file`. 5. **Verify Results**: Confirm all patents have corresponding `screened_patents` entries