Skip to content

Pr 88#89

Open
dreamGirl1996 wants to merge 5 commits intoccprocessor:mainfrom
dreamGirl1996:pr-88
Open

Pr 88#89
dreamGirl1996 wants to merge 5 commits intoccprocessor:mainfrom
dreamGirl1996:pr-88

Conversation

@dreamGirl1996
Copy link
Copy Markdown

No description provided.

ql101 and others added 5 commits April 14, 2026 13:36
- Add LLM invoke retry with exponential backoff and llm_retry_stats in pipeline summary
- Extend settings for LLM timeout and retry env vars
- Schema merge failures now record error in phase result; schema_extraction uses shared retry
- run_jsonl_web2json_pipeline: merge-summary from disk when no prior summary, --only-failed, per-cluster and pipeline elapsed time
- Add aggregate_site_pipeline_stats.py to sum token usage and time across jsonl pipeline outputs

Made-with: Cursor
…cs for ms-web-mma

- Add classify_crawl_jsonl_dir, crawl_jsonl helpers (split, manifest-friendly rows)
- Slice rows use layout_cluster_id / crawl_source_name / crawl_line_no (no _w2j)
- Export APIs from web2json.__init__
- Add ms-web-mma flow doc and Jupyter Spark checklist

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant