Draft: moving over citation_exists.py requirement #7

anpendyal · 2025-11-04T00:30:01Z

Draft: moving over citation_exists.py requirement and corresponding test; also added eyecite useage for LLM output parsing

…est; also added eyecite useage for LLM output parsing

nrfulton · 2025-11-04T15:42:40Z

mellea_contribs/reqlib/citation_exists.py

+# might not be needed
+# def ensure_list_of_dicts(obj: Any) -> list[dict]:
+#     """
+#     Normalize any JSON-like object into a list of dictionaries.
+
+#     Accepts:
+#       - A JSON string (object or array)
+#       - A single dict
+#       - A list of dicts
+
+#     Args:
+#         obj: Any data type, ideally something that can unpacked into a dictionary
+
+#     Returns:
+#         The unpacked object in list of dictionary form or raises an error.
+#     """
+#     # JSON string
+#     if isinstance(obj, str):
+#         try:
+#             obj = json.loads(obj)
+#         except json.JSONDecodeError as e:
+#             raise ValueError(f"Invalid JSON string: {e!s}")
+
+#     # Single dict
+#     if isinstance(obj, dict):
+#         return [obj]
+
+#     # List of dicts
+#     if isinstance(obj, list):
+#         if all(isinstance(item, dict) for item in obj):
+#             return obj
+#         else:
+#             raise ValueError("List contains non-dictionary elements")
+
+#     raise TypeError(f"Unsupported metadata format: {type(obj)}")
+
+# alternatively:
+# should this take in last_output instead of the whole context?
+# get case name: take LLM output and extract case name --> a string which you get from ctx.last_output() is the input
+# so the argument should be ctx.last_output.value: str


What was the purpose of this?

Remove commented out code.

nrfulton · 2025-11-04T15:43:48Z

mellea_contribs/reqlib/citation_exists.py

+    # install hyperscan if not already installed
+    # !pip install hyperscan
+    # tokenizer = HyperscanTokenizer(cache_dir=".test_cache")
+    # citations = get_citations(cleaned_text, tokenizer=tokenizer)
+
+    # or this?
+    # cleaned_text = clean_text(text, ["html", "all_whitespace"])
+    # citations = get_citations(cleaned_text)


What's going on here?

nrfulton · 2025-11-04T15:45:34Z

mellea_contribs/reqlib/citation_exists.py

+        plaintiff = citation.metadata.get("plaintiff")
+        defendant = citation.metadata.get("defendant")
+        if plaintiff and defendant:
+            case_names.add(f"{plaintiff} v. {defendant}")
+            # name = citation.metadata['plaintiff'] + " v. " + citation.metadata['defendant']
+            # case_names.add(name)


What's the purpose of this?

Are these actually canonical names/references?

What happens if you don't have a plaintiff and defendent? Can that ever happen? If not -> assert. If yes -> handle exceptiona lcases.

nrfulton · 2025-11-04T15:46:21Z

mellea_contribs/reqlib/citation_exists.py

+    # check if this code chunk is right later
+    # db_names = {normalize_case_name(c["name"]) for c in case_metadata if "name" in c}
+    # db_abbrevs = {
+    #     normalize_case_name(c["name_abbreviation"]) for c in case_metadata if "name_abbreviation" in c
+    # }
+
+    # for name in normalized_output_names:
+    #     if name not in db_names and name not in db_abbrevs:
+    #         return ValidationResult(False, reason=f"Case '{name}' not found in database")
+
+    # return ValidationResult(True, reason="All case names found in database")


remove commented out code.

nrfulton · 2025-11-04T15:50:15Z

mellea_contribs/reqlib/citation_exists.py

+    case_names = extract_case_names(ctx)
+
+    if not case_names or not isinstance(case_names, list[str]):
+        return ValidationResult(False, reason="No case names provided in output")


I think this should probably return True. The reason is good.

nrfulton · 2025-11-04T15:55:53Z

mellea_contribs/reqlib/citation_exists.py

+    for case in case_metadata:
+        if 'name' in case:
+            case_names.add(normalize_case_name(case['name']))
+        if 'name_abbreviation' in case:
+            case_name_abb.add(normalize_case_name(case['name_abbreviation']))


This approach seems like it will make a lot of errors.

What about cases where the case name isn't verbatim? E.g., sometimes if there is a large set of parties on one side or the other there will be an abbreviation of that in the cite. State names are often also different in the formal cite vs how they're cited inline.

Additionally, there is a lot of string manipulation here that I think can be streamlined or done in a more principles way. Should we implement this as something like likely_equivalent(c1, c2) where c1 and c2 are eyecite citation objects.

Draft: moving over citation_exists.py requirement and corresponding t…

0d37b51

…est; also added eyecite useage for LLM output parsing

nrfulton self-requested a review November 4, 2025 15:41

nrfulton requested changes Nov 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draft: moving over citation_exists.py requirement #7

Draft: moving over citation_exists.py requirement #7

Uh oh!

anpendyal commented Nov 4, 2025

Uh oh!

nrfulton Nov 4, 2025

Uh oh!

nrfulton Nov 4, 2025

Uh oh!

nrfulton Nov 4, 2025

Uh oh!

nrfulton Nov 4, 2025

Uh oh!

nrfulton Nov 4, 2025

Uh oh!

nrfulton Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Draft: moving over citation_exists.py requirement #7

Are you sure you want to change the base?

Draft: moving over citation_exists.py requirement #7

Uh oh!

Conversation

anpendyal commented Nov 4, 2025

Uh oh!

nrfulton Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

nrfulton Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

nrfulton Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

nrfulton Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

nrfulton Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

nrfulton Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants