A lightweight NLP library for concept/sentence/paragraph extraction with negation detection geared towards clinical text processing.
Fast case insensitive string matches: term matching with longest match capture.
Negation & uncertainty: term hits accompanied by negation status :Y (YES), :N (NO), :U (UNCERTAIN).
Segment text: return the sentence or paragraph containing terms/phrases of interest or ±n chars around the hit
Code mapping: map text to codes (ICD, SNOMED, CUIs, etc) within the text.
CLI: single command search, extract, convert codes, or get assertion status.
pip install nlplitefrom nlplite import search_terms
text = "Patient has heart failure. He denies chest pain but reports headache."
hits = search_terms(text, ["heart failure", "headache"], window_size="sentence")
print(hits)
# [
# ('heart failure', 12, 24, 'Patient has heart failure.'),
# ('headache', 53, 60, 'He denies chest pain but reports headache.')
# ]Return shape: (term, start_postion, end_position, [context])
window_size may be an int (±N chars), "sentence", "paragraph", or None.
Offsets: Set include_offsets=False to skip start/end locations from results.
from nlplite import convert_text_to_codes
dictionary = [("diabetes", "E11"), ("hypertension", "I10"), ("stroke", "I63")]
text = "No stroke. Has hypertension and diabetes."
# All occurrences with locations
rows = convert_text_to_codes(text, dictionary, negation_check=True, unique=False)
print(rows)
# [('I63:N', 3, 8), ('I10:Y', 13, 24), ('E11:Y', 29, 37)]Notes:
-
When
negation_check=True, the code fields carry a flag:Y/:N/:U. -
If your file is two columns with a header (term,code), pass
sep=","(or"tab") and leaveheader=True(default). -
Turn off
start/endlocations from results by passinginclude_offsets=False
from nlplite import extract_terms_with_window
# Dictionary can be a path to CSV/TSV or an in‑memory dict/list.
dictionary = [("heart failure", "I50.9"), ("chest pain", "R07.9"), ("headache", "R51")]
text = "Patient has heart failure. He denies chest pain but reports headache."
rows = extract_terms_with_window(
text=text,
dictionary=dictionary, # or "terms.csv"
window_size="sentence", # 'sentence' | 'paragraph' | int | None
include_code=None, # auto-include codes if present
include_offsets=True,
negation_check=True # adds :Y / :N / :U flags
)
print(rows)
# [
# ('heart failure:Y', 'I50.9:Y', 12, 24, 'Patient has heart failure.'),
# ('chest pain:N', 'R07.9:N', 33, 42, 'He denies chest pain but reports headache.'),
# ('headache:Y', 'R51:Y', 53, 60, 'He denies chest pain but reports headache.')
# ]After installing, use the nlplite command.
nlplite --search \
--terms "heart","heart failure" \
--text "Patient has heart failure. He denies chest pain." \
--window sentence \
--no-offsets \
--format json
# [["heart failure",12,24,"Patient has heart failure."]]# terms.csv (with header):
# term,code
# heart failure,I50.9
# chest pain,R07.9
# headache,R51
nlplite --extract --dict terms.csv --sep "," \
--text "note.txt" \
--window paragraph \
--negation \
--format text
# Example line:
# Term: chest pain (negated), Code: R07.9, Location: 123-132, Context: "..."nlplite --convert --dict terms.csv --sep "," \
--text "note.txt" \
--unique --format json
# → ["I50.9:Y","R07.9:N","R51:Y"]Tips:
- Use
--neg-window Nto restrict how far a negation/uncertainty cue can reach. --format json|csv|textcontrols output shape.--no-headerif your dictionary file has no header row.--convertdoes not support--window(by design).--no-offsetsto skipstart/endlocations from results.
- Matching is case‑insensitive and respects word boundaries; overlapping hits resolve to the longest match first.
- Performance uses a C‑accelerated automaton when
pyahocorasickis present; a pure‑Python fallback maintains portability. - Segmentation (
window_size) can be an integer (±N characters),"sentence", or"paragraph".