Skip to content

vidulpanickan/nlplite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLPLite PyPI Downloads

A lightweight NLP library for concept/sentence/paragraph extraction with negation detection geared towards clinical text processing.

Highlights

Fast case insensitive string matches: term matching with longest match capture. Negation & uncertainty: term hits accompanied by negation status :Y (YES), :N (NO), :U (UNCERTAIN). Segment text: return the sentence or paragraph containing terms/phrases of interest or ±n chars around the hit Code mapping: map text to codes (ICD, SNOMED, CUIs, etc) within the text. CLI: single command search, extract, convert codes, or get assertion status.

Install

pip install nlplite

Quick Start

1) Search, locate and extract terms or phrases within a large text file 🕵️

from nlplite import search_terms

text = "Patient has heart failure. He denies chest pain but reports headache."
hits = search_terms(text, ["heart failure", "headache"], window_size="sentence")

print(hits)
# [
#   ('heart failure', 12, 24, 'Patient has heart failure.'),
#   ('headache', 53, 60, 'He denies chest pain but reports headache.')
# ]

Return shape: (term, start_postion, end_position, [context])
window_size may be an int (±N chars), "sentence", "paragraph", or None.

Offsets: Set include_offsets=False to skip start/end locations from results.

2) Translate your text to codes (Clinical usecase: Term-CUI, Term-ICD code)

from nlplite import convert_text_to_codes

dictionary = [("diabetes", "E11"), ("hypertension", "I10"), ("stroke", "I63")]
text = "No stroke. Has hypertension and diabetes."

# All occurrences with locations
rows = convert_text_to_codes(text, dictionary, negation_check=True, unique=False)
print(rows)
# [('I63:N', 3, 8), ('I10:Y', 13, 24), ('E11:Y', 29, 37)]

Notes:

  • When negation_check=True, the code fields carry a flag :Y/:N/:U.

  • If your file is two columns with a header (term,code), pass sep="," (or "tab") and leave header=True (default).

  • Turn offstart/end locations from results by passing include_offsets=False

3) Extract sentences, paragraphs or string surrounding terms of interest 📚

from nlplite import extract_terms_with_window

# Dictionary can be a path to CSV/TSV or an in‑memory dict/list.
dictionary = [("heart failure", "I50.9"), ("chest pain", "R07.9"), ("headache", "R51")]

text = "Patient has heart failure. He denies chest pain but reports headache."
rows = extract_terms_with_window(
    text=text,
    dictionary=dictionary,      # or "terms.csv"
    window_size="sentence",     # 'sentence' | 'paragraph' | int | None
    include_code=None,          # auto-include codes if present
    include_offsets=True,
    negation_check=True         # adds :Y / :N / :U flags
)

print(rows)
# [
#   ('heart failure:Y', 'I50.9:Y', 12, 24, 'Patient has heart failure.'),
#   ('chest pain:N',    'R07.9:N', 33, 42, 'He denies chest pain but reports headache.'),
#   ('headache:Y',      'R51:Y',   53, 60, 'He denies chest pain but reports headache.')
# ]

CLI Quickstart

After installing, use the nlplite command.

Search (inline text) 🔎

nlplite --search \
  --terms "heart","heart failure" \
  --text "Patient has heart failure. He denies chest pain." \
  --window sentence \
  --no-offsets \
  --format json
#  [["heart failure",12,24,"Patient has heart failure."]]

Extract with dictionary file + negation 🧠

# terms.csv (with header):
# term,code
# heart failure,I50.9
# chest pain,R07.9
# headache,R51

nlplite --extract --dict terms.csv --sep "," \
  --text "note.txt" \
  --window paragraph \
  --negation \
  --format text
# Example line:
# Term: chest pain (negated), Code: R07.9, Location: 123-132, Context: "..."

Convert to unique codes only 🔄

nlplite --convert --dict terms.csv --sep "," \
  --text "note.txt" \
  --unique --format json
# → ["I50.9:Y","R07.9:N","R51:Y"]

Tips:

  • Use --neg-window N to restrict how far a negation/uncertainty cue can reach.
  • --format json|csv|text controls output shape.
  • --no-header if your dictionary file has no header row.
  • --convert does not support --window (by design).
  • --no-offsets to skip start/end locations from results.

Notes

  • Matching is case‑insensitive and respects word boundaries; overlapping hits resolve to the longest match first.
  • Performance uses a C‑accelerated automaton when pyahocorasick is present; a pure‑Python fallback maintains portability.
  • Segmentation (window_size) can be an integer (±N characters), "sentence", or "paragraph".

About

A Lightweight library for concept/sentence/paragraph extraction with negation detection.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages