NLPLite

A lightweight NLP library for concept/sentence/paragraph extraction with negation detection geared towards clinical text processing.

Highlights

Fast case insensitive string matches: term matching with longest match capture. Negation & uncertainty: term hits accompanied by negation status :Y (YES), :N (NO), :U (UNCERTAIN). Segment text: return the sentence or paragraph containing terms/phrases of interest or ±n chars around the hit Code mapping: map text to codes (ICD, SNOMED, CUIs, etc) within the text. CLI: single command search, extract, convert codes, or get assertion status.

Install

pip install nlplite

Quick Start

1) Search, locate and extract terms or phrases within a large text file 🕵️

from nlplite import search_terms

text = "Patient has heart failure. He denies chest pain but reports headache."
hits = search_terms(text, ["heart failure", "headache"], window_size="sentence")

print(hits)
# [
#   ('heart failure', 12, 24, 'Patient has heart failure.'),
#   ('headache', 53, 60, 'He denies chest pain but reports headache.')
# ]

Return shape: (term, start_postion, end_position, [context])
window_size may be an int (±N chars), "sentence", "paragraph", or None.

Offsets: Set include_offsets=False to skip start/end locations from results.

2) Translate your text to codes (Clinical usecase: Term-CUI, Term-ICD code)

from nlplite import convert_text_to_codes

dictionary = [("diabetes", "E11"), ("hypertension", "I10"), ("stroke", "I63")]
text = "No stroke. Has hypertension and diabetes."

# All occurrences with locations
rows = convert_text_to_codes(text, dictionary, negation_check=True, unique=False)
print(rows)
# [('I63:N', 3, 8), ('I10:Y', 13, 24), ('E11:Y', 29, 37)]

Notes:

When negation_check=True, the code fields carry a flag :Y/:N/:U.
If your file is two columns with a header (term,code), pass sep="," (or "tab") and leave header=True (default).
Turn offstart/end locations from results by passing include_offsets=False

3) Extract sentences, paragraphs or string surrounding terms of interest 📚

from nlplite import extract_terms_with_window

# Dictionary can be a path to CSV/TSV or an in‑memory dict/list.
dictionary = [("heart failure", "I50.9"), ("chest pain", "R07.9"), ("headache", "R51")]

text = "Patient has heart failure. He denies chest pain but reports headache."
rows = extract_terms_with_window(
    text=text,
    dictionary=dictionary,      # or "terms.csv"
    window_size="sentence",     # 'sentence' | 'paragraph' | int | None
    include_code=None,          # auto-include codes if present
    include_offsets=True,
    negation_check=True         # adds :Y / :N / :U flags
)

print(rows)
# [
#   ('heart failure:Y', 'I50.9:Y', 12, 24, 'Patient has heart failure.'),
#   ('chest pain:N',    'R07.9:N', 33, 42, 'He denies chest pain but reports headache.'),
#   ('headache:Y',      'R51:Y',   53, 60, 'He denies chest pain but reports headache.')
# ]

CLI Quickstart

After installing, use the nlplite command.

Search (inline text) 🔎

nlplite --search \
  --terms "heart","heart failure" \
  --text "Patient has heart failure. He denies chest pain." \
  --window sentence \
  --no-offsets \
  --format json
#  [["heart failure",12,24,"Patient has heart failure."]]

Extract with dictionary file + negation 🧠

# terms.csv (with header):
# term,code
# heart failure,I50.9
# chest pain,R07.9
# headache,R51

nlplite --extract --dict terms.csv --sep "," \
  --text "note.txt" \
  --window paragraph \
  --negation \
  --format text
# Example line:
# Term: chest pain (negated), Code: R07.9, Location: 123-132, Context: "..."

Convert to unique codes only 🔄

nlplite --convert --dict terms.csv --sep "," \
  --text "note.txt" \
  --unique --format json
# → ["I50.9:Y","R07.9:N","R51:Y"]

Tips:

Use --neg-window N to restrict how far a negation/uncertainty cue can reach.
--format json|csv|text controls output shape.
--no-header if your dictionary file has no header row.
--convert does not support --window (by design).
--no-offsets to skip start/end locations from results.

Notes

Matching is case‑insensitive and respects word boundaries; overlapping hits resolve to the longest match first.
Performance uses a C‑accelerated automaton when pyahocorasick is present; a pure‑Python fallback maintains portability.
Segmentation (window_size) can be an integer (±N characters), "sentence", or "paragraph".

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
src/nlplite		src/nlplite
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLPLite

Highlights

Install

Quick Start

1) Search, locate and extract terms or phrases within a large text file 🕵️

2) Translate your text to codes (Clinical usecase: Term-CUI, Term-ICD code)

3) Extract sentences, paragraphs or string surrounding terms of interest 📚

CLI Quickstart

Search (inline text) 🔎

Extract with dictionary file + negation 🧠

Convert to unique codes only 🔄

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NLPLite

Highlights

Install

Quick Start

1) Search, locate and extract terms or phrases within a large text file 🕵️

2) Translate your text to codes (Clinical usecase: Term-CUI, Term-ICD code)

3) Extract sentences, paragraphs or string surrounding terms of interest 📚

CLI Quickstart

Search (inline text) 🔎

Extract with dictionary file + negation 🧠

Convert to unique codes only 🔄

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages