Skip to content

Releases: milljm/dynamic-rag-chat

Assistant Mode TLC

19 Jun 00:00
9bac04f

Choose a tag to compare

Give the Assistant Mode --assistant-mode some TLC.

  • While in Assistant Mode:
    • Non-Story related Prompts are used.
    • RAGs will not be created, or used.
    • Chat History will not persist between sessions (the closing of the chat application and restarting).

Assistant Mode is therefore aimed at using this chat program as a tool, not a story-teller.

There were other generic fixes made since previous release. Namely with entity.txt file loading. And additional fixes discovered with the way Mistral produce tokens (e.g., fixed meta_tag creation/parsing when using Mistral branded LLMs)

RenderWindow, End-User color syntax TLC

15 Jun 21:27
fe4e6ef

Choose a tag to compare

Correctly rendering output from different models is the theme of this release. Some models treat chunks differently and this change reflects a more robust capture of "thinking" and "meta_tags".

  • Correctly capture chunks from the streaming LLM to decide when it is printing meta_tags or
  • Fix CodeBlocks dropping a character (use SimpleCodeBlock instead of CodeBlock)
    • There exists a bug in Rich's CodeBlock when a word precisely fits within the padding limits, it will drop the last character of the word.
  • Allow users to select a pygments theme for SimpleCodeBlocks (light mode vs darkmode TLC)
  • Code Reduction, pylint'ing, robustness

Default Prompts for Story Telling Included

14 Jun 18:16
f3003d7

Choose a tag to compare

This release marks a much more user-friendly release. 'Read-to-Go'. Only requiring the user to make small adjustments to the prompts to tailor their story how they see fit. The default default (if you don't modify anything): John (the protagonist) and Jane (his daughter), living in a post-apocalyptic world. Have fun!

  • Fixed PDF importing when --import-dir is used.
  • Fixed pollution of meta_data scene tracking being injected into each chunk when one would use any of the --import-* arguments.
  • Added a dynamic scene tracking ability that some of the larger LLMs seem to understand. Coupled with meta_tags, we can now instruct the LLM not to spontaneously pop in characters that should not be present. Basically, we're trying to provide situational awareness between turns.
  • Updated arguments to co-align with what the README instructs (normalized the models being used).

The prompts provided (a branch of my personal ones now made default) are pretty strict, and seem to make the LLM behave well enough. As with all LLMs, the real tricks are with these prompts. They can always be made better, and I encourage folks to tinker.

Add in-line file handling capabilities

11 Jun 23:38
0415f3d

Choose a tag to compare

Add ability to provide in-line file handling. Supports TXT, PDFs, MD, HTML (BeautifulSoup), PNG/JPEG.

Summarize the following website: {{http://example.com}}
Correct any spelling in the following: {{/path/to/file.txt}}
What do you make of this image? {{/path/to/picture.png}}

Supports multiple instances and use cases as well:

What differences do you see in {{/file1}} and {{/file2}}.
Also, who won last nights game? {{https://sportsnews.com/myteam}}

Full Changelog: v1.0.7...v1.0.8

Weighted Field-Filtering

06 Jun 15:35
5787a02

Choose a tag to compare

With this release comes weighted field-filtering matching, and a bit of context token budget management. For those using this tool for story telling it will shine.

By default the Context Manager will heavily bias a search (75%) along the entity field (NPCs, Names, Protagonist, etc) of your allotted match count. Furthermore, if there are multiple entities, it will balance returned matches for each of them. (15 matches allowed @%75 = 11 matches reserved for entity field-filtering. 11/2 = 5 for Jane, 5 for John, etc).

Dropped nltk in favor of using a built-in from difflib import SequenceMatcher for performing similar sentence de-duplication. At 15 maximum RAG result matches, it is impressive to see a total of 16K tokens found, while eliminating 80% of them afterwards (~3200 remaining unique tokens), meaning the LLM was certainly given pertinent information without bulge. This helps the LLM stay focused with its system prompt and not be overwhelmed by history context.

Added several methods folks can use to pre-populate their RAGs. The process now filters the chunks properly through the pre-processor, thus properly tagging the chunks for field-filtering matching. The process is of course slower, but the end results will be much more accurate.

  --import-pdf         Path to pdf to pre-populate main RAG
  --import-txt         Path to txt to pre-populate main RAG
  --import-web         URL to pre-populate main RAG
  --import-dir         Path to recursively find and import assorted files (*.md *.html, *.txt)

when encountering .html files, importing will use beautifulsoup4 to extract plain text

OpenAI API support

03 Jun 17:40
0a17713

Choose a tag to compare

Adding OpenAI API support

  • Allow users to connect to any OpenAI API supported server
    • Understanding that this means we can use ChatGPT API calls, there needed to exist the additional ability to run different LLMs from different servers (ChatGPT as your heavy-weight LLM, while still serving the embeddings/pre-processor LLM locally with Ollama).
  • Metadata Tag documents that you import directly into the RAG (--import-pdf || --import-txt).

Light-Mode, Keep-Alive

18 May 03:39
52d0b9c

Choose a tag to compare

For those using a bright luminance terminal background, this update should help with that

  • --light-mode will load up an appropriate color wheel for all the pretties bring printed to the screen. Also, the System Promp is being told about your preference. Some models actually seem to obey. Your mileage may vary (they will/should attempt to use higher contrast emoji that work will with light backgrounds).

  • Keep Alive Hack I found Ollama would unload my models even after explicitly telling it to do otherwise. And while using another chat application Enchanted (do check them out!!!), I noticed their app routinely "pings" the Ollama server. I quickly noticed how much quicker their app would garner a response from LLMs. And I want the same. So, I am adding a threaded operation to ping Ollama. Poof. We can now enjoy 10 second reduction in LLM first token response times depending on model size (Qwen 235B, reduced by 10/s).

Context Management

15 May 22:25
2fa2d23

Choose a tag to compare

This release marks an improvement to contextual token management.

  • Staggered chat history retrieval based on users --chat-history-session. e.g

    --chat-history-session 5

    Would retrieve the following history slices from all recorded history given there were 20 responses in all to draw from:

    [9, 13, 16, 18, 19]

    On top of that, there is also a cut-off in tokens at a rate of 550 * 5 = 2750 tokens (--chat-history-session).

  • Specialized LLM prompt files. I am allowing for separate prompt files to coexist based on the model in use. So far, There are special prompts for Gemma, Llama, and Owen.

  • Improvements to field-filtering matching with the RAGs. Could be better still #4

Pre-Processor

11 May 20:29
d65d98f

Choose a tag to compare

The pre-processor is finally working, churning through all your relevant RAG context, contextualizing it into RAG/Tags and doing all the things it was meant to do.

In a busy history session, it seems to take an additional 6 to 7 seconds to complete. That is pretty considerable, so I may take another look at just how many RAG matches we are retrieving. In any case, this release marks a history-aware chat experience from long past conversations. Tests continuously bring up long-past nuanced content now made relevant based on user's query.

  • Pre-Processor finally works
  • Granting LLMs the ability to modify their own System Prompt
    • Adds interesting behavior! Most LLMs I've tested seem to shy away from modifying their own prompt (whats their worry?? 😆), but I do see it happen. The larger the model the more I've seen this taken advantage of (Qwen 3 235B enjoys doing it when nudged to do so)
  • Tidying up the UI
    • provide an up-front time cost for pre-processing work, now that it is significant
  • Allow the user to specify chat history maximums as an argument
  • Allow for in-line commands while chatting:
    • So far the only one being \no-context:
    >>> \no-context hello!
    
    • which signals the system to simply send your query directly to the LLM, without any context gathering
  • Lots of Pylinting and consolidation work

Consolidation/Pylinting

09 May 15:33
685ddc7

Choose a tag to compare

Mostly a maintenance release.

  • Added date/time capabilities
  • Got a handle on the pre-processing step, it was calling an LLM multiple times when it didn't need to
  • Still learning how to produce a better system prompt...
  • Consolidated more methods into separate classes, as well as properly naming the Python modules (e.g. Lots of Pylinting)