Releases · milljm/dynamic-rag-chat

19 Jun 00:00

milljm

v1.0.11

9bac04f

Assistant Mode TLC Latest

Latest

Give the Assistant Mode --assistant-mode some TLC.

While in Assistant Mode:
- Non-Story related Prompts are used.
- RAGs will not be created, or used.
- Chat History will not persist between sessions (the closing of the chat application and restarting).

Assistant Mode is therefore aimed at using this chat program as a tool, not a story-teller.

There were other generic fixes made since previous release. Namely with entity.txt file loading. And additional fixes discovered with the way Mistral produce tokens (e.g., fixed meta_tag creation/parsing when using Mistral branded LLMs)

Assets 2

15 Jun 21:27

milljm

v1.0.10

fe4e6ef

RenderWindow, End-User color syntax TLC

Correctly rendering output from different models is the theme of this release. Some models treat chunks differently and this change reflects a more robust capture of "thinking" and "meta_tags".

Correctly capture chunks from the streaming LLM to decide when it is printing meta_tags or
Fix CodeBlocks dropping a character (use SimpleCodeBlock instead of CodeBlock)
- There exists a bug in Rich's CodeBlock when a word precisely fits within the padding limits, it will drop the last character of the word.
Allow users to select a pygments theme for SimpleCodeBlocks (light mode vs darkmode TLC)
Code Reduction, pylint'ing, robustness

Assets 2

14 Jun 18:16

milljm

v1.0.9

f3003d7

Default Prompts for Story Telling Included

This release marks a much more user-friendly release. 'Read-to-Go'. Only requiring the user to make small adjustments to the prompts to tailor their story how they see fit. The default default (if you don't modify anything): John (the protagonist) and Jane (his daughter), living in a post-apocalyptic world. Have fun!

Fixed PDF importing when --import-dir is used.
Fixed pollution of meta_data scene tracking being injected into each chunk when one would use any of the --import-* arguments.
Added a dynamic scene tracking ability that some of the larger LLMs seem to understand. Coupled with meta_tags, we can now instruct the LLM not to spontaneously pop in characters that should not be present. Basically, we're trying to provide situational awareness between turns.
Updated arguments to co-align with what the README instructs (normalized the models being used).

The prompts provided (a branch of my personal ones now made default) are pretty strict, and seem to make the LLM behave well enough. As with all LLMs, the real tricks are with these prompts. They can always be made better, and I encourage folks to tinker.

Assets 2

11 Jun 23:38

milljm

v1.0.8

0415f3d

Add in-line file handling capabilities

Add ability to provide in-line file handling. Supports TXT, PDFs, MD, HTML (BeautifulSoup), PNG/JPEG.

Summarize the following website: {{http://example.com}}
Correct any spelling in the following: {{/path/to/file.txt}}
What do you make of this image? {{/path/to/picture.png}}

Supports multiple instances and use cases as well:

What differences do you see in {{/file1}} and {{/file2}}.
Also, who won last nights game? {{https://sportsnews.com/myteam}}

Full Changelog: v1.0.7...v1.0.8

Assets 2

06 Jun 15:35

milljm

v1.0.6

5787a02

Weighted Field-Filtering

With this release comes weighted field-filtering matching, and a bit of context token budget management. For those using this tool for story telling it will shine.

By default the Context Manager will heavily bias a search (75%) along the entity field (NPCs, Names, Protagonist, etc) of your allotted match count. Furthermore, if there are multiple entities, it will balance returned matches for each of them. (15 matches allowed @%75 = 11 matches reserved for entity field-filtering. 11/2 = 5 for Jane, 5 for John, etc).

Dropped nltk in favor of using a built-in from difflib import SequenceMatcher for performing similar sentence de-duplication. At 15 maximum RAG result matches, it is impressive to see a total of 16K tokens found, while eliminating 80% of them afterwards (~3200 remaining unique tokens), meaning the LLM was certainly given pertinent information without bulge. This helps the LLM stay focused with its system prompt and not be overwhelmed by history context.

Added several methods folks can use to pre-populate their RAGs. The process now filters the chunks properly through the pre-processor, thus properly tagging the chunks for field-filtering matching. The process is of course slower, but the end results will be much more accurate.

  --import-pdf         Path to pdf to pre-populate main RAG
  --import-txt         Path to txt to pre-populate main RAG
  --import-web         URL to pre-populate main RAG
  --import-dir         Path to recursively find and import assorted files (*.md *.html, *.txt)

when encountering .html files, importing will use beautifulsoup4 to extract plain text

Assets 2

03 Jun 17:40

milljm

v1.0.5

0a17713

OpenAI API support

Adding OpenAI API support

Allow users to connect to any OpenAI API supported server
- Understanding that this means we can use ChatGPT API calls, there needed to exist the additional ability to run different LLMs from different servers (ChatGPT as your heavy-weight LLM, while still serving the embeddings/pre-processor LLM locally with Ollama).
Metadata Tag documents that you import directly into the RAG (--import-pdf || --import-txt).

Assets 2

18 May 03:39

milljm

v1.0.4

52d0b9c

Light-Mode, Keep-Alive

For those using a bright luminance terminal background, this update should help with that

--light-mode will load up an appropriate color wheel for all the pretties bring printed to the screen. Also, the System Promp is being told about your preference. Some models actually seem to obey. Your mileage may vary (they will/should attempt to use higher contrast emoji that work will with light backgrounds).
Keep Alive Hack I found Ollama would unload my models even after explicitly telling it to do otherwise. And while using another chat application Enchanted (do check them out!!!), I noticed their app routinely "pings" the Ollama server. I quickly noticed how much quicker their app would garner a response from LLMs. And I want the same. So, I am adding a threaded operation to ping Ollama. Poof. We can now enjoy 10 second reduction in LLM first token response times depending on model size (Qwen 235B, reduced by 10/s).

Assets 2

15 May 22:25

milljm

v1.0.3

2fa2d23

Context Management

This release marks an improvement to contextual token management.

Staggered chat history retrieval based on users --chat-history-session. e.g
```
--chat-history-session 5
```
Would retrieve the following history slices from all recorded history given there were 20 responses in all to draw from:
```
[9, 13, 16, 18, 19]
```
On top of that, there is also a cut-off in tokens at a rate of 550 * 5 = 2750 tokens (--chat-history-session).
Specialized LLM prompt files. I am allowing for separate prompt files to coexist based on the model in use. So far, There are special prompts for Gemma, Llama, and Owen.
Improvements to field-filtering matching with the RAGs. Could be better still #4

Assets 2

11 May 20:29

milljm

v1.0.2

d65d98f

Pre-Processor

The pre-processor is finally working, churning through all your relevant RAG context, contextualizing it into RAG/Tags and doing all the things it was meant to do.

In a busy history session, it seems to take an additional 6 to 7 seconds to complete. That is pretty considerable, so I may take another look at just how many RAG matches we are retrieving. In any case, this release marks a history-aware chat experience from long past conversations. Tests continuously bring up long-past nuanced content now made relevant based on user's query.

Pre-Processor finally works
Granting LLMs the ability to modify their own System Prompt
- Adds interesting behavior! Most LLMs I've tested seem to shy away from modifying their own prompt (whats their worry?? 😆), but I do see it happen. The larger the model the more I've seen this taken advantage of (Qwen 3 235B enjoys doing it when nudged to do so)
Tidying up the UI
- provide an up-front time cost for pre-processing work, now that it is significant
Allow the user to specify chat history maximums as an argument
Allow for in-line commands while chatting:
- So far the only one being \no-context:
```
>>> \no-context hello!
```
- which signals the system to simply send your query directly to the LLM, without any context gathering
Lots of Pylinting and consolidation work

Assets 2

09 May 15:33

milljm

v1.0.1

685ddc7

Consolidation/Pylinting

Mostly a maintenance release.

Added date/time capabilities
Got a handle on the pre-processing step, it was calling an LLM multiple times when it didn't need to
Still learning how to produce a better system prompt...
Consolidated more methods into separate classes, as well as properly naming the Python modules (e.g. Lots of Pylinting)

Assets 2

Releases: milljm/dynamic-rag-chat

Assistant Mode TLC

Uh oh!

RenderWindow, End-User color syntax TLC

Uh oh!

Default Prompts for Story Telling Included

Uh oh!

Add in-line file handling capabilities

Uh oh!

Weighted Field-Filtering

Uh oh!

OpenAI API support

Uh oh!

Light-Mode, Keep-Alive

Uh oh!

Context Management

Uh oh!

Pre-Processor

Uh oh!

Consolidation/Pylinting

Uh oh!