Implementation of example storage and example selector classes [draft]#428
Implementation of example storage and example selector classes [draft]#4283LayerPerceptron wants to merge 49 commits intodeeppavlov:devfrom
Conversation
There was a problem hiding this comment.
It appears this PR is a release PR (change its base from master if that is not the case).
Here's a release checklist:
- Update package version
- Update
poetry.lock - Change PR merge option
- Update template repo
- Search for objects to be deprecated
- Test parts not covered with pytest:
- web_api tutorials
- Test integrations with external services (telegram; stats)
|
Также, во время написания у меня возникли вопросы к архитектуре решения. Основная проблема архитектуры заключается в том, что в langchain Из этого следуют описанные ниже проблемы: Проблема владения примерами Возможный фикс **Возникают вопросы. ** Допустим, мы решили, что холдером может быть только один из классов. Если холдер - это селектор из langchain, то зачем нам обертка, почему просто не использовать переопределенный селектор. Если холдер - это обертка, то зачем мы вообще наследуемся от langchain базы? |
RLKRo
left a comment
There was a problem hiding this comment.
А для чего нам дополнительный уровень абстракции в виде обертки?
It's desirable to provide both simple and advanced ways of adding examples.
ExamplePrompt(
examples=[
{"input": "3+4", "output": "7"},
]
)-- this is simple and easy to use in scripts.
example_selector = SemanticSimilarityExampleSelector(vector_store=Chroma(...))
example_selector.add_examples(...)
ExamplePrompt(
examples=example_selector
)-- difficult to setup but more customizable.
Another layer of abstraction would allow the same API for the both options.
|
All coding is done and tests have been added. |
|
I run poe lint and refactored code accordingly. |
chatsky/llm/langchain_context.py
Outdated
| prompt = Prompt.model_validate(element) | ||
| prompt_langchain_message = await message_to_langchain(await prompt.message(ctx), ctx, source="human") | ||
|
|
||
| prompt_messages = await element.to_langchain_messages(ctx) |
There was a problem hiding this comment.
Element is of type Any. It should be converted to Prompt first.
chatsky/llm/prompt.py
Outdated
| from chatsky.core import BaseResponse, AnyResponse, MessageInitTypes, Message, Context | ||
| from chatsky.llm._langchain_imports import HumanMessage, AIMessage, SystemMessage | ||
| from langchain_core.example_selectors.base import BaseExampleSelector | ||
| from chatsky.llm.example_selector import to_langchain_context |
There was a problem hiding this comment.
Imports should be grouped:
https://peps.python.org/pep-0008/#imports
chatsky/llm/prompt.py
Outdated
| Uses Langchain's example selectors and prompt templates for few-shot learning. | ||
| """ | ||
| template: Optional[str] = Field(None) | ||
| examples: Optional[BaseExampleSelector] = None |
There was a problem hiding this comment.
This should be Union[StaticExampleSelector|BaseExampleSelector] to allow initializing the class as
FewShotExamplePrompt(examples=[
{"input": "", "output": ""},
])|
|
||
|
|
||
| @pytest.fixture(name="ctx") | ||
| def ctx() -> Context: |
There was a problem hiding this comment.
What is the difference between this fixture and book_context?
They return almost the same thing.
tests/llm/test_prompt.py
Outdated
| # -------------------- | ||
| # Tests for BasePrompt | ||
| # -------------------- | ||
| class TestBasePrompt: |
tests/llm/test_prompt.py
Outdated
| assert isinstance(result[0], SystemMessage) | ||
| assert "Query: 2 + 2" in result[0].content[0]["text"] | ||
| assert "Answer: 4" in result[0].content[0]["text"] | ||
| assert "Answer the following:" in result[0].content[0]["text"] |
There was a problem hiding this comment.
I think it would be better to compare result to the actual result.
e.g.
assert result == [SystemMessage("...")]
chatsky/llm/prompt.py
Outdated
| prefix=self.prefix, | ||
| suffix=self.suffix, | ||
| ) | ||
| text = prompt_template.format(input=user_input, output="") |
There was a problem hiding this comment.
I don't think we should support including user_input in this prompt.
User request is already added to context history in get_langchain_context.
| assert called_args["vars"]["input"] == "" | ||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_message_to_langchain_error(ctx, monkeypatch): |
There was a problem hiding this comment.
What is the purpose of this test?
| await prompt.to_langchain_messages(ctx) | ||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_template_without_prefix_suffix(self, ctx, monkeypatch): |
There was a problem hiding this comment.
I don't see the point of this test.
Empty strings are allowed in langchain's FewShotPromptTemplate and our FewShotExamplePrompt does not do anything special with suffix or prefix.
It would make more sense to pass None as prefix and suffix instead of empty strings, since our model allows None but FewShotPromptTemplate accepts only strings.
There was a problem hiding this comment.
There are a lot of tests but a lot of them seem unnecessary.
While some more important tests concerning UX are missing.
E.g. test what happens when prefix is None and template is not or vice versa (there should be a warning or exception if prefix is used without template).
Description
Please describe here what changes are made and why.
Checklist
List here tasks to complete in order to mark this PR as ready for review.
To Consider
.ignorefiles, scripts (such aslint), distribution manifest (if files are added/deleted)