This is an app designed to take company's strategy documents and identify the core strategy of the business across the multiple perspectives, including:
- Financial
- Customer
- Internal
- Enabling
This is currently just software that can used internally and also be shown as a demo to clients, this software is not planned to be distributed.
- Ensure that all dependencies are installed
- node_modules (npm install)
- python packages (see backend/requirements.txt)
- Create a file in the backend directory titled ".env" containing the following text:
OPENAI_API_KEY="your api key here" - Open up two command prompt windows in the frontend sub-folder
- In one terminal in start the python server by running: "npm run dev-backend"
- In the other terminal open the vite react app by running: "npm run dev"
The more documents uploaded the better the system will be able to provide insights into your strategy.
- Docs
- Sheets
- Word
- Excel
- PowerPoint
- Audio and video files
- file types to be determined
- Transcripts
- text files
- additional file types to be determined
- https://dev.to/nagatodev/how-to-connect-flask-to-reactjs-1k8i
- https://dev.to/nagatodev/getting-started-with-flask-1kn1
- create the app: https://www.youtube.com/watch?v=vr-I2HIVmTw
- https://developer.okta.com/blog/2022/03/14/react-vite-number-converter
- need to figure out how to use typescript with all of this **maybe not cause will be slower to learn, but pretty sure it is industry standard
- install react with vite
- followed steps in this article
- played around a bit by editing App.tsx
- install react-markdown to show the results from the Tasks being run
- run "npm install react-markdown"
- I did not log all the work I did here up to 2023-07-27, after this day, I began loggin the work done on the UI here\
Set up backend using python, but realize that all of the same functionality could be implemented with node.js since langchain is supported there also, this would allow the whole system to be javascript based but I'm not sure if this makes that much of a difference and I have worked a lot more with python in the past compared to node.js.
- followed the steps in the article up to here
- python modules explicitly installed with pip install (versions and implicit dependencies found in backend/requirements.txt):
- Flask
- Flask-Cors
- langchain
- "unstructured[local-inference]"
- faiss-cpu
- python-dotenv
- tiktoken
- openai
- pydantic
- run "pip3 freeze" to see all the dependencies currently installed
- run "pip install -r requirements.txt" to install python modules
- had to modify some stuff in the vite.config.ts file
- base.py: this is where the flask app is run from
- load_dotenv environment variables so that ChatOpenAI can work properly
- origins for CORS (can be limited in future, for development all origins are allowed)
- tasks: dict for all the tasks of the current session
- File system, documents loaded, vector store, llm
- implement document loaders
- loading different document types
- loading websites
- A general purpose loader that uses the specific loader depending on the file type
- Document source: to organize loaded documents in the running python environment
- Document store: to take various document sources and split them into smaller chunks that can be inputted to a vector store
- NEED TO DO in the future, will implement ability to add new or update documents (may currently be supported by the vector store)
- vector store: uses OpenAI embeddings and the FAISS Vector Store to allow for retrieval of context based on natural language similarity search
- NEED TO DO: experiment with what text to use for the similarity search here. Should it be just the topic? the whole question?
- implement document loaders
- api route to send the file structure to the UI
- implement path_to_dict to retrieve file structure of available documents
- api route to initalize a task
- requires: BaseTask, Task1Surfacing
- api route to stream the task results
- requires: BaseTask, Task1Surfacing
- api route to save the results (tasks are saved automatically after they finished running, this is just to save a copy of readable text results from hidden_files/api_output to the visible_files/ai_files so that the output can be used for future task input)
- requires: BaseTask, Task1Surfacing
- NEED TO DO: consider updating the vector store and document store after running this so that the next tasks can use these results as input
- task.py: BaseTask: abstract class that captures the essesntial functions and data that a task executed from this backend would have
- specify the basic data that will be stored about the task
- specify how results can be streamed as output (generate_results, generate_results_json_bytes)
- NEED TO DO: specify the format for BaseTask.generate_results() stream output with a dataclass or some reusable class
- specify how the task information is be saved (save function)
- Task1Surfacing: class that allows for executing Strategy Surfacing
- specify the inputs: vector store (for context), available data (for metadata), llm (for the natural language AI, using ChatOpenAI)
- performance management framework
- defined objective categories and sub-objectives to look for
- defined the prompt templates that will allow for the surfacing of the objectives - To be iterated and improved upon
- defined the OpenAI GPT function call schema for formatted objectives - To be iterated and improved upon
- there is a new langchain update that gives a better way to specify this format with a python class
- I also read that higher temperatures may lead to better formatting, however this may not be a good thing since the reason for higher temperature giving better formatting is likely due to hallucination to fill in non-existing data to fit the format. Need to find a way around this
- defined the prompt template sequence for each objective with asynchronous calls to the llm
- defined the structure to store the surfaced objectives from each category
- implemented abstractmethod generate_results
- risk management framework - NOT STARTED
- implement Assessment Task (Task 2)
- next step is to tweak what has been hard coded
-
session saving:
- be able to create sessions with their associated files and history for all tasks that have been run
- be able to load a previous session on startup
- be able to open results from previously run tasks in the UI
-
create an agent that does everything with: https://python.langchain.com/docs/modules/agents/how_to/custom_multi_action_agent