Skip to content

Agentic collaboration from scratch. Agentic code review.

License

Notifications You must be signed in to change notification settings

aweijnitz/codereviewcrew

Repository files navigation

Basic Agentic Code Review Application

Summary

This is a basic code review application that uses LLM agents to review code. It has a command line interface that takes a folder to scan and a report file name. It uses an agentic collaboration approach, where the agents use a combination of static code analysis and AI to generate a report. The report is generated as a markdown file.

While being a small codebase, it still sketches out some enterprise requirements, such as tenancy (the "owner" concept in the code), concurrency and rate limiting using message queues, data isolation (each run gets it's own dedicated queues and db tables) and basic data retention policy (all ephemeral data is flushed on report completion and at process exit).

Example report

Token Consumption Warning!

This application was developed using a local Ollama instance. For paid LLM services, it might consume a non-trivial amount of tokens and incur non-trivial costs.

Requirements

Developed on MacOSX (Intel Mac). Should work on *nix too without modifications.

Installation and services start

mv .env_example .env # Edit and update to match your local

./scripts/setup.sh # Check for Docker, Ollama and pre-pull docker images and ollama models

npm install # install dependencies

./scripts/startServices.sh # start docker service(s) as background processes

Usage

The frontend is a command line interface. Pass in the folder to scan and a report file name (optional). Note: ./scripts/startServices.sh is a prerequisite. Expect runs to take several minutes, or even hours for large code bases.

./reviewCodeBase.sh < folder to analyze> [report_file.md]

./reviewCodeBase.sh ./src/db /tmp/report.md # Example

Architecture and approach

This is a multi-agent system coordinated by an orchestrator agent. The orchestrator agent is responsible for creating tasks and assigning them to agents that specialize on one specific thing (complexity analysis for example). Agents are responsible for completing tasks and reporting back to the orchestrator. The orchestrator is also responsible for aggregating the results and tasking a summarizing agent with creating a report for the code base.

Static code analysis is used to enrich the context for the complexity analysis.

Models used

The models are set in the .env file. Best results are had with a mix of models, trained on code and on summarizing.

Overall agentic flow

Application Architecture

The application architecture is pretty straight forward. Each run creates a dedicated queue and db tables for the batch and sets up workers with concurrency and rate limiting to not overwhelm the system. The queue workers pick up jobs from the queues and invoke the corresponding agents. All completed reviews are stored in the database and the reporter agent summarizes them from the database contents. A good place to start reading the code is in the src/agents/OrchestratorAgent.ts

Application Infrastructure

The "infrastructure" of the application is on purpose a little over engineered, as I wanted to explore and sketch out some enterprise requirement aspects. The (very) long Ollama response times also calls for some robustness using message queues, as calls regularly timeout.

Learnings and ideas for future work

This is a useful testbed for trying out agent collaboration and experimenting with prompts, different models and model settings (temperature, top_k, repeat_penalty, ...). It does so without bringing the overhead of learning an agent framework or potentially a new programming language (Python). The codebase is also very small (currently less than 20 source files), so it is easy to refactor and understand. The domain (code review) is familiar to any developer and therefore it is easy to evaluate the results as well.

Ideas:

  • Add a feedback/QA loop to improve quality when agents produce poor quality (create a QualityAssuranceAgent to "review the review" and send back for rework before setting task state COMPLETED)
  • Enrich the context for the main review by supplying a description of the purpose of the codebase (ex. "This is a transactional system for invoice processing. [...]")
  • Create subject matter expert agents that focus on things like Frontend/React or backend enterprise code and dispatch tasks accordingly. The static code analysis stage already bins files by type, which is a start.
  • Experiments with domain expertise could be useful too, like eCommerce, or security.
  • Improve report quality with graphs and more details about the most problematic files.
  • Generate three reports from the database and have a final agent pick the best one.
  • Add all per-file reviews to a vector database and create a chatbot that can answer questions about the code, the reviews and suggest improvements (RAG style)

Next level/new direction

  • Create Jira tickets for most problematic files and supply the review text and improvement suggestions in the description
  • Git awareness and Github integration: Create a Github action that runs the review on every PR and posts the report as a comment.

Development

This is a Node.js project. It is written in TypeScript and uses BullMQ for job queueing and Redis for job persistence. The per-file results are stored in a sqlite database. The final report is generated from the database and output to either a file or to standard out.

To develop individual agents and tune prompts, work in src/devhelpers/evalAndTune.ts. It can easily be run inside an IDE and saves a lot of time.

Commands

$ ./scripts/startServices.sh # Start background services (redis and BullMQ monitor)

$ npm run dev # start in dev mode (colorful debug level logging with hard-coded test folder './src/db')

$ npm run start # alias for './reviewCodeBase.sh ./src/db'

$ npm run test # run tests

$ npm run clean && npm run build # clean and build the project

$ ./scripts/stopServices.sh # stop docker services

$ ./scripts/stopServices.sh -p # stop docker services and delete containers and data volumes

Development tips

* Prevent your Mac from sleeping

Jobs take a long time to run and your Mac might decide to go to bed.

$ caffeinate

* Recover from any inconsistent state during development (full data reset)

The code has data cleanup built-in, including at process exit. Inevitably, one will anyway end up in an inconsistent state when experimenting. To recover from any inconsistent state, stop the services with the prune option ('-p') and delete the sqlite database file. This removes all data and containers, and the next start will be a clean start.

$ ./scripts/stopServices.sh -p

$ rm -rf data/sqlite/reports.db

* Monitor BullMQ

The queues can be monitored and inspected using Bull Monitor. Useful for checking progress on long-running jobs (all jobs on my old machine...). http://localhost:3000/queues/

* Detecting when the prompt is too large for a selected model

Experimenting with prompts and models, you can detect when the prompt is too large for the selected model. It typically happens when optimizing for speed, using small models.

Assuming a local Ollama installation, monitor the Ollama logs

$ tail -f ~/.ollama/logs/server.log

If the chosen model cannot accommodate the prompt size, you will see something like this

time=2025-01-04T16:00:37.950+01:00 level=WARN source=runner.go:129 msg="truncating input prompt" limit=2048 prompt=2084 keep=5 new=2048

Use a command-line markdown reader (glow)

Install glow and use it to read the reports. It helps speed up evaluation a lot.

$ ./reviewCodeBase.sh ./src/db /tmp/report.md

$ glow /tmp/report.md

References

Screenshots

Being a CLI tool, there is not much to show. Anyway, here are a few shots.

An example run.

An typical report.

Monitoring dashboard provided by Bull Master.

LICENSE

GPL v3. See LICENSE file for more information.

About

Agentic collaboration from scratch. Agentic code review.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •