diff --git a/README.md b/README.md index 5d0c5c43..78c065c6 100644 --- a/README.md +++ b/README.md @@ -389,30 +389,8 @@ print(f"Syncode augmented LLM output:\n{output}") ```   -## How Does **SynCode** Compare to Other Constrained Decoders? -| Tool | Regex | CFG* | Pre-Computed* | GPL* | -|---------------------------------------------------- |-----------|-----------|:-------------:|------| -| [`LMQL`](https://github.com/eth-sri/lmql) | ✅ | ❌ | ❌ | ❌ | -| [`GUIDANCE`](https://github.com/guidance-ai/guidance) | ✅ | ✅ | ❌ | ❌ | -| [`OUTLINES`](https://github.com/outlines-dev/outlines) | ✅ | ✅ | ✅ | ❌ | -| [`PICARD`](https://github.com/ServiceNow/picard) | ✅ | ✅ | ❌ | ❌ | -| [`SYNCHROMESH`](https://arxiv.org/abs/2201.11227) | ✅ | ✅ | ❌ | ❌ | -| [`LLAMA.CPP`](https://github.com/ggerganov/llama.cpp) | ✅ | ✅ | ❌ | ❌ | -| [`GCD`](https://arxiv.org/abs/2305.13971) | ✅ | ✅ | ❌ | ❌ | -| **SynCode** | **✅** | **✅** | **✅** | **✅** | ---- - -**CFG***: Guide generation with a Context Free Grammar (CFG) - -**Pre-Computed***: Precompute masks over the vocabulary to significantly improve generation speed - -**GPL***: Support general-purpose programming languages, which involve non-context-free fragments, such as indentation in Python and end-of-scope markers in Golang. - -[test-img]: https://github.com/shubhamugare/llm-cfg/actions/workflows/run_tests.yml/badge.svg -[tests]: https://github.com/shubhamugare/llm-cfg/actions/workflows/run_tests.yml - ## 📜 Citation

@@ -437,6 +415,30 @@ print(f"Syncode augmented LLM output:\n{output}") In the SynCode workflow, the LLM takes partial code _Ck_ and generates a distribution for the next token _tk+1_. The incremental parser processes _Ck_ to generate accept sequences _A_, the sequences of terminals that can follow partial code called accept sequences. Simultaneously, the incremental parser computes a remainder _r_ from the partial code, representing the suffix that may change its terminal type in subsequent generations. The backbone of SynCode is the offline construction of a DFA mask store, a lookup table derived from regular expressions representing the terminals of the language grammar. The DFA mask store facilitates efficient traversal of DFA states, enabling the retrieval of masks mapped to each state and accept sequence. SynCode walks over the DFA using the remainder and uses the mask store to compute the mask specific to each accept sequence. By unifying masks for each accept sequence SynCode gets the set of syntactically valid tokens. The LLM iteratively generates a token _tk+1_ using the distribution and the mask, appending it to _Ck_ to create the updated code _Ck+1_. The process continues until the LLM returns the final code _Cn_ based on the defined stop condition. +## How Does **SynCode** Compare to Other Constrained Decoders? + + +| Tool | Regex | CFG* | Pre-Computed* | GPL* | +|---------------------------------------------------- |-----------|-----------|:-------------:|------| +| [`LMQL`](https://github.com/eth-sri/lmql) | ✅ | ❌ | ❌ | ❌ | +| [`GUIDANCE`](https://github.com/guidance-ai/guidance) | ✅ | ✅ | ❌ | ❌ | +| [`OUTLINES`](https://github.com/outlines-dev/outlines) | ✅ | ✅ | ✅ | ❌ | +| [`PICARD`](https://github.com/ServiceNow/picard) | ✅ | ✅ | ❌ | ❌ | +| [`SYNCHROMESH`](https://arxiv.org/abs/2201.11227) | ✅ | ✅ | ❌ | ❌ | +| [`LLAMA.CPP`](https://github.com/ggerganov/llama.cpp) | ✅ | ✅ | ❌ | ❌ | +| [`GCD`](https://arxiv.org/abs/2305.13971) | ✅ | ✅ | ❌ | ❌ | +| **SynCode** | **✅** | **✅** | **✅** | **✅** | +--- + +**CFG***: Guide generation with a Context Free Grammar (CFG) + +**Pre-Computed***: Precompute masks over the vocabulary to significantly improve generation speed + +**GPL***: Support general-purpose programming languages, which involve non-context-free fragments, such as indentation in Python and end-of-scope markers in Golang. + +[test-img]: https://github.com/shubhamugare/llm-cfg/actions/workflows/run_tests.yml/badge.svg +[tests]: https://github.com/shubhamugare/llm-cfg/actions/workflows/run_tests.yml + ## Contact For questions, please contact [Shubham Ugare](mailto:shubhamdugare@gmail.com).