Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 109 additions & 0 deletions .github/workflows/publish_to_pypi.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
name: Build and publish to PyPI

on:
release:
types: [published]
workflow_dispatch: # Allows you to run this workflow manually from the Actions tab

jobs:
build_wheels:
name: Build wheels on ${{ matrix.os }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest, macos-latest]
python-version: ['3.8', '3.9', '3.10', '3.11', '3.12']
exclude:
# Add any exclusions if certain OS/Python combinations are problematic
# - os: macos-latest
# python-version: '3.12'

steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Gets all history for proper versioning

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}

- name: Install build dependencies
run: |
python -m pip install --upgrade pip
pip install build wheel setuptools

- name: Build wheels
run: |
python -m build --wheel --outdir dist/

- name: Upload wheels
uses: actions/upload-artifact@v3
with:
name: wheels-${{ matrix.os }}-${{ matrix.python-version }}
path: dist/*.whl

build_sdist:
name: Build source distribution
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Gets all history for proper versioning

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.12'

- name: Install build dependencies
run: |
python -m pip install --upgrade pip
pip install build twine

- name: Build sdist
run: |
python -m build --sdist --outdir dist/

- name: Check metadata
run: |
twine check dist/*.tar.gz

- name: Upload sdist
uses: actions/upload-artifact@v3
with:
name: sdist
path: dist/*.tar.gz

publish:
name: Publish to PyPI
needs: [build_wheels, build_sdist]
runs-on: ubuntu-latest
# Only publish on release
if: github.event_name == 'release' && github.event.action == 'published'
environment:
name: pypi
url: https://pypi.org/project/syncode/
permissions:
id-token: write # For PyPI trusted publishing

steps:
- name: Download all artifacts
uses: actions/download-artifact@v3
with:
path: dist

- name: Flatten dist directory
run: |
mkdir -p flat_dist
find dist -type f -name "*.whl" -o -name "*.tar.gz" -exec cp {} flat_dist \;
ls -la flat_dist

- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
packages-dir: flat_dist
verbose: true
# skip-existing: true # Uncomment if you want to skip existing versions
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,5 @@ tmp*
cache/
.ipynb_checkpoints/
*.prof
dist/
syncode.egg-info/
37 changes: 24 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,27 +48,30 @@ Define your own grammar using simple EBNF syntax. Check out our [notebooks direc
| 🎲 Sample with any existing decoding strategy (eg. greedy, beam search, nucleus sampling) |


## 📖 More About **SynCode**
## 🚀 Quick Start
### Python Installation and Usage Instructions

### How **SynCode** works?
You can install SynCode via PyPI:

<img width="750" alt="Screenshot 2024-03-21 at 2 22 15 AM" src="https://github.com/uiuc-focal-lab/syncode/assets/14147610/d9d73072-3c9b-47d4-a941-69d5cf8fb1bf">
```bash
pip install syncode
```

In the SynCode workflow, the LLM takes partial code _C<sub>k</sub>_ and generates a distribution for the next token _t<sub>k+1</sub>_. The incremental parser processes _C<sub>k</sub>_ to generate accept sequences _A_, the sequences of terminals that can follow partial code called accept sequences. Simultaneously, the incremental parser computes a remainder _r_ from the partial code, representing the suffix that may change its terminal type in subsequent generations. The backbone of SynCode is the offline construction of a DFA mask store, a lookup table derived from regular expressions representing the terminals of the language grammar. The DFA mask store facilitates efficient traversal of DFA states, enabling the retrieval of masks mapped to each state and accept sequence. SynCode walks over the DFA using the remainder and uses the mask store to compute the mask specific to each accept sequence. By unifying masks for each accept sequence SynCode gets the set of syntactically valid tokens. The LLM iteratively generates a token _t<sub>k+1</sub>_ using the distribution and the mask, appending it to _C<sub>k</sub>_ to create the updated code _C<sub>k+1</sub>_. The process continues until the LLM returns the final code _C<sub>n</sub>_ based on the defined stop condition.
Alternatively, you can install the latest development version directly from GitHub:

## 🚀 Quick Start
### Python Installation and Usage Instructions
Simply install SynCode via PyPi using the following command:
``` bash
```bash
pip install git+https://github.com/uiuc-focal-lab/syncode.git
```

Note: SynCode depends on HuggingFace [transformers](https://github.com/huggingface/transformers):
| SynCode version | Recommended transformers version |
| -------------- | -------------------------------- |
| `v0.1.4` (latest) | `v4.44.0` |
| `v0.1.2` | `v4.42.0` |
#### Version Compatibility

SynCode depends on HuggingFace [transformers](https://github.com/huggingface/transformers):

| SynCode version | Required transformers version | Python version |
| -------------- | ----------------------------- | -------------- |
| `v0.4.1` (latest) | `v4.44.0` | 3.6 - 3.12 |

**Note:** Python 3.13 is not currently supported due to dependency constraints.

### Usage option 1:
SynCode can be used as a simple logit processor with HuggingFace [transformers](https://github.com/huggingface/transformers) library interface. Check this [notebook](./notebooks/example_logits_processor.ipynb) for example.
Expand Down Expand Up @@ -426,6 +429,14 @@ print(f"Syncode augmented LLM output:\n{output}")
}
```

## 📖 More About **SynCode**

### How **SynCode** works?

<img width="750" alt="Screenshot 2024-03-21 at 2 22 15 AM" src="https://github.com/uiuc-focal-lab/syncode/assets/14147610/d9d73072-3c9b-47d4-a941-69d5cf8fb1bf">

In the SynCode workflow, the LLM takes partial code _C<sub>k</sub>_ and generates a distribution for the next token _t<sub>k+1</sub>_. The incremental parser processes _C<sub>k</sub>_ to generate accept sequences _A_, the sequences of terminals that can follow partial code called accept sequences. Simultaneously, the incremental parser computes a remainder _r_ from the partial code, representing the suffix that may change its terminal type in subsequent generations. The backbone of SynCode is the offline construction of a DFA mask store, a lookup table derived from regular expressions representing the terminals of the language grammar. The DFA mask store facilitates efficient traversal of DFA states, enabling the retrieval of masks mapped to each state and accept sequence. SynCode walks over the DFA using the remainder and uses the mask store to compute the mask specific to each accept sequence. By unifying masks for each accept sequence SynCode gets the set of syntactically valid tokens. The LLM iteratively generates a token _t<sub>k+1</sub>_ using the distribution and the mask, appending it to _C<sub>k</sub>_ to create the updated code _C<sub>k+1</sub>_. The process continues until the LLM returns the final code _C<sub>n</sub>_ based on the defined stop condition.

## Contact
For questions, please contact [Shubham Ugare](mailto:shubhamdugare@gmail.com).

3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ build-backend = "setuptools.build_meta"

[project]
name = "syncode"
version = "0.4.0"
version="0.4.1"
requires-python = ">=3.6,<3.13"
description = "Grammar-guided code generation tool"
readme = "README.md"
authors = [
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,6 @@ interegular
regex==2023.8.8
torch
tqdm
transformers==4.44.0
transformers==4.44.0; python_version < "3.13"
datasets
jsonschema
3 changes: 2 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import setuptools
python_requires=">=3.6,<3.13",

with open("README.md", "r", encoding="utf-8") as fh:
long_description = fh.read()
Expand All @@ -17,7 +18,7 @@

setuptools.setup(
name="syncode",
version="0.4.0",
version="0.4.1",
author="Shubham Ugare",
author_email="shubhamugare@gmail.com",
description="This package provides the tool for grammar augmented LLM generation.",
Expand Down