Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 15 additions & 28 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,37 +17,24 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install Poetry
uses: snok/install-poetry@v1.3.4
with:
version: latest
virtualenvs-create: true
virtualenvs-in-project: true

- name: Load cached venv
id: cached-poetry-dependencies

- name: Install uv
uses: astral-sh/setup-uv@v1

- name: Ensure Python ${{ matrix.python-version }}
run: uv python install ${{ matrix.python-version }}

- name: Cache virtual environment
id: cache-uv-venv
uses: actions/cache@v4
with:
path: .venv
key: venv-${{ runner.os }}-${{ matrix.python-version }}-${{ hashFiles('**/poetry.lock') }}

- name: Install dependencies
if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
run: poetry install --no-interaction --no-root --extras "all"

- name: Install project
run: poetry install --no-interaction --extras "all"

- name: Run tests
run: poetry run pytest tests/ -v

key: uv-venv-${{ runner.os }}-${{ matrix.python-version }}-${{ hashFiles('uv.lock') }}

- name: Sync dependencies
run: uv sync --all-extras --python ${{ matrix.python-version }}

- name: Run tests with coverage
run: |
poetry run pytest tests/ --cov=pyspark_datasources --cov-report=xml --cov-report=term-missing
uv run pytest tests/ --cov=pyspark_datasources --cov-report=xml --cov-report=term-missing

43 changes: 15 additions & 28 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,38 +29,25 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'

- name: Install Poetry
run: |
curl -sSL https://install.python-poetry.org | python - -y
echo "$HOME/.local/bin" >> $GITHUB_PATH

- name: Configure Poetry
run: |
poetry config virtualenvs.create true
poetry config virtualenvs.in-project true

- name: Load cached venv
id: cached-poetry-dependencies

- name: Install uv
uses: astral-sh/setup-uv@v1

- name: Ensure Python 3.11
run: uv python install 3.11

- name: Cache virtual environment
id: cache-uv-venv
uses: actions/cache@v4
with:
path: .venv
key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('**/poetry.lock') }}

- name: Install dependencies
if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
run: poetry install --no-interaction --no-root

- name: Install project
run: poetry install --no-interaction

key: uv-venv-${{ runner.os }}-3.11-${{ hashFiles('uv.lock') }}

- name: Sync dependencies
run: uv sync --python 3.11 --group dev

- name: Build MkDocs
run: poetry run mkdocs build
run: uv run mkdocs build

- name: Setup Pages
uses: actions/configure-pages@v5
Expand Down
49 changes: 25 additions & 24 deletions contributing/DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

### Prerequisites
- Python 3.9-3.12
- Poetry for dependency management
- [uv](https://docs.astral.sh/uv/) for dependency management
- Apache Spark 4.0+ (or Databricks Runtime 15.4 LTS+)

### Installation
Expand All @@ -14,14 +14,14 @@
git clone https://github.com/allisonwang-db/pyspark-data-sources.git
cd pyspark-data-sources

# Install dependencies
poetry install
# Install dependencies (creates .venv/ automatically)
uv sync

# Install with all optional dependencies
poetry install --extras all
uv sync --extra all

# Activate virtual environment
poetry shell
# Activate virtual environment (optional)
source .venv/bin/activate
```

### macOS Setup
Expand Down Expand Up @@ -84,27 +84,27 @@ This project uses [Ruff](https://github.com/astral-sh/ruff) for code formatting

```bash
# Format code
poetry run ruff format .
uv run ruff format .

# Run linter
poetry run ruff check .
uv run ruff check .

# Run linter with auto-fix
poetry run ruff check . --fix
uv run ruff check . --fix

# Check specific file
poetry run ruff check pyspark_datasources/fake.py
uv run ruff check pyspark_datasources/fake.py
```

### Pre-commit Hooks (Optional)

```bash
# Install pre-commit hooks
poetry add --group dev pre-commit
pre-commit install
uv add --dev pre-commit
uv run pre-commit install

# Run manually
pre-commit run --all-files
uv run pre-commit run --all-files
```

## Documentation
Expand Down Expand Up @@ -245,24 +245,24 @@ Add your data source to the table in README.md with examples.

```bash
# Add required dependency
poetry add requests
uv add requests

# Add optional dependency
poetry add --optional faker
uv add --optional faker faker

# Add dev dependency
poetry add --group dev pytest-cov
uv add --dev pytest-cov

# Update dependencies
poetry update
uv sync --upgrade
```

### Managing Extras

Edit `pyproject.toml` to add optional dependency groups:

```toml
[tool.poetry.extras]
[project.optional-dependencies]
mynewsource = ["special-library"]
all = ["faker", "datasets", "special-library", ...]
```
Expand Down Expand Up @@ -363,17 +363,18 @@ import os
os.environ["OBJC_DISABLE_INITIALIZE_FORK_SAFETY"] = "YES"
```

### Poetry Issues
### uv Troubleshooting

```bash
# Clear cache
poetry cache clear pypi --all
uv cache clean

# Update lock file
poetry lock --no-update
# Check that the lockfile matches pyproject
uv lock --check

# Reinstall
poetry install --remove-untracked
# Recreate the virtual environment
rm -rf .venv
uv sync
```

### Spark Session Issues
Expand Down
Loading