Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
278bce5
Fix part of out-of-date docs and examples
jpcompartir Nov 6, 2025
9e1b34b
add the .handle_output_dir to start replacing .handle_output_file
jpcompartir Nov 6, 2025
16b2c47
bump version and add arrow to deps
jpcompartir Nov 18, 2025
a1d9018
add metadata.json to the output_dir - will be useful in the future fo…
jpcompartir Nov 18, 2025
79b615e
add final alert for hf_embed_df function reporting number of successe…
jpcompartir Nov 18, 2025
56e977d
start the hf_classify_chunks func
jpcompartir Nov 18, 2025
7b263bc
add input val to hf_classify_chunks
jpcompartir Nov 18, 2025
d6b4305
metadata and alerts etc. for hf_classify_chunks
jpcompartir Nov 18, 2025
1e23d54
fix typos in hf_classify_chunks
jpcompartir Nov 18, 2025
f18b46d
add texts to failures and successes for the hf_classify_chunks function
jpcompartir Nov 19, 2025
c626c6f
add the hf_classify_dev (with encrypted endpoints)
jpcompartir Nov 19, 2025
1f4af91
add the hf_get_model_max_length function for ammendments to hf_classi…
jpcompartir Nov 19, 2025
46e46a2
write inference parameters to metadata in hf_embed_df
jpcompartir Nov 19, 2025
de64e16
add max_length parameter to hf_classify_df and write inference parame…
jpcompartir Nov 19, 2025
cec839d
remove ... option for args passing in hf_classify_chunks/df
jpcompartir Nov 19, 2025
c8773f0
Remove max_length from hf_embed_df, and hf_classify_df - the solution…
jpcompartir Nov 20, 2025
7bcc538
add the 'hf_get_endpoint_info()` function to retrieve endpoint details
jpcompartir Nov 20, 2025
ff48329
add @returns for hf_get_endpoint_info
jpcompartir Nov 20, 2025
c1bc774
Update test_embed tests following changes to file writing and arguments
jpcompartir Nov 20, 2025
7f9c7f2
Update README following changes to hf_*_df functions and move to chunks
jpcompartir Nov 20, 2025
af3ae0b
add chunking and updated tests for hf_classify_*, similar to hf_embed_*
jpcompartir Nov 20, 2025
84f9532
add new functions to _pkgdown.yml and new section for HF utilities
jpcompartir Nov 20, 2025
88d2ac1
update the hugging_face_inference vignette in line with recent change…
jpcompartir Nov 20, 2025
fbf5096
update roxygen2 docs for classify
jpcompartir Nov 20, 2025
618c173
Re-factor such that chunks doesn't overwrite variable names - writes …
jpcompartir Nov 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
^EndpointR\.Rproj$
^\.Rproj\.user$
todos.md
dev_docs/
^dev_docs/
README\.Rmd
CONTRIBUTORS\.md
todos\.qmd
Expand All @@ -11,3 +11,4 @@ todos\.qmd
^docs$
^pkgdown$
^\.github$
^test_dir/
5 changes: 2 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -36,12 +36,11 @@ rsconnect/
.Rproj.user
inst/doc
EndpointR.Rproj

*.html
*_dev_files*

dev_docs/project_test_run.qmd
docs

# testing /dev_docs/ artifacts
*.csv
test_dir
metadata_test_dir
5 changes: 3 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: EndpointR
Title: Connects to various Machine Learning inference providers
Version: 0.1.1
Version: 0.1.2
Authors@R:
person("Jack", "Penzer", , "Jack.penzer@sharecreative.com", role = c("aut", "cre"))
Description: EndpointR is a 'batteries included', open-source R package for connecting to various APIs for Machine Learning model predictions. EndpointR is built for company-specific use cases, so may not be useful to a wide audience.
Expand Down Expand Up @@ -32,7 +32,8 @@ Imports:
tibble,
S7,
jsonvalidate,
readr
readr,
arrow
VignetteBuilder: knitr
Depends:
R (>= 3.5)
Expand Down
3 changes: 3 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,15 @@ export(hf_build_request)
export(hf_build_request_batch)
export(hf_build_request_df)
export(hf_classify_batch)
export(hf_classify_chunks)
export(hf_classify_df)
export(hf_classify_text)
export(hf_embed_batch)
export(hf_embed_chunks)
export(hf_embed_df)
export(hf_embed_text)
export(hf_get_endpoint_info)
export(hf_get_model_max_length)
export(hf_perform_request)
export(json_dump)
export(json_schema)
Expand Down
26 changes: 24 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,27 @@
# Endpointr 0.1.2
# EndpointR 0.1.2

- [ ] `hf_embed_df()`, `hf_classify_df()` improved to write to files similarly to the upgrades applied in 0.1qq.1
- **File writing improvements**: `hf_embed_df()` and `hf_classify_df()` now write intermediate results as `.parquet` files to `output_dir` directories, similar to improvements in 0.1.1 for OpenAI functions

- **Parameter changes**: Moved from `batch_size` to `chunk_size` argument across `hf_embed_df()`, `hf_classify_df()`, and `oai_complete_df()` for consistency

- **New chunking functions**: Introduced `hf_embed_chunks()` and `hf_classify_chunks()` for more efficient batch processing with better error handling

- **Dependency update**: Package now depends on `arrow` for faster `.parquet` file writing and reading

- **Metadata tracking**: Hugging Face functions that write to files (`hf_embed_df()`, `hf_classify_df()`, `hf_embed_chunks()`, `hf_classify_chunks()`) now write `metadata.json` to output directories containing:
- Endpoint URL and API key name used
- Processing parameters (chunk_size, concurrent_requests, timeout, max_retries)
- Inference parameters (truncate, max_length)
- Timestamp and row counts
- Useful for debugging, reproducibility, and tracking which models/endpoints were used

- **max_length parameter**: Added `max_length` parameter to `hf_classify_df()` and `hf_classify_chunks()` for text truncation control. Note: `hf_embed_df()` handles truncation automatically via endpoint configuration (set `AUTO_TRUNCATE` in endpoint settings)

- **New utility functions**:
- `hf_get_model_max_length()` - Retrieve maximum token length for a Hugging Face model
- `hf_get_endpoint_info()` - Retrieve detailed information about a Hugging Face Inference Endpoint

- **Improved reporting**: Chunked/batch processing functions now report total successes and failures at completion

# EndpointR 0.1.1

Expand All @@ -15,3 +36,4 @@ Initial BETA release, ships with:
- Support for text completion using OpenAI models via the Chat Completions API
- Support for embeddings with the OpenAI Embeddings API
- Structured outputs via JSON schemas and validators

Loading