Refactor/hf df functions by jpcompartir · Pull Request #33 · jpcompartir/EndpointR

jpcompartir · 2025-11-20T15:11:30Z

Version bump to 0.1.2

Release notes:

EndpointR 0.1.2

File writing improvements: hf_embed_df() and hf_classify_df() now write intermediate results as .parquet files to output_dir directories, similar to improvements in 0.1.1 for OpenAI functions
Parameter changes: Moved from batch_size to chunk_size argument across hf_embed_df(), hf_classify_df(), and oai_complete_df() for consistency
New chunking functions: Introduced hf_embed_chunks() and hf_classify_chunks() for more efficient batch processing with better error handling
Dependency update: Package now depends on arrow for faster .parquet file writing and reading
Metadata tracking: Hugging Face functions that write to files (hf_embed_df(), hf_classify_df(), hf_embed_chunks(), hf_classify_chunks()) now write metadata.json to output directories containing:
- Endpoint URL and API key name used
- Processing parameters (chunk_size, concurrent_requests, timeout, max_retries)
- Inference parameters (truncate, max_length)
- Timestamp and row counts
- Useful for debugging, reproducibility, and tracking which models/endpoints were used
max_length parameter: Added max_length parameter to hf_classify_df() and hf_classify_chunks() for text truncation control. Note: hf_embed_df() handles truncation automatically via endpoint configuration (set AUTO_TRUNCATE in endpoint settings)
New utility functions:
- hf_get_model_max_length() - Retrieve maximum token length for a Hugging Face model
- hf_get_endpoint_info() - Retrieve detailed information about a Hugging Face Inference Endpoint
Improved reporting: Chunked/batch processing functions now report total successes and failures at completion

pass output_file to hf_embed_chunks from inside hf_embed_df to fix the filenmae issue

…r debugging people's code/errors (including my own)

…s/failures

…fy and later hf_embed

add max_length to hf_embed_df and hf_embed_df

…ters to metadata

move hf_classify_df over to hf_classify_chunks not hf_classify_batch remove old comments from hf_embed_df

… is to turn on 'AUTO_TRUNCATE' in the Set up of the endpoint

Build README Update NEWS.md

…s to the hf_*_df functions.

build rd files add comma in embed test add test/dev docs to .Rbuildignore

…to files with the origianl variable names.

jpcompartir added 25 commits November 6, 2025 10:48

Fix part of out-of-date docs and examples

278bce5

pass output_file to hf_embed_chunks from inside hf_embed_df to fix the filenmae issue

add the .handle_output_dir to start replacing .handle_output_file

9e1b34b

bump version and add arrow to deps

16b2c47

add metadata.json to the output_dir - will be useful in the future fo…

a1d9018

…r debugging people's code/errors (including my own)

add final alert for hf_embed_df function reporting number of successe…

79b615e

…s/failures

start the hf_classify_chunks func

56e977d

add input val to hf_classify_chunks

7b263bc

metadata and alerts etc. for hf_classify_chunks

d6b4305

fix typos in hf_classify_chunks

1e23d54

add texts to failures and successes for the hf_classify_chunks function

f18b46d

add the hf_classify_dev (with encrypted endpoints)

c626c6f

add the hf_get_model_max_length function for ammendments to hf_classi…

1f4af91

…fy and later hf_embed

write inference parameters to metadata in hf_embed_df

46e46a2

add max_length to hf_embed_df and hf_embed_df

add max_length parameter to hf_classify_df and write inference parame…

de64e16

…ters to metadata

remove ... option for args passing in hf_classify_chunks/df

cec839d

move hf_classify_df over to hf_classify_chunks not hf_classify_batch remove old comments from hf_embed_df

Remove max_length from hf_embed_df, and hf_classify_df - the solution…

c8773f0

… is to turn on 'AUTO_TRUNCATE' in the Set up of the endpoint

add the 'hf_get_endpoint_info()` function to retrieve endpoint details

7bcc538

add @returns for hf_get_endpoint_info

ff48329

Update test_embed tests following changes to file writing and arguments

c1bc774

Update README following changes to hf_*_df functions and move to chunks

7f9c7f2

Build README Update NEWS.md

add chunking and updated tests for hf_classify_*, similar to hf_embed_*

af3ae0b

add new functions to _pkgdown.yml and new section for HF utilities

84f9532

update the hugging_face_inference vignette in line with recent change…

88d2ac1

…s to the hf_*_df functions.

update roxygen2 docs for classify

fbf5096

build rd files add comma in embed test add test/dev docs to .Rbuildignore

Re-factor such that chunks doesn't overwrite variable names - writes …

618c173

…to files with the origianl variable names.

jpcompartir merged commit 3531dd6 into main Nov 20, 2025
1 check passed

jpcompartir deleted the refactor/hf_df_functions branch December 3, 2025 17:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor/hf df functions#33

Refactor/hf df functions#33
jpcompartir merged 25 commits intomainfrom
refactor/hf_df_functions

jpcompartir commented Nov 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jpcompartir commented Nov 20, 2025

EndpointR 0.1.2

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant