Skip to content

Commit ce47b8f

Browse files
authored
v0.2.2 (#48)
* OpenAI args updated 'Model_name' deprecated in favour 'model', making whole a more uniform experience * Unlocked OpenAI parameter Now, use can pass any parameter to openai * Update OpenAI.md * Typo fixes * Create Example OpenAI Response Format.py * Utitlity added - Sanitize prompts A utility that make is simple to sanitize prompts * Utility added - Unpack Json Response * Refactor a bit * Index doc * Create Sanitize Prompt.md * Update utilities.py * Update Sanitize Prompt.md * Create Unpack Json Response.md * Updated documentation * New Utilities - Sanitize prompts, unpack json response * Ruffed * bump version
1 parent a29f66e commit ce47b8f

23 files changed

+805
-48
lines changed

Examples/Example Arrays.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ def main():
2929
provider="openai",
3030
system_prompt="Process these Data rows as per the provided prompt",
3131
options={
32-
"model_name": "gpt-4o-mini",
32+
"model": "gpt-4o-mini",
3333
"temperature": 1,
3434
"max_tokens": 1024,
3535
},

Examples/Example Batch Processing.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ def main():
2525
provider="openai",
2626
system_prompt="Process these Data rows as per the provided prompt",
2727
options={
28-
"model_name": "gpt-4o-mini",
28+
"model": "gpt-4o-mini",
2929
"temperature": 1,
3030
"max_tokens": 2048, # Ensure token limit is set
3131
},

Examples/Example DataFrames.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ def main():
2323
provider="openai",
2424
system_prompt="Process these Data rows as per the provided prompt",
2525
options={
26-
"model_name": "gpt-4o-mini",
26+
"model": "gpt-4o-mini",
2727
"temperature": 1,
2828
"max_tokens": 1024,
2929
},

Examples/Example Excel.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ def main():
2525
provider="openai",
2626
system_prompt="Process these Data rows as per the provided prompt",
2727
options={
28-
"model_name": "gpt-4o-mini",
28+
"model": "gpt-4o-mini",
2929
"temperature": 1,
3030
"max_tokens": 1024,
3131
},
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
"""
2+
Example usage script for OpenAI's response format with unpacking feature.
3+
The utility unpack_json_responses allows user to unpack the json response into various columns.
4+
5+
This utility support - Dataframe, Array, List.
6+
"""
7+
8+
import pandas as pd
9+
from llmworkbook import LLMConfig, LLMRunner, LLMDataFrameIntegrator, unpack_json_responses
10+
from dotenv import load_dotenv
11+
12+
load_dotenv()
13+
14+
15+
def main():
16+
# 1. Create a sample dataframe
17+
data = {
18+
"id": [1, 2, 3, 4, 5],
19+
"prompt_text": [
20+
"Extract key entities (persons, places, organizations) from this text: 'OpenAI, based in San Francisco, is a leading AI research lab founded by Sam Altman and Greg Brockman.'",
21+
"Convert this product description into structured data: 'The iPhone 15 Pro features a 6.1-inch display, A17 Bionic chip, and a titanium body.'",
22+
"Provide a breakdown of this sentence into subject, verb, and object: 'The cat chased the mouse across the room.'",
23+
"Generate a JSON object with three random trivia questions along with their answers.",
24+
"Summarize the given customer review into structured categories like 'sentiment', 'key topics', and 'rating' from this text: 'The camera quality of this phone is fantastic, but the battery life could be better. I would rate it 4 out of 5.'",
25+
],
26+
}
27+
28+
df = pd.DataFrame(data)
29+
30+
# 2. Create an LLM configuration
31+
config = LLMConfig(
32+
provider="openai",
33+
system_prompt="Process these data rows as per the provided prompt. Ensure the response is strictly in JSON format.",
34+
options={
35+
"model": "gpt-4o-mini",
36+
"temperature": 1,
37+
"max_tokens": 1024,
38+
"response_format" : { "type": "json_object" },
39+
},
40+
)
41+
42+
# 3. Instantiate the runner and the integrator
43+
runner = LLMRunner(config)
44+
integrator = LLMDataFrameIntegrator(runner=runner, df=df)
45+
46+
# 4. Add LLM responses to the df
47+
updated_df = integrator.add_llm_responses(
48+
prompt_column="prompt_text", response_column="llm_response", async_mode=True
49+
)
50+
51+
# 5. Unpack JSON responses
52+
updated_df = unpack_json_responses(updated_df)
53+
54+
print("DataFrame with Unpacked LLM responses:\n", updated_df)
55+
56+
updated_df.to_excel("testdf.xlsx")
57+
58+
59+
if __name__ == "__main__":
60+
main()
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
"""
2+
Example usage script for OpenAI's response format.
3+
"""
4+
5+
import pandas as pd
6+
from llmworkbook import LLMConfig, LLMRunner, LLMDataFrameIntegrator
7+
from dotenv import load_dotenv
8+
9+
load_dotenv()
10+
11+
12+
def main():
13+
# 1. Create a sample dataframe
14+
data = {
15+
"id": [1, 2, 3, 4, 5],
16+
"prompt_text": [
17+
"Extract key entities (persons, places, organizations) from this text: 'OpenAI, based in San Francisco, is a leading AI research lab founded by Sam Altman and Greg Brockman.'",
18+
"Convert this product description into structured data: 'The iPhone 15 Pro features a 6.1-inch display, A17 Bionic chip, and a titanium body.'",
19+
"Provide a breakdown of this sentence into subject, verb, and object: 'The cat chased the mouse across the room.'",
20+
"Generate a JSON object with three random trivia questions along with their answers.",
21+
"Summarize the given customer review into structured categories like 'sentiment', 'key topics', and 'rating' from this text: 'The camera quality of this phone is fantastic, but the battery life could be better. I would rate it 4 out of 5.'",
22+
],
23+
}
24+
25+
df = pd.DataFrame(data)
26+
27+
# 2. Create an LLM configuration
28+
config = LLMConfig(
29+
provider="openai",
30+
system_prompt="Process these data rows as per the provided prompt. Ensure the response is strictly in JSON format.",
31+
options={
32+
"model": "gpt-4o-mini",
33+
"temperature": 1,
34+
"max_tokens": 1024,
35+
"response_format" : { "type": "json_object" },
36+
},
37+
)
38+
39+
# 3. Instantiate the runner and the integrator
40+
runner = LLMRunner(config)
41+
integrator = LLMDataFrameIntegrator(runner=runner, df=df)
42+
43+
# 4. Add LLM responses to the df
44+
updated_df = integrator.add_llm_responses(
45+
prompt_column="prompt_text", response_column="llm_response", async_mode=True
46+
)
47+
48+
print("DataFrame with LLM responses:\n", updated_df)
49+
50+
#Print Expected output -
51+
# {
52+
# "subject": "The cat",
53+
# "verb": "chased",
54+
# "object": "the mouse"
55+
# }
56+
57+
if __name__ == "__main__":
58+
main()

Examples/Example OpenAI.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ def main():
2626
provider="openai",
2727
system_prompt="Process these Data rows as per the provided prompt",
2828
options={
29-
"model_name": "gpt-4o-mini",
29+
"model": "gpt-4o-mini",
3030
"temperature": 1,
3131
"max_tokens": 1024,
3232
},

Examples/Example PromptSeries.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ def main():
2626
provider="openai",
2727
system_prompt="Process these prompts",
2828
options={
29-
"model_name": "gpt-4o-mini",
29+
"model": "gpt-4o-mini",
3030
"temperature": 1,
3131
"max_tokens": 1024,
3232
},

Examples/Example Sanitize Prompts.py

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
"""
2+
LLMWORKBOOK provides a easy to use utility function that allow you to quickly clean the prompt inputs.
3+
This utility allows developer to ensure that the prompt passed to wrappers or integrators are secure and clean.
4+
"""
5+
6+
from llmworkbook import sanitize_prompt
7+
8+
prompt : str = "Some placefolder prompt"
9+
10+
#Example 1: Sanitizing a single string prompt
11+
prompt = " Tell me about AI <script>alert('XSS')</script> with [markdown] formatting! "
12+
clean_prompt = sanitize_prompt(prompt)
13+
print(clean_prompt)
14+
# Output: "Tell me about AI alert('XSS') with markdown formatting!"
15+
16+
#Example 2: Sanitizing a list of prompts
17+
prompt_list = [
18+
"Tell me about {Python}",
19+
" <script>alert('XSS')</script> ",
20+
"Analyze the following data: [1, 2, 3, 4, 5]"
21+
]
22+
clean_list = sanitize_prompt(prompt_list)
23+
print(clean_list)
24+
# Output: ['Tell me about Python', "alert('XSS')", 'Analyze the following data: 1, 2, 3, 4, 5']
25+
26+
# Example 3: Sanitizing a pandas DataFrame column
27+
import pandas as pd # noqa: E402
28+
29+
# Create a sample DataFrame with prompts
30+
df = pd.DataFrame({
31+
'user_id': [1, 2, 3],
32+
'prompt': [
33+
"Generate a *summary* of this article",
34+
" <script>malicious code</script> ",
35+
"What is the answer to [4+5]?"
36+
]
37+
})
38+
39+
# Sanitize the 'prompt' column
40+
df['clean_prompt'] = sanitize_prompt(df['prompt'])
41+
print(df)
42+
# Output:
43+
# user_id prompt clean_prompt
44+
# 0 1 Generate a *summary* of this article Generate a summary of this article
45+
# 1 2 <script>malicious code</script> malicious code
46+
# 2 3 What is the answer to [4+5]? What is the answer to 45?
47+
48+
#Example 4: Sanitizing a numpy array
49+
import numpy as np # noqa: E402
50+
51+
# Create an array of prompts
52+
prompt_array = np.array([
53+
"Calculate 2+2=?",
54+
" What is the ```result```? ",
55+
"<script>alert('XSS')</script>"
56+
])
57+
58+
clean_array = sanitize_prompt(prompt_array)
59+
print(clean_array)
60+
61+
62+
#Above clean prompts can be pass through the wrapper or integrator as needed.

README.md

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,14 +21,10 @@
2121

2222
---
2323

24-
🚀 New Feature: Batch & Row-wise Processing
25-
LLMWorkbook v0.2.1 now shows progress bars 🦦:
26-
24+
## LLMWorkbook v0.2.2 new utilities 🦦:
25+
✔ New Utilities - Sanitize prompts, unpack json response
2726
✔ Rich Console Progress bar
28-
✔ Row-wise Processing (Default) – Each row is sent individually to the LLM.
29-
✔ Row-wise Processing (Default) – Each row is sent individually to the LLM.
30-
✔ Batch Processing – Multiple rows are grouped together and sent as one request for efficiency.
31-
✔ Automatic Token Limit Handling – Ensures batch prompts stay within max_tokens limits.
27+
✔ Row-wise or batch-wise Processing – Choose what meets your need.
3228

3329
---
3430

docs/CLI Usage.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ llmworkbook version
3838

3939
- **Test LLM Connectivity:**
4040
```bash
41-
llmworkbook test YOUR_API_KEY --model_name gpt-4
41+
llmworkbook test YOUR_API_KEY --model gpt-4
4242
```
4343

4444
- **See Version of package:**

docs/Providers/OpenAI.md

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Each provider function uses specific keys from the configuration’s `options` d
88

99
**Configuration Keys in `options`:**
1010

11-
- **`model`**
11+
- **`model` (earlier 'model_name')**
1212
- **Type:** `str`
1313
- **Description:** Specifies the model to use for generating responses (e.g., `"gpt-4o-mini"`).
1414
- **Default Behavior:** If not provided, the code defaults to `"gpt-4o-mini"`.
@@ -17,7 +17,23 @@ Each provider function uses specific keys from the configuration’s `options` d
1717
- **Type:** `float` or `int`
1818
- **Description:** Controls the randomness of the output. A higher temperature produces more varied results.
1919

20-
**Additional Configurations (Outside `options`):**
20+
**Additional OpenAI API Parameters**
21+
Description: Any valid OpenAI API parameter (e.g., max_tokens, top_p, frequency_penalty, etc.) can be provided via options. This ensures full control over the API request without modifying the function.
22+
23+
Example -
24+
```
25+
config = {
26+
"options": {
27+
"model": "gpt-4o-mini",
28+
"temperature": 0.7,
29+
#Additional Parameter as needed
30+
"max_tokens": 500,
31+
"top_p": 0.9,
32+
"frequency_penalty": 0.5
33+
#Output format
34+
'response_format' : ...,
35+
},
36+
```
2137

2238
- **`api_key`**
2339
- **Description:** Your OpenAI API key. If not provided in the config, the code will attempt to read it from the environment variable `OPENAI_API_KEY`.

docs/README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Documentation Index
2+
3+
Welcome to the documentation for **LLMWorkbook**. Below is the index of all available documents:
4+
5+
Topics -
6+
- [Batch and Row Processing](Batch%20and%20Row%20Processing.md)
7+
- [CLI Usage](CLI%20Usage.md)
8+
- [Wrapping](wrapping.md)
9+
10+
- [Providers](Providers)
11+
- - [GPTALL](Providers/Gpt4All.md)
12+
- - [Ollama](Providers/Ollama.md)
13+
- - [OpenAI](Providers/OpenAI.md)
14+
15+
- [Utilities](Utilities)
16+
- - [Sanitize Prompt](Utilities/Sanitize%20Prompt.md)
17+
- - [Unpack Json Response](Utilities/Unpack%20Json%20Response.md)
18+
19+

0 commit comments

Comments
 (0)