fix based on comment

tukwila · tukwila · commit 2a2880565b9b · 2025-09-17T18:49:37.000+08:00
Signed-off-by: guangli.bao &lt;guangli.bao@daocloud.io&gt;
diff --git a/contrib/sharegpt_preprocess/README.md b/contrib/sharegpt_preprocess/README.md
@@ -2,6 +2,12 @@
 
 You can use ShareGPT_V3_unfiltered_cleaned_split.json as benchmark datasets.
 
+## Prerequisites
+Before you begin, ensure you have the following installed:
+
+* Python 3.9 or higher
+* pip (Python package manager)
+
 ## Example Commands
 
 Download and prepare the ShareGPT dataset; You can specify the proportion of data to process by providing a number between 0 and 1 as an argument to the script.
diff --git a/contrib/sharegpt_preprocess/preprocessing_sharegpt_data.py b/contrib/sharegpt_preprocess/preprocessing_sharegpt_data.py
@@ -62,8 +62,9 @@ def extract_and_save_with_filtering(file):
                         and
                         # except special characters
                         not re.search(r"[<>{}[\]\\]", prompt_text)
+                        # except pure numbers
                         and not prompt_text.isdigit()
-                    ):  # except pure numbers
+                    ):
                         filtered_prompts.append(
                             {
                                 "from": turn.get("from"),
diff --git a/contrib/sharegpt_preprocess/requirements.txt b/contrib/sharegpt_preprocess/requirements.txt
@@ -1,4 +1,5 @@
-tqdm
-pandas
-openai
-pyyaml
+tqdm==4.67.1
+pandas==2.3.1
+openai==1.99.9
+datasets==4.0.0
+transformers==4.55.4