@@ -127,9 +127,12 @@ vectorizer = KeyphraseCountVectorizer()
127
127
128
128
# Print parameters
129
129
print (vectorizer.get_params())
130
+ ```
131
+ ``` plaintext
130
132
>>> {'binary': False, 'dtype': <class 'numpy.int64'>, 'lowercase': True, 'max_df': None, 'min_df': None, 'pos_pattern': '<J.*>*<N.*>+', 'spacy_exclude': ['parser', 'attribute_ruler', 'lemmatizer', 'ner'], 'spacy_pipeline': 'en_core_web_sm', 'stop_words': 'english', 'workers': 1}
131
133
```
132
134
135
+
133
136
By default, the vectorizer is initialized for the English language. That means, an English ` spacy_pipeline ` is
134
137
specified, English ` stop_words ` are removed, and the ` pos_pattern ` extracts keywords that have 0 or more adjectives,
135
138
followed by 1 or more nouns using the English spaCy part-of-speech tags. In addition, the spaCy pipeline
@@ -255,14 +258,11 @@ vectorizer = KeyphraseTfidfVectorizer()
255
258
256
259
# Print parameters
257
260
print (vectorizer.get_params())
258
- >> > {' binary' : False , ' custom_pos_tagger' : None , ' decay' : None , ' delete_min_df' : None , ' dtype' : <
259
-
260
-
261
- class ' numpy.int64' > , ' lowercase' : True , ' max_df' : None
262
-
263
- , ' min_df' : None , ' pos_pattern' : ' <J.*>*<N.*>+' , ' spacy_exclude' : [' parser' , ' attribute_ruler' , ' lemmatizer' , ' ner' ,
264
- ' textcat' ], ' spacy_pipeline' : ' en_core_web_sm' , ' stop_words' : ' english' , ' workers' : 1 }
265
261
```
262
+ ``` plaintext
263
+ >>> {'binary': False, 'custom_pos_tagger': None, 'decay': None, 'delete_min_df': None, 'dtype': <class 'numpy.int64'>, 'lowercase': True, 'max_df': None, 'min_df': None, 'pos_pattern': '<J.*>*<N.*>+', 'spacy_exclude': ['parser', 'attribute_ruler', 'lemmatizer', 'ner','textcat'], 'spacy_pipeline': 'en_core_web_sm', 'stop_words': 'english', 'workers': 1}
264
+ ```
265
+
266
266
267
267
To calculate tf values instead, set ` use_idf=False ` .
268
268
0 commit comments