In the original implementation ZoeDepth model performs inference on both the original and flipped images and averages out the results. The post_process_depth_estimation
function can handle this for us by passing the flipped outputs to the optional outputs_flipped
argument:
>>> with torch.no_grad():
-... outputs = model(pixel_values)
-... outputs_flipped = model(pixel_values=torch.flip(inputs.pixel_values, dims=[3]))
->>> post_processed_output = image_processor.post_process_depth_estimation(
-... outputs,
-... source_sizes=[(image.height, image.width)],
-... outputs_flipped=outputs_flipped,
-... )
-
-In the original implementation ZoeDepth model performs inference on both the original and flipped images and averages out the results. The post_process_depth_estimation
function can handle this for us by passing the flipped outputs to the optional outputs_flipped
argument:
>>> with torch.no_grad():
+> ... outputs = model(pixel_values)
+> ... outputs_flipped = model(pixel_values=torch.flip(inputs.pixel_values, dims=[3]))
+> >>> post_processed_output = image_processor.post_process_depth_estimation(
+> ... outputs,
+> ... source_sizes=[(image.height, image.width)],
+> ... outputs_flipped=outputs_flipped,
+> ... )
+>
"`): Prefix token used for infilling. diff --git a/src/transformers/models/codegen/tokenization_codegen.py b/src/transformers/models/codegen/tokenization_codegen.py index 4d08c6acd5bb..6ecc575b8530 100644 --- a/src/transformers/models/codegen/tokenization_codegen.py +++ b/src/transformers/models/codegen/tokenization_codegen.py @@ -99,11 +99,8 @@ class CodeGenTokenizer(PreTrainedTokenizer): You can get around that behavior by passing `add_prefix_space=True` when instantiating this tokenizer or when you call it on some text, but since the model was not pretrained this way, it might yield a decrease in performance. -- - When used with `is_split_into_words=True`, this tokenizer will add a space before each word (even the first one). - - + > [!TIP] + > When used with `is_split_into_words=True`, this tokenizer will add a space before each word (even the first one). This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods. diff --git a/src/transformers/models/codegen/tokenization_codegen_fast.py b/src/transformers/models/codegen/tokenization_codegen_fast.py index 72c8d66c829a..08835c3f845e 100644 --- a/src/transformers/models/codegen/tokenization_codegen_fast.py +++ b/src/transformers/models/codegen/tokenization_codegen_fast.py @@ -59,11 +59,8 @@ class CodeGenTokenizerFast(PreTrainedTokenizerFast): You can get around that behavior by passing `add_prefix_space=True` when instantiating this tokenizer, but since the model was not pretrained this way, it might yield a decrease in performance. -- - When used with `is_split_into_words=True`, this tokenizer needs to be instantiated with `add_prefix_space=True`. - - + > [!TIP] + > When used with `is_split_into_words=True`, this tokenizer needs to be instantiated with `add_prefix_space=True`. This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods. diff --git a/src/transformers/models/cohere/tokenization_cohere_fast.py b/src/transformers/models/cohere/tokenization_cohere_fast.py index 8072cbe7c17c..7cf9beca7237 100644 --- a/src/transformers/models/cohere/tokenization_cohere_fast.py +++ b/src/transformers/models/cohere/tokenization_cohere_fast.py @@ -66,11 +66,8 @@ class CohereTokenizerFast(PreTrainedTokenizerFast): You can get around that behavior by passing `add_prefix_space=True` when instantiating this tokenizer, but since the model was not pretrained this way, it might yield a decrease in performance. -- - When used with `is_split_into_words=True`, this tokenizer needs to be instantiated with `add_prefix_space=True`. - - + > [!TIP] + > When used with `is_split_into_words=True`, this tokenizer needs to be instantiated with `add_prefix_space=True`. This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods. diff --git a/src/transformers/models/cpm/tokenization_cpm.py b/src/transformers/models/cpm/tokenization_cpm.py index 5ecfedd0a614..c15c9080692c 100644 --- a/src/transformers/models/cpm/tokenization_cpm.py +++ b/src/transformers/models/cpm/tokenization_cpm.py @@ -75,22 +75,16 @@ def __init__( The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `""`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the end of - sequence. The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of + > sequence. The token used is the `sep_token`. unk_token (`str`, *optional*, defaults to `""`): The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be diff --git a/src/transformers/models/cpm/tokenization_cpm_fast.py b/src/transformers/models/cpm/tokenization_cpm_fast.py index 3e828ca9e0b5..77ec8b781d4f 100644 --- a/src/transformers/models/cpm/tokenization_cpm_fast.py +++ b/src/transformers/models/cpm/tokenization_cpm_fast.py @@ -68,22 +68,16 @@ def __init__( The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. - - - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `""`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the end of - sequence. The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of + > sequence. The token used is the `sep_token`. unk_token (`str`, *optional*, defaults to `""`): The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be diff --git a/src/transformers/models/csm/generation_csm.py b/src/transformers/models/csm/generation_csm.py index cf8bc141f5d1..63a3f0c6eddf 100644 --- a/src/transformers/models/csm/generation_csm.py +++ b/src/transformers/models/csm/generation_csm.py @@ -356,12 +356,10 @@ def generate( 3. Use these generated codebook tokens as `input_ids` to sample the next first codebook token using the backbone model 4. Repeat until stopping criteria is met - - - Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the - model's default generation configuration. You can override any `generation_config` by passing the corresponding - parameters to generate(), e.g. `.generate(inputs, do_sample=True)`. - + > [!WARNING] + > Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the + > model's default generation configuration. You can override any `generation_config` by passing the corresponding + > parameters to generate(), e.g. `.generate(inputs, do_sample=True)`. Parameters: inputs_ids (`torch.Tensor` of shape (batch_size, seq_length), *optional*): diff --git a/src/transformers/models/deberta/tokenization_deberta.py b/src/transformers/models/deberta/tokenization_deberta.py index 74e958c8030b..159d9261dfab 100644 --- a/src/transformers/models/deberta/tokenization_deberta.py +++ b/src/transformers/models/deberta/tokenization_deberta.py @@ -90,11 +90,8 @@ class DebertaTokenizer(PreTrainedTokenizer): You can get around that behavior by passing `add_prefix_space=True` when instantiating this tokenizer or when you call it on some text, but since the model was not pretrained this way, it might yield a decrease in performance. -- - When used with `is_split_into_words=True`, this tokenizer will add a space before each word (even the first one). - - + > [!TIP] + > When used with `is_split_into_words=True`, this tokenizer will add a space before each word (even the first one). This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods. diff --git a/src/transformers/models/deberta/tokenization_deberta_fast.py b/src/transformers/models/deberta/tokenization_deberta_fast.py index c2f2e6552d9d..5775169c91bd 100644 --- a/src/transformers/models/deberta/tokenization_deberta_fast.py +++ b/src/transformers/models/deberta/tokenization_deberta_fast.py @@ -49,11 +49,8 @@ class DebertaTokenizerFast(PreTrainedTokenizerFast): You can get around that behavior by passing `add_prefix_space=True` when instantiating this tokenizer, but since the model was not pretrained this way, it might yield a decrease in performance. -- - When used with `is_split_into_words=True`, this tokenizer needs to be instantiated with `add_prefix_space=True`. - - + > [!TIP] + > When used with `is_split_into_words=True`, this tokenizer needs to be instantiated with `add_prefix_space=True`. This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods. diff --git a/src/transformers/models/deit/modeling_deit.py b/src/transformers/models/deit/modeling_deit.py index 8c9b7e89ecd8..74e8508dbc09 100644 --- a/src/transformers/models/deit/modeling_deit.py +++ b/src/transformers/models/deit/modeling_deit.py @@ -495,12 +495,9 @@ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor: custom_intro=""" DeiT Model with a decoder on top for masked image modeling, as proposed in [SimMIM](https://huggingface.co/papers/2111.09886). -- - Note that we provide a script to pre-train this model on custom data in our [examples - directory](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining). - - + > [!TIP] + > Note that we provide a script to pre-train this model on custom data in our [examples + > directory](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining). """ ) class DeiTForMaskedImageModeling(DeiTPreTrainedModel): diff --git a/src/transformers/models/deprecated/efficientformer/modeling_efficientformer.py b/src/transformers/models/deprecated/efficientformer/modeling_efficientformer.py index 2167df912d87..f1901c65911b 100644 --- a/src/transformers/models/deprecated/efficientformer/modeling_efficientformer.py +++ b/src/transformers/models/deprecated/efficientformer/modeling_efficientformer.py @@ -704,12 +704,9 @@ class token). state of the [CLS] token and a linear layer on top of the final hidden state of the distillation token) e.g. for ImageNet. -- - This model supports inference-only. Fine-tuning with distillation (i.e. with a teacher) is not yet - supported. - - + > [!WARNING] + > This model supports inference-only. Fine-tuning with distillation (i.e. with a teacher) is not yet + > supported. """, EFFICIENTFORMER_START_DOCSTRING, ) diff --git a/src/transformers/models/deprecated/jukebox/tokenization_jukebox.py b/src/transformers/models/deprecated/jukebox/tokenization_jukebox.py index 473d23d49565..825e550e6d8c 100644 --- a/src/transformers/models/deprecated/jukebox/tokenization_jukebox.py +++ b/src/transformers/models/deprecated/jukebox/tokenization_jukebox.py @@ -63,11 +63,8 @@ class JukeboxTokenizer(PreTrainedTokenizer): You can get around that behavior by passing `add_prefix_space=True` when instantiating this tokenizer or when you call it on some text, but since the model was not pretrained this way, it might yield a decrease in performance. -- - If nothing is provided, the genres and the artist will either be selected randomly or set to None - - + > [!TIP] + > If nothing is provided, the genres and the artist will either be selected randomly or set to None This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods. Users should refer to: this superclass for more information regarding those methods. diff --git a/src/transformers/models/deprecated/tapex/tokenization_tapex.py b/src/transformers/models/deprecated/tapex/tokenization_tapex.py index fa74d8aa3b55..6c332e01cbc7 100644 --- a/src/transformers/models/deprecated/tapex/tokenization_tapex.py +++ b/src/transformers/models/deprecated/tapex/tokenization_tapex.py @@ -205,22 +205,16 @@ class TapexTokenizer(PreTrainedTokenizer): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for @@ -678,11 +672,8 @@ def batch_encode_plus( **kwargs, ) -> BatchEncoding: """ -- - This method is deprecated, `__call__` should be used instead. - - + > [!WARNING] + > This method is deprecated, `__call__` should be used instead. """ # Backward compatibility for 'truncation_strategy', 'pad_to_max_length' padding_strategy, truncation_strategy, max_length, kwargs = self._get_padding_truncation_strategies( diff --git a/src/transformers/models/deprecated/tvlt/feature_extraction_tvlt.py b/src/transformers/models/deprecated/tvlt/feature_extraction_tvlt.py index 3c65f4314616..66984515ceae 100644 --- a/src/transformers/models/deprecated/tvlt/feature_extraction_tvlt.py +++ b/src/transformers/models/deprecated/tvlt/feature_extraction_tvlt.py @@ -139,12 +139,9 @@ def __call__( Whether to return the attention mask. If left to the default, will return the attention mask according to the specific feature_extractor's default. [What are attention masks?](../glossary#attention-mask) -- - For TvltTransformer models, `attention_mask` should always be passed for batched inference, to avoid - subtle bugs. - - + > [!TIP] + > For TvltTransformer models, `attention_mask` should always be passed for batched inference, to avoid + > subtle bugs. sampling_rate (`int`, *optional*): The sampling rate at which the `raw_speech` input was sampled. It is strongly recommended to pass diff --git a/src/transformers/models/deprecated/xlm_prophetnet/tokenization_xlm_prophetnet.py b/src/transformers/models/deprecated/xlm_prophetnet/tokenization_xlm_prophetnet.py index 77431b13c49f..9defb6ab6cd4 100644 --- a/src/transformers/models/deprecated/xlm_prophetnet/tokenization_xlm_prophetnet.py +++ b/src/transformers/models/deprecated/xlm_prophetnet/tokenization_xlm_prophetnet.py @@ -54,22 +54,16 @@ class XLMProphetNetTokenizer(PreTrainedTokenizer): bos_token (`str`, *optional*, defaults to `"[SEP]"`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"[SEP]"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `"[SEP]"`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/donut/modeling_donut_swin.py b/src/transformers/models/donut/modeling_donut_swin.py index d388e386ae49..4b2ce8c9c3a5 100644 --- a/src/transformers/models/donut/modeling_donut_swin.py +++ b/src/transformers/models/donut/modeling_donut_swin.py @@ -919,13 +919,10 @@ def forward( DonutSwin Model transformer with an image classification head on top (a linear layer on top of the final hidden state of the [CLS] token) e.g. for ImageNet. -- - Note that it's possible to fine-tune DonutSwin on higher resolution images than the ones it has been trained on, by - setting `interpolate_pos_encoding` to `True` in the forward of the model. This will interpolate the pre-trained - position embeddings to the higher resolution. - - + > [!TIP] + > Note that it's possible to fine-tune DonutSwin on higher resolution images than the ones it has been trained on, by + > setting `interpolate_pos_encoding` to `True` in the forward of the model. This will interpolate the pre-trained + > position embeddings to the higher resolution. """ ) # Copied from transformers.models.swin.modeling_swin.SwinForImageClassification with Swin->DonutSwin,swin->donut diff --git a/src/transformers/models/dots1/modeling_dots1.py.bak b/src/transformers/models/dots1/modeling_dots1.py.bak new file mode 100644 index 000000000000..26fdc9f76ce4 --- /dev/null +++ b/src/transformers/models/dots1/modeling_dots1.py.bak @@ -0,0 +1,614 @@ +# 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨 +# This file was automatically generated from src/transformers/models/dots1/modular_dots1.py. +# Do NOT edit this file manually as any edits will be overwritten by the generation of +# the file from the modular. If any change should be done, please apply the change to the +# modular_dots1.py file directly. One of our CI enforces this. +# 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨 +# coding=utf-8 +# Copyright 2025 The rednote-hilab team and the HuggingFace Inc. team. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +from typing import Callable, Optional, Union + +import torch +import torch.nn.functional as F +from torch import nn + +from ...activations import ACT2FN +from ...cache_utils import Cache, DynamicCache +from ...generation import GenerationMixin +from ...integrations import use_kernel_forward_from_hub +from ...masking_utils import create_causal_mask, create_sliding_window_causal_mask +from ...modeling_flash_attention_utils import FlashAttentionKwargs +from ...modeling_layers import GradientCheckpointingLayer +from ...modeling_outputs import BaseModelOutputWithPast, CausalLMOutputWithPast +from ...modeling_rope_utils import ROPE_INIT_FUNCTIONS, dynamic_rope_update +from ...modeling_utils import ALL_ATTENTION_FUNCTIONS, PreTrainedModel +from ...processing_utils import Unpack +from ...utils import TransformersKwargs, auto_docstring, can_return_tuple +from ...utils.generic import check_model_inputs +from .configuration_dots1 import Dots1Config + + +@use_kernel_forward_from_hub("RMSNorm") +class Dots1RMSNorm(nn.Module): + def __init__(self, hidden_size, eps=1e-6): + """ + Dots1RMSNorm is equivalent to T5LayerNorm + """ + super().__init__() + self.weight = nn.Parameter(torch.ones(hidden_size)) + self.variance_epsilon = eps + + def forward(self, hidden_states): + input_dtype = hidden_states.dtype + hidden_states = hidden_states.to(torch.float32) + variance = hidden_states.pow(2).mean(-1, keepdim=True) + hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon) + return self.weight * hidden_states.to(input_dtype) + + def extra_repr(self): + return f"{tuple(self.weight.shape)}, eps={self.variance_epsilon}" + + +class Dots1RotaryEmbedding(nn.Module): + def __init__(self, config: Dots1Config, device=None): + super().__init__() + # BC: "rope_type" was originally "type" + if hasattr(config, "rope_scaling") and isinstance(config.rope_scaling, dict): + self.rope_type = config.rope_scaling.get("rope_type", config.rope_scaling.get("type")) + else: + self.rope_type = "default" + self.max_seq_len_cached = config.max_position_embeddings + self.original_max_seq_len = config.max_position_embeddings + + self.config = config + self.rope_init_fn = ROPE_INIT_FUNCTIONS[self.rope_type] + + inv_freq, self.attention_scaling = self.rope_init_fn(self.config, device) + self.register_buffer("inv_freq", inv_freq, persistent=False) + self.original_inv_freq = self.inv_freq + + @torch.no_grad() + @dynamic_rope_update # power user: used with advanced RoPE types (e.g. dynamic rope) + def forward(self, x, position_ids): + inv_freq_expanded = self.inv_freq[None, :, None].float().expand(position_ids.shape[0], -1, 1).to(x.device) + position_ids_expanded = position_ids[:, None, :].float() + + device_type = x.device.type if isinstance(x.device.type, str) and x.device.type != "mps" else "cpu" + with torch.autocast(device_type=device_type, enabled=False): # Force float32 + freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2) + emb = torch.cat((freqs, freqs), dim=-1) + cos = emb.cos() * self.attention_scaling + sin = emb.sin() * self.attention_scaling + + return cos.to(dtype=x.dtype), sin.to(dtype=x.dtype) + + +def rotate_half(x): + """Rotates half the hidden dims of the input.""" + x1 = x[..., : x.shape[-1] // 2] + x2 = x[..., x.shape[-1] // 2 :] + return torch.cat((-x2, x1), dim=-1) + + +def apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1): + """Applies Rotary Position Embedding to the query and key tensors. + + Args: + q (`torch.Tensor`): The query tensor. + k (`torch.Tensor`): The key tensor. + cos (`torch.Tensor`): The cosine part of the rotary embedding. + sin (`torch.Tensor`): The sine part of the rotary embedding. + position_ids (`torch.Tensor`, *optional*): + Deprecated and unused. + unsqueeze_dim (`int`, *optional*, defaults to 1): + The 'unsqueeze_dim' argument specifies the dimension along which to unsqueeze cos[position_ids] and + sin[position_ids] so that they can be properly broadcasted to the dimensions of q and k. For example, note + that cos[position_ids] and sin[position_ids] have the shape [batch_size, seq_len, head_dim]. Then, if q and + k have the shape [batch_size, heads, seq_len, head_dim], then setting unsqueeze_dim=1 makes + cos[position_ids] and sin[position_ids] broadcastable to the shapes of q and k. Similarly, if q and k have + the shape [batch_size, seq_len, heads, head_dim], then set unsqueeze_dim=2. + Returns: + `tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding. + """ + cos = cos.unsqueeze(unsqueeze_dim) + sin = sin.unsqueeze(unsqueeze_dim) + q_embed = (q * cos) + (rotate_half(q) * sin) + k_embed = (k * cos) + (rotate_half(k) * sin) + return q_embed, k_embed + + +def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor: + """ + This is the equivalent of torch.repeat_interleave(x, dim=1, repeats=n_rep). The hidden states go from (batch, + num_key_value_heads, seqlen, head_dim) to (batch, num_attention_heads, seqlen, head_dim) + """ + batch, num_key_value_heads, slen, head_dim = hidden_states.shape + if n_rep == 1: + return hidden_states + hidden_states = hidden_states[:, :, None, :, :].expand(batch, num_key_value_heads, n_rep, slen, head_dim) + return hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim) + + +def eager_attention_forward( + module: nn.Module, + query: torch.Tensor, + key: torch.Tensor, + value: torch.Tensor, + attention_mask: Optional[torch.Tensor], + scaling: float, + dropout: float = 0.0, + **kwargs: Unpack[TransformersKwargs], +): + key_states = repeat_kv(key, module.num_key_value_groups) + value_states = repeat_kv(value, module.num_key_value_groups) + + attn_weights = torch.matmul(query, key_states.transpose(2, 3)) * scaling + if attention_mask is not None: + causal_mask = attention_mask[:, :, :, : key_states.shape[-2]] + attn_weights = attn_weights + causal_mask + + attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query.dtype) + attn_weights = nn.functional.dropout(attn_weights, p=dropout, training=module.training) + attn_output = torch.matmul(attn_weights, value_states) + attn_output = attn_output.transpose(1, 2).contiguous() + + return attn_output, attn_weights + + +class Dots1Attention(nn.Module): + """Multi-headed attention from 'Attention Is All You Need' paper""" + + def __init__(self, config: Dots1Config, layer_idx: int): + super().__init__() + self.config = config + self.layer_idx = layer_idx + self.head_dim = getattr(config, "head_dim", config.hidden_size // config.num_attention_heads) + self.num_key_value_groups = config.num_attention_heads // config.num_key_value_heads + self.scaling = self.head_dim**-0.5 + self.attention_dropout = config.attention_dropout + self.is_causal = True + + self.q_proj = nn.Linear( + config.hidden_size, config.num_attention_heads * self.head_dim, bias=config.attention_bias + ) + self.k_proj = nn.Linear( + config.hidden_size, config.num_key_value_heads * self.head_dim, bias=config.attention_bias + ) + self.v_proj = nn.Linear( + config.hidden_size, config.num_key_value_heads * self.head_dim, bias=config.attention_bias + ) + self.o_proj = nn.Linear( + config.num_attention_heads * self.head_dim, config.hidden_size, bias=config.attention_bias + ) + self.q_norm = Dots1RMSNorm(self.head_dim, eps=config.rms_norm_eps) # unlike olmo, only on the head dim! + self.k_norm = Dots1RMSNorm(self.head_dim, eps=config.rms_norm_eps) # thus post q_norm does not need reshape + self.sliding_window = config.sliding_window if config.layer_types[layer_idx] == "sliding_attention" else None + + def forward( + self, + hidden_states: torch.Tensor, + position_embeddings: tuple[torch.Tensor, torch.Tensor], + attention_mask: Optional[torch.Tensor], + past_key_value: Optional[Cache] = None, + cache_position: Optional[torch.LongTensor] = None, + **kwargs: Unpack[FlashAttentionKwargs], + ) -> tuple[torch.Tensor, Optional[torch.Tensor], Optional[tuple[torch.Tensor]]]: + input_shape = hidden_states.shape[:-1] + hidden_shape = (*input_shape, -1, self.head_dim) + + query_states = self.q_norm(self.q_proj(hidden_states).view(hidden_shape)).transpose(1, 2) + key_states = self.k_norm(self.k_proj(hidden_states).view(hidden_shape)).transpose(1, 2) + value_states = self.v_proj(hidden_states).view(hidden_shape).transpose(1, 2) + + cos, sin = position_embeddings + query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin) + + if past_key_value is not None: + # sin and cos are specific to RoPE models; cache_position needed for the static cache + cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position} + key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs) + + attention_interface: Callable = eager_attention_forward + if self.config._attn_implementation != "eager": + attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation] + + attn_output, attn_weights = attention_interface( + self, + query_states, + key_states, + value_states, + attention_mask, + dropout=0.0 if not self.training else self.attention_dropout, + scaling=self.scaling, + sliding_window=self.sliding_window, # diff with Llama + **kwargs, + ) + + attn_output = attn_output.reshape(*input_shape, -1).contiguous() + attn_output = self.o_proj(attn_output) + return attn_output, attn_weights + + +class Dots1MLP(nn.Module): + def __init__(self, config, hidden_size=None, intermediate_size=None): + super().__init__() + self.config = config + self.hidden_size = config.hidden_size if hidden_size is None else hidden_size + self.intermediate_size = config.intermediate_size if intermediate_size is None else intermediate_size + + self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False) + self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False) + self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False) + self.act_fn = ACT2FN[config.hidden_act] + + def forward(self, x): + down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x)) + return down_proj + + +class Dots1MoE(nn.Module): + """ + A mixed expert module containing shared experts. + """ + + def __init__(self, config): + super().__init__() + self.config = config + self.experts = nn.ModuleList( + [Dots1MLP(config, intermediate_size=config.moe_intermediate_size) for _ in range(config.n_routed_experts)] + ) + self.gate = Dots1TopkRouter(config) + self.shared_experts = Dots1MLP( + config=config, intermediate_size=config.moe_intermediate_size * config.n_shared_experts + ) + + def moe(self, hidden_states: torch.Tensor, topk_indices: torch.Tensor, topk_weights: torch.Tensor): + r""" + CALL FOR CONTRIBUTION! I don't have time to optimise this right now, but expert weights need to be fused + to not have to do a loop here (deepseek has 256 experts soooo yeah). + """ + final_hidden_states = torch.zeros_like(hidden_states, dtype=topk_weights.dtype) + expert_mask = torch.nn.functional.one_hot(topk_indices, num_classes=len(self.experts)) + expert_mask = expert_mask.permute(2, 0, 1) + + for expert_idx in range(len(self.experts)): + expert = self.experts[expert_idx] + mask = expert_mask[expert_idx] + token_indices, weight_indices = torch.where(mask) + + if token_indices.numel() > 0: + expert_weights = topk_weights[token_indices, weight_indices] + expert_input = hidden_states[token_indices] + expert_output = expert(expert_input) + weighted_output = expert_output * expert_weights.unsqueeze(-1) + final_hidden_states.index_add_(0, token_indices, weighted_output) + + # in original deepseek, the output of the experts are gathered once we leave this module + # thus the moe module is itelsf an IsolatedParallel module + # and all expert are "local" meaning we shard but we don't gather + return final_hidden_states.type(hidden_states.dtype) + + def forward(self, hidden_states): + residuals = hidden_states + orig_shape = hidden_states.shape + topk_indices, topk_weights = self.gate(hidden_states) + hidden_states = hidden_states.view(-1, hidden_states.shape[-1]) + hidden_states = self.moe(hidden_states, topk_indices, topk_weights).view(*orig_shape) + hidden_states = hidden_states + self.shared_experts(residuals) + return hidden_states + + +class Dots1TopkRouter(nn.Module): + def __init__(self, config): + super().__init__() + self.config = config + self.top_k = config.num_experts_per_tok + self.n_routed_experts = config.n_routed_experts + self.routed_scaling_factor = config.routed_scaling_factor + self.n_group = config.n_group + self.topk_group = config.topk_group + self.norm_topk_prob = config.norm_topk_prob + + self.weight = nn.Parameter(torch.empty((self.n_routed_experts, config.hidden_size))) + self.register_buffer("e_score_correction_bias", torch.zeros(self.n_routed_experts)) + + @torch.no_grad() + def get_topk_indices(self, scores): + scores_for_choice = scores.view(-1, self.n_routed_experts) + self.e_score_correction_bias.unsqueeze(0) + group_scores = ( + scores_for_choice.view(-1, self.n_group, self.n_routed_experts // self.n_group) + .topk(2, dim=-1)[0] + .sum(dim=-1) + ) + group_idx = torch.topk(group_scores, k=self.topk_group, dim=-1, sorted=False)[1] + group_mask = torch.zeros_like(group_scores) + group_mask.scatter_(1, group_idx, 1) + score_mask = ( + group_mask.unsqueeze(-1) + .expand(-1, self.n_group, self.n_routed_experts // self.n_group) + .reshape(-1, self.n_routed_experts) + ) + scores_for_choice = scores_for_choice.masked_fill(~score_mask.bool(), 0.0) + topk_indices = torch.topk(scores_for_choice, k=self.top_k, dim=-1, sorted=False)[1] + return topk_indices + + def forward(self, hidden_states): + hidden_states = hidden_states.view(-1, self.config.hidden_size) + router_logits = F.linear(hidden_states.type(torch.float32), self.weight.type(torch.float32)) + scores = router_logits.sigmoid() + topk_indices = self.get_topk_indices(scores) + topk_weights = scores.gather(1, topk_indices) + if self.norm_topk_prob: + denominator = topk_weights.sum(dim=-1, keepdim=True) + 1e-20 + topk_weights /= denominator + topk_weights = topk_weights * self.routed_scaling_factor + return topk_indices, topk_weights + + +class Dots1DecoderLayer(GradientCheckpointingLayer): + def __init__(self, config: Dots1Config, layer_idx: int): + super().__init__() + self.hidden_size = config.hidden_size + + self.self_attn = Dots1Attention(config=config, layer_idx=layer_idx) + + if layer_idx >= config.first_k_dense_replace: + self.mlp = Dots1MoE(config) + else: + self.mlp = Dots1MLP(config) + + self.input_layernorm = Dots1RMSNorm(config.hidden_size, eps=config.rms_norm_eps) + self.post_attention_layernorm = Dots1RMSNorm(config.hidden_size, eps=config.rms_norm_eps) + self.attention_type = config.layer_types[layer_idx] + + def forward( + self, + hidden_states: torch.Tensor, + attention_mask: Optional[torch.Tensor] = None, + position_ids: Optional[torch.LongTensor] = None, + past_key_value: Optional[Cache] = None, + use_cache: Optional[bool] = False, + cache_position: Optional[torch.LongTensor] = None, + position_embeddings: Optional[tuple[torch.Tensor, torch.Tensor]] = None, # necessary, but kept here for BC + **kwargs: Unpack[TransformersKwargs], + ) -> tuple[torch.Tensor]: + residual = hidden_states + hidden_states = self.input_layernorm(hidden_states) + # Self Attention + hidden_states, _ = self.self_attn( + hidden_states=hidden_states, + attention_mask=attention_mask, + position_ids=position_ids, + past_key_value=past_key_value, + use_cache=use_cache, + cache_position=cache_position, + position_embeddings=position_embeddings, + **kwargs, + ) + hidden_states = residual + hidden_states + + # Fully Connected + residual = hidden_states + hidden_states = self.post_attention_layernorm(hidden_states) + hidden_states = self.mlp(hidden_states) + hidden_states = residual + hidden_states + return hidden_states + + +@auto_docstring +class Dots1PreTrainedModel(PreTrainedModel): + config: Dots1Config + base_model_prefix = "model" + supports_gradient_checkpointing = True + _no_split_modules = ["Dots1DecoderLayer"] + _skip_keys_device_placement = ["past_key_values"] + _supports_flash_attn = True + _supports_sdpa = True + _supports_flex_attn = True + + _can_compile_fullgraph = True + _supports_attention_backend = True + _can_record_outputs = { + "hidden_states": Dots1DecoderLayer, + "attentions": Dots1Attention, + } + + def _init_weights(self, module): + super()._init_weights(module) + if isinstance(module, Dots1TopkRouter): + module.weight.data.normal_(mean=0.0, std=self.config.initializer_range) + + +@auto_docstring +class Dots1Model(Dots1PreTrainedModel): + def __init__(self, config: Dots1Config): + super().__init__(config) + self.padding_idx = config.pad_token_id + self.vocab_size = config.vocab_size + + self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx) + self.layers = nn.ModuleList( + [Dots1DecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)] + ) + self.norm = Dots1RMSNorm(config.hidden_size, eps=config.rms_norm_eps) + self.rotary_emb = Dots1RotaryEmbedding(config=config) + self.gradient_checkpointing = False + self.has_sliding_layers = "sliding_attention" in self.config.layer_types + + # Initialize weights and apply final processing + self.post_init() + + @check_model_inputs + @auto_docstring + def forward( + self, + input_ids: Optional[torch.LongTensor] = None, + attention_mask: Optional[torch.Tensor] = None, + position_ids: Optional[torch.LongTensor] = None, + past_key_values: Optional[Cache] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + use_cache: Optional[bool] = None, + cache_position: Optional[torch.LongTensor] = None, + **kwargs: Unpack[TransformersKwargs], + ) -> BaseModelOutputWithPast: + if (input_ids is None) ^ (inputs_embeds is not None): + raise ValueError("You must specify exactly one of input_ids or inputs_embeds") + + if inputs_embeds is None: + inputs_embeds = self.embed_tokens(input_ids) + + if use_cache and past_key_values is None: + past_key_values = DynamicCache() + + if cache_position is None: + past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0 + cache_position = torch.arange( + past_seen_tokens, past_seen_tokens + inputs_embeds.shape[1], device=inputs_embeds.device + ) + + if position_ids is None: + position_ids = cache_position.unsqueeze(0) + + # It may already have been prepared by e.g. `generate` + if not isinstance(causal_mask_mapping := attention_mask, dict): + # Prepare mask arguments + mask_kwargs = { + "config": self.config, + "input_embeds": inputs_embeds, + "attention_mask": attention_mask, + "cache_position": cache_position, + "past_key_values": past_key_values, + "position_ids": position_ids, + } + # Create the masks + causal_mask_mapping = { + "full_attention": create_causal_mask(**mask_kwargs), + } + # The sliding window alternating layers are not always activated depending on the config + if self.has_sliding_layers: + causal_mask_mapping["sliding_attention"] = create_sliding_window_causal_mask(**mask_kwargs) + + hidden_states = inputs_embeds + + # create position embeddings to be shared across the decoder layers + position_embeddings = self.rotary_emb(hidden_states, position_ids) + + for decoder_layer in self.layers[: self.config.num_hidden_layers]: + hidden_states = decoder_layer( + hidden_states, + attention_mask=causal_mask_mapping[decoder_layer.attention_type], + position_ids=position_ids, + past_key_value=past_key_values, + use_cache=use_cache, + cache_position=cache_position, + position_embeddings=position_embeddings, + **kwargs, + ) + + hidden_states = self.norm(hidden_states) + return BaseModelOutputWithPast( + last_hidden_state=hidden_states, + past_key_values=past_key_values if use_cache else None, + ) + + +@auto_docstring +class Dots1ForCausalLM(Dots1PreTrainedModel, GenerationMixin): + _tied_weights_keys = ["lm_head.weight"] + _tp_plan = {"lm_head": "colwise_rep"} + _pp_plan = {"lm_head": (["hidden_states"], ["logits"])} + + def __init__(self, config): + super().__init__(config) + self.model = Dots1Model(config) + self.vocab_size = config.vocab_size + self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False) + + # Initialize weights and apply final processing + self.post_init() + + def set_decoder(self, decoder): + self.model = decoder + + def get_decoder(self): + return self.model + + @can_return_tuple + @auto_docstring + def forward( + self, + input_ids: Optional[torch.LongTensor] = None, + attention_mask: Optional[torch.Tensor] = None, + position_ids: Optional[torch.LongTensor] = None, + past_key_values: Optional[Cache] = None, + inputs_embeds: Optional[torch.FloatTensor] = None, + labels: Optional[torch.LongTensor] = None, + use_cache: Optional[bool] = None, + cache_position: Optional[torch.LongTensor] = None, + logits_to_keep: Union[int, torch.Tensor] = 0, + **kwargs: Unpack[TransformersKwargs], + ) -> CausalLMOutputWithPast: + r""" + labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*): + Labels for computing the masked language modeling loss. Indices should either be in `[0, ..., + config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored + (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`. + + Example: + + ```python + >>> from transformers import AutoTokenizer, Dots1ForCausalLM + + >>> model = Dots1ForCausalLM.from_pretrained("rednote-hilab/dots1.llm1.inst") + >>> tokenizer = AutoTokenizer.from_pretrained("rednote-hilab/dots1.llm1.inst") + + >>> prompt = "Hey, are you conscious? Can you talk to me?" + >>> inputs = tokenizer(prompt, return_tensors="pt") + + >>> # Generate + >>> generate_ids = model.generate(inputs.input_ids, max_length=30) + >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0] + "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you." + ```""" + outputs: BaseModelOutputWithPast = self.model( + input_ids=input_ids, + attention_mask=attention_mask, + position_ids=position_ids, + past_key_values=past_key_values, + inputs_embeds=inputs_embeds, + use_cache=use_cache, + cache_position=cache_position, + **kwargs, + ) + + hidden_states = outputs.last_hidden_state + # Only compute necessary logits, and do not upcast them to float if we are not computing the loss + slice_indices = slice(-logits_to_keep, None) if isinstance(logits_to_keep, int) else logits_to_keep + logits = self.lm_head(hidden_states[:, slice_indices, :]) + + loss = None + if labels is not None: + loss = self.loss_function(logits=logits, labels=labels, vocab_size=self.config.vocab_size, **kwargs) + + return CausalLMOutputWithPast( + loss=loss, + logits=logits, + past_key_values=outputs.past_key_values, + hidden_states=outputs.hidden_states, + attentions=outputs.attentions, + ) + + +__all__ = ["Dots1PreTrainedModel", "Dots1Model", "Dots1ForCausalLM"] diff --git a/src/transformers/models/encodec/modeling_encodec.py b/src/transformers/models/encodec/modeling_encodec.py index c3c32f5bd61d..c99801e5d382 100644 --- a/src/transformers/models/encodec/modeling_encodec.py +++ b/src/transformers/models/encodec/modeling_encodec.py @@ -737,13 +737,10 @@ def forward( - 1 for tokens that are **not masked**, - 0 for tokens that are **masked**. -- - `padding_mask` should always be passed, unless the input was truncated or not padded. This is because in - order to process tensors effectively, the input audio should be padded so that `input_length % stride = - step` with `step = chunk_length-stride`. This ensures that all chunks are of the same shape - - + > [!WARNING] + > `padding_mask` should always be passed, unless the input was truncated or not padded. This is because in + > order to process tensors effectively, the input audio should be padded so that `input_length % stride = + > step` with `step = chunk_length-stride`. This ensures that all chunks are of the same shape bandwidth (`float`, *optional*): The target bandwidth. Must be one of `config.target_bandwidths`. If `None`, uses the smallest possible bandwidth. bandwidth is represented as a thousandth of what it is, e.g. 6kbps bandwidth is represented as diff --git a/src/transformers/models/flaubert/modeling_flaubert.py b/src/transformers/models/flaubert/modeling_flaubert.py index 5812aa457cbc..2bfbc5d90307 100644 --- a/src/transformers/models/flaubert/modeling_flaubert.py +++ b/src/transformers/models/flaubert/modeling_flaubert.py @@ -355,12 +355,9 @@ def forward( Mask for tokens at invalid position, such as query and special symbols (PAD, SEP, CLS). 1.0 means token should be masked. -- - One of `start_states` or `start_positions` should be not `None`. If both are set, `start_positions` overrides - `start_states`. - - + > [!TIP] + > One of `start_states` or `start_positions` should be not `None`. If both are set, `start_positions` overrides + > `start_states`. Returns: `torch.FloatTensor`: The end logits for SQuAD. @@ -422,12 +419,9 @@ def forward( cls_index (`torch.LongTensor` of shape `(batch_size,)`, *optional*): Position of the CLS token for each sentence in the batch. If `None`, takes the last token. -- - One of `start_states` or `start_positions` should be not `None`. If both are set, `start_positions` overrides - `start_states`. - - + > [!TIP] + > One of `start_states` or `start_positions` should be not `None`. If both are set, `start_positions` overrides + > `start_states`. Returns: `torch.FloatTensor`: The SQuAD 2.0 answer class. diff --git a/src/transformers/models/flaubert/tokenization_flaubert.py b/src/transformers/models/flaubert/tokenization_flaubert.py index dee653450eba..f3cb98697ddb 100644 --- a/src/transformers/models/flaubert/tokenization_flaubert.py +++ b/src/transformers/models/flaubert/tokenization_flaubert.py @@ -146,12 +146,9 @@ class FlaubertTokenizer(PreTrainedTokenizer): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/focalnet/modeling_focalnet.py b/src/transformers/models/focalnet/modeling_focalnet.py index 9b5d4daed70c..98e340f925bf 100644 --- a/src/transformers/models/focalnet/modeling_focalnet.py +++ b/src/transformers/models/focalnet/modeling_focalnet.py @@ -681,12 +681,9 @@ def forward( This follows the same implementation as in [SimMIM](https://huggingface.co/papers/2111.09886). -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. sep_token (`str`, *optional*, defaults to `"- - Note that we provide a script to pre-train this model on custom data in our [examples - directory](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining). - - + > [!TIP] + > Note that we provide a script to pre-train this model on custom data in our [examples + > directory](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining). """ ) class FocalNetForMaskedImageModeling(FocalNetPreTrainedModel): diff --git a/src/transformers/models/fsmt/tokenization_fsmt.py b/src/transformers/models/fsmt/tokenization_fsmt.py index 5a4446d8e90b..c30a411797e4 100644 --- a/src/transformers/models/fsmt/tokenization_fsmt.py +++ b/src/transformers/models/fsmt/tokenization_fsmt.py @@ -141,12 +141,9 @@ class FSMTTokenizer(PreTrainedTokenizer): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/gpt2/tokenization_gpt2.py b/src/transformers/models/gpt2/tokenization_gpt2.py index 608164ef2d83..e4bcaa05824b 100644 --- a/src/transformers/models/gpt2/tokenization_gpt2.py +++ b/src/transformers/models/gpt2/tokenization_gpt2.py @@ -93,11 +93,8 @@ class GPT2Tokenizer(PreTrainedTokenizer): You can get around that behavior by passing `add_prefix_space=True` when instantiating this tokenizer or when you call it on some text, but since the model was not pretrained this way, it might yield a decrease in performance. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. sep_token (`str`, *optional*, defaults to `"- - When used with `is_split_into_words=True`, this tokenizer will add a space before each word (even the first one). - - + > [!TIP] + > When used with `is_split_into_words=True`, this tokenizer will add a space before each word (even the first one). This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods. diff --git a/src/transformers/models/gpt2/tokenization_gpt2_fast.py b/src/transformers/models/gpt2/tokenization_gpt2_fast.py index f81c155e8644..6f49b429e2ed 100644 --- a/src/transformers/models/gpt2/tokenization_gpt2_fast.py +++ b/src/transformers/models/gpt2/tokenization_gpt2_fast.py @@ -49,11 +49,8 @@ class GPT2TokenizerFast(PreTrainedTokenizerFast): You can get around that behavior by passing `add_prefix_space=True` when instantiating this tokenizer, but since the model was not pretrained this way, it might yield a decrease in performance. -- - When used with `is_split_into_words=True`, this tokenizer needs to be instantiated with `add_prefix_space=True`. - - + > [!TIP] + > When used with `is_split_into_words=True`, this tokenizer needs to be instantiated with `add_prefix_space=True`. This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods. diff --git a/src/transformers/models/gpt_neox/tokenization_gpt_neox_fast.py b/src/transformers/models/gpt_neox/tokenization_gpt_neox_fast.py index a3b190a60eb1..46d93916ccc5 100644 --- a/src/transformers/models/gpt_neox/tokenization_gpt_neox_fast.py +++ b/src/transformers/models/gpt_neox/tokenization_gpt_neox_fast.py @@ -49,11 +49,8 @@ class GPTNeoXTokenizerFast(PreTrainedTokenizerFast): You can get around that behavior by passing `add_prefix_space=True` when instantiating this tokenizer, but since the model was not pretrained this way, it might yield a decrease in performance. -- - When used with `is_split_into_words=True`, this tokenizer needs to be instantiated with `add_prefix_space=True`. - - + > [!TIP] + > When used with `is_split_into_words=True`, this tokenizer needs to be instantiated with `add_prefix_space=True`. This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods. diff --git a/src/transformers/models/groupvit/modeling_groupvit.py b/src/transformers/models/groupvit/modeling_groupvit.py index 598845750da2..a38673a31677 100644 --- a/src/transformers/models/groupvit/modeling_groupvit.py +++ b/src/transformers/models/groupvit/modeling_groupvit.py @@ -272,13 +272,10 @@ class GroupViTModelOutput(ModelOutput): segmentation_logits (`torch.FloatTensor` of shape `(batch_size, config.num_labels, logits_height, logits_width)`): Classification scores for each pixel. -- - The logits returned do not necessarily have the same size as the `pixel_values` passed as inputs. This is - to avoid doing two interpolations and lose some quality when a user needs to resize the logits to the - original image size as post-processing. You should always check your logits shape and resize as needed. - - + > [!WARNING] + > The logits returned do not necessarily have the same size as the `pixel_values` passed as inputs. This is + > to avoid doing two interpolations and lose some quality when a user needs to resize the logits to the + > original image size as post-processing. You should always check your logits shape and resize as needed. text_embeds (`torch.FloatTensor` of shape `(batch_size, output_dim`): The text embeddings obtained by applying the projection layer to the pooled output of [`GroupViTTextModel`]. diff --git a/src/transformers/models/hiera/modeling_hiera.py b/src/transformers/models/hiera/modeling_hiera.py index 499c0b454600..eddb542f75c4 100644 --- a/src/transformers/models/hiera/modeling_hiera.py +++ b/src/transformers/models/hiera/modeling_hiera.py @@ -1080,12 +1080,9 @@ def forward(self, feature_maps: list[torch.Tensor]) -> torch.Tensor: custom_intro=""" The Hiera Model transformer with the decoder on top for self-supervised pre-training. -- - Note that we provide a script to pre-train this model on custom data in our [examples - directory](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining). - - + > [!TIP] + > Note that we provide a script to pre-train this model on custom data in our [examples + > directory](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining). """ ) class HieraForPreTraining(HieraPreTrainedModel): @@ -1222,13 +1219,10 @@ def forward( Hiera Model transformer with an image classification head on top (a linear layer on top of the final hidden state with average pooling) e.g. for ImageNet. -- - Note that it's possible to fine-tune Hiera on higher resolution images than the ones it has been trained on, by - setting `interpolate_pos_encoding` to `True` in the forward of the model. This will interpolate the pre-trained - position embeddings to the higher resolution. - - + > [!TIP] + > Note that it's possible to fine-tune Hiera on higher resolution images than the ones it has been trained on, by + > setting `interpolate_pos_encoding` to `True` in the forward of the model. This will interpolate the pre-trained + > position embeddings to the higher resolution. """ ) class HieraForImageClassification(HieraPreTrainedModel): diff --git a/src/transformers/models/ijepa/modeling_ijepa.py b/src/transformers/models/ijepa/modeling_ijepa.py index 2a15c40da4d3..bdb9906beaad 100644 --- a/src/transformers/models/ijepa/modeling_ijepa.py +++ b/src/transformers/models/ijepa/modeling_ijepa.py @@ -466,13 +466,10 @@ def forward( IJepa Model transformer with an image classification head on top (a linear layer on top of the final hidden states) e.g. for ImageNet. -- - Note that it's possible to fine-tune IJepa on higher resolution images than the ones it has been trained on, by - setting `interpolate_pos_encoding` to `True` in the forward of the model. This will interpolate the pre-trained - position embeddings to the higher resolution. - - + > [!TIP] + > Note that it's possible to fine-tune IJepa on higher resolution images than the ones it has been trained on, by + > setting `interpolate_pos_encoding` to `True` in the forward of the model. This will interpolate the pre-trained + > position embeddings to the higher resolution. """ ) class IJepaForImageClassification(IJepaPreTrainedModel): diff --git a/src/transformers/models/ijepa/modular_ijepa.py b/src/transformers/models/ijepa/modular_ijepa.py index b37bc41d13bf..9fb579660c06 100644 --- a/src/transformers/models/ijepa/modular_ijepa.py +++ b/src/transformers/models/ijepa/modular_ijepa.py @@ -128,13 +128,10 @@ def __init__(self, config: IJepaConfig, add_pooling_layer: bool = False, use_mas IJepa Model transformer with an image classification head on top (a linear layer on top of the final hidden states) e.g. for ImageNet. -- - Note that it's possible to fine-tune IJepa on higher resolution images than the ones it has been trained on, by - setting `interpolate_pos_encoding` to `True` in the forward of the model. This will interpolate the pre-trained - position embeddings to the higher resolution. - - + > [!TIP] + > Note that it's possible to fine-tune IJepa on higher resolution images than the ones it has been trained on, by + > setting `interpolate_pos_encoding` to `True` in the forward of the model. This will interpolate the pre-trained + > position embeddings to the higher resolution. """ ) class IJepaForImageClassification(IJepaPreTrainedModel, ViTForImageClassification): diff --git a/src/transformers/models/layoutlmv3/tokenization_layoutlmv3.py b/src/transformers/models/layoutlmv3/tokenization_layoutlmv3.py index fdf95a34d58d..c6afd35a251c 100644 --- a/src/transformers/models/layoutlmv3/tokenization_layoutlmv3.py +++ b/src/transformers/models/layoutlmv3/tokenization_layoutlmv3.py @@ -203,22 +203,16 @@ class LayoutLMv3Tokenizer(PreTrainedTokenizer): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/layoutlmv3/tokenization_layoutlmv3_fast.py b/src/transformers/models/layoutlmv3/tokenization_layoutlmv3_fast.py index d0407638595d..7e4b93deb24e 100644 --- a/src/transformers/models/layoutlmv3/tokenization_layoutlmv3_fast.py +++ b/src/transformers/models/layoutlmv3/tokenization_layoutlmv3_fast.py @@ -64,22 +64,16 @@ class LayoutLMv3TokenizerFast(PreTrainedTokenizerFast): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/layoutxlm/tokenization_layoutxlm.py b/src/transformers/models/layoutxlm/tokenization_layoutxlm.py index 9c1d5c05a9f9..b39bcb3ad5f1 100644 --- a/src/transformers/models/layoutxlm/tokenization_layoutxlm.py +++ b/src/transformers/models/layoutxlm/tokenization_layoutxlm.py @@ -158,22 +158,16 @@ class LayoutXLMTokenizer(PreTrainedTokenizer): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/layoutxlm/tokenization_layoutxlm_fast.py b/src/transformers/models/layoutxlm/tokenization_layoutxlm_fast.py index 7b08a3aa5f0e..d793d30f6304 100644 --- a/src/transformers/models/layoutxlm/tokenization_layoutxlm_fast.py +++ b/src/transformers/models/layoutxlm/tokenization_layoutxlm_fast.py @@ -160,22 +160,16 @@ class LayoutXLMTokenizerFast(PreTrainedTokenizerFast): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/led/tokenization_led.py b/src/transformers/models/led/tokenization_led.py index d110ac30d969..324a67a017d4 100644 --- a/src/transformers/models/led/tokenization_led.py +++ b/src/transformers/models/led/tokenization_led.py @@ -96,11 +96,8 @@ class LEDTokenizer(PreTrainedTokenizer): You can get around that behavior by passing `add_prefix_space=True` when instantiating this tokenizer or when you call it on some text, but since the model was not pretrained this way, it might yield a decrease in performance. -- - When used with `is_split_into_words=True`, this tokenizer will add a space before each word (even the first one). - - + > [!TIP] + > When used with `is_split_into_words=True`, this tokenizer will add a space before each word (even the first one). This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods. @@ -116,22 +113,16 @@ class LEDTokenizer(PreTrainedTokenizer): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/led/tokenization_led_fast.py b/src/transformers/models/led/tokenization_led_fast.py index baea10f23516..151d6d22da7c 100644 --- a/src/transformers/models/led/tokenization_led_fast.py +++ b/src/transformers/models/led/tokenization_led_fast.py @@ -53,11 +53,8 @@ class LEDTokenizerFast(PreTrainedTokenizerFast): You can get around that behavior by passing `add_prefix_space=True` when instantiating this tokenizer or when you call it on some text, but since the model was not pretrained this way, it might yield a decrease in performance. -- - When used with `is_split_into_words=True`, this tokenizer needs to be instantiated with `add_prefix_space=True`. - - + > [!TIP] + > When used with `is_split_into_words=True`, this tokenizer needs to be instantiated with `add_prefix_space=True`. This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods. @@ -73,22 +70,16 @@ class LEDTokenizerFast(PreTrainedTokenizerFast): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/longformer/tokenization_longformer.py b/src/transformers/models/longformer/tokenization_longformer.py index 104bdd7a9b99..2b1be5a69d2b 100644 --- a/src/transformers/models/longformer/tokenization_longformer.py +++ b/src/transformers/models/longformer/tokenization_longformer.py @@ -93,11 +93,8 @@ class LongformerTokenizer(PreTrainedTokenizer): You can get around that behavior by passing `add_prefix_space=True` when instantiating this tokenizer or when you call it on some text, but since the model was not pretrained this way, it might yield a decrease in performance. -- - When used with `is_split_into_words=True`, this tokenizer will add a space before each word (even the first one). - - + > [!TIP] + > When used with `is_split_into_words=True`, this tokenizer will add a space before each word (even the first one). This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods. @@ -113,22 +110,16 @@ class LongformerTokenizer(PreTrainedTokenizer): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/longformer/tokenization_longformer_fast.py b/src/transformers/models/longformer/tokenization_longformer_fast.py index bde6bb55fec6..e2872c704df7 100644 --- a/src/transformers/models/longformer/tokenization_longformer_fast.py +++ b/src/transformers/models/longformer/tokenization_longformer_fast.py @@ -53,11 +53,8 @@ class LongformerTokenizerFast(PreTrainedTokenizerFast): You can get around that behavior by passing `add_prefix_space=True` when instantiating this tokenizer or when you call it on some text, but since the model was not pretrained this way, it might yield a decrease in performance. -- - When used with `is_split_into_words=True`, this tokenizer needs to be instantiated with `add_prefix_space=True`. - - + > [!TIP] + > When used with `is_split_into_words=True`, this tokenizer needs to be instantiated with `add_prefix_space=True`. This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods. @@ -73,22 +70,16 @@ class LongformerTokenizerFast(PreTrainedTokenizerFast): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/luke/tokenization_luke.py b/src/transformers/models/luke/tokenization_luke.py index 4bb19bb5ee73..4b051900f959 100644 --- a/src/transformers/models/luke/tokenization_luke.py +++ b/src/transformers/models/luke/tokenization_luke.py @@ -192,11 +192,8 @@ class LukeTokenizer(PreTrainedTokenizer): You can get around that behavior by passing `add_prefix_space=True` when instantiating this tokenizer or when you call it on some text, but since the model was not pretrained this way, it might yield a decrease in performance. -- - When used with `is_split_into_words=True`, this tokenizer will add a space before each word (even the first one). - - + > [!TIP] + > When used with `is_split_into_words=True`, this tokenizer will add a space before each word (even the first one). This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods. It also creates entity sequences, namely @@ -230,22 +227,16 @@ class LukeTokenizer(PreTrainedTokenizer): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/markuplm/tokenization_markuplm.py b/src/transformers/models/markuplm/tokenization_markuplm.py index 0a6f7c3bd6a0..46c4d69620b6 100644 --- a/src/transformers/models/markuplm/tokenization_markuplm.py +++ b/src/transformers/models/markuplm/tokenization_markuplm.py @@ -142,22 +142,16 @@ class MarkupLMTokenizer(PreTrainedTokenizer): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/markuplm/tokenization_markuplm_fast.py b/src/transformers/models/markuplm/tokenization_markuplm_fast.py index 4033ef319ff8..bd2cc823120d 100644 --- a/src/transformers/models/markuplm/tokenization_markuplm_fast.py +++ b/src/transformers/models/markuplm/tokenization_markuplm_fast.py @@ -101,22 +101,16 @@ class MarkupLMTokenizerFast(PreTrainedTokenizerFast): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/mluke/tokenization_mluke.py b/src/transformers/models/mluke/tokenization_mluke.py index d63129c7b7e4..454e5e6f22b1 100644 --- a/src/transformers/models/mluke/tokenization_mluke.py +++ b/src/transformers/models/mluke/tokenization_mluke.py @@ -146,22 +146,16 @@ class MLukeTokenizer(PreTrainedTokenizer): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/mpnet/tokenization_mpnet.py b/src/transformers/models/mpnet/tokenization_mpnet.py index bf035cf8e4bd..21f92bb3891f 100644 --- a/src/transformers/models/mpnet/tokenization_mpnet.py +++ b/src/transformers/models/mpnet/tokenization_mpnet.py @@ -68,22 +68,16 @@ class MPNetTokenizer(PreTrainedTokenizer): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pre-training. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/mpnet/tokenization_mpnet_fast.py b/src/transformers/models/mpnet/tokenization_mpnet_fast.py index 1a470565a845..d5854fbb2c9d 100644 --- a/src/transformers/models/mpnet/tokenization_mpnet_fast.py +++ b/src/transformers/models/mpnet/tokenization_mpnet_fast.py @@ -46,22 +46,16 @@ class MPNetTokenizerFast(PreTrainedTokenizerFast): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/musicgen/modeling_musicgen.py b/src/transformers/models/musicgen/modeling_musicgen.py index 7326ede89e71..49a20294a814 100644 --- a/src/transformers/models/musicgen/modeling_musicgen.py +++ b/src/transformers/models/musicgen/modeling_musicgen.py @@ -488,16 +488,13 @@ def forward( [What are input IDs?](../glossary#input-ids) -- - The `input_ids` will automatically be converted from shape `(batch_size * num_codebooks, - target_sequence_length)` to `(batch_size, num_codebooks, target_sequence_length)` in the forward pass. If - you obtain audio codes from an audio encoding model, such as [`EncodecModel`], ensure that the number of - frames is equal to 1, and that you reshape the audio codes from `(frames, batch_size, num_codebooks, - target_sequence_length)` to `(batch_size * num_codebooks, target_sequence_length)` prior to passing them as - `input_ids`. - - + > [!WARNING] + > The `input_ids` will automatically be converted from shape `(batch_size * num_codebooks, + > target_sequence_length)` to `(batch_size, num_codebooks, target_sequence_length)` in the forward pass. If + > you obtain audio codes from an audio encoding model, such as [`EncodecModel`], ensure that the number of + > frames is equal to 1, and that you reshape the audio codes from `(frames, batch_size, num_codebooks, + > target_sequence_length)` to `(batch_size * num_codebooks, target_sequence_length)` prior to passing them as + > `input_ids`. encoder_hidden_states (`torch.FloatTensor` of shape `(batch_size, encoder_sequence_length, hidden_size)`, *optional*): Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder. @@ -729,16 +726,13 @@ def forward( [What are input IDs?](../glossary#input-ids) -- - The `input_ids` will automatically be converted from shape `(batch_size * num_codebooks, - target_sequence_length)` to `(batch_size, num_codebooks, target_sequence_length)` in the forward pass. If - you obtain audio codes from an audio encoding model, such as [`EncodecModel`], ensure that the number of - frames is equal to 1, and that you reshape the audio codes from `(frames, batch_size, num_codebooks, - target_sequence_length)` to `(batch_size * num_codebooks, target_sequence_length)` prior to passing them as - `input_ids`. - - + > [!WARNING] + > The `input_ids` will automatically be converted from shape `(batch_size * num_codebooks, + > target_sequence_length)` to `(batch_size, num_codebooks, target_sequence_length)` in the forward pass. If + > you obtain audio codes from an audio encoding model, such as [`EncodecModel`], ensure that the number of + > frames is equal to 1, and that you reshape the audio codes from `(frames, batch_size, num_codebooks, + > target_sequence_length)` to `(batch_size * num_codebooks, target_sequence_length)` prior to passing them as + > `input_ids`. encoder_hidden_states (`torch.FloatTensor` of shape `(batch_size, encoder_sequence_length, hidden_size)`, *optional*): Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder. @@ -848,16 +842,13 @@ def forward( [What are input IDs?](../glossary#input-ids) -- - The `input_ids` will automatically be converted from shape `(batch_size * num_codebooks, - target_sequence_length)` to `(batch_size, num_codebooks, target_sequence_length)` in the forward pass. If - you obtain audio codes from an audio encoding model, such as [`EncodecModel`], ensure that the number of - frames is equal to 1, and that you reshape the audio codes from `(frames, batch_size, num_codebooks, - target_sequence_length)` to `(batch_size * num_codebooks, target_sequence_length)` prior to passing them as - `input_ids`. - - + > [!WARNING] + > The `input_ids` will automatically be converted from shape `(batch_size * num_codebooks, + > target_sequence_length)` to `(batch_size, num_codebooks, target_sequence_length)` in the forward pass. If + > you obtain audio codes from an audio encoding model, such as [`EncodecModel`], ensure that the number of + > frames is equal to 1, and that you reshape the audio codes from `(frames, batch_size, num_codebooks, + > target_sequence_length)` to `(batch_size * num_codebooks, target_sequence_length)` prior to passing them as + > `input_ids`. encoder_hidden_states (`torch.FloatTensor` of shape `(batch_size, encoder_sequence_length, hidden_size)`, *optional*): Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder. @@ -1077,16 +1068,13 @@ def generate( Generates sequences of token ids for models with a language modeling head. -- - Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the - model's default generation configuration. You can override any `generation_config` by passing the corresponding - parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. - - For an overview of generation strategies and code examples, check out the [following - guide](./generation_strategies). - - + > [!WARNING] + > Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the + > model's default generation configuration. You can override any `generation_config` by passing the corresponding + > parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. + > + > For an overview of generation strategies and code examples, check out the [following + > guide](./generation_strategies). Parameters: inputs (`torch.Tensor` of varying shape depending on the modality, *optional*): @@ -1664,16 +1652,13 @@ def forward( [What are decoder input IDs?](../glossary#decoder-input-ids) -- - The `decoder_input_ids` will automatically be converted from shape `(batch_size * num_codebooks, - target_sequence_length)` to `(batch_size, num_codebooks, target_sequence_length)` in the forward pass. If - you obtain audio codes from an audio encoding model, such as [`EncodecModel`], ensure that the number of - frames is equal to 1, and that you reshape the audio codes from `(frames, batch_size, num_codebooks, - target_sequence_length)` to `(batch_size * num_codebooks, target_sequence_length)` prior to passing them as - `decoder_input_ids`. - - + > [!WARNING] + > The `decoder_input_ids` will automatically be converted from shape `(batch_size * num_codebooks, + > target_sequence_length)` to `(batch_size, num_codebooks, target_sequence_length)` in the forward pass. If + > you obtain audio codes from an audio encoding model, such as [`EncodecModel`], ensure that the number of + > frames is equal to 1, and that you reshape the audio codes from `(frames, batch_size, num_codebooks, + > target_sequence_length)` to `(batch_size * num_codebooks, target_sequence_length)` prior to passing them as + > `decoder_input_ids`. decoder_attention_mask (`torch.LongTensor` of shape `(batch_size, target_sequence_length)`, *optional*): Default behavior: generate a tensor that ignores pad tokens in `decoder_input_ids`. Causal mask will also be used by default. @@ -2116,16 +2101,13 @@ def generate( Generates sequences of token ids for models with a language modeling head. -- - Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the - model's default generation configuration. You can override any `generation_config` by passing the corresponding - parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. - - For an overview of generation strategies and code examples, check out the [following - guide](./generation_strategies). - - + > [!WARNING] + > Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the + > model's default generation configuration. You can override any `generation_config` by passing the corresponding + > parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. + > + > For an overview of generation strategies and code examples, check out the [following + > guide](./generation_strategies). Parameters: inputs (`torch.Tensor` of varying shape depending on the modality, *optional*): diff --git a/src/transformers/models/musicgen_melody/feature_extraction_musicgen_melody.py b/src/transformers/models/musicgen_melody/feature_extraction_musicgen_melody.py index 744471bab553..28163b1960b2 100644 --- a/src/transformers/models/musicgen_melody/feature_extraction_musicgen_melody.py +++ b/src/transformers/models/musicgen_melody/feature_extraction_musicgen_melody.py @@ -69,12 +69,9 @@ class MusicgenMelodyFeatureExtractor(SequenceFeatureExtractor): [What are attention masks?](../glossary#attention-mask) -- - For Whisper models, `attention_mask` should always be passed for batched inference, to avoid subtle - bugs. - - + > [!TIP] + > For Whisper models, `attention_mask` should always be passed for batched inference, to avoid subtle + > bugs. stem_indices (`list[int]`, *optional*, defaults to `[3, 2]`): Stem channels to extract if demucs outputs are passed. """ @@ -219,9 +216,8 @@ def __call__( [What are attention masks?](../glossary#attention-mask) -- For Musicgen Melody models, audio `attention_mask` is not necessary. - + > [!TIP] + > For Musicgen Melody models, audio `attention_mask` is not necessary. padding (`bool`, `str` or [`~utils.PaddingStrategy`], *optional*, defaults to `True`): Select a strategy to pad the returned sequences (according to the model's padding side and padding diff --git a/src/transformers/models/musicgen_melody/modeling_musicgen_melody.py b/src/transformers/models/musicgen_melody/modeling_musicgen_melody.py index cea583599ee2..e86884150d14 100644 --- a/src/transformers/models/musicgen_melody/modeling_musicgen_melody.py +++ b/src/transformers/models/musicgen_melody/modeling_musicgen_melody.py @@ -461,16 +461,13 @@ def forward( [What are input IDs?](../glossary#input-ids) -- - The `input_ids` will automatically be converted from shape `(batch_size * num_codebooks, - target_sequence_length)` to `(batch_size, num_codebooks, target_sequence_length)` in the forward pass. If - you obtain audio codes from an audio encoding model, such as [`EncodecModel`], ensure that the number of - frames is equal to 1, and that you reshape the audio codes from `(frames, batch_size, num_codebooks, - target_sequence_length)` to `(batch_size * num_codebooks, target_sequence_length)` prior to passing them as - `input_ids`. - - + > [!WARNING] + > The `input_ids` will automatically be converted from shape `(batch_size * num_codebooks, + > target_sequence_length)` to `(batch_size, num_codebooks, target_sequence_length)` in the forward pass. If + > you obtain audio codes from an audio encoding model, such as [`EncodecModel`], ensure that the number of + > frames is equal to 1, and that you reshape the audio codes from `(frames, batch_size, num_codebooks, + > target_sequence_length)` to `(batch_size * num_codebooks, target_sequence_length)` prior to passing them as + > `input_ids`. encoder_hidden_states (`torch.FloatTensor` of shape `(batch_size, encoder_sequence_length, hidden_size)`, *optional*): Sequence of hidden-states representing the concatenation of the text encoder output and the processed audio encoder output. Used as a conditional signal and will thus be concatenated to the projected `decoder_input_ids`. @@ -683,16 +680,13 @@ def forward( [What are input IDs?](../glossary#input-ids) -- - The `input_ids` will automatically be converted from shape `(batch_size * num_codebooks, - target_sequence_length)` to `(batch_size, num_codebooks, target_sequence_length)` in the forward pass. If - you obtain audio codes from an audio encoding model, such as [`EncodecModel`], ensure that the number of - frames is equal to 1, and that you reshape the audio codes from `(frames, batch_size, num_codebooks, - target_sequence_length)` to `(batch_size * num_codebooks, target_sequence_length)` prior to passing them as - `input_ids`. - - + > [!WARNING] + > The `input_ids` will automatically be converted from shape `(batch_size * num_codebooks, + > target_sequence_length)` to `(batch_size, num_codebooks, target_sequence_length)` in the forward pass. If + > you obtain audio codes from an audio encoding model, such as [`EncodecModel`], ensure that the number of + > frames is equal to 1, and that you reshape the audio codes from `(frames, batch_size, num_codebooks, + > target_sequence_length)` to `(batch_size * num_codebooks, target_sequence_length)` prior to passing them as + > `input_ids`. encoder_hidden_states (`torch.FloatTensor` of shape `(batch_size, encoder_sequence_length, hidden_size)`, *optional*): Sequence of hidden-states representing the concatenation of the text encoder output and the processed audio encoder output. Used as a conditional signal and will thus be concatenated to the projected `decoder_input_ids`. @@ -802,16 +796,13 @@ def forward( [What are input IDs?](../glossary#input-ids) -- - The `input_ids` will automatically be converted from shape `(batch_size * num_codebooks, - target_sequence_length)` to `(batch_size, num_codebooks, target_sequence_length)` in the forward pass. If - you obtain audio codes from an audio encoding model, such as [`EncodecModel`], ensure that the number of - frames is equal to 1, and that you reshape the audio codes from `(frames, batch_size, num_codebooks, - target_sequence_length)` to `(batch_size * num_codebooks, target_sequence_length)` prior to passing them as - `input_ids`. - - + > [!WARNING] + > The `input_ids` will automatically be converted from shape `(batch_size * num_codebooks, + > target_sequence_length)` to `(batch_size, num_codebooks, target_sequence_length)` in the forward pass. If + > you obtain audio codes from an audio encoding model, such as [`EncodecModel`], ensure that the number of + > frames is equal to 1, and that you reshape the audio codes from `(frames, batch_size, num_codebooks, + > target_sequence_length)` to `(batch_size * num_codebooks, target_sequence_length)` prior to passing them as + > `input_ids`. encoder_hidden_states (`torch.FloatTensor` of shape `(batch_size, encoder_sequence_length, hidden_size)`, *optional*): Sequence of hidden-states representing the concatenation of the text encoder output and the processed audio encoder output. Used as a conditional signal and will thus be concatenated to the projected `decoder_input_ids`. @@ -1045,16 +1036,13 @@ def generate( Generates sequences of token ids for models with a language modeling head. -- - Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the - model's default generation configuration. You can override any `generation_config` by passing the corresponding - parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. - - For an overview of generation strategies and code examples, check out the [following - guide](./generation_strategies). - - + > [!WARNING] + > Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the + > model's default generation configuration. You can override any `generation_config` by passing the corresponding + > parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. + > + > For an overview of generation strategies and code examples, check out the [following + > guide](./generation_strategies). Parameters: inputs (`torch.Tensor` of varying shape depending on the modality, *optional*): @@ -1577,16 +1565,13 @@ def forward( [What are decoder input IDs?](../glossary#decoder-input-ids) -- - The `decoder_input_ids` will automatically be converted from shape `(batch_size * num_codebooks, - target_sequence_length)` to `(batch_size, num_codebooks, target_sequence_length)` in the forward pass. If - you obtain audio codes from an audio encoding model, such as [`EncodecModel`], ensure that the number of - frames is equal to 1, and that you reshape the audio codes from `(frames, batch_size, num_codebooks, - target_sequence_length)` to `(batch_size * num_codebooks, target_sequence_length)` prior to passing them as - `decoder_input_ids`. - - + > [!WARNING] + > The `decoder_input_ids` will automatically be converted from shape `(batch_size * num_codebooks, + > target_sequence_length)` to `(batch_size, num_codebooks, target_sequence_length)` in the forward pass. If + > you obtain audio codes from an audio encoding model, such as [`EncodecModel`], ensure that the number of + > frames is equal to 1, and that you reshape the audio codes from `(frames, batch_size, num_codebooks, + > target_sequence_length)` to `(batch_size * num_codebooks, target_sequence_length)` prior to passing them as + > `decoder_input_ids`. decoder_attention_mask (`torch.LongTensor` of shape `(batch_size, target_sequence_length)`, *optional*): Default behavior: generate a tensor that ignores pad tokens in `decoder_input_ids`. Causal mask will also be used by default. @@ -2007,16 +1992,13 @@ def generate( Generates sequences of token ids for models with a language modeling head. -- - Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the - model's default generation configuration. You can override any `generation_config` by passing the corresponding - parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. - - For an overview of generation strategies and code examples, check out the [following - guide](./generation_strategies). - - + > [!WARNING] + > Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the + > model's default generation configuration. You can override any `generation_config` by passing the corresponding + > parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. + > + > For an overview of generation strategies and code examples, check out the [following + > guide](./generation_strategies). Parameters: inputs (`torch.Tensor` of varying shape depending on the modality, *optional*): diff --git a/src/transformers/models/mvp/tokenization_mvp.py b/src/transformers/models/mvp/tokenization_mvp.py index f6039df2dc02..63a1438b1a1c 100644 --- a/src/transformers/models/mvp/tokenization_mvp.py +++ b/src/transformers/models/mvp/tokenization_mvp.py @@ -92,11 +92,8 @@ class MvpTokenizer(PreTrainedTokenizer): You can get around that behavior by passing `add_prefix_space=True` when instantiating this tokenizer or when you call it on some text, but since the model was not pretrained this way, it might yield a decrease in performance. -- - When used with `is_split_into_words=True`, this tokenizer will add a space before each word (even the first one). - - + > [!TIP] + > When used with `is_split_into_words=True`, this tokenizer will add a space before each word (even the first one). This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods. @@ -112,22 +109,16 @@ class MvpTokenizer(PreTrainedTokenizer): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/mvp/tokenization_mvp_fast.py b/src/transformers/models/mvp/tokenization_mvp_fast.py index ca0bc6b165f7..1adf757055b5 100644 --- a/src/transformers/models/mvp/tokenization_mvp_fast.py +++ b/src/transformers/models/mvp/tokenization_mvp_fast.py @@ -54,11 +54,8 @@ class MvpTokenizerFast(PreTrainedTokenizerFast): You can get around that behavior by passing `add_prefix_space=True` when instantiating this tokenizer or when you call it on some text, but since the model was not pretrained this way, it might yield a decrease in performance. -- - When used with `is_split_into_words=True`, this tokenizer needs to be instantiated with `add_prefix_space=True`. - - + > [!TIP] + > When used with `is_split_into_words=True`, this tokenizer needs to be instantiated with `add_prefix_space=True`. This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods. @@ -74,22 +71,16 @@ class MvpTokenizerFast(PreTrainedTokenizerFast): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/nllb/tokenization_nllb.py b/src/transformers/models/nllb/tokenization_nllb.py index 4962a642bb31..cf7a29a3146e 100644 --- a/src/transformers/models/nllb/tokenization_nllb.py +++ b/src/transformers/models/nllb/tokenization_nllb.py @@ -64,22 +64,16 @@ class NllbTokenizer(PreTrainedTokenizer): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/nllb/tokenization_nllb_fast.py b/src/transformers/models/nllb/tokenization_nllb_fast.py index 5300b3942b5d..0cc028219c10 100644 --- a/src/transformers/models/nllb/tokenization_nllb_fast.py +++ b/src/transformers/models/nllb/tokenization_nllb_fast.py @@ -69,22 +69,16 @@ class NllbTokenizerFast(PreTrainedTokenizerFast): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/parakeet/feature_extraction_parakeet.py b/src/transformers/models/parakeet/feature_extraction_parakeet.py index d28f1a214a21..1106c3fac2f3 100644 --- a/src/transformers/models/parakeet/feature_extraction_parakeet.py +++ b/src/transformers/models/parakeet/feature_extraction_parakeet.py @@ -163,12 +163,9 @@ def __call__( [What are attention masks?](../glossary#attention-mask) -- - For Parakeet models, `attention_mask` should always be passed for batched inference, to avoid subtle - bugs. - - + > [!TIP] + > For Parakeet models, `attention_mask` should always be passed for batched inference, to avoid subtle + > bugs. return_tensors (`str` or [`~utils.TensorType`], *optional*): If set, will return tensors instead of list of python integers. Acceptable values are: diff --git a/src/transformers/models/pegasus/tokenization_pegasus.py b/src/transformers/models/pegasus/tokenization_pegasus.py index b8a4a1c737d1..033d88aae927 100644 --- a/src/transformers/models/pegasus/tokenization_pegasus.py +++ b/src/transformers/models/pegasus/tokenization_pegasus.py @@ -51,12 +51,9 @@ class PegasusTokenizer(PreTrainedTokenizer): eos_token (`str`, *optional*, defaults to `""`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. unk_token (`str`, *optional*, defaults to `""`): The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this diff --git a/src/transformers/models/pegasus/tokenization_pegasus_fast.py b/src/transformers/models/pegasus/tokenization_pegasus_fast.py index 92a37c44ff2e..57531f9db10b 100644 --- a/src/transformers/models/pegasus/tokenization_pegasus_fast.py +++ b/src/transformers/models/pegasus/tokenization_pegasus_fast.py @@ -53,12 +53,9 @@ class PegasusTokenizerFast(PreTrainedTokenizerFast): eos_token (`str`, *optional*, defaults to `""`): The end of sequence token. - - - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. unk_token (`str`, *optional*, defaults to `""`): The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this diff --git a/src/transformers/models/perceiver/modeling_perceiver.py b/src/transformers/models/perceiver/modeling_perceiver.py index 499d01774d06..cb23b557a313 100755 --- a/src/transformers/models/perceiver/modeling_perceiver.py +++ b/src/transformers/models/perceiver/modeling_perceiver.py @@ -575,13 +575,10 @@ def _init_weights(self, module): custom_intro=""" The Perceiver: a scalable, fully attentional architecture. - - - Note that it's possible to fine-tune Perceiver on higher resolution images than the ones it has been trained on, by - setting `interpolate_pos_encoding` to `True` in the forward of the model. This will interpolate the pre-trained - position embeddings to the higher resolution. - - + > [!TIP] + > Note that it's possible to fine-tune Perceiver on higher resolution images than the ones it has been trained on, by + > setting `interpolate_pos_encoding` to `True` in the forward of the model. This will interpolate the pre-trained + > position embeddings to the higher resolution. """ ) class PerceiverModel(PerceiverPreTrainedModel): diff --git a/src/transformers/models/perceiver/tokenization_perceiver.py b/src/transformers/models/perceiver/tokenization_perceiver.py index f17e7e99ac9d..8109d0e7e0e7 100644 --- a/src/transformers/models/perceiver/tokenization_perceiver.py +++ b/src/transformers/models/perceiver/tokenization_perceiver.py @@ -38,12 +38,9 @@ class PerceiverTokenizer(PreTrainedTokenizer): eos_token (`str`, *optional*, defaults to `"[EOS]"`): The end of sequence token (reserved in the vocab, but not actually used). -- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. mask_token (`str`, *optional*, defaults to `"[MASK]"`): The MASK token, useful for masked language modeling. diff --git a/src/transformers/models/phobert/tokenization_phobert.py b/src/transformers/models/phobert/tokenization_phobert.py index 61ac8194b45c..624fe9501f4d 100644 --- a/src/transformers/models/phobert/tokenization_phobert.py +++ b/src/transformers/models/phobert/tokenization_phobert.py @@ -63,22 +63,16 @@ class PhobertTokenizer(PreTrainedTokenizer): bos_token (`st`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/pop2piano/modeling_pop2piano.py b/src/transformers/models/pop2piano/modeling_pop2piano.py index 94c2a7515a44..0f8183aee655 100644 --- a/src/transformers/models/pop2piano/modeling_pop2piano.py +++ b/src/transformers/models/pop2piano/modeling_pop2piano.py @@ -1193,14 +1193,11 @@ def generate( """ Generates token ids for midi outputs. -- - Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the - model's default generation configuration. You can override any `generation_config` by passing the corresponding - parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. For an overview of generation - strategies and code examples, check out the [following guide](./generation_strategies). - - + > [!WARNING] + > Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the + > model's default generation configuration. You can override any `generation_config` by passing the corresponding + > parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. For an overview of generation + > strategies and code examples, check out the [following guide](./generation_strategies). Parameters: input_features (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): diff --git a/src/transformers/models/reformer/modeling_reformer.py b/src/transformers/models/reformer/modeling_reformer.py index e2cb1c4657a8..b28ecac35bbe 100755 --- a/src/transformers/models/reformer/modeling_reformer.py +++ b/src/transformers/models/reformer/modeling_reformer.py @@ -2402,12 +2402,9 @@ def forward( config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the loss is only computed for the tokens with labels -- - This example uses a false checkpoint since we don't have any available pretrained model for the masked language - modeling task with the Reformer architecture. - - + > [!WARNING] + > This example uses a false checkpoint since we don't have any available pretrained model for the masked language + > modeling task with the Reformer architecture. Example: diff --git a/src/transformers/models/reformer/tokenization_reformer.py b/src/transformers/models/reformer/tokenization_reformer.py index 458b72df4ff6..175c35a60ba8 100644 --- a/src/transformers/models/reformer/tokenization_reformer.py +++ b/src/transformers/models/reformer/tokenization_reformer.py @@ -48,12 +48,9 @@ class ReformerTokenizer(PreTrainedTokenizer): eos_token (`str`, *optional*, defaults to `""`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. unk_token (`str`, *optional*, defaults to `""`): The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this diff --git a/src/transformers/models/reformer/tokenization_reformer_fast.py b/src/transformers/models/reformer/tokenization_reformer_fast.py index d68528de5872..40ef8a5ef783 100644 --- a/src/transformers/models/reformer/tokenization_reformer_fast.py +++ b/src/transformers/models/reformer/tokenization_reformer_fast.py @@ -51,12 +51,9 @@ class ReformerTokenizerFast(PreTrainedTokenizerFast): eos_token (`str`, *optional*, defaults to `""`): The end of sequence token. - - - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. unk_token (`str`, *optional*, defaults to `""`): The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this diff --git a/src/transformers/models/rembert/tokenization_rembert.py b/src/transformers/models/rembert/tokenization_rembert.py index cf27a7b3bae6..ff8a21b6b850 100644 --- a/src/transformers/models/rembert/tokenization_rembert.py +++ b/src/transformers/models/rembert/tokenization_rembert.py @@ -45,22 +45,16 @@ class RemBertTokenizer(PreTrainedTokenizer): bos_token (`str`, *optional*, defaults to `"[CLS]"`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. - - - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"[SEP]"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. unk_token (`str`, *optional*, defaults to `""`): The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this diff --git a/src/transformers/models/rembert/tokenization_rembert_fast.py b/src/transformers/models/rembert/tokenization_rembert_fast.py index fb358746e6d2..52e454c31478 100644 --- a/src/transformers/models/rembert/tokenization_rembert_fast.py +++ b/src/transformers/models/rembert/tokenization_rembert_fast.py @@ -55,12 +55,9 @@ class RemBertTokenizerFast(PreTrainedTokenizerFast): bos_token (`str`, *optional*, defaults to `"[CLS]"`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. - - - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"[SEP]"`): The end of sequence token. .. note:: When building a sequence using special tokens, this is not the token diff --git a/src/transformers/models/roberta/tokenization_roberta.py b/src/transformers/models/roberta/tokenization_roberta.py index 67cdcbbf488a..394ec17f32a0 100644 --- a/src/transformers/models/roberta/tokenization_roberta.py +++ b/src/transformers/models/roberta/tokenization_roberta.py @@ -93,11 +93,8 @@ class RobertaTokenizer(PreTrainedTokenizer): You can get around that behavior by passing `add_prefix_space=True` when instantiating this tokenizer or when you call it on some text, but since the model was not pretrained this way, it might yield a decrease in performance. -- - When used with `is_split_into_words=True`, this tokenizer will add a space before each word (even the first one). - - + > [!TIP] + > When used with `is_split_into_words=True`, this tokenizer will add a space before each word (even the first one). This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods. @@ -113,22 +110,16 @@ class RobertaTokenizer(PreTrainedTokenizer): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/roberta/tokenization_roberta_fast.py b/src/transformers/models/roberta/tokenization_roberta_fast.py index d9ddcfc82d49..d23236a79983 100644 --- a/src/transformers/models/roberta/tokenization_roberta_fast.py +++ b/src/transformers/models/roberta/tokenization_roberta_fast.py @@ -52,11 +52,8 @@ class RobertaTokenizerFast(PreTrainedTokenizerFast): You can get around that behavior by passing `add_prefix_space=True` when instantiating this tokenizer or when you call it on some text, but since the model was not pretrained this way, it might yield a decrease in performance. -- - When used with `is_split_into_words=True`, this tokenizer needs to be instantiated with `add_prefix_space=True`. - - + > [!TIP] + > When used with `is_split_into_words=True`, this tokenizer needs to be instantiated with `add_prefix_space=True`. This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods. @@ -72,22 +69,16 @@ class RobertaTokenizerFast(PreTrainedTokenizerFast): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/seamless_m4t/feature_extraction_seamless_m4t.py b/src/transformers/models/seamless_m4t/feature_extraction_seamless_m4t.py index d0a8a07b3a9e..f6151f7e2516 100644 --- a/src/transformers/models/seamless_m4t/feature_extraction_seamless_m4t.py +++ b/src/transformers/models/seamless_m4t/feature_extraction_seamless_m4t.py @@ -194,12 +194,9 @@ def __call__( [What are attention masks?](../glossary#attention-mask) -- - For SeamlessM4T models, `attention_mask` should always be passed for batched inference, to avoid subtle - bugs. - - + > [!TIP] + > For SeamlessM4T models, `attention_mask` should always be passed for batched inference, to avoid subtle + > bugs. return_tensors (`str` or [`~utils.TensorType`], *optional*): If set, will return tensors instead of list of python integers. Acceptable values are: diff --git a/src/transformers/models/seamless_m4t/modeling_seamless_m4t.py b/src/transformers/models/seamless_m4t/modeling_seamless_m4t.py index 9332e18856a2..07ea919e0e7d 100755 --- a/src/transformers/models/seamless_m4t/modeling_seamless_m4t.py +++ b/src/transformers/models/seamless_m4t/modeling_seamless_m4t.py @@ -2611,16 +2611,13 @@ def generate( """ Generates sequences of token ids. -- - Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the - model's default generation configuration. You can override any `generation_config` by passing the corresponding - parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. - - For an overview of generation strategies and code examples, check out the [following - guide](./generation_strategies). - - + > [!WARNING] + > Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the + > model's default generation configuration. You can override any `generation_config` by passing the corresponding + > parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. + > + > For an overview of generation strategies and code examples, check out the [following + > guide](./generation_strategies). Parameters: input_ids (`torch.Tensor` of varying shape depending on the modality, *optional*): @@ -2870,16 +2867,13 @@ def generate( """ Generates sequences of token ids. -- - Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the - model's default generation configuration. You can override any `generation_config` by passing the corresponding - parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. - - For an overview of generation strategies and code examples, check out the [following - guide](./generation_strategies). - - + > [!WARNING] + > Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the + > model's default generation configuration. You can override any `generation_config` by passing the corresponding + > parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. + > + > For an overview of generation strategies and code examples, check out the [following + > guide](./generation_strategies). Parameters: input_features (`torch.FloatTensor` of shape `(batch_size, sequence_length, num_banks)`): @@ -3137,19 +3131,16 @@ def generate( """ Generates translated audio waveforms. -- - This method successively calls the `.generate` function of two different sub-models. You can specify keyword - arguments at two different levels: general arguments that will be passed to both models, or prefixed arguments - that will be passed to one of them. - - For example, calling `.generate(input_ids, num_beams=4, speech_do_sample=True)` will successively perform - beam-search decoding on the text model, and multinomial beam-search sampling on the speech model. - - For an overview of generation strategies and code examples, check out the [following - guide](./generation_strategies). - - + > [!TIP] + > This method successively calls the `.generate` function of two different sub-models. You can specify keyword + > arguments at two different levels: general arguments that will be passed to both models, or prefixed arguments + > that will be passed to one of them. + > + > For example, calling `.generate(input_ids, num_beams=4, speech_do_sample=True)` will successively perform + > beam-search decoding on the text model, and multinomial beam-search sampling on the speech model. + > + > For an overview of generation strategies and code examples, check out the [following + > guide](./generation_strategies). Args: input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`): @@ -3461,19 +3452,16 @@ def generate( """ Generates translated audio waveforms. -- - This method successively calls the `.generate` function of two different sub-models. You can specify keyword - arguments at two different levels: general arguments that will be passed to both models, or prefixed arguments - that will be passed to one of them. - - For example, calling `.generate(input_features, num_beams=4, speech_do_sample=True)` will successively perform - beam-search decoding on the text model, and multinomial beam-search sampling on the speech model. - - For an overview of generation strategies and code examples, check out the [following - guide](./generation_strategies). - - + > [!TIP] + > This method successively calls the `.generate` function of two different sub-models. You can specify keyword + > arguments at two different levels: general arguments that will be passed to both models, or prefixed arguments + > that will be passed to one of them. + > + > For example, calling `.generate(input_features, num_beams=4, speech_do_sample=True)` will successively perform + > beam-search decoding on the text model, and multinomial beam-search sampling on the speech model. + > + > For an overview of generation strategies and code examples, check out the [following + > guide](./generation_strategies). Args: input_features (`torch.FloatTensor` of shape `(batch_size, sequence_length, num_banks)`): @@ -3853,19 +3841,16 @@ def generate( """ Generates translated token ids and/or translated audio waveforms. -- - This method successively calls the `.generate` function of two different sub-models. You can specify keyword - arguments at two different levels: general arguments that will be passed to both models, or prefixed arguments - that will be passed to one of them. - - For example, calling `.generate(input_ids=input_ids, num_beams=4, speech_do_sample=True)` will successively - perform beam-search decoding on the text model, and multinomial beam-search sampling on the speech model. - - For an overview of generation strategies and code examples, check out the [following - guide](./generation_strategies). - - + > [!TIP] + > This method successively calls the `.generate` function of two different sub-models. You can specify keyword + > arguments at two different levels: general arguments that will be passed to both models, or prefixed arguments + > that will be passed to one of them. + > + > For example, calling `.generate(input_ids=input_ids, num_beams=4, speech_do_sample=True)` will successively + > perform beam-search decoding on the text model, and multinomial beam-search sampling on the speech model. + > + > For an overview of generation strategies and code examples, check out the [following + > guide](./generation_strategies). Args: diff --git a/src/transformers/models/seamless_m4t/tokenization_seamless_m4t.py b/src/transformers/models/seamless_m4t/tokenization_seamless_m4t.py index fd773316580c..9de5a7b5d798 100644 --- a/src/transformers/models/seamless_m4t/tokenization_seamless_m4t.py +++ b/src/transformers/models/seamless_m4t/tokenization_seamless_m4t.py @@ -71,22 +71,16 @@ class SeamlessM4TTokenizer(PreTrainedTokenizer): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/seamless_m4t/tokenization_seamless_m4t_fast.py b/src/transformers/models/seamless_m4t/tokenization_seamless_m4t_fast.py index 0318336332c3..6f0cd5ec8301 100644 --- a/src/transformers/models/seamless_m4t/tokenization_seamless_m4t_fast.py +++ b/src/transformers/models/seamless_m4t/tokenization_seamless_m4t_fast.py @@ -71,22 +71,16 @@ class SeamlessM4TTokenizerFast(PreTrainedTokenizerFast): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/seamless_m4t_v2/modeling_seamless_m4t_v2.py b/src/transformers/models/seamless_m4t_v2/modeling_seamless_m4t_v2.py index 4836416bced6..6fcfac5dfd3a 100644 --- a/src/transformers/models/seamless_m4t_v2/modeling_seamless_m4t_v2.py +++ b/src/transformers/models/seamless_m4t_v2/modeling_seamless_m4t_v2.py @@ -2818,16 +2818,13 @@ def generate( """ Generates sequences of token ids. -- - Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the - model's default generation configuration. You can override any `generation_config` by passing the corresponding - parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. - - For an overview of generation strategies and code examples, check out the [following - guide](./generation_strategies). - - + > [!WARNING] + > Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the + > model's default generation configuration. You can override any `generation_config` by passing the corresponding + > parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. + > + > For an overview of generation strategies and code examples, check out the [following + > guide](./generation_strategies). Parameters: input_ids (`torch.Tensor` of varying shape depending on the modality, *optional*): @@ -3085,16 +3082,13 @@ def generate( """ Generates sequences of token ids. -- - Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the - model's default generation configuration. You can override any `generation_config` by passing the corresponding - parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. - - For an overview of generation strategies and code examples, check out the [following - guide](./generation_strategies). - - + > [!WARNING] + > Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the + > model's default generation configuration. You can override any `generation_config` by passing the corresponding + > parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. + > + > For an overview of generation strategies and code examples, check out the [following + > guide](./generation_strategies). Parameters: input_features (`torch.FloatTensor` of shape `(batch_size, sequence_length, num_banks)`): @@ -3359,19 +3353,16 @@ def generate( """ Generates translated audio waveforms. -- - This method successively calls the `.generate` function of two different sub-models. You can specify keyword - arguments at two different levels: general arguments that will be passed to both models, or prefixed arguments - that will be passed to one of them. - - For example, calling `.generate(input_ids, num_beams=4, speech_do_sample=True)` will successively perform - beam-search decoding on the text model, and multinomial beam-search sampling on the speech model. - - For an overview of generation strategies and code examples, check out the [following - guide](./generation_strategies). - - + > [!TIP] + > This method successively calls the `.generate` function of two different sub-models. You can specify keyword + > arguments at two different levels: general arguments that will be passed to both models, or prefixed arguments + > that will be passed to one of them. + > + > For example, calling `.generate(input_ids, num_beams=4, speech_do_sample=True)` will successively perform + > beam-search decoding on the text model, and multinomial beam-search sampling on the speech model. + > + > For an overview of generation strategies and code examples, check out the [following + > guide](./generation_strategies). Args: input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`): @@ -3721,19 +3712,16 @@ def generate( """ Generates translated audio waveforms. -- - This method successively calls the `.generate` function of two different sub-models. You can specify keyword - arguments at two different levels: general arguments that will be passed to both models, or prefixed arguments - that will be passed to one of them. - - For example, calling `.generate(input_features, num_beams=4, speech_do_sample=True)` will successively perform - beam-search decoding on the text model, and multinomial beam-search sampling on the speech model. - - For an overview of generation strategies and code examples, check out the [following - guide](./generation_strategies). - - + > [!TIP] + > This method successively calls the `.generate` function of two different sub-models. You can specify keyword + > arguments at two different levels: general arguments that will be passed to both models, or prefixed arguments + > that will be passed to one of them. + > + > For example, calling `.generate(input_features, num_beams=4, speech_do_sample=True)` will successively perform + > beam-search decoding on the text model, and multinomial beam-search sampling on the speech model. + > + > For an overview of generation strategies and code examples, check out the [following + > guide](./generation_strategies). Args: input_features (`torch.FloatTensor` of shape `(batch_size, sequence_length, num_banks)`): @@ -4150,19 +4138,16 @@ def generate( """ Generates translated token ids and/or translated audio waveforms. -- - This method successively calls the `.generate` function of two different sub-models. You can specify keyword - arguments at two different levels: general arguments that will be passed to both models, or prefixed arguments - that will be passed to one of them. - - For example, calling `.generate(input_ids=input_ids, num_beams=4, speech_do_sample=True)` will successively - perform beam-search decoding on the text model, and multinomial beam-search sampling on the speech model. - - For an overview of generation strategies and code examples, check out the [following - guide](./generation_strategies). - - + > [!TIP] + > This method successively calls the `.generate` function of two different sub-models. You can specify keyword + > arguments at two different levels: general arguments that will be passed to both models, or prefixed arguments + > that will be passed to one of them. + > + > For example, calling `.generate(input_ids=input_ids, num_beams=4, speech_do_sample=True)` will successively + > perform beam-search decoding on the text model, and multinomial beam-search sampling on the speech model. + > + > For an overview of generation strategies and code examples, check out the [following + > guide](./generation_strategies). Args: diff --git a/src/transformers/models/speech_to_text/feature_extraction_speech_to_text.py b/src/transformers/models/speech_to_text/feature_extraction_speech_to_text.py index fe6698e9ebec..b77b60ce3fde 100644 --- a/src/transformers/models/speech_to_text/feature_extraction_speech_to_text.py +++ b/src/transformers/models/speech_to_text/feature_extraction_speech_to_text.py @@ -219,12 +219,9 @@ def __call__( [What are attention masks?](../glossary#attention-mask) -- - For Speech2TextTransformer models, `attention_mask` should always be passed for batched inference, to - avoid subtle bugs. - - + > [!TIP] + > For Speech2TextTransformer models, `attention_mask` should always be passed for batched inference, to + > avoid subtle bugs. return_tensors (`str` or [`~utils.TensorType`], *optional*): If set, will return tensors instead of list of python integers. Acceptable values are: diff --git a/src/transformers/models/swin/modeling_swin.py b/src/transformers/models/swin/modeling_swin.py index c9fdc0d7d044..d9b2a1a0ec46 100644 --- a/src/transformers/models/swin/modeling_swin.py +++ b/src/transformers/models/swin/modeling_swin.py @@ -942,12 +942,9 @@ def forward( custom_intro=""" Swin Model with a decoder on top for masked image modeling, as proposed in [SimMIM](https://huggingface.co/papers/2111.09886). -- - Note that we provide a script to pre-train this model on custom data in our [examples - directory](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining). - - + > [!TIP] + > Note that we provide a script to pre-train this model on custom data in our [examples + > directory](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining). """ ) class SwinForMaskedImageModeling(SwinPreTrainedModel): @@ -1056,13 +1053,10 @@ def forward( Swin Model transformer with an image classification head on top (a linear layer on top of the final hidden state of the [CLS] token) e.g. for ImageNet. -- - Note that it's possible to fine-tune Swin on higher resolution images than the ones it has been trained on, by - setting `interpolate_pos_encoding` to `True` in the forward of the model. This will interpolate the pre-trained - position embeddings to the higher resolution. - - + > [!TIP] + > Note that it's possible to fine-tune Swin on higher resolution images than the ones it has been trained on, by + > setting `interpolate_pos_encoding` to `True` in the forward of the model. This will interpolate the pre-trained + > position embeddings to the higher resolution. """ ) class SwinForImageClassification(SwinPreTrainedModel): diff --git a/src/transformers/models/swinv2/modeling_swinv2.py b/src/transformers/models/swinv2/modeling_swinv2.py index 33be714f96b3..7e1c24addac8 100644 --- a/src/transformers/models/swinv2/modeling_swinv2.py +++ b/src/transformers/models/swinv2/modeling_swinv2.py @@ -1019,12 +1019,9 @@ def forward( Swinv2 Model with a decoder on top for masked image modeling, as proposed in [SimMIM](https://huggingface.co/papers/2111.09886). -- - Note that we provide a script to pre-train this model on custom data in our [examples - directory](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining). - - + > [!TIP] + > Note that we provide a script to pre-train this model on custom data in our [examples + > directory](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining). """ ) # Copied from transformers.models.swin.modeling_swin.SwinForMaskedImageModeling with swin->swinv2, base-simmim-window6-192->tiny-patch4-window8-256,SWIN->SWINV2,Swin->Swinv2,192->256 @@ -1134,13 +1131,10 @@ def forward( Swinv2 Model transformer with an image classification head on top (a linear layer on top of the final hidden state of the [CLS] token) e.g. for ImageNet. -- - Note that it's possible to fine-tune SwinV2 on higher resolution images than the ones it has been trained on, by - setting `interpolate_pos_encoding` to `True` in the forward of the model. This will interpolate the pre-trained - position embeddings to the higher resolution. - - + > [!TIP] + > Note that it's possible to fine-tune SwinV2 on higher resolution images than the ones it has been trained on, by + > setting `interpolate_pos_encoding` to `True` in the forward of the model. This will interpolate the pre-trained + > position embeddings to the higher resolution. """ ) # Copied from transformers.models.swin.modeling_swin.SwinForImageClassification with SWIN->SWINV2,Swin->Swinv2,swin->swinv2 diff --git a/src/transformers/models/t5/tokenization_t5.py b/src/transformers/models/t5/tokenization_t5.py index 0a25271345cf..136e7ffe73f4 100644 --- a/src/transformers/models/t5/tokenization_t5.py +++ b/src/transformers/models/t5/tokenization_t5.py @@ -56,12 +56,9 @@ class T5Tokenizer(PreTrainedTokenizer): eos_token (`str`, *optional*, defaults to `""`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. unk_token (`str`, *optional*, defaults to `""`): The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this diff --git a/src/transformers/models/t5/tokenization_t5_fast.py b/src/transformers/models/t5/tokenization_t5_fast.py index bdba1a7928c8..1e9f02dddf2c 100644 --- a/src/transformers/models/t5/tokenization_t5_fast.py +++ b/src/transformers/models/t5/tokenization_t5_fast.py @@ -53,12 +53,9 @@ class T5TokenizerFast(PreTrainedTokenizerFast): eos_token (`str`, *optional*, defaults to `""`): The end of sequence token. - - - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. unk_token (`str`, *optional*, defaults to `""`): The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this diff --git a/src/transformers/models/tapas/tokenization_tapas.py b/src/transformers/models/tapas/tokenization_tapas.py index 7277f562a118..97ddc1ed8c86 100644 --- a/src/transformers/models/tapas/tokenization_tapas.py +++ b/src/transformers/models/tapas/tokenization_tapas.py @@ -653,11 +653,8 @@ def batch_encode_plus( """ Prepare a table and a list of strings for the model. - - - This method is deprecated, `__call__` should be used instead. - - + > [!WARNING] + > This method is deprecated, `__call__` should be used instead. Args: table (`pd.DataFrame`): diff --git a/src/transformers/models/udop/tokenization_udop.py b/src/transformers/models/udop/tokenization_udop.py index a5833333e10a..35bff072e680 100644 --- a/src/transformers/models/udop/tokenization_udop.py +++ b/src/transformers/models/udop/tokenization_udop.py @@ -163,12 +163,9 @@ class UdopTokenizer(PreTrainedTokenizer): eos_token (`str`, *optional*, defaults to `""`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. unk_token (`str`, *optional*, defaults to `""`): The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this @@ -833,11 +830,8 @@ def encode_plus_boxes( """ Tokenize and prepare for the model a sequence or a pair of sequences. - - - This method is deprecated, `__call__` should be used instead. - - + > [!WARNING] + > This method is deprecated, `__call__` should be used instead. Args: text (`str`, `list[str]` or (for non-fast tokenizers) `list[int]`): diff --git a/src/transformers/models/udop/tokenization_udop_fast.py b/src/transformers/models/udop/tokenization_udop_fast.py index 9751f5d65ddf..fbd38f936ab9 100644 --- a/src/transformers/models/udop/tokenization_udop_fast.py +++ b/src/transformers/models/udop/tokenization_udop_fast.py @@ -162,12 +162,9 @@ class UdopTokenizerFast(PreTrainedTokenizerFast): eos_token (`str`, *optional*, defaults to `""`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for @@ -465,11 +462,8 @@ def batch_encode_plus_boxes( """ Tokenize and prepare for the model a list of sequences or a list of pairs of sequences. -- - This method is deprecated, `__call__` should be used instead. - - + > [!WARNING] + > This method is deprecated, `__call__` should be used instead. Args: batch_text_or_text_pairs (`list[str]`, `list[tuple[str, str]]`, `list[list[str]]`, `list[tuple[list[str], list[str]]]`, and for not-fast tokenizers, also `list[list[int]]`, `list[tuple[list[int], list[int]]]`): @@ -812,11 +806,8 @@ def encode_plus_boxes( """ Tokenize and prepare for the model a sequence or a pair of sequences. -- - This method is deprecated, `__call__` should be used instead. - - + > [!WARNING] + > This method is deprecated, `__call__` should be used instead. Args: text (`str`, `list[str]` or (for non-fast tokenizers) `list[int]`): diff --git a/src/transformers/models/vit/modeling_vit.py b/src/transformers/models/vit/modeling_vit.py index 849085bc08b1..6268ca19d4f1 100644 --- a/src/transformers/models/vit/modeling_vit.py +++ b/src/transformers/models/vit/modeling_vit.py @@ -499,12 +499,9 @@ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor: custom_intro=""" ViT Model with a decoder on top for masked image modeling, as proposed in [SimMIM](https://huggingface.co/papers/2111.09886). -- - Note that we provide a script to pre-train this model on custom data in our [examples - directory](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining). - - + > [!TIP] + > Note that we provide a script to pre-train this model on custom data in our [examples + > directory](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining). """ ) class ViTForMaskedImageModeling(ViTPreTrainedModel): @@ -613,13 +610,10 @@ def forward( ViT Model transformer with an image classification head on top (a linear layer on top of the final hidden state of the [CLS] token) e.g. for ImageNet. -- - Note that it's possible to fine-tune ViT on higher resolution images than the ones it has been trained on, by - setting `interpolate_pos_encoding` to `True` in the forward of the model. This will interpolate the pre-trained - position embeddings to the higher resolution. - - + > [!TIP] + > Note that it's possible to fine-tune ViT on higher resolution images than the ones it has been trained on, by + > setting `interpolate_pos_encoding` to `True` in the forward of the model. This will interpolate the pre-trained + > position embeddings to the higher resolution. """ ) class ViTForImageClassification(ViTPreTrainedModel): diff --git a/src/transformers/models/vit_mae/modeling_vit_mae.py b/src/transformers/models/vit_mae/modeling_vit_mae.py index 2db4df13bc95..1e4c30b4f0c8 100755 --- a/src/transformers/models/vit_mae/modeling_vit_mae.py +++ b/src/transformers/models/vit_mae/modeling_vit_mae.py @@ -752,12 +752,9 @@ def forward(self, hidden_states: torch.Tensor, ids_restore: torch.Tensor, interp custom_intro=""" The ViTMAE Model transformer with the decoder on top for self-supervised pre-training. -- - Note that we provide a script to pre-train this model on custom data in our [examples - directory](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining). - - + > [!TIP] + > Note that we provide a script to pre-train this model on custom data in our [examples + > directory](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining). """ ) class ViTMAEForPreTraining(ViTMAEPreTrainedModel): diff --git a/src/transformers/models/vivit/modeling_vivit.py b/src/transformers/models/vivit/modeling_vivit.py index 7170d3ff7de3..f7b2810e78e0 100755 --- a/src/transformers/models/vivit/modeling_vivit.py +++ b/src/transformers/models/vivit/modeling_vivit.py @@ -545,13 +545,10 @@ def forward( ViViT Transformer model with a video classification head on top (a linear layer on top of the final hidden state of the [CLS] token) e.g. for Kinetics-400. -- - Note that it's possible to fine-tune ViT on higher resolution images than the ones it has been trained on, by - setting `interpolate_pos_encoding` to `True` in the forward of the model. This will interpolate the pre-trained - position embeddings to the higher resolution. - - + > [!TIP] + > Note that it's possible to fine-tune ViT on higher resolution images than the ones it has been trained on, by + > setting `interpolate_pos_encoding` to `True` in the forward of the model. This will interpolate the pre-trained + > position embeddings to the higher resolution. """ ) class VivitForVideoClassification(VivitPreTrainedModel): diff --git a/src/transformers/models/wav2vec2/feature_extraction_wav2vec2.py b/src/transformers/models/wav2vec2/feature_extraction_wav2vec2.py index 3b830c314b31..e56a88548b51 100644 --- a/src/transformers/models/wav2vec2/feature_extraction_wav2vec2.py +++ b/src/transformers/models/wav2vec2/feature_extraction_wav2vec2.py @@ -49,18 +49,15 @@ class Wav2Vec2FeatureExtractor(SequenceFeatureExtractor): return_attention_mask (`bool`, *optional*, defaults to `False`): Whether or not [`~Wav2Vec2FeatureExtractor.__call__`] should return `attention_mask`. -- - Wav2Vec2 models that have set `config.feat_extract_norm == "group"`, such as - [wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base-960h), have **not** been trained using - `attention_mask`. For such models, `input_values` should simply be padded with 0 and no `attention_mask` - should be passed. - - For Wav2Vec2 models that have set `config.feat_extract_norm == "layer"`, such as - [wav2vec2-lv60](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self), `attention_mask` should be - passed for batched inference. - - """ + > [!TIP] + > Wav2Vec2 models that have set `config.feat_extract_norm == "group"`, such as + > [wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base-960h), have **not** been trained using + > `attention_mask`. For such models, `input_values` should simply be padded with 0 and no `attention_mask` + > should be passed. + > + > For Wav2Vec2 models that have set `config.feat_extract_norm == "layer"`, such as + > [wav2vec2-lv60](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self), `attention_mask` should be + > passed for batched inference.""" model_input_names = ["input_values", "attention_mask"] @@ -144,18 +141,15 @@ def __call__( [What are attention masks?](../glossary#attention-mask) -- - Wav2Vec2 models that have set `config.feat_extract_norm == "group"`, such as - [wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base-960h), have **not** been trained using - `attention_mask`. For such models, `input_values` should simply be padded with 0 and no - `attention_mask` should be passed. - - For Wav2Vec2 models that have set `config.feat_extract_norm == "layer"`, such as - [wav2vec2-lv60](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self), `attention_mask` should - be passed for batched inference. - - + > [!TIP] + > Wav2Vec2 models that have set `config.feat_extract_norm == "group"`, such as + > [wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base-960h), have **not** been trained using + > `attention_mask`. For such models, `input_values` should simply be padded with 0 and no + > `attention_mask` should be passed. + > + > For Wav2Vec2 models that have set `config.feat_extract_norm == "layer"`, such as + > [wav2vec2-lv60](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self), `attention_mask` should + > be passed for batched inference. return_tensors (`str` or [`~utils.TensorType`], *optional*): If set, will return tensors instead of list of python integers. Acceptable values are: diff --git a/src/transformers/models/wav2vec2/modeling_wav2vec2.py b/src/transformers/models/wav2vec2/modeling_wav2vec2.py index c517c26288c1..399089257365 100755 --- a/src/transformers/models/wav2vec2/modeling_wav2vec2.py +++ b/src/transformers/models/wav2vec2/modeling_wav2vec2.py @@ -1168,23 +1168,17 @@ def load_adapter(self, target_lang: str, force_load=True, **kwargs): git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git. -- - To test a pull request you made on the Hub, you can pass `revision="refs/pr/ + > [!TIP] + > To test a pull request you made on the Hub, you can pass `revision="refs/pr/"`. - - "`. mirror (`str`, *optional*): Mirror source to accelerate downloads in China. If you are from China and have an accessibility problem, you can set this option to resolve it. Note that we do not guarantee the timeliness or safety. Please refer to the mirror site for more information. - - - Activate the special ["offline-mode"](https://huggingface.co/transformers/installation.html#offline-mode) to - use this method in a firewalled environment. - - + > [!TIP] + > Activate the special ["offline-mode"](https://huggingface.co/transformers/installation.html#offline-mode) to + > use this method in a firewalled environment. Examples: diff --git a/src/transformers/models/wav2vec2/tokenization_wav2vec2.py b/src/transformers/models/wav2vec2/tokenization_wav2vec2.py index e9f9ce04b1ba..613c85c5e641 100644 --- a/src/transformers/models/wav2vec2/tokenization_wav2vec2.py +++ b/src/transformers/models/wav2vec2/tokenization_wav2vec2.py @@ -469,25 +469,19 @@ def batch_decode( Whether or not to output character offsets. Character offsets can be used in combination with the sampling rate and model downsampling rate to compute the time-stamps of transcribed characters. -- - Please take a look at the Example of [`~Wav2Vec2CTCTokenizer.decode`] to better understand how to make - use of `output_char_offsets`. [`~Wav2Vec2CTCTokenizer.batch_decode`] works the same way with batched - output. - - + > [!TIP] + > Please take a look at the Example of [`~Wav2Vec2CTCTokenizer.decode`] to better understand how to make + > use of `output_char_offsets`. [`~Wav2Vec2CTCTokenizer.batch_decode`] works the same way with batched + > output. output_word_offsets (`bool`, *optional*, defaults to `False`): Whether or not to output word offsets. Word offsets can be used in combination with the sampling rate and model downsampling rate to compute the time-stamps of transcribed words. -- - Please take a look at the Example of [`~Wav2Vec2CTCTokenizer.decode`] to better understand how to make - use of `output_word_offsets`. [`~Wav2Vec2CTCTokenizer.batch_decode`] works the same way with batched - output. - - + > [!TIP] + > Please take a look at the Example of [`~Wav2Vec2CTCTokenizer.decode`] to better understand how to make + > use of `output_word_offsets`. [`~Wav2Vec2CTCTokenizer.batch_decode`] works the same way with batched + > output. kwargs (additional keyword arguments, *optional*): Will be passed to the underlying model specific decode method. @@ -542,21 +536,15 @@ def decode( Whether or not to output character offsets. Character offsets can be used in combination with the sampling rate and model downsampling rate to compute the time-stamps of transcribed characters. -- - Please take a look at the example below to better understand how to make use of `output_char_offsets`. - - + > [!TIP] + > Please take a look at the example below to better understand how to make use of `output_char_offsets`. output_word_offsets (`bool`, *optional*, defaults to `False`): Whether or not to output word offsets. Word offsets can be used in combination with the sampling rate and model downsampling rate to compute the time-stamps of transcribed words. -- - Please take a look at the example below to better understand how to make use of `output_word_offsets`. - - + > [!TIP] + > Please take a look at the example below to better understand how to make use of `output_word_offsets`. kwargs (additional keyword arguments, *optional*): Will be passed to the underlying model specific decode method. @@ -665,18 +653,15 @@ class Wav2Vec2Tokenizer(PreTrainedTokenizer): return_attention_mask (`bool`, *optional*, defaults to `False`): Whether or not [`~Wav2Vec2Tokenizer.__call__`] should return `attention_mask`. -- - Wav2Vec2 models that have set `config.feat_extract_norm == "group"`, such as - [wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base-960h), have **not** been trained using - `attention_mask`. For such models, `input_values` should simply be padded with 0 and no `attention_mask` - should be passed. - - For Wav2Vec2 models that have set `config.feat_extract_norm == "layer"`, such as - [wav2vec2-lv60](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self), `attention_mask` should be - passed for batched inference. - - + > [!TIP] + > Wav2Vec2 models that have set `config.feat_extract_norm == "group"`, such as + > [wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base-960h), have **not** been trained using + > `attention_mask`. For such models, `input_values` should simply be padded with 0 and no `attention_mask` + > should be passed. + > + > For Wav2Vec2 models that have set `config.feat_extract_norm == "layer"`, such as + > [wav2vec2-lv60](https://huggingface.co/facebook/wav2vec2-large-960h-lv60-self), `attention_mask` should be + > passed for batched inference. **kwargs Additional keyword arguments passed along to [`PreTrainedTokenizer`] diff --git a/src/transformers/models/wav2vec2_phoneme/tokenization_wav2vec2_phoneme.py b/src/transformers/models/wav2vec2_phoneme/tokenization_wav2vec2_phoneme.py index c819e63fd6cf..8b7bff2552dd 100644 --- a/src/transformers/models/wav2vec2_phoneme/tokenization_wav2vec2_phoneme.py +++ b/src/transformers/models/wav2vec2_phoneme/tokenization_wav2vec2_phoneme.py @@ -468,14 +468,11 @@ def decode( Whether or not to output character offsets. Character offsets can be used in combination with the sampling rate and model downsampling rate to compute the time-stamps of transcribed characters. -- - Please take a look at the Example of [`~models.wav2vec2.tokenization_wav2vec2.decode`] to better - understand how to make use of `output_word_offsets`. - [`~model.wav2vec2_phoneme.tokenization_wav2vec2_phoneme.batch_decode`] works the same way with - phonemes. - - + > [!TIP] + > Please take a look at the Example of [`~models.wav2vec2.tokenization_wav2vec2.decode`] to better + > understand how to make use of `output_word_offsets`. + > [`~model.wav2vec2_phoneme.tokenization_wav2vec2_phoneme.batch_decode`] works the same way with + > phonemes. kwargs (additional keyword arguments, *optional*): Will be passed to the underlying model specific decode method. @@ -521,14 +518,11 @@ def batch_decode( Whether or not to output character offsets. Character offsets can be used in combination with the sampling rate and model downsampling rate to compute the time-stamps of transcribed characters. -- - Please take a look at the Example of [`~models.wav2vec2.tokenization_wav2vec2.decode`] to better - understand how to make use of `output_word_offsets`. - [`~model.wav2vec2_phoneme.tokenization_wav2vec2_phoneme.batch_decode`] works analogous with phonemes - and batched output. - - + > [!TIP] + > Please take a look at the Example of [`~models.wav2vec2.tokenization_wav2vec2.decode`] to better + > understand how to make use of `output_word_offsets`. + > [`~model.wav2vec2_phoneme.tokenization_wav2vec2_phoneme.batch_decode`] works analogous with phonemes + > and batched output. kwargs (additional keyword arguments, *optional*): Will be passed to the underlying model specific decode method. diff --git a/src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py b/src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py index beb22ca86749..6a174308a2f7 100644 --- a/src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py +++ b/src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py @@ -122,16 +122,13 @@ def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): r""" Instantiate a [`Wav2Vec2ProcessorWithLM`] from a pretrained Wav2Vec2 processor. -- - This class method is simply calling the feature extractor's - [`~feature_extraction_utils.FeatureExtractionMixin.from_pretrained`], Wav2Vec2CTCTokenizer's - [`~tokenization_utils_base.PreTrainedTokenizerBase.from_pretrained`], and - [`pyctcdecode.BeamSearchDecoderCTC.load_from_hf_hub`]. - - Please refer to the docstrings of the methods above for more information. - - + > [!TIP] + > This class method is simply calling the feature extractor's + > [`~feature_extraction_utils.FeatureExtractionMixin.from_pretrained`], Wav2Vec2CTCTokenizer's + > [`~tokenization_utils_base.PreTrainedTokenizerBase.from_pretrained`], and + > [`pyctcdecode.BeamSearchDecoderCTC.load_from_hf_hub`]. + > + > Please refer to the docstrings of the methods above for more information. Args: pretrained_model_name_or_path (`str` or `os.PathLike`): @@ -309,15 +306,12 @@ def batch_decode( """ Batch decode output logits to audio transcription with language model support. -- - This function makes use of Python's multiprocessing. Currently, multiprocessing is available only on Unix - systems (see this [issue](https://github.com/kensho-technologies/pyctcdecode/issues/65)). - - If you are decoding multiple batches, consider creating a `Pool` and passing it to `batch_decode`. Otherwise, - `batch_decode` will be very slow since it will create a fresh `Pool` for each call. See usage example below. - - + > [!TIP] + > This function makes use of Python's multiprocessing. Currently, multiprocessing is available only on Unix + > systems (see this [issue](https://github.com/kensho-technologies/pyctcdecode/issues/65)). + > + > If you are decoding multiple batches, consider creating a `Pool` and passing it to `batch_decode`. Otherwise, + > `batch_decode` will be very slow since it will create a fresh `Pool` for each call. See usage example below. Args: logits (`np.ndarray`): @@ -327,12 +321,9 @@ def batch_decode( should be instantiated *after* `Wav2Vec2ProcessorWithLM`. Otherwise, the LM won't be available to the pool's sub-processes. -- - Currently, only pools created with a 'fork' context can be used. If a 'spawn' pool is passed, it will - be ignored and sequential decoding will be used instead. - - + > [!TIP] + > Currently, only pools created with a 'fork' context can be used. If a 'spawn' pool is passed, it will + > be ignored and sequential decoding will be used instead. num_processes (`int`, *optional*): If `pool` is not set, number of processes on which the function should be parallelized over. Defaults @@ -365,13 +356,10 @@ def batch_decode( lists of floats, where the length of the outer list will correspond to the batch size and the length of the inner list will correspond to the number of returned hypotheses . The value should be >= 1. -- - Please take a look at the Example of [`~Wav2Vec2ProcessorWithLM.decode`] to better understand how to - make use of `output_word_offsets`. [`~Wav2Vec2ProcessorWithLM.batch_decode`] works the same way with - batched output. - - + > [!TIP] + > Please take a look at the Example of [`~Wav2Vec2ProcessorWithLM.decode`] to better understand how to + > make use of `output_word_offsets`. [`~Wav2Vec2ProcessorWithLM.batch_decode`] works the same way with + > batched output. Returns: [`~models.wav2vec2.Wav2Vec2DecoderWithLMOutput`]. @@ -523,11 +511,8 @@ def decode( of strings, `logit_score` will be a list of floats, and `lm_score` will be a list of floats, where the length of these lists will correspond to the number of returned hypotheses. The value should be >= 1. -- - Please take a look at the example below to better understand how to make use of `output_word_offsets`. - - + > [!TIP] + > Please take a look at the example below to better understand how to make use of `output_word_offsets`. Returns: [`~models.wav2vec2.Wav2Vec2DecoderWithLMOutput`]. diff --git a/src/transformers/models/whisper/feature_extraction_whisper.py b/src/transformers/models/whisper/feature_extraction_whisper.py index e11895191f95..f49204f06eff 100644 --- a/src/transformers/models/whisper/feature_extraction_whisper.py +++ b/src/transformers/models/whisper/feature_extraction_whisper.py @@ -226,12 +226,9 @@ def __call__( [What are attention masks?](../glossary#attention-mask) -- - For Whisper models, `attention_mask` should always be passed for batched inference, to avoid subtle - bugs. - - + > [!TIP] + > For Whisper models, `attention_mask` should always be passed for batched inference, to avoid subtle + > bugs. return_tensors (`str` or [`~utils.TensorType`], *optional*): If set, will return tensors instead of list of python integers. Acceptable values are: diff --git a/src/transformers/models/whisper/generation_whisper.py b/src/transformers/models/whisper/generation_whisper.py index 9c4f0f6e1d63..75011dd5647a 100644 --- a/src/transformers/models/whisper/generation_whisper.py +++ b/src/transformers/models/whisper/generation_whisper.py @@ -416,16 +416,13 @@ def generate( """ Transcribes or translates log-mel input features to a sequence of auto-regressively generated token ids. -- - Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the - model's default generation configuration. You can override any `generation_config` by passing the corresponding - parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. - - For an overview of generation strategies and code examples, check out the [following - guide](./generation_strategies). - - + > [!WARNING] + > Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the + > model's default generation configuration. You can override any `generation_config` by passing the corresponding + > parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`. + > + > For an overview of generation strategies and code examples, check out the [following + > guide](./generation_strategies). Parameters: input_features (`torch.Tensor` of shape `(batch_size, feature_size, sequence_length)`, *optional*): diff --git a/src/transformers/models/xglm/tokenization_xglm.py b/src/transformers/models/xglm/tokenization_xglm.py index 9e0a8706683f..090bbaa89f1d 100644 --- a/src/transformers/models/xglm/tokenization_xglm.py +++ b/src/transformers/models/xglm/tokenization_xglm.py @@ -47,22 +47,16 @@ class XGLMTokenizer(PreTrainedTokenizer): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/xglm/tokenization_xglm_fast.py b/src/transformers/models/xglm/tokenization_xglm_fast.py index a9c8b3aac257..82d0b50cf3c3 100644 --- a/src/transformers/models/xglm/tokenization_xglm_fast.py +++ b/src/transformers/models/xglm/tokenization_xglm_fast.py @@ -48,22 +48,16 @@ class XGLMTokenizerFast(PreTrainedTokenizerFast): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/xlm/modeling_xlm.py b/src/transformers/models/xlm/modeling_xlm.py index 6fd21d0490de..c5fd12212203 100755 --- a/src/transformers/models/xlm/modeling_xlm.py +++ b/src/transformers/models/xlm/modeling_xlm.py @@ -184,12 +184,9 @@ def forward( Mask for tokens at invalid position, such as query and special symbols (PAD, SEP, CLS). 1.0 means token should be masked. -- - One of `start_states` or `start_positions` should be not `None`. If both are set, `start_positions` overrides - `start_states`. - - + > [!TIP] + > One of `start_states` or `start_positions` should be not `None`. If both are set, `start_positions` overrides + > `start_states`. Returns: `torch.FloatTensor`: The end logits for SQuAD. @@ -250,12 +247,9 @@ def forward( cls_index (`torch.LongTensor` of shape `(batch_size,)`, *optional*): Position of the CLS token for each sentence in the batch. If `None`, takes the last token. -- - One of `start_states` or `start_positions` should be not `None`. If both are set, `start_positions` overrides - `start_states`. - - + > [!TIP] + > One of `start_states` or `start_positions` should be not `None`. If both are set, `start_positions` overrides + > `start_states`. Returns: `torch.FloatTensor`: The SQuAD 2.0 answer class. diff --git a/src/transformers/models/xlm/tokenization_xlm.py b/src/transformers/models/xlm/tokenization_xlm.py index 8c4471a38436..5bf7500f53e5 100644 --- a/src/transformers/models/xlm/tokenization_xlm.py +++ b/src/transformers/models/xlm/tokenization_xlm.py @@ -160,12 +160,9 @@ class XLMTokenizer(PreTrainedTokenizer): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/xlm_roberta/tokenization_xlm_roberta.py b/src/transformers/models/xlm_roberta/tokenization_xlm_roberta.py index 149a09f5ed61..650ea33cb695 100644 --- a/src/transformers/models/xlm_roberta/tokenization_xlm_roberta.py +++ b/src/transformers/models/xlm_roberta/tokenization_xlm_roberta.py @@ -47,22 +47,16 @@ class XLMRobertaTokenizer(PreTrainedTokenizer): bos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. sep_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/xlm_roberta/tokenization_xlm_roberta_fast.py b/src/transformers/models/xlm_roberta/tokenization_xlm_roberta_fast.py index bcdea2325fc1..0155f07cda40 100644 --- a/src/transformers/models/xlm_roberta/tokenization_xlm_roberta_fast.py +++ b/src/transformers/models/xlm_roberta/tokenization_xlm_roberta_fast.py @@ -49,22 +49,16 @@ class XLMRobertaTokenizerFast(PreTrainedTokenizerFast): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. sep_token (`str`, *optional*, defaults to `""`): The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for diff --git a/src/transformers/models/xlnet/configuration_xlnet.py b/src/transformers/models/xlnet/configuration_xlnet.py index d32f05c875bb..7a371afd8a5b 100644 --- a/src/transformers/models/xlnet/configuration_xlnet.py +++ b/src/transformers/models/xlnet/configuration_xlnet.py @@ -105,16 +105,13 @@ class XLNetConfig(PretrainedConfig): use_mems_train (`bool`, *optional*, defaults to `False`): Whether or not the model should make use of the recurrent memory mechanism in train mode. -- - For pretraining, it is recommended to set `use_mems_train` to `True`. For fine-tuning, it is recommended to - set `use_mems_train` to `False` as discussed - [here](https://github.com/zihangdai/xlnet/issues/41#issuecomment-505102587). If `use_mems_train` is set to - `True`, one has to make sure that the train batches are correctly pre-processed, *e.g.* `batch_1 = [[This - line is], [This is the]]` and `batch_2 = [[ the first line], [ second line]]` and that all batches are of - equal size. - - + > [!TIP] + > For pretraining, it is recommended to set `use_mems_train` to `True`. For fine-tuning, it is recommended to + > set `use_mems_train` to `False` as discussed + > [here](https://github.com/zihangdai/xlnet/issues/41#issuecomment-505102587). If `use_mems_train` is set to + > `True`, one has to make sure that the train batches are correctly pre-processed, *e.g.* `batch_1 = [[This + > line is], [This is the]]` and `batch_2 = [[ the first line], [ second line]]` and that all batches are of + > equal size. Examples: diff --git a/src/transformers/models/xlnet/modeling_xlnet.py b/src/transformers/models/xlnet/modeling_xlnet.py index 48fb1b41a61f..95acab50f504 100755 --- a/src/transformers/models/xlnet/modeling_xlnet.py +++ b/src/transformers/models/xlnet/modeling_xlnet.py @@ -433,12 +433,9 @@ def forward( Mask for tokens at invalid position, such as query and special symbols (PAD, SEP, CLS). 1.0 means token should be masked. -- - One of `start_states` or `start_positions` should be not `None`. If both are set, `start_positions` overrides - `start_states`. - - + > [!TIP] + > One of `start_states` or `start_positions` should be not `None`. If both are set, `start_positions` overrides + > `start_states`. Returns: `torch.FloatTensor`: The end logits for SQuAD. @@ -500,12 +497,9 @@ def forward( cls_index (`torch.LongTensor` of shape `(batch_size,)`, *optional*): Position of the CLS token for each sentence in the batch. If `None`, takes the last token. -- - One of `start_states` or `start_positions` should be not `None`. If both are set, `start_positions` overrides - `start_states`. - - + > [!TIP] + > One of `start_states` or `start_positions` should be not `None`. If both are set, `start_positions` overrides + > `start_states`. Returns: `torch.FloatTensor`: The SQuAD 2.0 answer class. diff --git a/src/transformers/models/xlnet/tokenization_xlnet.py b/src/transformers/models/xlnet/tokenization_xlnet.py index 9186db33d788..9477ca9ff4b6 100644 --- a/src/transformers/models/xlnet/tokenization_xlnet.py +++ b/src/transformers/models/xlnet/tokenization_xlnet.py @@ -60,22 +60,16 @@ class XLNetTokenizer(PreTrainedTokenizer): bos_token (`str`, *optional*, defaults to `""`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. unk_token (`str`, *optional*, defaults to `""`): The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this diff --git a/src/transformers/models/xlnet/tokenization_xlnet_fast.py b/src/transformers/models/xlnet/tokenization_xlnet_fast.py index 56cd2a50e1b2..ca6af63eb12b 100644 --- a/src/transformers/models/xlnet/tokenization_xlnet_fast.py +++ b/src/transformers/models/xlnet/tokenization_xlnet_fast.py @@ -65,22 +65,16 @@ class XLNetTokenizerFast(PreTrainedTokenizerFast): bos_token (`str`, *optional*, defaults to `" "`): The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. -"`): The end of sequence token. -- - When building a sequence using special tokens, this is not the token that is used for the beginning of - sequence. The token used is the `cls_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the beginning of + > sequence. The token used is the `cls_token`. eos_token (`str`, *optional*, defaults to `"- - When building a sequence using special tokens, this is not the token that is used for the end of sequence. - The token used is the `sep_token`. - - + > [!TIP] + > When building a sequence using special tokens, this is not the token that is used for the end of sequence. + > The token used is the `sep_token`. unk_token (`str`, *optional*, defaults to `""`): The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this diff --git a/src/transformers/pipelines/__init__.py b/src/transformers/pipelines/__init__.py index a029bb32df03..7a55d1599246 100755 --- a/src/transformers/pipelines/__init__.py +++ b/src/transformers/pipelines/__init__.py @@ -540,12 +540,11 @@ def pipeline( - A [model](model) that generates predictions from the inputs. - Optional post-processing steps to refine the model's output, which can also be handled by processors. - - While there are such optional arguments as `tokenizer`, `feature_extractor`, `image_processor`, and `processor`, - they shouldn't be specified all at once. If these components are not provided, `pipeline` will try to load - required ones automatically. In case you want to provide these components explicitly, please refer to a - specific pipeline in order to get more details regarding what components are required. - + > [!TIP] + > While there are such optional arguments as `tokenizer`, `feature_extractor`, `image_processor`, and `processor`, + > they shouldn't be specified all at once. If these components are not provided, `pipeline` will try to load + > required ones automatically. In case you want to provide these components explicitly, please refer to a + > specific pipeline in order to get more details regarding what components are required. Args: task (`str`): @@ -652,11 +651,8 @@ def pipeline( [here](https://huggingface.co/docs/accelerate/main/en/package_reference/big_modeling#accelerate.cpu_offload) for more information). -- - Do not use `device_map` AND `device` at the same time as they will conflict - - + > [!WARNING] + > Do not use `device_map` AND `device` at the same time as they will conflict dtype (`str` or `torch.dtype`, *optional*): Sent directly as `model_kwargs` (just a simpler shortcut) to use the available precision for this model diff --git a/src/transformers/pipelines/automatic_speech_recognition.py b/src/transformers/pipelines/automatic_speech_recognition.py index 1f3c21526169..53e61461c90d 100644 --- a/src/transformers/pipelines/automatic_speech_recognition.py +++ b/src/transformers/pipelines/automatic_speech_recognition.py @@ -149,24 +149,18 @@ class AutomaticSpeechRecognitionPipeline(ChunkPipeline): chunk_length_s (`float`, *optional*, defaults to 0): The input length for in each chunk. If `chunk_length_s = 0` then chunking is disabled (default). -- - For more information on how to effectively use `chunk_length_s`, please have a look at the [ASR chunking - blog post](https://huggingface.co/blog/asr-chunking). - - + > [!TIP] + > For more information on how to effectively use `chunk_length_s`, please have a look at the [ASR chunking + > blog post](https://huggingface.co/blog/asr-chunking). stride_length_s (`float`, *optional*, defaults to `chunk_length_s / 6`): The length of stride on the left and right of each chunk. Used only with `chunk_length_s > 0`. This enables the model to *see* more context and infer letters better than without this context but the pipeline discards the stride bits at the end to make the final reconstitution as perfect as possible. -- - For more information on how to effectively use `stride_length_s`, please have a look at the [ASR chunking - blog post](https://huggingface.co/blog/asr-chunking). - - + > [!TIP] + > For more information on how to effectively use `stride_length_s`, please have a look at the [ASR chunking + > blog post](https://huggingface.co/blog/asr-chunking). device (Union[`int`, `torch.device`], *optional*): Device ordinal for CPU/GPU supports. Setting this to `None` will leverage CPU, a positive will run the diff --git a/src/transformers/pipelines/fill_mask.py b/src/transformers/pipelines/fill_mask.py index 11810bc2bea3..10d7f7f0a1ad 100644 --- a/src/transformers/pipelines/fill_mask.py +++ b/src/transformers/pipelines/fill_mask.py @@ -54,31 +54,24 @@ class FillMaskPipeline(Pipeline): which includes the bi-directional models in the library. See the up-to-date list of available models on [huggingface.co/models](https://huggingface.co/models?filter=fill-mask). -- - This pipeline only works for inputs with exactly one token masked. Experimental: We added support for multiple - masks. The returned values are raw model output, and correspond to disjoint probabilities where one might expect - joint probabilities (See [discussion](https://github.com/huggingface/transformers/pull/10222)). - - - -- - This pipeline now supports tokenizer_kwargs. For example try: - - ```python - >>> from transformers import pipeline - - >>> fill_masker = pipeline(model="google-bert/bert-base-uncased") - >>> tokenizer_kwargs = {"truncation": True} - >>> fill_masker( - ... "This is a simple [MASK]. " + "...with a large amount of repeated text appended. " * 100, - ... tokenizer_kwargs=tokenizer_kwargs, - ... ) - ``` - - - + > [!TIP] + > This pipeline only works for inputs with exactly one token masked. Experimental: We added support for multiple + > masks. The returned values are raw model output, and correspond to disjoint probabilities where one might expect + > joint probabilities (See [discussion](https://github.com/huggingface/transformers/pull/10222)). + + > [!TIP] + > This pipeline now supports tokenizer_kwargs. For example try: + > + > ```python + > >>> from transformers import pipeline + > + > >>> fill_masker = pipeline(model="google-bert/bert-base-uncased") + > >>> tokenizer_kwargs = {"truncation": True} + > >>> fill_masker( + > ... "This is a simple [MASK]. " + "...with a large amount of repeated text appended. " * 100, + > ... tokenizer_kwargs=tokenizer_kwargs, + > ... ) + > ``` """ diff --git a/src/transformers/pipelines/text_to_audio.py b/src/transformers/pipelines/text_to_audio.py index d43695b37399..c08aab16231b 100644 --- a/src/transformers/pipelines/text_to_audio.py +++ b/src/transformers/pipelines/text_to_audio.py @@ -50,29 +50,26 @@ class TextToAudioPipeline(Pipeline): Learn more about the basics of using a pipeline in the [pipeline tutorial](../pipeline_tutorial) -- - You can specify parameters passed to the model by using [`TextToAudioPipeline.__call__.forward_params`] or - [`TextToAudioPipeline.__call__.generate_kwargs`]. - - Example: - - ```python - >>> from transformers import pipeline - - >>> music_generator = pipeline(task="text-to-audio", model="facebook/musicgen-small") - - >>> # diversify the music generation by adding randomness with a high temperature and set a maximum music length - >>> generate_kwargs = { - ... "do_sample": True, - ... "temperature": 0.7, - ... "max_new_tokens": 35, - ... } - - >>> outputs = music_generator("Techno music with high melodic riffs", generate_kwargs=generate_kwargs) - ``` - - + > [!TIP] + > You can specify parameters passed to the model by using [`TextToAudioPipeline.__call__.forward_params`] or + > [`TextToAudioPipeline.__call__.generate_kwargs`]. + > + > Example: + > + > ```python + > >>> from transformers import pipeline + > + > >>> music_generator = pipeline(task="text-to-audio", model="facebook/musicgen-small") + > + > >>> # diversify the music generation by adding randomness with a high temperature and set a maximum music length + > >>> generate_kwargs = { + > ... "do_sample": True, + > ... "temperature": 0.7, + > ... "max_new_tokens": 35, + > ... } + > + > >>> outputs = music_generator("Techno music with high melodic riffs", generate_kwargs=generate_kwargs) + > ``` This pipeline can currently be loaded from [`pipeline`] using the following task identifiers: `"text-to-speech"` or `"text-to-audio"`. diff --git a/src/transformers/pipelines/zero_shot_audio_classification.py b/src/transformers/pipelines/zero_shot_audio_classification.py index 7d5e36e5dd08..0c74d7c940eb 100644 --- a/src/transformers/pipelines/zero_shot_audio_classification.py +++ b/src/transformers/pipelines/zero_shot_audio_classification.py @@ -35,11 +35,8 @@ class ZeroShotAudioClassificationPipeline(Pipeline): Zero shot audio classification pipeline using `ClapModel`. This pipeline predicts the class of an audio when you provide an audio and a set of `candidate_labels`. -- - The default `hypothesis_template` is : `"This is a sound of {}."`. Make sure you update it for your usage. - - + > [!WARNING] + > The default `hypothesis_template` is : `"This is a sound of {}."`. Make sure you update it for your usage. Example: ```python diff --git a/src/transformers/processing_utils.py b/src/transformers/processing_utils.py index 5f3f455662e3..a60bec885907 100644 --- a/src/transformers/processing_utils.py +++ b/src/transformers/processing_utils.py @@ -715,13 +715,10 @@ def save_pretrained(self, save_directory, push_to_hub: bool = False, legacy_seri Saves the attributes of this processor (feature extractor, tokenizer...) in the specified directory so that it can be reloaded using the [`~ProcessorMixin.from_pretrained`] method. -- - This class method is simply calling [`~feature_extraction_utils.FeatureExtractionMixin.save_pretrained`] and - [`~tokenization_utils_base.PreTrainedTokenizerBase.save_pretrained`]. Please refer to the docstrings of the - methods above for more information. - - + > [!TIP] + > This class method is simply calling [`~feature_extraction_utils.FeatureExtractionMixin.save_pretrained`] and + > [`~tokenization_utils_base.PreTrainedTokenizerBase.save_pretrained`]. Please refer to the docstrings of the + > methods above for more information. Args: save_directory (`str` or `os.PathLike`): @@ -1344,15 +1341,12 @@ def from_pretrained( r""" Instantiate a processor associated with a pretrained model. -- - This class method is simply calling the feature extractor - [`~feature_extraction_utils.FeatureExtractionMixin.from_pretrained`], image processor - [`~image_processing_utils.ImageProcessingMixin`] and the tokenizer - [`~tokenization_utils_base.PreTrainedTokenizer.from_pretrained`] methods. Please refer to the docstrings of the - methods above for more information. - - + > [!TIP] + > This class method is simply calling the feature extractor + > [`~feature_extraction_utils.FeatureExtractionMixin.from_pretrained`], image processor + > [`~image_processing_utils.ImageProcessingMixin`] and the tokenizer + > [`~tokenization_utils_base.PreTrainedTokenizer.from_pretrained`] methods. Please refer to the docstrings of the + > methods above for more information. Args: pretrained_model_name_or_path (`str` or `os.PathLike`): diff --git a/src/transformers/tokenization_mistral_common.py b/src/transformers/tokenization_mistral_common.py index d8ea3688efae..db85177d2d2c 100644 --- a/src/transformers/tokenization_mistral_common.py +++ b/src/transformers/tokenization_mistral_common.py @@ -1150,13 +1150,10 @@ def pad( Padding side (left/right) padding token ids are defined at the tokenizer level (with `self.padding_side`, `self.pad_token_id`). -- - If the `encoded_inputs` passed are dictionary of numpy arrays, PyTorch tensors, the - result will use the same type unless you provide a different tensor type with `return_tensors`. In the case of - PyTorch tensors, you will lose the specific device of your tensors however. - - + > [!TIP] + > If the `encoded_inputs` passed are dictionary of numpy arrays, PyTorch tensors, the + > result will use the same type unless you provide a different tensor type with `return_tensors`. In the case of + > PyTorch tensors, you will lose the specific device of your tensors however. Args: encoded_inputs ([`BatchEncoding`], list of [`BatchEncoding`], `Dict[str, List[int]]`, `Dict[str, List[List[int]]` or `List[Dict[str, List[int]]]`): diff --git a/src/transformers/tokenization_utils.py b/src/transformers/tokenization_utils.py index b89e57093152..6a919fd07211 100644 --- a/src/transformers/tokenization_utils.py +++ b/src/transformers/tokenization_utils.py @@ -599,12 +599,9 @@ def num_special_tokens_to_add(self, pair: bool = False) -> int: """ Returns the number of added tokens when encoding a sequence with special tokens. -- - This encodes a dummy input and checks the number of added tokens, and is therefore not efficient. Do not put - this inside your training loop. - - + > [!TIP] + > This encodes a dummy input and checks the number of added tokens, and is therefore not efficient. Do not put + > this inside your training loop. Args: pair (`bool`, *optional*, defaults to `False`): diff --git a/src/transformers/tokenization_utils_base.py b/src/transformers/tokenization_utils_base.py index 74550cb0f6ab..d3bd45776850 100644 --- a/src/transformers/tokenization_utils_base.py +++ b/src/transformers/tokenization_utils_base.py @@ -1855,11 +1855,8 @@ def from_pretrained( `eos_token`, `unk_token`, `sep_token`, `pad_token`, `cls_token`, `mask_token`, `additional_special_tokens`. See parameters in the `__init__` for more details. -- - Passing `token=True` is required when you want to use a private model. - - + > [!TIP] + > Passing `token=True` is required when you want to use a private model. Examples: @@ -3051,11 +3048,8 @@ def encode_plus( """ Tokenize and prepare for the model a sequence or a pair of sequences. -- - This method is deprecated, `__call__` should be used instead. - - + > [!WARNING] + > This method is deprecated, `__call__` should be used instead. Args: text (`str`, `list[str]` or (for non-fast tokenizers) `list[int]`): @@ -3158,11 +3152,8 @@ def batch_encode_plus( """ Tokenize and prepare for the model a list of sequences or a list of pairs of sequences. -- - This method is deprecated, `__call__` should be used instead. - - + > [!WARNING] + > This method is deprecated, `__call__` should be used instead. Args: batch_text_or_text_pairs (`list[str]`, `list[tuple[str, str]]`, `list[list[str]]`, `list[tuple[list[str], list[str]]]`, and for not-fast tokenizers, also `list[list[int]]`, `list[tuple[list[int], list[int]]]`): @@ -3261,13 +3252,10 @@ def pad( Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. -- - If the `encoded_inputs` passed are dictionary of numpy arrays, or PyTorch tensors, the - result will use the same type unless you provide a different tensor type with `return_tensors`. In the case of - PyTorch tensors, you will lose the specific device of your tensors however. - - + > [!TIP] + > If the `encoded_inputs` passed are dictionary of numpy arrays, or PyTorch tensors, the + > result will use the same type unless you provide a different tensor type with `return_tensors`. In the case of + > PyTorch tensors, you will lose the specific device of your tensors however. Args: encoded_inputs ([`BatchEncoding`], list of [`BatchEncoding`], `dict[str, list[int]]`, `dict[str, list[list[int]]` or `list[dict[str, list[int]]]`): diff --git a/src/transformers/tokenization_utils_fast.py b/src/transformers/tokenization_utils_fast.py index fe4873d61b37..9bc4e6fb62dc 100644 --- a/src/transformers/tokenization_utils_fast.py +++ b/src/transformers/tokenization_utils_fast.py @@ -386,12 +386,9 @@ def num_special_tokens_to_add(self, pair: bool = False) -> int: """ Returns the number of added tokens when encoding a sequence with special tokens. -- - This encodes a dummy input and checks the number of added tokens, and is therefore not efficient. Do not put - this inside your training loop. - - + > [!TIP] + > This encodes a dummy input and checks the number of added tokens, and is therefore not efficient. Do not put + > this inside your training loop. Args: pair (`bool`, *optional*, defaults to `False`): diff --git a/src/transformers/trainer.py b/src/transformers/trainer.py index 27adca9c836e..1133ce863418 100755 --- a/src/transformers/trainer.py +++ b/src/transformers/trainer.py @@ -311,13 +311,10 @@ class Trainer: model ([`PreTrainedModel`] or `torch.nn.Module`, *optional*): The model to train, evaluate or use for predictions. If not provided, a `model_init` must be passed. -- - [`Trainer`] is optimized to work with the [`PreTrainedModel`] provided by the library. You can still use - your own models defined as `torch.nn.Module` as long as they work the same way as the 🤗 Transformers - models. - - + > [!TIP] + > [`Trainer`] is optimized to work with the [`PreTrainedModel`] provided by the library. You can still use + > your own models defined as `torch.nn.Module` as long as they work the same way as the 🤗 Transformers + > models. args ([`TrainingArguments`], *optional*): The arguments to tweak for training. Will default to a basic instance of [`TrainingArguments`] with the @@ -3541,14 +3538,11 @@ def hyperparameter_search( by `compute_objective`, which defaults to a function returning the evaluation loss when no metric is provided, the sum of all metrics otherwise. -- - To use this method, you need to have provided a `model_init` when initializing your [`Trainer`]: we need to - reinitialize the model at each new run. This is incompatible with the `optimizers` argument, so you need to - subclass [`Trainer`] and override the method [`~Trainer.create_optimizer_and_scheduler`] for custom - optimizer/scheduler. - - + > [!WARNING] + > To use this method, you need to have provided a `model_init` when initializing your [`Trainer`]: we need to + > reinitialize the model at each new run. This is incompatible with the `optimizers` argument, so you need to + > subclass [`Trainer`] and override the method [`~Trainer.create_optimizer_and_scheduler`] for custom + > optimizer/scheduler. Args: hp_space (`Callable[["optuna.Trial"], dict[str, float]]`, *optional*): @@ -4268,17 +4262,14 @@ def evaluate( evaluate on each dataset, prepending the dictionary key to the metric name. Datasets must implement the `__len__` method. -- - If you pass a dictionary with names of datasets as keys and datasets as values, evaluate will run - separate evaluations on each dataset. This can be useful to monitor how training affects other - datasets or simply to get a more fine-grained evaluation. - When used with `load_best_model_at_end`, make sure `metric_for_best_model` references exactly one - of the datasets. If you, for example, pass in `{"data1": data1, "data2": data2}` for two datasets - `data1` and `data2`, you could specify `metric_for_best_model="eval_data1_loss"` for using the - loss on `data1` and `metric_for_best_model="eval_data2_loss"` for the loss on `data2`. - - + > [!TIP] + > If you pass a dictionary with names of datasets as keys and datasets as values, evaluate will run + > separate evaluations on each dataset. This can be useful to monitor how training affects other + > datasets or simply to get a more fine-grained evaluation. + > When used with `load_best_model_at_end`, make sure `metric_for_best_model` references exactly one + > of the datasets. If you, for example, pass in `{"data1": data1, "data2": data2}` for two datasets + > `data1` and `data2`, you could specify `metric_for_best_model="eval_data1_loss"` for using the + > loss on `data1` and `metric_for_best_model="eval_data2_loss"` for the loss on `data2`. ignore_keys (`list[str]`, *optional*): A list of keys in the output of your model (if it is a dictionary) that should be ignored when @@ -4370,13 +4361,10 @@ def predict( An optional prefix to be used as the metrics key prefix. For example the metrics "bleu" will be named "test_bleu" if the prefix is "test" (default) -- - If your predictions or labels have different sequence length (for instance because you're doing dynamic padding - in a token classification task) the predictions will be padded (on the right) to allow for concatenation into - one array. The padding index is -100. - - + > [!TIP] + > If your predictions or labels have different sequence length (for instance because you're doing dynamic padding + > in a token classification task) the predictions will be padded (on the right) to allow for concatenation into + > one array. The padding index is -100. Returns: *NamedTuple* A namedtuple with the following keys: diff --git a/src/transformers/trainer_callback.py b/src/transformers/trainer_callback.py index c72bdbb70bcd..5f2e356418c4 100644 --- a/src/transformers/trainer_callback.py +++ b/src/transformers/trainer_callback.py @@ -38,13 +38,10 @@ class TrainerState: A class containing the [`Trainer`] inner state that will be saved along the model and optimizer when checkpointing and passed to the [`TrainerCallback`]. -- - In all this class, one step is to be understood as one update step. When using gradient accumulation, one update - step may require several forward and backward passes: if you use `gradient_accumulation_steps=n`, then one update - step requires going through *n* batches. - - + > [!TIP] + > In all this class, one step is to be understood as one update step. When using gradient accumulation, one update + > step may require several forward and backward passes: if you use `gradient_accumulation_steps=n`, then one update + > step requires going through *n* batches. Args: epoch (`float`, *optional*): diff --git a/src/transformers/trainer_pt_utils.py b/src/transformers/trainer_pt_utils.py index b1cb1f551ac5..3a25228112f3 100644 --- a/src/transformers/trainer_pt_utils.py +++ b/src/transformers/trainer_pt_utils.py @@ -636,16 +636,13 @@ class IterableDatasetShard(IterableDataset): - the shard on process 0 will yield `[0, 1, 4, 5, 8, 9]` so will see batches `[0, 1]`, `[4, 5]`, `[8, 9]` - the shard on process 1 will yield `[2, 3, 6, 7, 10, 11]` so will see batches `[2, 3]`, `[6, 7]`, `[10, 11]` -- - If your IterableDataset implements some randomization that needs to be applied the same way on all processes - (for instance, a shuffling), you should use a `torch.Generator` in a `generator` attribute of the `dataset` to - generate your random numbers and call the [`~trainer_pt_utils.IterableDatasetShard.set_epoch`] method of this - object. It will set the seed of this `generator` to `seed + epoch` on all processes before starting the - iteration. Alternatively, you can also implement a `set_epoch()` method in your iterable dataset to deal with - this. - - + > [!WARNING] + > If your IterableDataset implements some randomization that needs to be applied the same way on all processes + > (for instance, a shuffling), you should use a `torch.Generator` in a `generator` attribute of the `dataset` to + > generate your random numbers and call the [`~trainer_pt_utils.IterableDatasetShard.set_epoch`] method of this + > object. It will set the seed of this `generator` to `seed + epoch` on all processes before starting the + > iteration. Alternatively, you can also implement a `set_epoch()` method in your iterable dataset to deal with + > this. Args: dataset (`torch.utils.data.IterableDataset`): diff --git a/src/transformers/trainer_seq2seq.py b/src/transformers/trainer_seq2seq.py index ca6842bc0ff3..5b918bb9004b 100644 --- a/src/transformers/trainer_seq2seq.py +++ b/src/transformers/trainer_seq2seq.py @@ -219,13 +219,10 @@ def predict( gen_kwargs: Additional `generate` specific kwargs. -- - If your predictions or labels have different sequence lengths (for instance because you're doing dynamic - padding in a token classification task) the predictions will be padded (on the right) to allow for - concatenation into one array. The padding index is -100. - - + > [!TIP] + > If your predictions or labels have different sequence lengths (for instance because you're doing dynamic + > padding in a token classification task) the predictions will be padded (on the right) to allow for + > concatenation into one array. The padding index is -100. Returns: *NamedTuple* A namedtuple with the following keys: diff --git a/src/transformers/training_args.py b/src/transformers/training_args.py index 2abf0d5c883d..fc2270f0c785 100644 --- a/src/transformers/training_args.py +++ b/src/transformers/training_args.py @@ -254,12 +254,9 @@ class TrainingArguments: gradient_accumulation_steps (`int`, *optional*, defaults to 1): Number of updates steps to accumulate the gradients for, before performing a backward/update pass. -- - When using gradient accumulation, one step is counted as one step with backward pass. Therefore, logging, - evaluation, save will be conducted every `gradient_accumulation_steps * xxx_step` training examples. - - + > [!WARNING] + > When using gradient accumulation, one step is counted as one step with backward pass. Therefore, logging, + > evaluation, save will be conducted every `gradient_accumulation_steps * xxx_step` training examples. eval_accumulation_steps (`int`, *optional*): Number of predictions steps to accumulate the output tensors for, before moving the results to the CPU. If @@ -271,11 +268,8 @@ class TrainingArguments: torch_empty_cache_steps (`int`, *optional*): Number of steps to wait before calling `torch..empty_cache()`. If left unset or set to None, cache will not be emptied. - - - This can help avoid CUDA out-of-memory errors by lowering peak VRAM usage at a cost of about [10% slower performance](https://github.com/huggingface/transformers/issues/31372). - - + > [!TIP] + > This can help avoid CUDA out-of-memory errors by lowering peak VRAM usage at a cost of about [10% slower performance](https://github.com/huggingface/transformers/issues/31372). learning_rate (`float`, *optional*, defaults to 5e-5): The initial learning rate for [`AdamW`] optimizer. @@ -333,12 +327,9 @@ class TrainingArguments: Whether to filter `nan` and `inf` losses for logging. If set to `True` the loss of every step that is `nan` or `inf` is filtered and the average loss of the current logging window is taken instead. -- - `logging_nan_inf_filter` only influences the logging of loss values, it does not change the behavior the - gradient is computed or applied to the model. - - + > [!TIP] + > `logging_nan_inf_filter` only influences the logging of loss values, it does not change the behavior the + > gradient is computed or applied to the model. save_strategy (`str` or [`~trainer_utils.SaveStrategy`], *optional*, defaults to `"steps"`): The checkpoint save strategy to adopt during training. Possible values are: @@ -449,12 +440,9 @@ class TrainingArguments: [`save_total_limit`](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments.save_total_limit) for more. -- - When set to `True`, the parameters `save_strategy` needs to be the same as `eval_strategy`, and in - the case it is "steps", `save_steps` must be a round multiple of `eval_steps`. - - + > [!TIP] + > When set to `True`, the parameters `save_strategy` needs to be the same as `eval_strategy`, and in + > the case it is "steps", `save_steps` must be a round multiple of `eval_steps`. metric_for_best_model (`str`, *optional*): Use in conjunction with `load_best_model_at_end` to specify the metric to use to compare two different @@ -550,10 +538,9 @@ class TrainingArguments: evolve in the future. The value is either the location of DeepSpeed json config file (e.g., `ds_config.json`) or an already loaded json file as a `dict`" -- If enabling any Zero-init, make sure that your model is not initialized until - *after* initializing the `TrainingArguments`, else it will not be applied. - + > [!WARNING] + > If enabling any Zero-init, make sure that your model is not initialized until + > *after* initializing the `TrainingArguments`, else it will not be applied. accelerator_config (`str`, `dict`, or `AcceleratorConfig`, *optional*): Config to be used with the internal `Accelerator` implementation. The value is either a location of @@ -643,12 +630,9 @@ class TrainingArguments: will be pushed each time a save is triggered (depending on your `save_strategy`). Calling [`~Trainer.save_model`] will also trigger a push. -- - If `output_dir` exists, it needs to be a local clone of the repository to which the [`Trainer`] will be - pushed. - - + > [!WARNING] + > If `output_dir` exists, it needs to be a local clone of the repository to which the [`Trainer`] will be + > pushed. resume_from_checkpoint (`str`, *optional*): The path to a folder with a valid checkpoint for your model. This argument is not directly used by @@ -2332,11 +2316,8 @@ def set_training( """ A method that regroups all basic arguments linked to the training. -- - Calling this method will automatically set `self.do_train` to `True`. - - + > [!TIP] + > Calling this method will automatically set `self.do_train` to `True`. Args: learning_rate (`float`, *optional*, defaults to 5e-5): @@ -2356,13 +2337,10 @@ def set_training( gradient_accumulation_steps (`int`, *optional*, defaults to 1): Number of updates steps to accumulate the gradients for, before performing a backward/update pass. -- - When using gradient accumulation, one step is counted as one step with backward pass. Therefore, - logging, evaluation, save will be conducted every `gradient_accumulation_steps * xxx_step` training - examples. - - + > [!WARNING] + > When using gradient accumulation, one step is counted as one step with backward pass. Therefore, + > logging, evaluation, save will be conducted every `gradient_accumulation_steps * xxx_step` training + > examples. seed (`int`, *optional*, defaults to 42): Random seed that will be set at the beginning of training. To ensure reproducibility across runs, use @@ -2463,11 +2441,8 @@ def set_testing( """ A method that regroups all basic arguments linked to testing on a held-out dataset. -- - Calling this method will automatically set `self.do_predict` to `True`. - - + > [!TIP] + > Calling this method will automatically set `self.do_predict` to `True`. Args: batch_size (`int` *optional*, defaults to 8): @@ -2582,12 +2557,9 @@ def set_logging( Whether to filter `nan` and `inf` losses for logging. If set to `True` the loss of every step that is `nan` or `inf` is filtered and the average loss of the current logging window is taken instead. -- - `nan_inf_filter` only influences the logging of loss values, it does not change the behavior the - gradient is computed or applied to the model. - - + > [!TIP] + > `nan_inf_filter` only influences the logging of loss values, it does not change the behavior the + > gradient is computed or applied to the model. on_each_node (`bool`, *optional*, defaults to `True`): In multinode distributed training, whether to log using `log_level` once per node, or only on the main @@ -2630,13 +2602,10 @@ def set_push_to_hub( """ A method that regroups all arguments linked to synchronizing checkpoints with the Hub. -- - Calling this method will set `self.push_to_hub` to `True`, which means the `output_dir` will begin a git - directory synced with the repo (determined by `model_id`) and the content will be pushed each time a save is - triggered (depending on your `self.save_strategy`). Calling [`~Trainer.save_model`] will also trigger a push. - - + > [!TIP] + > Calling this method will set `self.push_to_hub` to `True`, which means the `output_dir` will begin a git + > directory synced with the repo (determined by `model_id`) and the content will be pushed each time a save is + > triggered (depending on your `self.save_strategy`). Calling [`~Trainer.save_model`] will also trigger a push. Args: model_id (`str`): diff --git a/src/transformers/utils/auto_docstring.py b/src/transformers/utils/auto_docstring.py index 9bf44c8bb426..fe5ee8fc674d 100644 --- a/src/transformers/utils/auto_docstring.py +++ b/src/transformers/utils/auto_docstring.py @@ -1202,13 +1202,10 @@ def add_intro_docstring(func, class_name, indent_level=0): if func.__name__ == "forward": intro_docstring = rf"""The [`{class_name}`] forward method, overrides the `__call__` special method. -- - Although the recipe for forward pass needs to be defined within this function, one should call the [`Module`] - instance afterwards instead of this since the former takes care of running the pre and post processing steps while - the latter silently ignores them. - - + > [!TIP] + > Although the recipe for forward pass needs to be defined within this function, one should call the [`Module`] + > instance afterwards instead of this since the former takes care of running the pre and post processing steps while + > the latter silently ignores them. """ intro_docstring = equalize_indent(intro_docstring, indent_level + 4) diff --git a/src/transformers/utils/doc.py b/src/transformers/utils/doc.py index f9a787a74a13..fd0b4881774a 100644 --- a/src/transformers/utils/doc.py +++ b/src/transformers/utils/doc.py @@ -47,13 +47,10 @@ def docstring_decorator(fn): class_name = f"[`{fn.__qualname__.split('.')[0]}`]" intro = rf""" The {class_name} forward method, overrides the `__call__` special method. -- - Although the recipe for forward pass needs to be defined within this function, one should call the [`Module`] - instance afterwards instead of this since the former takes care of running the pre and post processing steps while - the latter silently ignores them. - - + > [!TIP] + > Although the recipe for forward pass needs to be defined within this function, one should call the [`Module`] + > instance afterwards instead of this since the former takes care of running the pre and post processing steps while + > the latter silently ignores them. """ correct_indentation = get_docstring_indentation_level(fn) @@ -180,13 +177,10 @@ def _prepare_output_docstrings(output_type, config_class, min_indent=None, add_i FAKE_MODEL_DISCLAIMER = """ -- - This example uses a random model as the real ones are all very big. To get proper results, you should use - {real_checkpoint} instead of {fake_checkpoint}. If you get out-of-memory when loading that checkpoint, you can try - adding `device_map="auto"` in the `from_pretrained` call. - - + > [!WARNING] + > This example uses a random model as the real ones are all very big. To get proper results, you should use + > {real_checkpoint} instead of {fake_checkpoint}. If you get out-of-memory when loading that checkpoint, you can try + > adding `device_map="auto"` in the `from_pretrained` call. """ diff --git a/src/transformers/utils/generic.py b/src/transformers/utils/generic.py index b39ed65251b2..4e558d2629d7 100644 --- a/src/transformers/utils/generic.py +++ b/src/transformers/utils/generic.py @@ -249,12 +249,9 @@ class ModelOutput(OrderedDict): tuple) or strings (like a dictionary) that will ignore the `None` attributes. Otherwise behaves like a regular python dictionary. -- - You can't unpack a `ModelOutput` directly. Use the [`~utils.ModelOutput.to_tuple`] method to convert it to a tuple - before. - - + > [!WARNING] + > You can't unpack a `ModelOutput` directly. Use the [`~utils.ModelOutput.to_tuple`] method to convert it to a tuple + > before. """ def __init_subclass__(cls) -> None: diff --git a/src/transformers/utils/hub.py b/src/transformers/utils/hub.py index dab357941b81..8189603a50ee 100644 --- a/src/transformers/utils/hub.py +++ b/src/transformers/utils/hub.py @@ -298,11 +298,8 @@ def cached_file( repo_type (`str`, *optional*): Specify the repo type (useful when downloading from a space for instance). -- - Passing `token=True` is required when you want to use a private model. - - + > [!TIP] + > Passing `token=True` is required when you want to use a private model. Returns: `Optional[str]`: Returns the resolved file (to the cache folder if downloaded from a repo). @@ -386,11 +383,8 @@ def cached_files( passed when we are chaining several calls to various files (e.g. when loading a tokenizer or a pipeline). If files are cached for this commit hash, avoid calls to head and get from the cache. -- - Passing `token=True` is required when you want to use a private model. - - + > [!TIP] + > Passing `token=True` is required when you want to use a private model. Returns: `Optional[str]`: Returns the resolved file (to the cache folder if downloaded from a repo). diff --git a/src/transformers/utils/logging.py b/src/transformers/utils/logging.py index e383653871bf..9e5329c9b9e6 100644 --- a/src/transformers/utils/logging.py +++ b/src/transformers/utils/logging.py @@ -165,17 +165,14 @@ def get_verbosity() -> int: Returns: `int`: The logging level. -- - 🤗 Transformers has following logging levels: - - - 50: `transformers.logging.CRITICAL` or `transformers.logging.FATAL` - - 40: `transformers.logging.ERROR` - - 30: `transformers.logging.WARNING` or `transformers.logging.WARN` - - 20: `transformers.logging.INFO` - - 10: `transformers.logging.DEBUG` - - """ + > [!TIP] + > 🤗 Transformers has following logging levels: + > + > - 50: `transformers.logging.CRITICAL` or `transformers.logging.FATAL` + > - 40: `transformers.logging.ERROR` + > - 30: `transformers.logging.WARNING` or `transformers.logging.WARN` + > - 20: `transformers.logging.INFO` + > - 10: `transformers.logging.DEBUG`""" _configure_library_root_logger() return _get_library_root_logger().getEffectiveLevel() diff --git a/src/transformers/utils/peft_utils.py b/src/transformers/utils/peft_utils.py index e3976acf168b..09e6177aca82 100644 --- a/src/transformers/utils/peft_utils.py +++ b/src/transformers/utils/peft_utils.py @@ -65,11 +65,8 @@ def find_adapter_config_file( git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any identifier allowed by git. -- - To test a pull request you made on the Hub, you can pass `revision="refs/pr/ + > [!TIP] + > To test a pull request you made on the Hub, you can pass `revision="refs/pr/". - - ". local_files_only (`bool`, *optional*, defaults to `False`): If `True`, will only try to load the tokenizer configuration from local files. diff --git a/src/transformers/video_processing_utils.py b/src/transformers/video_processing_utils.py index 4d0e9c58f314..db6e9e95db9e 100644 --- a/src/transformers/video_processing_utils.py +++ b/src/transformers/video_processing_utils.py @@ -484,11 +484,8 @@ def from_pretrained( identifier allowed by git. - - - To test a pull request you made on the Hub, you can pass `revision="refs/pr/ + > [!TIP] + > To test a pull request you made on the Hub, you can pass `revision="refs/pr/"`. - - "`. return_unused_kwargs (`bool`, *optional*, defaults to `False`): If `False`, then this function returns just the final video processor object. If `True`, then this @@ -850,11 +847,8 @@ def register_for_auto_class(cls, auto_class="AutoVideoProcessor"): Register this class with a given auto class. This should only be used for custom video processors as the ones in the library are already mapped with `AutoVideoProcessor `. - - - This API is experimental and may have some slight breaking changes in the next releases. - - + > [!WARNING] + > This API is experimental and may have some slight breaking changes in the next releases. Args: auto_class (`str` or `type`, *optional*, defaults to `"AutoVideoProcessor "`): diff --git a/utils/deprecate_models.py b/utils/deprecate_models.py index 8cbe319fdb65..eda05f8d0ffb 100644 --- a/utils/deprecate_models.py +++ b/utils/deprecate_models.py @@ -45,14 +45,11 @@ def get_last_stable_minor_release(): def build_tip_message(last_stable_release): return ( """ -- -This model is in maintenance mode only, we don't accept any new PRs changing its code. -""" - + f"""If you run into any issues running this model, please reinstall the last version that supported this model: v{last_stable_release}. -You can do so by running the following command: `pip install -U transformers=={last_stable_release}`. - - """ +> [!WARNING] +> This model is in maintenance mode only, we don't accept any new PRs changing its code. +> """ +> + f"""If you run into any issues running this model, please reinstall the last version that supported this model: v{last_stable_release}. +> You can do so by running the following command: `pip install -U transformers=={last_stable_release}`.""" )