Sourcery Starbot ⭐ refactored guyrosin/temporal_attention by SourceryAI · Pull Request #2 · guyrosin/temporal_attention

SourceryAI · 2023-04-14T14:14:36Z

Thanks for starring sourcery-ai/sourcery ✨ 🌟 ✨

Here's your pull request refactoring your most popular Python repo.

If you want Sourcery to refactor all your Python repos and incoming pull requests install our bot.

Review changes via command line

To manually merge these changes, make sure you're on the main branch, then run:

git fetch https://github.com/sourcery-ai-bot/temporal_attention main
git merge --ff-only FETCH_HEAD
git reset HEAD^

SourceryAI · 2023-04-14T14:14:38Z

data_utils.py

    )
-    dataset = DatasetDict({"train": train_dataset, "validation": test_dataset})
-    return dataset
+    return DatasetDict({"train": train_dataset, "validation": test_dataset})


Function load_train_test_datasets refactored with the following changes:

Inline variable that is immediately returned (inline-immediately-returned-variable)

SourceryAI · 2023-04-14T14:14:38Z

data_utils.py

-    exclude_similar_sentences = True if corpus_name.startswith("liverpool") else False
+    exclude_similar_sentences = bool(corpus_name.startswith("liverpool"))


Function split_temporal_dataset_files refactored with the following changes:

Simplify boolean if expression (boolean-if-exp-identity)

SourceryAI · 2023-04-14T14:14:38Z

data_utils.py

-        logger.info(f"Finding relevant sentences in the corpus...")
+        logger.info("Finding relevant sentences in the corpus...")


Function find_sentences_of_words refactored with the following changes:

Replace f-string with no interpolated values with string (remove-redundant-fstring)

SourceryAI · 2023-04-14T14:14:39Z

hf_utils.py

-    kwargs.update(additional_kwargs)
-    config = AutoConfig.from_pretrained(
+    kwargs |= additional_kwargs
+    return AutoConfig.from_pretrained(
        model_args.model_name_or_path, cache_dir=model_args.cache_dir, **kwargs
    )
-    return config


Function _load_auto_config refactored with the following changes:

Merge dictionary updates via the union operator (dict-assign-update-to-union)

Inline variable that is immediately returned (inline-immediately-returned-variable)

SourceryAI · 2023-04-14T14:14:39Z

hf_utils.py

-        f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu}, "
-        + f"distributed training: {bool(training_args.local_rank != -1)}"
+        (
+            f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu}, "
+            + f"distributed training: {training_args.local_rank != -1}"
+        )


Function init_run refactored with the following changes:

Remove unnecessary casts to int, str, float or bool (remove-unnecessary-cast)

SourceryAI · 2023-04-14T14:14:41Z

tokenization_utils_base.py

-        # For backward compatibility, allow to try to setup 'max_len_sentences_pair'.
        if (
-            value == self.model_max_length - self.num_special_tokens_to_add(pair=True)
-            and self.verbose
+            value
+            != self.model_max_length - self.num_special_tokens_to_add(pair=True)
+            or not self.verbose
        ):
-            if not self.deprecation_warnings.get("max_len_sentences_pair", False):
-                logger.warning(
-                    "Setting 'max_len_sentences_pair' is now deprecated. "
-                    "This value is automatically set up."
-                )
-            self.deprecation_warnings["max_len_sentences_pair"] = True
-        else:
            raise ValueError(
                "Setting 'max_len_sentences_pair' is now deprecated. "
                "This value is automatically set up."
            )
+        if not self.deprecation_warnings.get("max_len_sentences_pair", False):
+            logger.warning(
+                "Setting 'max_len_sentences_pair' is now deprecated. "
+                "This value is automatically set up."
+            )
+        self.deprecation_warnings["max_len_sentences_pair"] = True


Function TempoPreTrainedTokenizerBase.max_len_sentences_pair refactored with the following changes:

Swap if/else branches (swap-if-else-branches)

Remove unnecessary else after guard condition (remove-unnecessary-else)

This removes the following comments ( why? ):

# For backward compatibility, allow to try to setup 'max_len_sentences_pair'.

SourceryAI · 2023-04-14T14:14:41Z

tokenization_utils_base.py

-                "Model name '{}' not found in model shortcut name list ({}). "
-                "Assuming '{}' is a path, a model identifier, or url to a directory containing tokenizer files.".format(
-                    pretrained_model_name_or_path,
-                    ", ".join(s3_models),
-                    pretrained_model_name_or_path,
-                )
+                f"""Model name '{pretrained_model_name_or_path}' not found in model shortcut name list ({", ".join(s3_models)}). Assuming '{pretrained_model_name_or_path}' is a path, a model identifier, or url to a directory containing tokenizer files."""


Function TempoPreTrainedTokenizerBase.from_pretrained refactored with the following changes:

Replace call to format with f-string (use-fstring-for-formatting)

Swap if/else branches (swap-if-else-branches)

Remove unnecessary else after guard condition (remove-unnecessary-else)

Simplify sequence length comparison (simplify-len-comparison)

SourceryAI · 2023-04-14T14:14:42Z

tokenization_utils_base.py

+                "It may result in unexpected tokenization. \n"
+                f"The tokenizer class you load from this checkpoint is '{config_tokenizer_class}'. \n"
+                f"The class this function is called from is '{cls.__name__}'."
+            )


Function TempoPreTrainedTokenizerBase._from_pretrained refactored with the following changes:

Merge nested if conditions (merge-nested-ifs)

Replace list(), dict() or set() with comprehension (collection-builtin-to-comprehension)

Remove unnecessary casts to int, str, float or bool (remove-unnecessary-cast)

SourceryAI · 2023-04-14T14:14:42Z

tokenization_utils_base.py

-            (filename_prefix + "-" if filename_prefix else "")
-            + SPECIAL_TOKENS_MAP_FILE,
+            (
+                (f"{filename_prefix}-" if filename_prefix else "")
+                + SPECIAL_TOKENS_MAP_FILE
+            ),
        )
        tokenizer_config_file = os.path.join(
            save_directory,
-            (filename_prefix + "-" if filename_prefix else "") + TOKENIZER_CONFIG_FILE,
+            (f"{filename_prefix}-" if filename_prefix else "")
+            + TOKENIZER_CONFIG_FILE,


Function TempoPreTrainedTokenizerBase.save_pretrained refactored with the following changes:

Use f-string instead of string concatenation [×2] (use-fstring-for-concatenation)

Replace list(), dict() or set() with comprehension (collection-builtin-to-comprehension)

SourceryAI · 2023-04-14T14:14:42Z

tokenization_utils_base.py

-            (filename_prefix + "-" if filename_prefix else "") + ADDED_TOKENS_FILE,
+            (f"{filename_prefix}-" if filename_prefix else "") + ADDED_TOKENS_FILE,
        )
-        added_vocab = self.get_added_vocab()
-        if added_vocab:
+        if added_vocab := self.get_added_vocab():


Function TempoPreTrainedTokenizerBase._save_pretrained refactored with the following changes:

Use named expression to simplify assignment and conditional (use-named-expression)

Use f-string instead of string concatenation (use-fstring-for-concatenation)

SourceryAI · 2023-04-14T14:14:47Z

tokenization_utils_base.py

-                    if max_length is not None:
-                        if max_length is not None and (
-                            truncation is False or truncation == "do_not_truncate"
-                        ):
-                            warnings.warn(
-                                "`max_length` is ignored when `padding`=`True` and there is no truncation strategy. "
-                                "To pad to max length, use `padding='max_length'`."
-                            )
+                    if max_length is not None and (
+                        truncation is False or truncation == "do_not_truncate"
+                    ):
+                        warnings.warn(
+                            "`max_length` is ignored when `padding`=`True` and there is no truncation strategy. "
+                            "To pad to max length, use `padding='max_length'`."
+                        )


Function TempoPreTrainedTokenizerBase._get_padding_truncation_strategies refactored with the following changes:

Merge nested if conditions (merge-nested-ifs)

Remove redundant conditional [×3] (remove-redundant-if)

SourceryAI · 2023-04-14T14:14:47Z

tokenization_utils_base.py

-            inputs = dict((k, v[i]) for k, v in encoded_inputs.items())
+            inputs = {k: v[i] for k, v in encoded_inputs.items()}


Function TempoPreTrainedTokenizerBase.pad refactored with the following changes:

Replace list(), dict() or set() with comprehension (collection-builtin-to-comprehension)

SourceryAI · 2023-04-14T14:14:47Z

tokenization_utils_base.py

-        if token_ids_1 is None:
-            return token_ids_0
-        return token_ids_0 + token_ids_1
+        return token_ids_0 if token_ids_1 is None else token_ids_0 + token_ids_1


Function TempoPreTrainedTokenizerBase.build_inputs_with_special_tokens refactored with the following changes:

Lift code into else after jump in control flow (reintroduce-else)

Replace if statement with if expression (assign-if-exp)

SourceryAI · 2023-04-14T14:14:48Z

tokenization_utils_base.py

-        pair = bool(pair_ids is not None)
+        pair = pair_ids is not None


Function TempoPreTrainedTokenizerBase.prepare_for_model refactored with the following changes:

Remove unnecessary casts to int, str, float or bool (remove-unnecessary-cast)

Swap positions of nested conditionals [×2] (swap-nested-ifs)

Hoist nested repeated code outside conditional statements [×2] (hoist-similar-statement-from-if)

Split conditional into multiple branches [×2] (split-or-ifs)

Merge duplicate blocks in conditional [×2] (merge-duplicate-blocks)

Inline variable that is immediately returned (inline-immediately-returned-variable)

Remove redundant conditional [×2] (remove-redundant-if)

SourceryAI · 2023-04-14T14:14:48Z

tokenization_utils_base.py

-                    error_msg = (
-                        error_msg + "Please select another truncation strategy than "
-                        f"{truncation_strategy}, for instance 'longest_first' or 'only_second'."
-                    )
+                    error_msg = f"{error_msg}Please select another truncation strategy than {truncation_strategy}, for instance 'longest_first' or 'only_second'."


Function TempoPreTrainedTokenizerBase.truncate_sequences refactored with the following changes:

Use f-string instead of string concatenation (use-fstring-for-concatenation)

SourceryAI · 2023-04-14T14:14:51Z

tokenization_utils_fast.py

-            clean_text = self.clean_up_tokenization(text)
-            return clean_text
+            return self.clean_up_tokenization(text)


Function TempoPreTrainedTokenizerFast._decode refactored with the following changes:

Inline variable that is immediately returned (inline-immediately-returned-variable)

SourceryAI · 2023-04-14T14:14:51Z

tokenization_utils_fast.py

-                (filename_prefix + "-" if filename_prefix else "") + ADDED_TOKENS_FILE,
+                (f"{filename_prefix}-" if filename_prefix else "")
+                + ADDED_TOKENS_FILE,
            )
-            added_vocab = self.get_added_vocab()
-            if added_vocab:
+            if added_vocab := self.get_added_vocab():


Function TempoPreTrainedTokenizerFast._save_pretrained refactored with the following changes:

Use named expression to simplify assignment and conditional (use-named-expression)

Use f-string instead of string concatenation [×2] (use-fstring-for-concatenation)

SourceryAI · 2023-04-14T14:14:51Z

train_tempobert.py

-    if data_args.line_by_line:
-        tokenized_dataset = tokenize_dataset_line_by_line(
+    return (
+        tokenize_dataset_line_by_line(


Function load_data refactored with the following changes:

Replace if statement with if expression (assign-if-exp)

Inline variable that is immediately returned (inline-immediately-returned-variable)

SourceryAI · 2023-04-14T14:14:52Z

models/tempobert/__init__.py

-    "configuration_tempobert": [
-        "TempoBertConfig",
+    "configuration_tempobert": ["TempoBertConfig"],
+    "tokenization_tempobert_fast": ["TempoBertTokenizerFast"],
+    "modeling_tempobert": [
+        "TempoBertForMaskedLM",
+        "TempoBertModel",
+        "TempoBertForPreTraining",
+        "TempoBertForSequenceClassification",
+        "TempoBertForTokenClassification",
    ],
 }

-_import_structure["tokenization_tempobert_fast"] = ["TempoBertTokenizerFast"]
-
-_import_structure["modeling_tempobert"] = [
-    "TempoBertForMaskedLM",
-    "TempoBertModel",
-    "TempoBertForPreTraining",
-    "TempoBertForSequenceClassification",
-    "TempoBertForTokenClassification",
-]


Lines 6-19 refactored with the following changes:

Merge dictionary assignment with declaration [×2] (merge-dict-assign)

SourceryAI · 2023-04-14T14:14:53Z

models/tempobert/modeling_tempobert.py

-            SPECIAL_TIMES_COUNT = 2  # NOTE: hardcoded (see TempoSpecialTokensMixin)
            if "attention" in self.time_embedding_type:
+                SPECIAL_TIMES_COUNT = 2  # NOTE: hardcoded (see TempoSpecialTokensMixin)


Function TempoBertEmbeddings.init_time_embeddings refactored with the following changes:

Move assignments closer to their usage (move-assign)

'Refactored by Sourcery'

e026f86

SourceryAI commented Apr 14, 2023

View reviewed changes

		exclude_similar_sentences = True if corpus_name.startswith("liverpool") else False
		exclude_similar_sentences = bool(corpus_name.startswith("liverpool"))

		logger.info(f"Finding relevant sentences in the corpus...")
		logger.info("Finding relevant sentences in the corpus...")

		inputs = dict((k, v[i]) for k, v in encoded_inputs.items())
		inputs = {k: v[i] for k, v in encoded_inputs.items()}

		pair = bool(pair_ids is not None)
		pair = pair_ids is not None

Conversation

SourceryAI commented Apr 14, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant