Conversation
| directory = "testing/data/" | ||
| location = "output/genemark/" | ||
|
|
||
| gff_ext = ".genemark.txt" | ||
|
|
||
| gff_file = directory + location + id + "/" + id + gff_ext | ||
| gff_file = f"testing/data/{location}{id}/{id}{gff_ext}" |
There was a problem hiding this comment.
Function run_file refactored with the following changes:
- Move assignment closer to its usage within a block (
move-assign-in-block) - Merge append into list declaration [×2] (
merge-list-append) - Use named expression to simplify assignment and conditional (
use-named-expression) - Merge extend into list declaration (
merge-list-extend) - Use f-string instead of string concatenation [×5] (
use-fstring-for-concatenation) - Inline variable that is only used once (
inline-variable)
This removes the following comments ( why? ):
# rows.append(metadata)
| output = [] | ||
| output.append(["id", "correct", "total", "gene_count"]) | ||
| output = [["id", "correct", "total", "gene_count"]] |
There was a problem hiding this comment.
Function run_files refactored with the following changes:
- Merge append into list declaration (
merge-list-append)
| if not type(files) == list: | ||
| if type(files) != list: | ||
| files = [files] | ||
|
|
||
| all_file_count = len(files) | ||
| print(files) | ||
|
|
||
| output = [] | ||
| output.append(["id", "correct", "total", "gene_count"]) | ||
| output = [["id", "correct", "total", "gene_count"]] |
There was a problem hiding this comment.
Function run refactored with the following changes:
- Simplify logical expression using De Morgan identities (
de-morgan) - Merge append into list declaration (
merge-list-append)
| directory = "testing/data/" | ||
| location = "output/genemark/" | ||
|
|
||
| gff_ext = ".genemark.txt" | ||
|
|
||
| gff_file = directory + location + id + "/" + id + gff_ext | ||
| gff_file = f"testing/data/{location}{id}/{id}{gff_ext}" |
There was a problem hiding this comment.
Function run_file refactored with the following changes:
- Move assignment closer to its usage within a block (
move-assign-in-block) - Merge append into list declaration [×2] (
merge-list-append) - Use named expression to simplify assignment and conditional (
use-named-expression) - Merge extend into list declaration (
merge-list-extend) - Use f-string instead of string concatenation [×5] (
use-fstring-for-concatenation) - Inline variable that is only used once (
inline-variable)
This removes the following comments ( why? ):
# rows.append(metadata)
| output = [] | ||
| output.append(["id", "correct", "total", "gene_count"]) | ||
| output = [["id", "correct", "total", "gene_count"]] |
There was a problem hiding this comment.
Function run_files refactored with the following changes:
- Merge append into list declaration (
merge-list-append)
| for ref in GeneDatabaseReference.objects.filter(gene=gene): | ||
| references.append(ref) | ||
|
|
||
| for ref in gene.cds.database_set: | ||
| references.append(ref) | ||
|
|
||
| references.extend(iter(GeneDatabaseReference.objects.filter(gene=gene))) | ||
| references.extend(iter(gene.cds.database_set)) |
There was a problem hiding this comment.
Function get_gene refactored with the following changes:
- Replace a for append loop with list extend [×2] (
for-append-to-extend) - Simplify generator expression [×2] (
simplify-generator)
| if not response.status_code == 200: | ||
| if response.status_code != 200: |
There was a problem hiding this comment.
Function get_interpro_info refactored with the following changes:
- Simplify logical expression using De Morgan identities (
de-morgan)
| base = "https://www.orthodb.org/search?" | ||
| query = f"query={id}&" | ||
| tags = "ncbi=1" | ||
| response = requests.get(base + query + tags) | ||
| if not response.status_code == 200: | ||
| return [] | ||
| if results := response.json().get("data", []): | ||
| return results | ||
| else: | ||
| response = requests.get(f"https://www.orthodb.org/search?{query}{tags}") | ||
| if response.status_code != 200: | ||
| return [] | ||
| return results if (results := response.json().get("data", [])) else [] |
There was a problem hiding this comment.
Function get_ortho_db_set refactored with the following changes:
- Move assignment closer to its usage within a block (
move-assign-in-block) - Simplify logical expression using De Morgan identities (
de-morgan) - Replace if statement with if expression (
assign-if-exp) - Use f-string instead of string concatenation [×2] (
use-fstring-for-concatenation) - Inline variable that is only used once (
inline-variable)
| print("________________FAILS________________________") | ||
| print(fails) | ||
| file = output + "gene_population_failures.csv" | ||
| file = f"{output}gene_population_failures.csv" |
There was a problem hiding this comment.
Function main refactored with the following changes:
- Use f-string instead of string concatenation (
use-fstring-for-concatenation)
| parser = MinimalGenbankParser(f) | ||
| for p in parser: | ||
| genes.append(p) | ||
| genes.extend(iter(parser)) |
There was a problem hiding this comment.
Function parse_file refactored with the following changes:
- Replace a for append loop with list extend (
for-append-to-extend) - Simplify generator expression (
simplify-generator)
| genes = parse_file(file) | ||
| if genes: | ||
| if genes := parse_file(file): |
There was a problem hiding this comment.
Function run refactored with the following changes:
- Use named expression to simplify assignment and conditional (
use-named-expression)
| type = "search" | ||
| query = f"?query={query}" | ||
|
|
||
| response = requests.get(ORTHO_DB_ENDPOINT + type + query) | ||
| response = requests.get(f"{ORTHO_DB_ENDPOINT}search{query}") |
There was a problem hiding this comment.
Function get_orthoDB_id refactored with the following changes:
- Move assignment closer to its usage within a block (
move-assign-in-block) - Use f-string instead of string concatenation [×2] (
use-fstring-for-concatenation) - Inline variable that is only used once (
inline-variable)
| type = "orthologs" | ||
| query = f"?id={id}" | ||
|
|
||
| response = requests.get(ORTHO_DB_ENDPOINT + type + query) | ||
| response = requests.get(f"{ORTHO_DB_ENDPOINT}orthologs{query}") |
There was a problem hiding this comment.
Function get_orthologs_from_orthoDB refactored with the following changes:
- Move assignment closer to its usage within a block (
move-assign-in-block) - Use f-string instead of string concatenation [×2] (
use-fstring-for-concatenation) - Inline variable that is only used once (
inline-variable)
| if self.kegg_id: | ||
| return self.kegg_id | ||
|
|
||
| else: | ||
| if not self.kegg_id: | ||
| self.kegg_link() | ||
| if not self.kegg_id: | ||
| raise ValueError( | ||
| f"Database reference does not have an associated link in kegg db." | ||
| ) | ||
| return self.kegg_id | ||
| if not self.kegg_id: | ||
| raise ValueError( | ||
| "Database reference does not have an associated link in kegg db." | ||
| ) | ||
| return self.kegg_id |
There was a problem hiding this comment.
Function Gene.get_kegg_id refactored with the following changes:
- Swap if/else to remove empty if body (
remove-pass-body) - Hoist repeated code outside conditional statement (
hoist-statement-from-if) - Hoist conditional out of nested conditional (
hoist-if-from-if) - Replace f-string with no interpolated values with string (
remove-redundant-fstring)
| if not "NC_000964" in a: | ||
| if "NC_000964" not in a: | ||
| accs.append(a) | ||
| print(f"{i} -- {a=}") | ||
| print(f"{i} -- {accs=}") | ||
|
|
||
| raise ValueError(accs) | ||
| for ref in high_counts.keys(): | ||
| get_sequence(ref) |
There was a problem hiding this comment.
Function get_references refactored with the following changes:
- Remove unreachable code (
remove-unreachable-code) - Simplify logical expression using De Morgan identities (
de-morgan)
| i = 0 | ||
| while i < len(db_string): | ||
| for i in range(0, len(db_string), 2): | ||
| database = db_string[i] | ||
| location = db_string[i + 1] | ||
| i += 2 |
There was a problem hiding this comment.
Function create_database_objects refactored with the following changes:
- Replace while with for (
while-to-for) - Use named expression to simplify assignment and conditional (
use-named-expression)
| if y == "y" or y == "Y": | ||
| print(f"please enter a description for the {object_name}") | ||
| description = input() | ||
| print(f"are you happy with {description=} (y/n)") | ||
| y = input() | ||
| if y != "y" or y != "Y": | ||
| description = None | ||
| return name, description | ||
| else: | ||
| if y not in ["y", "Y"]: | ||
| return user_input_name_and_description(object_name) | ||
| print(f"please enter a description for the {object_name}") | ||
| description = input() | ||
| print(f"are you happy with {description=} (y/n)") | ||
| y = input() | ||
| return name, None |
There was a problem hiding this comment.
Function user_input_name_and_description refactored with the following changes:
- Swap if/else branches (
swap-if-else-branches) - Remove unnecessary else after guard condition (
remove-unnecessary-else) - Replace multiple comparisons of same variable with
inoperator (merge-comparisons) - Remove redundant conditional (
remove-redundant-if) - Inline variable that is only used once (
inline-variable)
| for p in parser: | ||
| genes.append(p) | ||
|
|
||
| genes.extend(iter(parser)) |
There was a problem hiding this comment.
Lines 20-22 refactored with the following changes:
- Replace a for append loop with list extend (
for-append-to-extend) - Simplify generator expression (
simplify-generator)
| result = {} | ||
| type_, data = cogent.block_consolidator(lines) | ||
| result["type"] = type_ | ||
| result = {"type": type_} |
There was a problem hiding this comment.
Function GenbankParser.parse_feature refactored with the following changes:
- Move assignment closer to its usage within a block (
move-assign-in-block) - Replace a conditional string slice with a call to
str.removesuffixorstr.removeprefix, where applicable (use-string-remove-affix) - Merge dictionary assignment with declaration (
merge-dict-assign)
| rows = [] | ||
| correct = 0 | ||
| rows.append(['Gene name (genemark)', 'Annotated Gene Name', 'Start', 'End', 'Equal', 'Raw Location']) | ||
| rows = [ | ||
| [ | ||
| 'Gene name (genemark)', | ||
| 'Annotated Gene Name', | ||
| 'Start', | ||
| 'End', | ||
| 'Equal', | ||
| 'Raw Location', | ||
| ] | ||
| ] |
There was a problem hiding this comment.
Lines 50-52 refactored with the following changes:
- Move assignment closer to its usage within a block (
move-assign-in-block) - Merge append into list declaration (
merge-list-append)
There was a problem hiding this comment.
PR Type: Refactoring
Summary of PR: This PR introduces a series of refactoring changes aimed at improving code readability and conciseness. It includes the use of f-strings for string formatting, list comprehensions, and the walrus operator for inline assignments. The changes are spread across multiple files and functions within the codebase.
General PR suggestions
- Verify that the Python version in the deployment environment supports the walrus operator (Python 3.8+), as it is used in several places in the refactored code.
- Review the changes to ensure that the refactoring does not alter the intended logic, especially in cases where the original code's behavior may have been modified inadvertently.
- Consider the readability of the code, particularly when using newer Python syntax like the walrus operator, which may not be familiar to all developers. Splitting dense lines could improve clarity.
- Ensure that the refactoring does not introduce potential bugs, such as index errors when iterating over lists with steps or when making assumptions about list contents.
- Revisit any changes that may have introduced regressions or altered the behavior of the code, such as the handling of
self.kegg_idin theGenemodel, and correct them to maintain the original functionality.
Your trial expires on December 18, 2023. Please email tim@sourcery.ai to continue using Sourcery ✨
|
|
||
| m = annotated_genes.filter(Q(first_base=first) | Q(last_base=last)) | ||
| if m: | ||
| if m := annotated_genes.filter( |
There was a problem hiding this comment.
suggestion (llm): Using the walrus operator (:=) inside the if statement is a Python 3.8+ feature. Ensure that the deployment environment supports this Python version. If compatibility with earlier versions is required, consider reverting to the previous approach.
| if m := annotated_genes.filter( | |
| m = annotated_genes.filter(Q(first_base=first) | Q(last_base=last)) | |
| if m: |
| ]) | ||
|
|
||
| all_rows.extend(rows) | ||
| all_rows = [['correct', success, 'total', total], *rows] |
There was a problem hiding this comment.
suggestion (llm): While using unpacking to extend lists is concise, it may not be immediately clear to all readers. Consider if the readability could be improved by using the more explicit all_rows.extend(rows) after initializing all_rows with the summary row.
| all_rows = [['correct', success, 'total', total], *rows] | |
| all_rows = [['correct', success, 'total', total]] | |
| all_rows.extend(rows) |
| databases = [] | ||
| i = 0 | ||
| while i < len(db_string): | ||
| for i in range(0, len(db_string), 2): |
There was a problem hiding this comment.
suggestion (llm): The use of a range with a step of 2 is a good way to iterate over pairs in the list. However, ensure that the list always contains an even number of elements to avoid an IndexError on location = db_string[i + 1].
| response = requests.get(f"https://www.orthodb.org/search?{query}{tags}") | ||
| if response.status_code != 200: | ||
| return [] | ||
| return results if (results := response.json().get("data", [])) else [] |
There was a problem hiding this comment.
suggestion (llm): The inline assignment and conditional expression make this line quite dense. Consider splitting it for clarity, especially since the walrus operator is a relatively new addition to Python and may not be familiar to all developers.
| return self.kegg_id | ||
|
|
||
| else: | ||
| if not self.kegg_id: |
There was a problem hiding this comment.
issue (llm): The refactored code has changed the logic. Previously, if self.kegg_id was not set, self.kegg_link() was called before checking self.kegg_id again. Now, if self.kegg_id is not set, it raises a ValueError immediately without attempting to set it via self.kegg_link(). This could lead to unintended behavior.
| if not self.kegg_id: | |
| if not self.kegg_id: | |
| self.kegg_link() | |
| if not self.kegg_id: | |
| raise ValueError("Database reference does not have an associated link in kegg db.") | |
| return self.kegg_id |
Branch
mainrefactored by Sourcery.If you're happy with these changes, merge this Pull Request using the Squash and merge strategy.
See our documentation here.
Run Sourcery locally
Reduce the feedback loop during development by using the Sourcery editor plugin:
Review changes via command line
To manually merge these changes, make sure you're on the
mainbranch, then run:Help us improve this pull request!