-
Notifications
You must be signed in to change notification settings - Fork 4
ctk info table-traffic
#528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds a new CLI command table-traffic that invokes a new job analysis pipeline. Introduces cratedb_toolkit/info/job.py with SQL classification, job log reading, operation modeling, and rendering. Adjusts get_table_names to return fully qualified table names. The CLI wires context to a DatabaseCluster and TableTraffic to render results. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant CLI as CLI (table-traffic)
participant DC as DatabaseCluster
participant TA as TableTraffic
participant DB as CrateDB
participant Class as SqlStatementClassifier
participant Polars as Polars/DataFrame
User->>CLI: crate info table-traffic
CLI->>DC: init(address from ctx.meta)
CLI->>TA: init(adapter=DC.adapter)
CLI->>TA: render()
activate TA
TA->>DB: SELECT ... FROM sys.jobs_log WHERE ... (time window, non-system)
DB-->>TA: jobs rows (stmt, timestamps)
TA->>TA: read_jobs(map rows -> parse_expression)
loop per stmt
TA->>Class: classify(stmt)
Class-->>TA: operation, table_names
end
TA->>Polars: build/analyze Operations
Polars-->>TA: aggregated results
TA-->>CLI: output rendered
deactivate TA
CLI-->>User: display table usage/traffic
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Nitpick comments (7)
cratedb_toolkit/info/job.py (7)
41-56
: Parameterize SQL to avoid SQL injection vectors and improve readability.Even though
begin
/end
are computed locally, binding parameters is safer and consistent with the rest of the codebase.Apply this diff:
def read_jobs_database(self, begin: int = 0, end: int = 0): logger.info("Reading sys.jobs_log") now = int(time.time() * 1000) end = end or now begin = begin or now - 600 * 60 * 1000 - stmt = ( - f"SELECT " - f"started, ended, classification, stmt, username, node " - f"FROM sys.jobs_log " - f"WHERE " - f"stmt NOT LIKE '%sys.%' AND " - f"stmt NOT LIKE '%information_schema.%' " - f"AND ended BETWEEN {begin} AND {end} " - f"ORDER BY ended ASC" - ) - return self.adapter.run_sql(stmt, records=True) + stmt = ( + "SELECT started, ended, classification, stmt, username, node " + "FROM sys.jobs_log " + "WHERE stmt NOT LIKE :sys AND stmt NOT LIKE :info " + "AND ended BETWEEN :begin AND :end " + "ORDER BY ended ASC" + ) + params = {"sys": "%sys.%", "info": "%information_schema.%", "begin": begin, "end": end} + return self.adapter.run_sql(stmt, parameters=params, records=True)
1-12
: Pick one data model library (attrs or dataclasses) for consistency.This module mixes
@attr.define
and@dataclasses.dataclass
. Prefer one approach across the module for maintainability.If the project favors
attrs
, convertSqlStatementClassifier
to@attr.define
. Otherwise, convertOperation
/Operations
to@dataclasses.dataclass
. I can provide a patch if you confirm the preferred style.
47-54
: Time window readability.
600 * 60 * 1000
is “10 hours” but non-obvious. Consider10 * 60 * 60 * 1000
or usedatetime.timedelta(hours=10)
to compute milliseconds for clarity.I can send a small patch using
timedelta
if you prefer.
27-34
: Unit-test coverage for aggregation and rendering paths.Add tests that:
- verify a single statement with two tables yields two counts after explode;
- verify duplicates within the same statement are counted once (via your new dedup logic);
- ensure
render()
returns a list of dicts suitable forjd()
.I can scaffold
tests/info/test_table_traffic.py
with fixtures forsys.jobs_log
rows and mocked adapter responses.Also applies to: 83-88
30-30
: Remove prints per Ruff T201.All output should go through the structured return and CLI serialization.
Also applies to: 33-33
21-21
: Remove commented-out code per Ruff ERA001.Delete the commented fields/lines to keep the module clean.
Also applies to: 31-31
151-156
: Leverage FQN change upstream; consider normalization.
flatten(get_table_names(self.expression))
now yields FQNs. If you anticipate comparing or joining across sources, consider normalizing (e.g., lowercasing schema/table where appropriate) to avoid case/quote mismatches, especially when tables are referenced with quotes.I can provide a helper to canonicalize identifiers if needed.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
cratedb_toolkit/info/cli.py
(2 hunks)cratedb_toolkit/info/job.py
(1 hunks)cratedb_toolkit/util/database.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
cratedb_toolkit/info/cli.py (3)
cratedb_toolkit/cluster/core.py (5)
info
(208-211)DatabaseCluster
(623-676)from_options
(636-640)adapter
(399-405)adapter
(484-488)cratedb_toolkit/info/job.py (2)
TableTraffic
(36-87)render
(82-87)cratedb_toolkit/util/app.py (1)
cli
(21-50)
cratedb_toolkit/info/job.py (1)
cratedb_toolkit/util/database.py (4)
DatabaseAdapter
(43-421)get_table_names
(477-488)run_sql
(35-36)run_sql
(109-137)
🪛 Ruff (0.12.2)
cratedb_toolkit/info/cli.py
95-95: Local variable scrub
is assigned to but never used
Remove assignment to unused variable scrub
(F841)
cratedb_toolkit/info/job.py
21-21: Found commented-out code
Remove commented-out code
(ERA001)
30-30: print
found
Remove print
(T201)
31-31: Found commented-out code
Remove commented-out code
(ERA001)
33-33: print
found
Remove print
(T201)
47-54: Possible SQL injection vector through string-based query construction
(S608)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: build-and-publish
- GitHub Check: build-and-test
- GitHub Check: Generic: Python 3.9 on OS ubuntu-latest
- GitHub Check: Generic: Python 3.13 on OS ubuntu-latest
- GitHub Check: Cloud: Python 3.13 on OS ubuntu-latest
- GitHub Check: CFR for OS ubuntu-latest
- GitHub Check: CFR for OS windows-latest
- GitHub Check: CFR for OS macos-13
- GitHub Check: CFR for OS macos-latest
🔇 Additional comments (3)
cratedb_toolkit/util/database.py (1)
478-489
: Ensure safe fallback to bare table names when fully-qualified names might break downstreamThe change from
table.name
totable.fqn
will now always surface schema‐qualified names (including surrounding quotes), which may not be expected by consumers like the Rockset adapter or MLflow/LangChain integrations. To avoid unintended breakage:• Update the docstring in
cratedb_toolkit/util/database.py
to reflect that the function returns fully-qualified names when available, but will fall back to the plain table name.
• Modify the loop at lines 480–488 incratedb_toolkit/util/database.py
to prefertable.fqn
yet safely fall back totable.name
.
• Manually review all callsites ofget_table_names(sql)
—notably in:
–cratedb_toolkit/info/job.py
(populatingtables_symbols
)
–cratedb_toolkit/adapter/rockset/server/api/query.py
(setting"collections"
)
—to confirm that passing FQNs (or bare names) behaves as intended in each context.Proposed diff in
cratedb_toolkit/util/database.py
:def get_table_names(sql: str) -> t.List[t.List[str]]: - """ - Decode table names from SQL statements. - """ + """ + Decode table names from SQL statements. + Returns fully-qualified names (schema.table) when available; + falls back to the plain table name otherwise. + """ names = [] statements = sqlparse_cratedb(sql) for statement in statements: local_names = [] for table in statement.metadata.tables: - local_names.append(table.fqn) + # Prefer FQN (with schema); fall back to the bare table name. + local_names.append(getattr(table, "fqn", None) or table.name) names.append(local_names) return namesPlease verify this change against each downstream consumer to ensure compatibility.
cratedb_toolkit/info/cli.py (1)
9-9
: Import looks fine.No issues with the new import. Keeping imports top-level matches existing style in this module.
cratedb_toolkit/info/job.py (1)
118-134
: Semantics of DQL detection look good.The
is_dql
logic combiningis_select and not is_camouflage
is a sensible baseline.
@make_command(cli, "table-traffic", "Display information about table use.") | ||
@click.pass_context | ||
def table_traffic(ctx: click.Context): | ||
""" | ||
Display ad hoc job information. | ||
""" | ||
scrub = ctx.meta.get("scrub", False) | ||
dc = DatabaseCluster.from_options(ctx.meta["address"]) | ||
traffic = TableTraffic(adapter=dc.adapter) | ||
traffic.render() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Wire scrub, return structured data, and emit JSON like other info commands.
scrub
is read but unused (Ruff F841).TableTraffic.render()
currently logs and prints in its implementation; the CLI should output structured JSON like the other commands.
Refactor to pass scrub
, return data, and call jd()
.
Apply this diff:
-@make_command(cli, "table-traffic", "Display information about table use.")
+@make_command(cli, "table-traffic", "Display information about table usage.")
@click.pass_context
def table_traffic(ctx: click.Context):
"""
- Display ad hoc job information.
+ Display aggregate table usage derived from sys.jobs_log.
"""
scrub = ctx.meta.get("scrub", False)
dc = DatabaseCluster.from_options(ctx.meta["address"])
- traffic = TableTraffic(adapter=dc.adapter)
- traffic.render()
+ traffic = TableTraffic(adapter=dc.adapter, scrub=scrub)
+ data = traffic.render()
+ jd(data)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
@make_command(cli, "table-traffic", "Display information about table use.") | |
@click.pass_context | |
def table_traffic(ctx: click.Context): | |
""" | |
Display ad hoc job information. | |
""" | |
scrub = ctx.meta.get("scrub", False) | |
dc = DatabaseCluster.from_options(ctx.meta["address"]) | |
traffic = TableTraffic(adapter=dc.adapter) | |
traffic.render() | |
@make_command(cli, "table-traffic", "Display information about table usage.") | |
@click.pass_context | |
def table_traffic(ctx: click.Context): | |
""" | |
Display aggregate table usage derived from sys.jobs_log. | |
""" | |
scrub = ctx.meta.get("scrub", False) | |
dc = DatabaseCluster.from_options(ctx.meta["address"]) | |
traffic = TableTraffic(adapter=dc.adapter, scrub=scrub) | |
data = traffic.render() | |
jd(data) |
🧰 Tools
🪛 Ruff (0.12.2)
95-95: Local variable scrub
is assigned to but never used
Remove assignment to unused variable scrub
(F841)
🤖 Prompt for AI Agents
In cratedb_toolkit/info/cli.py around lines 89-99, the CLI reads scrub but
doesn't use it and calls TableTraffic.render() which prints/logs instead of
returning structured data; update the function to pass scrub into TableTraffic
(e.g., TableTraffic(adapter=dc.adapter, scrub=scrub)), call the TableTraffic
method that returns structured data (assign it to a variable), emit that data
via jd(data) like the other info commands, and return the data so the scrub
variable is actually used and the output is JSON-serializable.
def foo(self): | ||
fj = [attr.asdict(j) for j in self.data] | ||
df = pl.from_records(fj) | ||
print(df) | ||
#grouped = df.group_by("tables_symbols").agg([pl.sum("tables_symbols"), pl.sum("op")]) | ||
grouped = df.sql("SELECT tables_symbols, COUNT(op) FROM self GROUP BY tables_symbols") | ||
print(grouped) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Replace prints and fragile DataFrame SQL with a deterministic aggregation API.
print
in library code is undesirable (Ruff T201).- Grouping by the list column
tables_symbols
counts per-statement arrays, not per-table usage. df.sql("... FROM self ...")
is brittle; prefer a stable Polars API.
Refactor to explode the list column, then group and count, and return a DataFrame.
Apply this diff:
- def foo(self):
- fj = [attr.asdict(j) for j in self.data]
- df = pl.from_records(fj)
- print(df)
- #grouped = df.group_by("tables_symbols").agg([pl.sum("tables_symbols"), pl.sum("op")])
- grouped = df.sql("SELECT tables_symbols, COUNT(op) FROM self GROUP BY tables_symbols")
- print(grouped)
+ def summarize_by_table(self) -> pl.DataFrame:
+ fj = [attr.asdict(j) for j in self.data]
+ if not fj:
+ return pl.DataFrame({"tables_symbols": [], "count": []})
+ df = pl.from_records(fj)
+ # Count statements per table (explode list column first, then group).
+ df = df.explode("tables_symbols")
+ return (
+ df.group_by("tables_symbols")
+ .agg(pl.len().alias("count"))
+ .sort("count", descending=True)
+ )
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def foo(self): | |
fj = [attr.asdict(j) for j in self.data] | |
df = pl.from_records(fj) | |
print(df) | |
#grouped = df.group_by("tables_symbols").agg([pl.sum("tables_symbols"), pl.sum("op")]) | |
grouped = df.sql("SELECT tables_symbols, COUNT(op) FROM self GROUP BY tables_symbols") | |
print(grouped) | |
def summarize_by_table(self) -> pl.DataFrame: | |
fj = [attr.asdict(j) for j in self.data] | |
if not fj: | |
return pl.DataFrame({"tables_symbols": [], "count": []}) | |
- df = pl.from_records(fj) | |
- # Count statements per table (explode list column first, then group). | |
- df = df.explode("tables_symbols") | |
- return ( | |
- df.group_by("tables_symbols") | |
- .agg(pl.len().alias("count")) | |
- .sort("count", descending=True) | |
df = pl.from_records(fj) | |
# Count statements per table (explode list column first, then group). | |
df = df.explode("tables_symbols") | |
return ( | |
df.group_by("tables_symbols") | |
.agg(pl.len().alias("count")) | |
.sort("count", descending=True) | |
) |
🧰 Tools
🪛 Ruff (0.12.2)
30-30: print
found
Remove print
(T201)
31-31: Found commented-out code
Remove commented-out code
(ERA001)
33-33: print
found
Remove print
(T201)
class TableTraffic: | ||
|
||
def __init__(self, adapter: DatabaseAdapter): | ||
self.adapter = adapter | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Pass scrub via constructor for consistent PII handling.
Expose scrub
on TableTraffic
to allow the CLI to control redaction.
Apply this diff:
-class TableTraffic:
-
- def __init__(self, adapter: DatabaseAdapter):
- self.adapter = adapter
+class TableTraffic:
+
+ def __init__(self, adapter: DatabaseAdapter, scrub: bool = False):
+ self.adapter = adapter
+ self.scrub = scrub
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
class TableTraffic: | |
def __init__(self, adapter: DatabaseAdapter): | |
self.adapter = adapter | |
class TableTraffic: | |
def __init__(self, adapter: DatabaseAdapter, scrub: bool = False): | |
self.adapter = adapter | |
self.scrub = scrub |
🤖 Prompt for AI Agents
In cratedb_toolkit/info/job.py around lines 36 to 40, TableTraffic's constructor
must accept and store a scrub parameter so the CLI can control PII redaction;
add a scrub parameter (typed appropriately, e.g., Callable[[str], str] or a
Scrubber type) to __init__, assign it to self.scrub, and update any code that
constructs TableTraffic to pass the desired scrub function (for example the
adapter's scrub or a CLI-provided scrub). Ensure the attribute is used wherever
redaction is performed and update type hints/imports as needed.
def read_jobs(self, jobs) -> List[Operation]: | ||
result = [] | ||
for job in jobs: | ||
sql = job["stmt"] | ||
result.append(self.parse_expression(sql)) | ||
return result | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Redact raw SQL when scrubbing and prepare data for aggregation.
Use scrub
to avoid leaking query text and deduplicate tables per statement to count per-table usage once per statement.
Apply this diff:
def read_jobs(self, jobs) -> List[Operation]:
result = []
for job in jobs:
sql = job["stmt"]
- result.append(self.parse_expression(sql))
+ op = self.parse_expression(sql)
+ if self.scrub:
+ op.stmt = None # redact raw SQL text
+ result.append(op)
return result
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def read_jobs(self, jobs) -> List[Operation]: | |
result = [] | |
for job in jobs: | |
sql = job["stmt"] | |
result.append(self.parse_expression(sql)) | |
return result | |
def read_jobs(self, jobs) -> List[Operation]: | |
result = [] | |
for job in jobs: | |
sql = job["stmt"] | |
op = self.parse_expression(sql) | |
if self.scrub: | |
op.stmt = None # redact raw SQL text | |
result.append(op) | |
return result |
🤖 Prompt for AI Agents
In cratedb_toolkit/info/job.py around lines 58 to 64, the code returns raw SQL
and may double-count tables; call scrub(sql) before parsing to avoid leaking
query text (e.g., scrubbed = scrub(sql)) and pass the scrubbed SQL to
parse_expression, then deduplicate any per-operation table list so each table is
counted once per statement (e.g., replace op.tables with a unique list/set)
before appending to the result; ensure the returned Operation objects contain
scrubbed SQL and deduplicated tables for aggregation.
@staticmethod | ||
def parse_expression(sql: str) -> Operation: | ||
logger.debug(f"Analyzing SQL: {sql}") | ||
classifier = SqlStatementClassifier(expression=sql) | ||
if not classifier.operation: | ||
logger.warning(f"Unable to determine operation: {sql}") | ||
if not classifier.table_names: | ||
logger.warning(f"Unable to determine table names: {sql}") | ||
return Operation( | ||
op=classifier.operation, | ||
stmt=sql, | ||
tables_symbols=classifier.table_names, | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Harden parsing: default unknown op, deduplicate table names.
Avoid None
for op
, and ensure tables per statement are unique to prevent overcounting.
Apply this diff:
@staticmethod
def parse_expression(sql: str) -> Operation:
logger.debug(f"Analyzing SQL: {sql}")
classifier = SqlStatementClassifier(expression=sql)
if not classifier.operation:
logger.warning(f"Unable to determine operation: {sql}")
if not classifier.table_names:
logger.warning(f"Unable to determine table names: {sql}")
- return Operation(
- op=classifier.operation,
+ table_names = list(dict.fromkeys(classifier.table_names))
+ return Operation(
+ op=classifier.operation or "UNKNOWN",
stmt=sql,
- tables_symbols=classifier.table_names,
+ tables_symbols=table_names,
)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
@staticmethod | |
def parse_expression(sql: str) -> Operation: | |
logger.debug(f"Analyzing SQL: {sql}") | |
classifier = SqlStatementClassifier(expression=sql) | |
if not classifier.operation: | |
logger.warning(f"Unable to determine operation: {sql}") | |
if not classifier.table_names: | |
logger.warning(f"Unable to determine table names: {sql}") | |
return Operation( | |
op=classifier.operation, | |
stmt=sql, | |
tables_symbols=classifier.table_names, | |
) | |
@staticmethod | |
def parse_expression(sql: str) -> Operation: | |
logger.debug(f"Analyzing SQL: {sql}") | |
classifier = SqlStatementClassifier(expression=sql) | |
if not classifier.operation: | |
logger.warning(f"Unable to determine operation: {sql}") | |
if not classifier.table_names: | |
logger.warning(f"Unable to determine table names: {sql}") | |
table_names = list(dict.fromkeys(classifier.table_names)) | |
return Operation( | |
op=classifier.operation or "UNKNOWN", | |
stmt=sql, | |
tables_symbols=table_names, | |
) |
🤖 Prompt for AI Agents
In cratedb_toolkit/info/job.py around lines 65 to 78, the parse_expression
function currently can return None for the operation and duplicate table names;
ensure you set a default op value (e.g., "UNKNOWN" or Operation.UNKNOWN) when
classifier.operation is falsy, and replace the tables_symbols assignment with a
deduplicated list preserving order (e.g., use dict.fromkeys or an OrderedDict)
so repeated table names are collapsed before constructing the Operation; keep
the existing logging but still return the safe default op and the unique tables
list.
def analyze_jobs(self, ops: Operations): | ||
ops.foo() | ||
|
||
def render(self): | ||
jobs = self.read_jobs_database() | ||
logger.info(f"Analyzing {len(jobs)} jobs") | ||
ops = Operations(self.read_jobs(jobs)) | ||
jobsa = self.analyze_jobs(ops) | ||
logger.info(f"Result: {jobsa}") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Make analyze/render return structured results; avoid side-effect-only logging.
Currently analyze_jobs
returns None
and render()
logs that None
. Return the computed summary and let the CLI serialize it.
Apply this diff:
- def analyze_jobs(self, ops: Operations):
- ops.foo()
+ def analyze_jobs(self, ops: Operations) -> pl.DataFrame:
+ return ops.summarize_by_table()
- def render(self):
+ def render(self):
jobs = self.read_jobs_database()
logger.info(f"Analyzing {len(jobs)} jobs")
- ops = Operations(self.read_jobs(jobs))
- jobsa = self.analyze_jobs(ops)
- logger.info(f"Result: {jobsa}")
+ ops = Operations(self.read_jobs(jobs))
+ summary_df = self.analyze_jobs(ops)
+ return summary_df.to_dicts()
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def analyze_jobs(self, ops: Operations): | |
ops.foo() | |
def render(self): | |
jobs = self.read_jobs_database() | |
logger.info(f"Analyzing {len(jobs)} jobs") | |
ops = Operations(self.read_jobs(jobs)) | |
jobsa = self.analyze_jobs(ops) | |
logger.info(f"Result: {jobsa}") | |
def analyze_jobs(self, ops: Operations) -> pl.DataFrame: | |
return ops.summarize_by_table() | |
def render(self): | |
jobs = self.read_jobs_database() | |
logger.info(f"Analyzing {len(jobs)} jobs") | |
- ops = Operations(self.read_jobs(jobs)) | |
- jobsa = self.analyze_jobs(ops) | |
ops = Operations(self.read_jobs(jobs)) | |
summary_df = self.analyze_jobs(ops) | |
return summary_df.to_dicts() |
🤖 Prompt for AI Agents
In cratedb_toolkit/info/job.py around lines 79 to 88, analyze_jobs currently
calls ops.foo() and returns None, and render() logs that None; change
analyze_jobs to compute and return a structured summary (e.g., dict or
dataclass) of the operations instead of returning None, and update render() to
call result = self.analyze_jobs(ops), stop printing/logging the raw None, and
return that structured result so the CLI can serialize it; ensure no
side-effect-only logging remains — keep logging minimal or move it to the
caller.
def operation(self) -> str: | ||
""" | ||
The SQL operation: SELECT, INSERT, UPDATE, DELETE, CREATE, etc. | ||
""" | ||
parsed = self.parse_sqlparse() | ||
return parsed[0].get_type().upper() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against empty parses in operation
to prevent IndexError.
sqlparse.parse("")
returns an empty list; parsed[0]
would crash.
Apply this diff:
@property
def operation(self) -> str:
"""
The SQL operation: SELECT, INSERT, UPDATE, DELETE, CREATE, etc.
"""
- parsed = self.parse_sqlparse()
- return parsed[0].get_type().upper()
+ parsed = self.parse_sqlparse()
+ if not parsed:
+ return ""
+ return (parsed[0].get_type() or "").upper()
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def operation(self) -> str: | |
""" | |
The SQL operation: SELECT, INSERT, UPDATE, DELETE, CREATE, etc. | |
""" | |
parsed = self.parse_sqlparse() | |
return parsed[0].get_type().upper() | |
@property | |
def operation(self) -> str: | |
""" | |
The SQL operation: SELECT, INSERT, UPDATE, DELETE, CREATE, etc. | |
""" | |
parsed = self.parse_sqlparse() | |
if not parsed: | |
return "" | |
return (parsed[0].get_type() or "").upper() |
🤖 Prompt for AI Agents
In cratedb_toolkit/info/job.py around lines 143-149, the operation() method
currently assumes sqlparse.parse() returns at least one statement and accessing
parsed[0] can raise IndexError for empty SQL; update the method to check whether
parsed is non-empty before accessing parsed[0] and return an empty string (or a
sensible default) if it is empty, otherwise call get_type().upper() on the first
parsed statement; ensure the guard covers both an empty list and any unexpected
None so no IndexError occurs.
def is_select_into(self) -> bool: | ||
""" | ||
Use traditional `sqlparse` for catching `SELECT ... INTO ...` statements. | ||
Examples: | ||
SELECT * INTO foobar FROM bazqux | ||
SELECT * FROM bazqux INTO foobar | ||
""" | ||
# Flatten all tokens (including nested ones) and match on type+value. | ||
statement = self.parse_sqlparse()[0] | ||
return any( | ||
token.ttype is Keyword and token.value.upper() == "INTO" | ||
for token in statement.flatten() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Guard is_select_into
against empty parses.
Prevent potential IndexError
when no statements are parsed.
Apply this diff:
# Flatten all tokens (including nested ones) and match on type+value.
- statement = self.parse_sqlparse()[0]
+ parsed = self.parse_sqlparse()
+ if not parsed:
+ return False
+ statement = parsed[0]
return any(
token.ttype is Keyword and token.value.upper() == "INTO"
for token in statement.flatten()
)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def is_select_into(self) -> bool: | |
""" | |
Use traditional `sqlparse` for catching `SELECT ... INTO ...` statements. | |
Examples: | |
SELECT * INTO foobar FROM bazqux | |
SELECT * FROM bazqux INTO foobar | |
""" | |
# Flatten all tokens (including nested ones) and match on type+value. | |
statement = self.parse_sqlparse()[0] | |
return any( | |
token.ttype is Keyword and token.value.upper() == "INTO" | |
for token in statement.flatten() | |
) | |
def is_select_into(self) -> bool: | |
""" | |
Use traditional `sqlparse` for catching `SELECT ... INTO ...` statements. | |
Examples: | |
SELECT * INTO foobar FROM bazqux | |
SELECT * FROM bazqux INTO foobar | |
""" | |
# Flatten all tokens (including nested ones) and match on type+value. | |
parsed = self.parse_sqlparse() | |
if not parsed: | |
return False | |
statement = parsed[0] | |
return any( | |
token.ttype is Keyword and token.value.upper() == "INTO" | |
for token in statement.flatten() | |
) |
🤖 Prompt for AI Agents
In cratedb_toolkit/info/job.py around lines 165 to 177, the is_select_into
method assumes parse_sqlparse() always returns at least one statement and can
raise IndexError; add a guard to handle empty or falsy parses by checking the
result (e.g., assign statements = self.parse_sqlparse(); if not statements:
return False) before accessing statements[0], then proceed with the existing
flatten-and-check logic so the method safely returns False when no statements
are parsed.
About
Find out about about table usage frequency.
References