Sanitize and refactor repeated utility functions#243
Sanitize and refactor repeated utility functions#243FedericoTartarini merged 9 commits intoCenterForTheBuiltEnvironment:developmentfrom
Conversation
…ring in template_graphs.py to reduce duplication
…ult value lists, and data range calculations
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughTrimmed development dependencies in Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant UI as UI / Pages
participant Extract as extract_df.py
participant Utils as utils.py
participant Template as template_graphs.py
participant Plot as plotting modules
UI->>Extract: create_df / prepare data
activate Extract
Extract->>Extract: expand_to_hours(...)
Extract->>Extract: utci_calc(...) / add_utci_variants(...)
Extract->>Extract: add_utci_categories(...)
Extract->>Extract: temperature(...) for adaptive cols
deactivate Extract
UI->>Template: request plot data (filter by month/hour)
activate Template
Template->>Template: time_filtering(df, start, end, time_col, target_col)
Template->>Utils: get_max_min_value(series)
deactivate Template
Template->>Plot: render (violin/heatmap/profile) with ranges
activate Plot
Plot->>Utils: get_max_min_value(series) (non-global)
deactivate Plot
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
pages/natural_ventilation.py (1)
347-357: Fix empty-data check column and typo in user-facing alert.The check should use the plotted variable column, not MONTH. Also fix “thedew-point” spacing.
- if df.dropna(subset=[ColNames.MONTH]).shape[0] == 0: + if df.dropna(subset=[var]).shape[0] == 0: return ( dbc.Alert( "Natural ventilation is not available in this location under these" " conditions. Please either select a different outdoor dry-bulb air" " temperature range, change the month and hour filter, or increase" - " thedew-point temperature.", + " the dew-point temperature.", color="danger", style={"text-align": "center", "marginTop": "2rem"}, ), )pages/lib/charts_data_explorer.py (1)
24-31: Avoid SettingWithCopy; use .loc to mutate safely.Chained assignment can silently fail. Consolidate the mask and use .loc.
- if data_filter: - if min_val <= max_val: - mask = (df[filter_var] < min_val) | (df[filter_var] > max_val) - df[var][mask] = None - else: - mask = (df[filter_var] >= max_val) & (df[filter_var] <= min_val) - df[var][mask] = None + if data_filter: + if min_val <= max_val: + mask = (df[filter_var] < min_val) | (df[filter_var] > max_val) + else: + mask = (df[filter_var] >= max_val) & (df[filter_var] <= min_val) + df.loc[mask, var] = None
🧹 Nitpick comments (11)
pages/lib/utils.py (1)
295-301: Harden helpers against invalid input (base<=0) and document behavior.Add a simple guard and a short docstring so misuse fails fast.
-def get_data_max(series, base=5): - return base * math.ceil(series.max() / base) +def get_data_max(series, base=5): + """Round the series max up to the nearest multiple of `base` (inclusive).""" + if base <= 0: + raise ValueError("base must be > 0") + return base * math.ceil(series.max() / base) -def get_data_min(series, base=5): - return base * math.floor(series.min() / base) +def get_data_min(series, base=5): + """Round the series min down to the nearest multiple of `base` (inclusive).""" + if base <= 0: + raise ValueError("base must be > 0") + return base * math.floor(series.min() / base)pages/lib/template_graphs.py (1)
865-870: Optional: assign the return from time_filtering for clarity.Function mutates in place, but reassigning improves readability and future refactors.
- # Month filter - time_filtering(df, start_month, end_month, ColNames.MONTH, var) - # Hour filter - time_filtering(df, start_hour, end_hour, ColNames.HOUR, var) + # Month filter + df = time_filtering(df, start_month, end_month, ColNames.MONTH, var) + # Hour filter + df = time_filtering(df, start_hour, end_hour, ColNames.HOUR, var)pages/lib/extract_df.py (5)
80-83: UTCI bins/labels: confirm scheme and edge handling.
- Ensure these bins match the intended UTCI category thresholds and cover the full numeric range including NaNs. Consider defining as numpy arrays for faster take/lookups later.
Would you like me to add a short docstring table mapping ranges to labels and unit notes (°C, % RH, m/s)?
85-88: utci_calc: consider explicit input validation and limit_inputs behavior.
- Validate columns exist; otherwise raise with a clear message.
- Decide whether to pass limit_inputs=False to pythermalcomfort.models.utci to avoid implicit clipping.
Proposed guard:
def utci_calc(df: pd.DataFrame, t_air_col, t_rad_col, wind_col, rh_col=ColNames.RH): - """Call utci() using values from df columns.""" - return utci(df[t_air_col], df[t_rad_col], df[wind_col], df[rh_col]) + """Call utci() using values from df columns.""" + missing = [c for c in (t_air_col, t_rad_col, wind_col, rh_col) if c not in df] + if missing: + raise KeyError(f"UTCI calc missing columns: {missing}") + return utci(df[t_air_col], df[t_rad_col], df[wind_col], df[rh_col]) # consider: limit_inputs=False
90-119: add_utci_variants: solid consolidation; add type hints/docs and preflight check.
- Nice centralization. Add a preflight check that all recipe columns exist before assignment to fail fast.
296-306: check_value default length can desync with df length; derive dynamically.
- Using a hardcoded 8760 risks mismatch if the data slice ever changes. Generate arrays sized from epw_df instead.
Apply this diff:
- sharp = check_value(45) + sharp = [45] * len(epw_df) sol_radiation_dir = epw_df[ColNames.DIR_NOR_RAD] - sol_transmittance = check_value(1) # CHECK VALUE - f_svv = check_value(1) # CHECK VALUE - f_bes = check_value(1) # CHECK VALUE - asw = check_value(0.7) # CHECK VALUE - posture = check_value("standing") - floor_reflectance = check_value(0.6) # EXPOSE AS A VARIABLE? + sol_transmittance = [1] * len(epw_df) # CHECK VALUE + f_svv = [1] * len(epw_df) # CHECK VALUE + f_bes = [1] * len(epw_df) # CHECK VALUE + asw = [0.7] * len(epw_df) # CHECK VALUE + posture = ["standing"] * len(epw_df) + floor_reflectance = [0.6] * len(epw_df) # Consider exposing via configAlternatively, add a helper check_value_like(df, value).
445-447: check_value: consider numpy and naming.
- Return numpy arrays for downstream vector ops and rename to reflect behavior (e.g., fill_hours).
-def check_value(value, hours=8760): - return [value] * hours +def check_value(value, hours=8760): + return np.full(hours, value)pages/psy-chart.py (2)
13-17: Imports: avoid coupling UI to data-extract module for a single helper.
- Importing temperature from extract_df pulls in heavy deps. Consider moving unit helpers to a small, standalone utils module.
364-365: C→F conversion in loop works; minor performance/readability tweak.
- Using the helper per element is fine; alternatively, vectorize once: dbt_list_convert = [(x*1.8+32) for x in dbt_list_convert].
Additionally, if coloring by UTCI categories, remove .2f from hover for that var or ensure source provides numeric values (see extract_df fix).
Pipfile (2)
7-25: Trim runtime deps; move tooling to dev; drop deprecated Dash extras.
- pre-commit, black, bump2version, ruff, pytest, jsbeautifier, editorconfig, cleanpy should be under [dev-packages] only.
- dash-core-components, dash-html-components, dash-table are bundled in dash≥2 and can be removed.
- Prefer tenacity over retrying; having both is redundant.
Proposed edits:
[packages] -pre-commit = "*" -ansi2html = "==1.9.2" -black = "==25.1.0" -bump2version = "==1.0.1" -... -dash-core-components = "==2.0.0" -dash-html-components = "==2.0.0" -dash-table = "==5.0.0" -... -jsbeautifier = "==1.15.4" -... -pytest = "==8.4.1" -... -ruff = "==0.12.7" -... -retrying = "==1.4.1" +dash = "==2.15.0" +# keep only runtime libs here (dash-bootstrap-components, dash-extensions, numpy, pandas, plotly, pvlib, pythermalcomfort, requests, scipy, flask, flask-caching, etc.) [dev-packages] -cleanpy = "*" -pytest = "*" -bump2version = "*" -black = "*" -ruff = "*" -pre-commit = "*" +pre-commit = "*" +black = "*" +ruff = "*" +pytest = "*" +bump2version = "*" +cleanpy = "*" +jsbeautifier = "*" +ansi2html = "*"Also applies to: 27-67
55-56: Pinning tzdata/pytz is fine; ensure single time lib usage.
- Prefer dateutil/tzdata over pytz in new code; avoid mixed tz libs to reduce surprises.
Also applies to: 63-64
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
Pipfile.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
Pipfile(1 hunks)pages/lib/charts_data_explorer.py(3 hunks)pages/lib/charts_sun.py(3 hunks)pages/lib/extract_df.py(5 hunks)pages/lib/template_graphs.py(8 hunks)pages/lib/utils.py(2 hunks)pages/natural_ventilation.py(2 hunks)pages/psy-chart.py(3 hunks)
🔇 Additional comments (13)
pages/lib/utils.py (1)
4-4: LGTM on import.Importing math is appropriate for the new helpers.
pages/natural_ventilation.py (1)
18-21: LGTM on centralized range helpers import.Nice consolidation; reduces duplication.
pages/lib/charts_sun.py (2)
2-2: LGTM on math import trimming.Only importing what’s needed keeps things tidy.
9-12: LGTM on using shared range helpers.Consistent with the refactor goals.
pages/lib/charts_data_explorer.py (3)
4-7: LGTM on shared helpers import.Matches the PR objective.
47-49: LGTM on localized range computation.custom_heatmap already checks for empty data (Line 32), so this is safe.
125-127: LGTM on three_var_graph range change.Consistent with the refactor.
pages/lib/template_graphs.py (1)
7-10: LGTM on shared helpers import.Keeps range logic centralized.
pages/lib/extract_df.py (2)
338-341: Good: variant + category pipeline clearly sequenced.
- This improves readability and reuse.
429-433: Gating for adaptive temperature conversion already handled convert_data is only invoked inside the if si_ip_input == UnitSystem.IP branch in pages/select.py (lines 227–229), so adaptive columns aren’t converted for SI. No extra guard needed.Likely an incorrect or invalid review comment.
pages/psy-chart.py (1)
327-329: LGTM: centralized range rounding via utils.
- Consistent with other modules.
Pipfile (2)
27-38: Compatibility Verified: Flask 2.3.3 with Werkzeug 3.0.6 — pip’s resolver and runtime import succeed; no changes required.
37-43: Numba & llvmlite wheels confirmed for Python 3.11 on all major platforms
Both numba==0.61.2 and llvmlite==0.44.0 publish pre-built wheels for CPython 3.11 on Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows x86_64, and pip download succeeded on Linux (Python 3.11.2) — no source builds required. (pypi.org)
pages/lib/template_graphs.py
Outdated
| def time_filtering(df, start_time, end_time, time_col, target_col): | ||
| if start_time <= end_time: | ||
| mask = (df[time_col] < start_time) | (df[time_col] > end_time) | ||
| else: | ||
| mask = (df[time_col] >= end_time) & (df[time_col] <= start_time) | ||
| df.loc[mask, target_col] = None | ||
| return df | ||
|
|
There was a problem hiding this comment.
Wrap-around time filter masks endpoints; use strict inequalities.
For ranges like 20→6, endpoints (20 and 6) must be kept. Current mask excludes them. Switch to strict inner-band masking.
def time_filtering(df, start_time, end_time, time_col, target_col):
if start_time <= end_time:
mask = (df[time_col] < start_time) | (df[time_col] > end_time)
else:
- mask = (df[time_col] >= end_time) & (df[time_col] <= start_time)
+ # keep values >= start_time or <= end_time; mask the inner band exclusively
+ mask = (df[time_col] > end_time) & (df[time_col] < start_time)
df.loc[mask, target_col] = None
return df📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def time_filtering(df, start_time, end_time, time_col, target_col): | |
| if start_time <= end_time: | |
| mask = (df[time_col] < start_time) | (df[time_col] > end_time) | |
| else: | |
| mask = (df[time_col] >= end_time) & (df[time_col] <= start_time) | |
| df.loc[mask, target_col] = None | |
| return df | |
| def time_filtering(df, start_time, end_time, time_col, target_col): | |
| if start_time <= end_time: | |
| # non–wrap-around: mask values strictly before start_time or strictly after end_time | |
| mask = (df[time_col] < start_time) | (df[time_col] > end_time) | |
| else: | |
| # wrap-around: keep values >= start_time or <= end_time; mask the inner band exclusively | |
| mask = (df[time_col] > end_time) & (df[time_col] < start_time) | |
| df.loc[mask, target_col] = None | |
| return df |
🤖 Prompt for AI Agents
In pages/lib/template_graphs.py around lines 849 to 856, the wrap-around branch
currently masks the interval inclusively ((df[time_col] >= end_time) &
(df[time_col] <= start_time)), which removes the endpoints for ranges like 20→6;
change the wrap-around mask to use strict inequalities so endpoints are
preserved by using (df[time_col] > end_time) & (df[time_col] < start_time) while
leaving the non-wrap branch as-is.
FedericoTartarini
left a comment
There was a problem hiding this comment.
Please carefully review my comments, address them, then mark the comment as solved.
There was a problem hiding this comment.
why did you delete this documentation?
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (4)
pages/lib/template_graphs.py (3)
865-889: Add type hints to new helper for consistency with project guidance.Previous review asked for annotations; add them here too.
-def filter_df_by_month_and_hour( - df, time_filter, month, hour, invert_month, invert_hour, var -): +def filter_df_by_month_and_hour( + df: pd.DataFrame, + time_filter: bool, + month: list[int] | tuple[int, int], + hour: list[int] | tuple[int, int], + invert_month: list[str] | list[int] | list[object], + invert_hour: list[str] | list[int] | list[object], + var: str, +) -> pd.DataFrame:
841-860: Preserve endpoints for wrap-around ranges; tighten inequalities.Current wrap-around mask excludes endpoints (e.g., 20 and 6 for 20→6). Use strict inner-band mask.
- if start_time <= end_time: - mask = (df[time_col] < start_time) | (df[time_col] > end_time) - else: - mask = (df[time_col] >= end_time) & (df[time_col] <= start_time) + if start_time <= end_time: + mask = (df[time_col] < start_time) | (df[time_col] > end_time) + else: + # keep values >= start_time or <= end_time; mask the inner band exclusively + mask = (df[time_col] > end_time) & (df[time_col] < start_time)
428-429: Same guard for heatmap z-range.- data_max, data_min = get_max_min_value(df[var]) - range_z = [data_min, data_max] + series = pd.to_numeric(df[var], errors="coerce").dropna() + if series.empty: + range_z = var_range + else: + data_max, data_min = get_max_min_value(series) + if data_min is not None and data_max is not None: + if data_min == data_max: + pad = 0.5 if data_min == 0 else abs(data_min) * 0.05 + data_min, data_max = data_min - pad, data_max + pad + range_z = [data_min, data_max]pages/lib/extract_df.py (1)
121-132: Bug: pd.cut creates Categorical dtype; convert to numeric for Plotly.Current pd.cut returns categorical values, which break numeric color scales/formatting downstream (psy-chart). Map bins to numeric labels or midpoints.
Apply:
- for src_col, dst_col in mapping.items(): - df[dst_col] = pd.cut(x=df[src_col], bins=UTCI_BINS, labels=UTCI_LABELS) + for src_col, dst_col in mapping.items(): + # numeric codes; keep NaN for out-of-range + codes = pd.cut(x=df[src_col], bins=UTCI_BINS, labels=False, include_lowest=True) + df[dst_col] = np.where( + pd.isna(codes), + np.nan, + np.take(UTCI_LABELS, codes.astype(int)) + ).astype("float32")
🧹 Nitpick comments (5)
pages/lib/extract_df.py (5)
85-88: Clarify units in the docstring.Explicitly state expected units: t_air/tr in °C, wind in m/s, RH in %. This prevents silent misuses.
Apply:
def utci_calc(df: pd.DataFrame, t_air_col, t_rad_col, wind_col, rh_col=ColNames.RH): - """Call utci() using values from df columns.""" + """Call utci() using values from df columns. + + Units: t_air/tr [°C], wind [m/s], RH [%]. + """ return utci(df[t_air_col], df[t_rad_col], df[wind_col], df[rh_col])
298-306: Hard-coded “CHECK VALUE” defaults should be configurable.These constants materially affect MRT/UTCI. Promote them to module-level constants with comments/units and/or plumb as function args so callers can override. Also consider leap years (8784) if/when the time range changes.
Example:
+SHARP_DEG = 45.0 # degrees +SOL_TRANSMITTANCE = 1.0 # unitless +F_SVV = 1.0 # unitless +F_BES = 1.0 # unitless +ASW = 0.7 # absorptivity, unitless +POSTURE = "standing" +FLOOR_REFLECTANCE = 0.6 # unitless ... - sharp = expand_to_hours(45) + sharp = expand_to_hours(SHARP_DEG) ... - sol_transmittance = expand_to_hours(1) # CHECK VALUE - f_svv = expand_to_hours(1) # CHECK VALUE - f_bes = expand_to_hours(1) # CHECK VALUE - asw = expand_to_hours(0.7) # CHECK VALUE - posture = expand_to_hours("standing") - floor_reflectance = expand_to_hours(0.6) # EXPOSE AS A VARIABLE? + sol_transmittance = expand_to_hours(SOL_TRANSMITTANCE) + f_svv = expand_to_hours(F_SVV) + f_bes = expand_to_hours(F_BES) + asw = expand_to_hours(ASW) + posture = expand_to_hours(POSTURE) + floor_reflectance = expand_to_hours(FLOOR_REFLECTANCE)
429-433: Risk of double temperature conversion + function naming.You now always convert these five adaptive columns to °F before applying mapping_json conversions. If mapping_json also specifies a temperature conversion for any of these keys, they’ll be converted twice.
- Prefer driving all conversions via mapping_json (single source of truth), or gate these calls behind a flag.
- Also, per prior feedback, rename the generic temperature() to a more explicit convert_t_to_f() and keep a backward-compat alias.
Patch calls:
- temperature(df, ColNames.ADAPTIVE_COMFORT) - temperature(df, ColNames.ADAPTIVE_CMF_80_LOW) - temperature(df, ColNames.ADAPTIVE_CMF_80_UP) - temperature(df, ColNames.ADAPTIVE_CMF_90_LOW) - temperature(df, ColNames.ADAPTIVE_CMF_90_UP) + convert_t_to_f(df, ColNames.ADAPTIVE_COMFORT) + convert_t_to_f(df, ColNames.ADAPTIVE_CMF_80_LOW) + convert_t_to_f(df, ColNames.ADAPTIVE_CMF_80_UP) + convert_t_to_f(df, ColNames.ADAPTIVE_CMF_90_LOW) + convert_t_to_f(df, ColNames.ADAPTIVE_CMF_90_UP)And add (outside this hunk):
def convert_t_to_f(df: pd.DataFrame, name: str) -> None: """Convert temperature column from °C to °F in place.""" df[name] = df[name].astype(float) * 1.8 + 32 # Back-compat alias if mapping_json still references "temperature" temperature = convert_t_to_f
435-442: Avoid double json.loads and speed up the loop.Minor perf/readability improvement.
Apply:
- mapping_dict = json.loads(mapping_json) - for key in json.loads(mapping_json): + mapping_dict = json.loads(mapping_json) + for key in mapping_dict: if ColNames.CONVERSION_FUNCTION in mapping_dict[key]: conversion_function_name = mapping_dict[key][ColNames.CONVERSION_FUNCTION] if conversion_function_name is not None: conversion_function = globals()[conversion_function_name] conversion_function(df, key)
445-456: Return numpy arrays for better interoperability and optional auto-size.Lists work, but numpy arrays are faster and match vectorized code better. Optionally allow hours=None to auto-size from a reference array (e.g., df length).
Apply:
-def expand_to_hours(value, hours=8760): - """Return a list with the input value repeated for a given number of hours. +def expand_to_hours(value, hours=8760): + """Return a numpy array with the input value repeated for `hours`. @@ - Returns: - A list containing the value repeated `hours` times. + Returns: + A numpy array containing the value repeated `hours` times. """ - return [value] * hours + return np.full(shape=hours, fill_value=value)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
Pipfile.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
Pipfile(0 hunks)docs/contributing/contributing.md(0 hunks)pages/lib/charts_data_explorer.py(3 hunks)pages/lib/charts_sun.py(3 hunks)pages/lib/extract_df.py(5 hunks)pages/lib/template_graphs.py(7 hunks)pages/lib/utils.py(2 hunks)pages/natural_ventilation.py(2 hunks)pages/psy-chart.py(3 hunks)
💤 Files with no reviewable changes (2)
- docs/contributing/contributing.md
- Pipfile
🚧 Files skipped from review as they are similar to previous changes (3)
- pages/lib/charts_sun.py
- pages/psy-chart.py
- pages/lib/charts_data_explorer.py
🔇 Additional comments (6)
pages/lib/utils.py (1)
4-4: Import is appropriate.math is required by the new rounding helper.
pages/natural_ventilation.py (1)
18-18: Good centralization.Switching to the shared get_max_min_value keeps range logic consistent.
pages/lib/template_graphs.py (1)
7-7: OK to import shared helper.pages/lib/extract_df.py (3)
80-83: Validate UTCI bin edges and label semantics against UI/color mapping.Bins and labels look consistent (11 edges → 10 labels). Ensure these stay in sync with any Plotly color scales and legends elsewhere, or category thresholds will drift.
90-119: LGTM: clean, DRY UTCI variants generation.The recipe-driven loop is clear and extensible.
338-341: LGTM: centralizing UTCI variant & category pipeline.Nice reduction of duplication and a clearer flow.
There was a problem hiding this comment.
Pleas revert the chages to the Pipfile.lock and I will then accept the Pull request @Sallie00
78daff4
into
CenterForTheBuiltEnvironment:development

Refactored repeated calculations
DRY principle improvements
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Documentation
Chores