Skip to content

Comments

fix problems with the new versions#49

Merged
KFilippopolitis merged 1 commit intomainfrom
monetdb-toggle
Oct 30, 2025
Merged

fix problems with the new versions#49
KFilippopolitis merged 1 commit intomainfrom
monetdb-toggle

Conversation

@KFilippopolitis
Copy link
Contributor

No description provided.

@KFilippopolitis KFilippopolitis merged commit f8b00c3 into main Oct 30, 2025
3 checks passed
@KFilippopolitis KFilippopolitis deleted the monetdb-toggle branch October 30, 2025 12:45
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +399 to +410
def _validate_datasets(self, csv_path, cdes, copy_from_file):
csv_columns = pd.read_csv(csv_path, nrows=0).columns.tolist()
if DATASET_COLUMN_NAME not in csv_columns:
raise InvalidDatasetError(
"The 'dataset' column is required to exist in the csv."
)
metadata_table = MetadataTable.from_db(data_model, self.sqlite_db)
cdes = metadata_table.table

if self.monetdb is None:
LOGGER.debug("MonetDB disabled – skipping primary data table creation.")
df = pd.read_csv(csv_path, usecols=["dataset"])
unique_datasets = df["dataset"].unique().tolist()
return unique_datasets

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skipping dataset schema validation when MonetDB is disabled

When ValidateDataset._validate_datasets sees self.monetdb is None it now returns immediately after reading only the dataset column. This short‑circuit means that all other validations (type checks, min/max boundaries, enumeration validations other than dataset name) are never executed whenever the CLI is run with --no-monetdb. Previously the same method always invoked validate_csv/validate_csv_with_volume, so invalid rows would be rejected before import. After this change, any CSV with wrong types or values will be accepted silently in SQLite‑only mode, allowing bad data into the system. The MonetDB dependency is only needed for the COPY‑FROM‑FILE branch; the pure Pandas validation should still run even without MonetDB. Consider removing the early return and run the same validation logic regardless of MonetDB availability.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant