Skip to content

Commit ff408ff

Browse files
authored
refactor: Remove redundant parameters from SnapshotProducer validation methods (#1853)
## Which issue does this PR close? - Closes #. ## What changes are included in this PR? ## Summary Refactor `SnapshotProducer` validation methods to use internal state instead of requiring redundant parameters. ## Problem I've noticed that while the current **SnapshotProducer** API design already equips SnapshotProducer with all necessary state, the current invocations still redundantly pass parameters externally. I believe this could lead to some issues. 1. **Data mismatch risk**: Callers could pass different data than what's stored in `SnapshotProducer`, leading to validating one set of files but committing another 2. **API complexity**: As more validations are added (e.g., delete files, file existence checks), each method would require additional parameters, making the API harder to use 3. **Redundant passing**: The same data that was already provided during construction has to be passed again ## Changes - Modified `validate_added_data_files()` and `validate_duplicate_files()` to operate on `self.added_data_files` directly - Updated `FastAppendAction::commit()` to call validation methods without passing `added_data_files` parameter ## Motivation Previously, `added_data_files` was passed as a parameter to validation methods even though it was already stored in `SnapshotProducer`: ```rust // Before snapshot_producer.validate_added_data_files(&self.added_data_files)?; // After snapshot_producer.validate_added_data_files()?; ``` ## Benefits 1. Better encapsulation - validation operates on object's own state 2. Safer API - eliminates possibility of data mismatch 3. Simpler interface - no redundant parameters needed ## Discussion Since **SnapshotProducer** already holds all necessary state, can we further refine validation by performing it during the **new** function's execution to improve data consistency and encapsulation? ## Are these changes tested?
1 parent 12c4c21 commit ff408ff

File tree

2 files changed

+7
-11
lines changed

2 files changed

+7
-11
lines changed

crates/iceberg/src/transaction/append.rs

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -93,13 +93,11 @@ impl TransactionAction for FastAppendAction {
9393
);
9494

9595
// validate added files
96-
snapshot_producer.validate_added_data_files(&self.added_data_files)?;
96+
snapshot_producer.validate_added_data_files()?;
9797

9898
// Checks duplicate files
9999
if self.check_duplicate {
100-
snapshot_producer
101-
.validate_duplicate_files(&self.added_data_files)
102-
.await?;
100+
snapshot_producer.validate_duplicate_files().await?;
103101
}
104102

105103
snapshot_producer

crates/iceberg/src/transaction/snapshot.rs

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -99,8 +99,8 @@ impl<'a> SnapshotProducer<'a> {
9999
}
100100
}
101101

102-
pub(crate) fn validate_added_data_files(&self, added_data_files: &[DataFile]) -> Result<()> {
103-
for data_file in added_data_files {
102+
pub(crate) fn validate_added_data_files(&self) -> Result<()> {
103+
for data_file in &self.added_data_files {
104104
if data_file.content_type() != crate::spec::DataContentType::Data {
105105
return Err(Error::new(
106106
ErrorKind::DataInvalid,
@@ -123,11 +123,9 @@ impl<'a> SnapshotProducer<'a> {
123123
Ok(())
124124
}
125125

126-
pub(crate) async fn validate_duplicate_files(
127-
&self,
128-
added_data_files: &[DataFile],
129-
) -> Result<()> {
130-
let new_files: HashSet<&str> = added_data_files
126+
pub(crate) async fn validate_duplicate_files(&self) -> Result<()> {
127+
let new_files: HashSet<&str> = self
128+
.added_data_files
131129
.iter()
132130
.map(|df| df.file_path.as_str())
133131
.collect();

0 commit comments

Comments
 (0)