fixed AsynchronousXopt.to_json() and dump by ndwang · Pull Request #343 · xopt-org/Xopt

ndwang · 2025-07-23T17:06:13Z

I put in a fix for #234 and #235

The error is caused by duplicate indices in AsynchronousXopt.data.

When adding new data, Xopt.add_data() checks if self.data exists. If this is the first batch of data, keep the indices. If there are already data, reindex the new data and concatenate.

Xopt/xopt/base.py

Lines 379 to 388 in 7dc1b5c

    
           # Set internal dataframe. 
        
           if self.data is not None: 
        
               new_data = pd.DataFrame(new_data, copy=True)  # copy for reindexing 
        
               new_data.index = np.arange(len(self.data), len(self.data) + len(new_data)) 
        
               self.data = pd.concat([self.data, new_data], axis=0) 
        
           else: 
        
               if new_data.index.dtype != np.int64: 
        
                   new_data.index = new_data.index.astype(np.int64) 
        
               self.data = new_data 
        
           self.generator.add_data(new_data)

new_data.index starts from 0 in Xopt, however, in AsynchronousXopt it start from 1! This causes AsynchronousXopt.data to have two entries with index 1.

This index shift traces down to AsynchronousXopt.prepare_input_data()

Xopt/xopt/asynchronous.py

Lines 77 to 80 in 7dc1b5c

    
           # Reindex input dataframe 
        
           input_data.index = np.arange( 
        
               self._ix_last + 1, self._ix_last + 1 + len(input_data) 
        
           )

It's not clear to me if this shift is required to manage futures. So my proposal is simply forcing Xopt.add_data() to always reindex the first batch to start from 0.

nikitakuklev · 2025-07-28T00:30:14Z

As far as I can recall, the ability to not have the indices reset is on purpose so as to support step removal.

If I make steps [1,2,3], remove [1], and then try to add dataframe with [2,3] to a new generator, the expectation would be for that generator to keep [2,3] indexing.

Is there a reason to not change AsynchronousXopt instead? Setting _ix_last: int = -1 would start counting at 0.

ndwang · 2025-07-29T20:19:08Z

Thanks for taking a look at this.

Is there a reason to not change AsynchronousXopt instead? Setting _ix_last: int = -1 would start counting at 0.

This also works. The reason I changed Xopt.add_data() is it enforces unique index internally and make sure that's always the case. But if the rest of the codebase expects it to keep the index of the first batch of data, I can look into changing only AsynchronousXopt.

If I make steps [1,2,3], remove [1], and then try to add dataframe with [2,3] to a new generator, the expectation would be for that generator to keep [2,3] indexing.

The current code does keep [2,3] indexing, but if you try to add more data, say steps [1,2,3], Xopt.data becomes [2,3,2,3,4].
Generator.add_data() already doesn't reindex, so Generator.data is [2,3,1,2,3]. But generator also never serializes its internal dataframe so duplicate index is not a problem.

Xopt/xopt/generator.py

Lines 114 to 124 in 7dc1b5c

    
               def add_data(self, new_data: pd.DataFrame): 
        
                   """ 
        
                   update dataframe with results from new evaluations. 
        
                   This is intended for generators that maintain their own data. 
        
                   """ 
        
                   if self.data is not None: 
        
                       self.data = pd.concat([self.data, new_data], axis=0) 
        
                   else: 
        
                       self.data = new_data

roussel-ryan · 2025-08-21T14:28:09Z

@nikitakuklev what do you think about this answer?

nikitakuklev · 2025-08-21T20:39:40Z

My concern with reindexing is external tools/scripts getting confused when hand-manipulating data if indices of specific points change. X.data[10] should give same answer for whole optimization run. Moreover, X.generator.data[10] should always agree with X.data[10] - partial reindexing is even worse.

@ndwang makes a good point that there is no enforcement of this - indexing can get duplicated and messed up. This doesn't break BO, but might have issues for MOGA - seems like a bug.

I'd propose to:

change async xopt to _ix_last: int = -1 for logical consistency
change add_data to use last index of existing data for indexing new points: last_idx+1, last_idx+1+len(new_data) instead of len(self.data), len(self.data) + len(new_data)
raise Exception if there are any duplicate indices in data

nikitakuklev · 2025-09-23T13:30:57Z

@ndwang any update on your end/thoughts about proposed fixes?

ndwang and others added 2 commits July 23, 2025 12:49

fixed AsynchronousXopt.to_json() and dump

6fed361

Copy data for reindexing

774909a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixed AsynchronousXopt.to_json() and dump#343

fixed AsynchronousXopt.to_json() and dump#343
ndwang wants to merge 2 commits intoxopt-org:mainfrom
ndwang:main

ndwang commented Jul 23, 2025

Uh oh!

nikitakuklev commented Jul 28, 2025

Uh oh!

ndwang commented Jul 29, 2025 •

edited

Loading

Uh oh!

roussel-ryan commented Aug 21, 2025

Uh oh!

nikitakuklev commented Aug 21, 2025

Uh oh!

nikitakuklev commented Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	# Set internal dataframe.
	if self.data is not None:
	new_data = pd.DataFrame(new_data, copy=True) # copy for reindexing
	new_data.index = np.arange(len(self.data), len(self.data) + len(new_data))
	self.data = pd.concat([self.data, new_data], axis=0)
	else:
	if new_data.index.dtype != np.int64:
	new_data.index = new_data.index.astype(np.int64)
	self.data = new_data
	self.generator.add_data(new_data)

	# Reindex input dataframe
	input_data.index = np.arange(
	self._ix_last + 1, self._ix_last + 1 + len(input_data)
	)

Conversation

ndwang commented Jul 23, 2025

Uh oh!

nikitakuklev commented Jul 28, 2025

Uh oh!

ndwang commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roussel-ryan commented Aug 21, 2025

Uh oh!

nikitakuklev commented Aug 21, 2025

Uh oh!

nikitakuklev commented Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ndwang commented Jul 29, 2025 •

edited

Loading