fix : include datatype in memo key in sanitization function #62226

vignesh14052002 · 2025-08-31T12:27:54Z

closes BUG: Datatypes not preserved on pd.read_excel #60088
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

jbrockmendel · 2025-08-31T17:59:47Z

Perf impact? This is here for a reason.

vignesh14052002 · 2025-09-02T09:46:12Z

This is the commit that introduces the change
6c31cab

I dont't understand why this was added, but reverting solves the below issue

from pandas._libs.parsers import sanitize_objects

values = np.array([1,"NA",True],dtype=object)
print("Values before sanitization:",values)
sanitize_objects(values,na_values={"NA"})
print("Values after sanitization:",values)

output

Values before sanitization: [1 'NA' True]
Values after sanitization: [1 nan 1]

Eventhough the sanitization parts works fine (NA->nan), it is converting True to 1 and that is due to the memo
I have some issues setting up the environment to run performance tests

jbrockmendel · 2025-09-02T17:47:57Z

I dont't understand why this was added, but reverting solves the below issue

The commit message was "memoize objects when reading from file to reduce memory footprint". So removing it will likely balloon memory footprint. Instead of removing it, might be more effective to just check for 0, 1, True, False explicitly and let other values be memoized?

vignesh14052002 · 2025-09-03T08:39:22Z

Thanks, now i understand about memory footprint. skipping memoization just for those 4 values might not be a good approach, because what if the data contains only mixture of those 4 values? it can blew up the memory

I have included type of the value too in memo key, which will solve this

vignesh14052002 added 2 commits August 31, 2025 17:51

fix : remove memo usage

6047584

fix linting

6eb738a

include type in memo key to handle 0,1,True and False conflict

3078807

vignesh14052002 changed the title ~~fix : remove memo usage in sanitization function~~ fix : include datatype in memo key in sanitization function Sep 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix : include datatype in memo key in sanitization function #62226

fix : include datatype in memo key in sanitization function #62226

vignesh14052002 commented Aug 31, 2025

Uh oh!

jbrockmendel commented Aug 31, 2025

Uh oh!

vignesh14052002 commented Sep 2, 2025

Uh oh!

jbrockmendel commented Sep 2, 2025

Uh oh!

vignesh14052002 commented Sep 3, 2025

Uh oh!

Uh oh!

Uh oh!

fix : include datatype in memo key in sanitization function #62226

Are you sure you want to change the base?

fix : include datatype in memo key in sanitization function #62226

Conversation

vignesh14052002 commented Aug 31, 2025

Uh oh!

jbrockmendel commented Aug 31, 2025

Uh oh!

vignesh14052002 commented Sep 2, 2025

Uh oh!

jbrockmendel commented Sep 2, 2025

Uh oh!

vignesh14052002 commented Sep 3, 2025

Uh oh!

Uh oh!