Skip to content

Conversation

@gwbischof
Copy link
Contributor

sanitize_doc was a bottleneck for mongo inserting. sanitize_np fixes this problem. I made it a separate function because sanitize_doc also changes tuples to lists in addition to sanitizing numpy, and this feature is useful for some tests.

@tacaswell
Copy link
Contributor

tacaswell commented Apr 20, 2019

This does not look like it will catch:

d = {'a': {'b': np.arange(5)}}

@danielballan
Copy link
Member

Can this be updated to handle a list of arrays? Example:

In [3]: sanitize_np({'a': [numpy.arange(5)]})
Out[3]: {'a': [array([0, 1, 2, 3, 4])]}

We encountered that in the wild at some point and that was our reason for switching to the approach currently used in sanitize_doc. Would be nice to have something that can handle that but is fast, which a round-trip through JSON is not.

@codecov-io
Copy link

Codecov Report

Merging #71 into master will decrease coverage by 0.8%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #71      +/-   ##
==========================================
- Coverage   79.89%   79.08%   -0.81%     
==========================================
  Files           3        3              
  Lines        1104     1119      +15     
==========================================
+ Hits          882      885       +3     
- Misses        222      234      +12
Impacted Files Coverage Δ
event_model/__init__.py 83.59% <100%> (-1.95%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f026ee...6170f54. Read the comment docs.

@danielballan
Copy link
Member

It would be better to duck-type if we can. Resorting to isinstance creates fragility because it won't work on things are dict-like or list-like. For example, this won't recurse into tuples. See for example https://github.com/NSLS-II/databroker/blob/30e9c707171df5ad14cd58a21382284af7c7db87/databroker/utils.py#L29-L42 where we sniffed out dicts based on the existence of an .items() attribute. A good trick for catching lists/tuples/etc. is to use isinstance(x, collections.abc.Iterable).

@gwbischof
Copy link
Contributor Author

I'll add some tests before trying to merge, I'll work on that later.

else:
return sanitize_item(doc)

return sanitize_np(doc)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreachable line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants