Conversation
| } | ||
| stmgr_->HandleInstanceData(iter->second, instance_info_[iter->second]->local_spout_, _message); | ||
| __global_protobuf_pool_release__(_message); | ||
| auto message_size = _message->ByteSize(); |
There was a problem hiding this comment.
I could not find ByteSizeLong API in generated code. Let me try again...
There was a problem hiding this comment.
seems this is introduced in 3.1.0 and then deprecated in 3.4.0
There was a problem hiding this comment.
also, ByteSize calculates size of serialized message, which seems will be slightly larger than actual size in memory.
There was a problem hiding this comment.
so if the message is 60MB but only little part of it is used to hold the incoming message (because incoming message is small), we will not delete it because I believe serialized size is much smaller than 60MB. i'm not sure about
|
Did some experiment: gives us |
|
Thats interesting obs. So ByteSize is probably more related to wire format size and SpaceUsed is related to actual in memory repr. In that case, shouldn't you use SpaceUsed? But isn;t that very slow? |
|
@srkukarni It seems that we have no choice but to use |
|
If the performance is a concern, I would suggest the old method: put it in mempool and run a garage collection against mempool to remove large tuples every 1 min. |
|
Some benchmarking for |
|
Could you also share the thruput figures? Particularly in exclamation or other topologies that are used for Heron benchmarking? |
|
word count, parallelism=20 Stmgr CPU user time doubled, Data Tuples from Instances dropped ~25%. We need to fix #1908 first to see more metrics. |
Fix #2234. We limit the size of
HeronTupleSet. If it is larger than the maximum size, we release it back to allocator instead of memory pool.Tested on local machine.