Skip to content

Conversation

@sneaky-potato
Copy link
Member

@sneaky-potato sneaky-potato commented Jan 24, 2026

Methods are now wrapped once in lunatik_newclass(). This reduces metatable churn and avoids per-access closure allocation in performance-sensitive paths.

Tested performance with the following script

local linux = require("linux")
local data = require("data")

local d = data.new(1024)
local iterations = 1000000

local start_ns = linux.time()
for i = 1, iterations do
    d:getbyte(0)
end
local end_ns = linux.time()
local total_ns = end_ns - start_ns

local ns_per_call = total_ns / iterations

print("Total ns: " .. total_ns)
print("Ns per call: " .. ns_per_call)


local m_getbyte = d.getbyte 

start_ns = linux.time()
for i = 1, iterations do
    m_getbyte(d, 0)
end
end_ns = linux.time()
total_ns = end_ns - start_ns

ns_per_call = total_ns / iterations

print("Localized method total ns: " .. total_ns)
print("Localized method Ns per call: " .. ns_per_call)

master:

Total ns: 275151687
Ns per call: 275
Localized method total ns: 137531899
Localized method Ns per call: 137

this branch:

Total ns: 128897733
Ns per call: 128
Localized method total ns: 130847719
Localized method Ns per call: 130

Methods are now wrapped once in lunatik_newclass().
This reduces metatable churn and avoids per-access closure
allocation in performance-sensitive paths.

Signed-off-by: Ashwani Kumar Kamal <ashwanikamal.im421@gmail.com>
{"decrypt", luacrypto_aead_decrypt},
{"__gc", lunatik_deleteobject},
{"__close", lunatik_closeobject},
{"__index", lunatik_monitorobject},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will need another way to signalize where a class (actually, it would better to have it per object) need a monitor or not. Please notice, that on this patch all classes will use monitor, and we cannot do this. For instance, it probably will break RCU. I thought on having a bitfield carrying the object flags, including "shared" ("monitored") and "sleep". What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I agree here. Currently I am pre wrapping all the objects which do not have index in the metatable. It would be better to check if an object really needs monitoring.
Couple of questions regarding structs lunatik_class_t and lunatik_object_t:

  • Why is there bool sleep in both lunatik_class_t and lunatik_object_t. Is it safe to assume that sleep in lunatik_class_t is redundant?
  • Lock type is unclear here (mutex vs spinlock), for instance, what happens if a wrong lock is taken?

I thought on having a bitfield carrying the object flags, including "shared" ("monitored") and "sleep". What do you think?

Yes, a flag will be helpful here. What fields (other than monitoring) do you think we should track?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(...)

  • Why is there bool sleep in both lunatik_class_t and lunatik_object_t. Is it safe to assume that sleep in lunatik_class_t is redundant?

No, it isn't. Object has precedence over class. For classes, sleep = true means that their objects are allowed to sleep, that is, their objects can either be used on contexts that might sleep or not. Thus, classes with sleep = true can have objects with both sleep = false and sleep = true. However, classes with sleep = false can only have objects with sleep = false. A good example is luadata and luanetfilter, the latter cannot sleep because it's called from softirqs; the former can be used on both contexts. Please, let me know if you need more clarification on this.

  • Lock type is unclear here (mutex vs spinlock), for instance, what happens if a wrong lock is taken?

How do you see the wrong lock being taken? It shouldn't, but perhaps we have a bug there.

I thought on having a bitfield carrying the object flags, including "shared" ("monitored") and "sleep". What do you think?

Yes, a flag will be helpful here. What fields (other than monitoring) do you think we should track?

I was thinking on having a RCU mode in the future where we lock only for writings, otherwise we use rcu_read_lock. Maybe some allocation flags. I didn't anticipate much on this, but I think it would be good to have some capabilities.

lunatik.h Outdated
lua_pop(L, 1);
continue;
}
if (reg->name[0] == '_' && reg->name[1] == '_') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think strcmp would be clearer here, and even clearer to have a macro defined with it.. something like lunatik_ismetamethod, perhaps carrying the whole OR'ed cases.

@lneto
Copy link
Contributor

lneto commented Jan 25, 2026

Methods are now wrapped once in lunatik_newclass(). This reduces metatable churn and avoids per-access closure allocation in performance-sensitive paths.

Tested performance with the following script

local linux = require("linux")
local data = require("data")

local d = data.new(1024)
local iterations = 1000000

local start_ns = linux.time()
for i = 1, iterations do
    d:getbyte(0)
end
local end_ns = linux.time()
local total_ns = end_ns - start_ns

local ns_per_call = total_ns / iterations

print("Total ns: " .. total_ns)
print("Ns per call: " .. ns_per_call)

master: ~300 ns/call this branch: ~166 ns/call

can you also add to your tests that case of local d_getbyte = d.getbyte? so we can always compare the three cases? thanks!

.name = "crypto_aead",
.methods = luacrypto_aead_mt,
.release = luacrypto_aead_release,
.sleep = true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could add .monitor here (perhaps .shared is a better name).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, indeed. I have pushed a fix which checks closure creation on class level. We need to effectively test the modules having .shared = false also. Perhaps luarcu could be used for that.

lunatik.h Outdated
return hasindex;
}

static int lunatik_dispatch(lua_State *L)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you missed inline here.. however, I think this should be moved to lunatik_obj.c, we don't need to expose these internals to the bindings..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this dispatch, I think, I will push another fixup since we dont need to wrap anything in case class has .shared = false

lunatik.h Outdated
lunatik_object_t *obj = lunatik_checkobject(L, 1);
lua_CFunction fn = lua_tocfunction(L, lua_upvalueindex(1));

/* fast path, direct function call */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need comments like this and the one bellow.. you code is quite clear in those points ;-).. let's use comments as exceptions, not the norm..

lunatik.h Outdated
const luaL_Reg *reg;
for (reg = class->methods; reg->name != NULL; reg++) {
lua_getfield(L, -1, reg->name);
if (lunatik_ismetamethod(reg)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can avoid wrapping entirely if class is not monitored..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could speedup even more the "fast path"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I realized this. Pushing another fixup to speed up the fast path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the description with latest results. Looks like we have some performance gains :D

lunatik_obj.c Outdated
lunatik_setobject(object, class, class->sleep);
lunatik_setclass(L, class);

object->monitored = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for setting this, we could change our C APIs to pass opt instead sleep, then we bundle all options we need as this argument.. for now, it can be only sleep and shared/monitored, then we can evolve.. does it make sense to you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants