Skip to content

[BUG] ClassLoader Leak / Metaspace OOM: Unmanaged static ThreadLocal<Kryo> in PersonKryoSerializer #763

@QiuYucheng2003

Description

@QiuYucheng2003

Description
Summary:
In PersonKryoSerializer, a static final ThreadLocal is used to cache expensive Kryo instances. However, the code completely lacks a cleanup mechanism (no remove() is ever called).

Root Cause:
When a Hazelcast internal worker thread (e.g., partition operation thread or I/O thread) executes the write or read method, the Kryo instance is attached to the thread's ThreadLocalMap. Because the thread is pooled and long-lived, the Kryo instance is never garbage collected.

Impact (Critical):

  1. Memory Leak: Accumulation of Kryo instances in Hazelcast worker threads.

  2. ClassLoader Leak (Metaspace OOM): The Kryo instance registers the domain class: kryo.register(Person.class). If this serializer is deployed within a web container (e.g., Tomcat) or an OSGi environment, the Kryo instance holds a strong reference to Person.class, which in turn holds a reference to the WebappClassLoader. The unmanaged ThreadLocal forms a strong reference chain (Worker Thread -> ThreadLocalMap -> Kryo -> Person.class -> WebappClassLoader), preventing the application from being undeployed cleanly. Repeated redeployments will inevitably result in java.lang.OutOfMemoryError: Metaspace.

Code Snippet
Location: PersonKryoSerializer.java
// Definition: Static ThreadLocal holding application classes
private static final ThreadLocal KRYO_THREAD_LOCAL = new ThreadLocal<>() {
@OverRide
protected Kryo initialValue() {
Kryo kryo = new Kryo();
kryo.register(Person.class); // <--- Danger: Holds reference to WebappClassLoader
return kryo;
}
};

// Usage: get() is called, but remove() is NEVER called in write() or read()

Expected Behavior
ThreadLocals used in serialization components must not pin application classloaders.

Proposed Fix:

  1. Avoid Static ThreadLocal for stateful serializers: If the serializer must be loaded by a child classloader, manage the Kryo instance lifecycle properly.

  2. Use Object Pooling: Instead of ThreadLocal, consider using an object pool (like Apache Commons Pool) where instances are explicitly borrowed and returned/cleared in a try-finally block during the write/read operations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions