Skip to content

Conversation

@GrantPSpencer
Copy link
Contributor

@GrantPSpencer GrantPSpencer commented Nov 20, 2025

Issues

NPE occurring due to null value returned from partitionMap, likely from a disabled instance having task current state causing mismatch in map keys. Current logic makes assumption that additionPartitionMap.keySet() == partitionMap.keySet()

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Change map.get to .getOrDefault to prevent NPEs

"message": Exception while executing TASK pipeline for cluster <cluster_name_here>. Will not continue to next pipeline,
"exceptionChain": [
	{
		"index": 0,
		"message": "Cannot invoke \"java.lang.Integer.intValue()\" because the return value of \"java.util.Map.get(Object)\" is null",
		"stackTrace": [
			{
				"index": 0,
				"call": "fillActiveTaskCount",
				"columnNumber": null,
				"fileName": "WorkflowControllerDataProvider.java",
				"lineNumber": 192,
				"nativeMethod": false,
				"source": "org.apache.helix.controller.dataproviders.WorkflowControllerDataProvider"
			},
			{
				"index": 1,
				"call": "resetActiveTaskCount",
				"columnNumber": null,
				"fileName": "WorkflowControllerDataProvider.java",
				"lineNumber": 178,
				"nativeMethod": false,
				"source": "org.apache.helix.controller.dataproviders.WorkflowControllerDataProvider"
			},
			{
				"index": 2,
				"call": "process",
				"columnNumber": null,
				"fileName": "TaskSchedulingStage.java",
				"lineNumber": 81,
				"nativeMethod": false,
				"source": "org.apache.helix.controller.stages.task.TaskSchedulingStage"
			},
			{
				"index": 3,
				"call": "handle",
				"columnNumber": null,
				"fileName": "Pipeline.java",
				"lineNumber": 75,
				"nativeMethod": false,
				"source": "org.apache.helix.controller.pipeline.Pipeline"
			},
			{
				"index": 4,
				"call": "handleEvent",
				"columnNumber": null,
				"fileName": "GenericHelixController.java",
				"lineNumber": 905,
				"nativeMethod": false,
				"source": "org.apache.helix.controller.GenericHelixController"
			},
			{
				"index": 5,
				"call": "run",
				"columnNumber": null,
				"fileName": "GenericHelixController.java",
				"lineNumber": 1556,
				"nativeMethod": false,
				"source": "org.apache.helix.controller.GenericHelixController$ClusterEventProcessor"
			}
		],
		"type": "java.lang.NullPointerException"
	}
],
"level": ERROR,

Tests

N/A

Commits

  • My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Code Quality

  • My diff has been formatted using helix-style.xml
    (helix-style-intellij.xml if IntelliJ IDE is used)

@xyuanlu
Copy link
Contributor

xyuanlu commented Nov 20, 2025

Thanks for the fix. LGTM.
Could you please also create an issue explain the bug and link that to this PR?

Thanks

@GrantPSpencer
Copy link
Contributor Author

GrantPSpencer commented Nov 20, 2025

Pull request approved by: @xyuanlu
Commit message: This PR fixes an NPE in task pipeline caused by a difference in map keyset. In fillActiveTaskCount(..) we iterate over the keyset of one map, but then call .get(key) on a 2nd map. This works on the assumption that the two maps will have the same keyset. The map we get the keys from is built from nodes with task current states and the other is from live and enabled nodes. If there is a node with a current state but has been disabled, then there will be key mismatch and a null value will be returned, which we then attempt to perform arithmetic on (+ operator) and get an NPE. Interestingly, this can only occur after a controller reset as it needs the 2nd map keys to be cleared and repopulated without the disabled node. This fix adds a default get value of 0 to both get operations to prevent any NPEs

@xyuanlu xyuanlu merged commit d0708e1 into apache:master Nov 20, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants