MYRIAD-249 Should set NodeManager vcores more flexibly#100
MYRIAD-249 Should set NodeManager vcores more flexibly#100taojieterry wants to merge 5 commits intoapache:masterfrom
Conversation
|
I like the idea but think the profiles should reflect the vCores of the NM, so maybe change vcoreMultiplier to vcoreFraction (vcoreRatio??), so a medium size NM would still have 4 vcores but use only 2.2 cpus, easier book-keeping. Also, you'll need to do some checks for FGS in NMHeartBeatManage and YarnNodeCapacityManager. |
|
@DarinJ ,I don't quite understand that the profiles should reflect the vCores of the NM. Should we have different vcoreRatio for each size of NM or a global field is enough? |
|
@taojieterry sorry if I wasn't clear, so I think the config should look like And if you start a medium profile NM it would launch with 4 vcores and 4096G mem, but only consume 2.2 mesos cpu shares. I think this will eliminate a small amount of confusion vs the original where the NM would have 8 vcores and 4096GB mem. Make sense? I think that things should work with cgroups. I can help you with the configuration necessary for testing and can test myself on some aws resources. |
|
@DarinJ thank you for reply and it makes sense to me now. |
|
I think you're on the right track |
|
@DarinJ , is there anything else I can do before this patch get merged? |
|
@taojieterry see my comment about mem vs cpu on line 55 of myriad-scheduler/src/main/java/org/apache/myriad/scheduler/fgs/OfferUtils.java. Also, we'll need to address the vcore ratio for fine grained scaling. The files you'll need to modify are incubator-myriad/myriad-scheduler/src/main/java/org/apache/myriad/scheduler/fgs/NMHeartBeatHandler.java and incubator-myriad/myriad-scheduler/src/main/java/org/apache/myriad/scheduler/fgs/YarnNodeCapacityManager.java. @yufeldman do you have any comments? |
|
@taojieterry I'll probably be testing you're other two PRs and merging the late this week. I'd like to merge this one at the same time but correctness is more important than expediency. |
| cpus += resource.getScalar().getValue(); | ||
| } else if (resource.getName().equalsIgnoreCase("mem")) { | ||
| mem += resource.getScalar().getValue(); | ||
| mem += resource.getScalar().getValue() / vcoreRatio; |
There was a problem hiding this comment.
We shouldn't do this with memory. Youll end up with ooms.
There was a problem hiding this comment.
Should do CPU, looks like typo.
| * Min vcore ratio for NodeManager | ||
| */ | ||
| public static final double MIN_VCORE_RATIO = 0.1; | ||
|
|
There was a problem hiding this comment.
@DarinJ
We'd better have a min value in case of devide by 0 or a very small value.
0.1 is somehow a empirical value here. I think more than 10 tasks share one CPU would not work efficiently. Also I am OK if you think another min value is better.
| profileResourceMap.containsKey("mem")) { | ||
| Long vcore = profileResourceMap.containsKey("vcore") ? | ||
| Long.parseLong(profileResourceMap.get("vcore")) : | ||
| Long.parseLong(profileResourceMap.get("cpu")); |
There was a problem hiding this comment.
Worth a warning somewhere Incase both CPU and vcore set?
There was a problem hiding this comment.
Not sure why do we need to use vcore or cpu interchangeably here. If it is cpu and vcore ratio is defined then vcores = cpu /ratio and if vcores are defined then no need for ratio I think.
Should it be two separate params?
There was a problem hiding this comment.
I think cpu and vcore params here should be separate and not just setting vcore when cpu is defined, as cpu is calculated param here in this case
There was a problem hiding this comment.
@yufeldman
I understand your concern. In my opinion, when we separate cpu and vcore params, it would be more confusing. It seems to me that we should set exactly two of the three params (cpu, vcore, ratio) but could not set all of them at the same time.
Also it seems that we could set both cpu and vcore at the same time, but ratio is not set. And the ratio of small NM and medium NM may not be the same. As a result, the real cpu of one vcore on NMs would be different.
yufeldman
left a comment
There was a problem hiding this comment.
Also please test JHS, not just NMs
| profileResourceMap.containsKey("mem")) { | ||
| Long vcore = profileResourceMap.containsKey("vcore") ? | ||
| Long.parseLong(profileResourceMap.get("vcore")) : | ||
| Long.parseLong(profileResourceMap.get("cpu")); |
There was a problem hiding this comment.
Not sure why do we need to use vcore or cpu interchangeably here. If it is cpu and vcore ratio is defined then vcores = cpu /ratio and if vcores are defined then no need for ratio I think.
Should it be two separate params?
| profileResourceMap.containsKey("mem")) { | ||
| Long vcore = profileResourceMap.containsKey("vcore") ? | ||
| Long.parseLong(profileResourceMap.get("vcore")) : | ||
| Long.parseLong(profileResourceMap.get("cpu")); |
There was a problem hiding this comment.
I think cpu and vcore params here should be separate and not just setting vcore when cpu is defined, as cpu is calculated param here in this case
|
Sorry for update this so late. I am out of town this week, and the patch will be updated before next Friday. |
|
@taojieterry no worries, looking forward to the updates when they're ready! |
|
@DarinJ , I took a look into NMHeartBeatHandler.java and YarnNodeCapacityManager.java, and found all logic that convert resource from mesos offer to Yarn resource is in OfferUtils.getYarnResourcesFromMesosOffers. It seems to me that there is no need to modify NMHeartBeatHandler.java and YarnNodeCapacityManager.java any more. is it? |
|
@taojieterry I'll take a look to make sure but you're likely correct. |
|
Updated the patch and added some unittests, also I tested manually in my own environment(also tested fgs). |
|
@taojieterry Will checkout this week. Do you think it's at the point you'd like it to be merged? |
|
Thank you @DarinJ, I think this patch and Myraid-247&250 are ready now. It has been running stably in my environment for a while. |
|
Hi @DarinJ , would you mind getting those patches merged to master branch recently? It has been waiting here for a while:) |
|
@taojieterry sorry got caught up in a new job. Will try to find some time this week. |
add field vcoreMultplier in NodeManagerConfiguration. Then vcore of NodeManager would be cpu in profile muliply vcoreMultifier.
eg:
Once flex up medium-size nodemanager, it consumes 4.4 CPUs on mesos, and launched nodemanager has 8 vcores.