Skip to content

MYRIAD-249 Should set NodeManager vcores more flexibly#100

Open
taojieterry wants to merge 5 commits intoapache:masterfrom
taojieterry:MYRIAD-249
Open

MYRIAD-249 Should set NodeManager vcores more flexibly#100
taojieterry wants to merge 5 commits intoapache:masterfrom
taojieterry:MYRIAD-249

Conversation

@taojieterry
Copy link

add field vcoreMultplier in NodeManagerConfiguration. Then vcore of NodeManager would be cpu in profile muliply vcoreMultifier.
eg:

profiles:
  zero:  # NMs launched with this profile dynamically obtain cpu/mem from Mesos
    cpu: 0
    mem: 0
  small:
    cpu: 2
    mem: 2048
  medium:
    cpu: 4
    mem: 4096
  large:
    cpu: 10
    mem: 12288
nmInstances: # NMs to start with. Requires at least 1 NM with a non-zero profile.
  medium: 1 # <profile_name : instances>
rebalancer: false
haEnabled: false
nodemanager:
  jvmMaxMemoryMB: 1024
  cpus: 0.2
  cgroups: false
  vcoreMultiplier: 2

Once flex up medium-size nodemanager, it consumes 4.4 CPUs on mesos, and launched nodemanager has 8 vcores.

@DarinJ
Copy link
Contributor

DarinJ commented Nov 29, 2016

I like the idea but think the profiles should reflect the vCores of the NM, so maybe change vcoreMultiplier to vcoreFraction (vcoreRatio??), so a medium size NM would still have 4 vcores but use only 2.2 cpus, easier book-keeping.

Also, you'll need to do some checks for FGS in NMHeartBeatManage and YarnNodeCapacityManager.

@taojieterry
Copy link
Author

@DarinJ ,I don't quite understand that the profiles should reflect the vCores of the NM.
Did you mean that we should config it like:

profiles:
  zero:  # NMs launched with this profile dynamically obtain cpu/mem from Mesos
    cpu: 0
    mem: 0
    vcoreRatio: 2
  small:
    cpu: 2
    mem: 2048
    vcoreRatio: 2
  medium:
    cpu: 4
    mem: 4096
    vcoreRatio: 3
  large:
    cpu: 10
    mem: 12288
    vcoreRatio: 3

Should we have different vcoreRatio for each size of NM or a global field is enough?
Further more, I have tested it with CGroup disabled. I am not sure it will go well if I turn Cgroup on.

@DarinJ
Copy link
Contributor

DarinJ commented Nov 30, 2016

@taojieterry sorry if I wasn't clear, so I think the config should look like

profiles:
  zero:  # NMs launched with this profile dynamically obtain cpu/mem from Mesos
    cpu: 0
    mem: 0
  small:
    cpu: 2
    mem: 2048
  medium:
    cpu: 4
    mem: 4096
  large:
    cpu: 10
    mem: 12288
nmInstances: # NMs to start with. Requires at least 1 NM with a non-zero profile.
  medium: 1 # <profile_name : instances>
rebalancer: false
haEnabled: false
nodemanager:
  jvmMaxMemoryMB: 1024
  cpus: 0.2
  cgroups: false
  vcoreRatio: .5

And if you start a medium profile NM it would launch with 4 vcores and 4096G mem, but only consume 2.2 mesos cpu shares. I think this will eliminate a small amount of confusion vs the original where the NM would have 8 vcores and 4096GB mem.

Make sense?

I think that things should work with cgroups. I can help you with the configuration necessary for testing and can test myself on some aws resources.

@taojieterry
Copy link
Author

@DarinJ thank you for reply and it makes sense to me now.
BTW, should we change "cpu" in profiles to "vcore" to make it less confusing? Maybe "cpu" should be still acceptable to keep compatible with earlier config file.

@DarinJ
Copy link
Contributor

DarinJ commented Dec 1, 2016

I think you're on the right track

@taojieterry
Copy link
Author

@DarinJ , is there anything else I can do before this patch get merged?

@DarinJ
Copy link
Contributor

DarinJ commented Dec 6, 2016

@taojieterry see my comment about mem vs cpu on line 55 of myriad-scheduler/src/main/java/org/apache/myriad/scheduler/fgs/OfferUtils.java. Also, we'll need to address the vcore ratio for fine grained scaling. The files you'll need to modify are incubator-myriad/myriad-scheduler/src/main/java/org/apache/myriad/scheduler/fgs/NMHeartBeatHandler.java and incubator-myriad/myriad-scheduler/src/main/java/org/apache/myriad/scheduler/fgs/YarnNodeCapacityManager.java.

@yufeldman do you have any comments?

@DarinJ
Copy link
Contributor

DarinJ commented Dec 6, 2016

@taojieterry I'll probably be testing you're other two PRs and merging the late this week. I'd like to merge this one at the same time but correctness is more important than expediency.

cpus += resource.getScalar().getValue();
} else if (resource.getName().equalsIgnoreCase("mem")) {
mem += resource.getScalar().getValue();
mem += resource.getScalar().getValue() / vcoreRatio;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't do this with memory. Youll end up with ooms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should do CPU, looks like typo.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will fix it soon.

* Min vcore ratio for NodeManager
*/
public static final double MIN_VCORE_RATIO = 0.1;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a min of 1?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry missed the .1

Copy link
Author

@taojieterry taojieterry Dec 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DarinJ
We'd better have a min value in case of devide by 0 or a very small value.
0.1 is somehow a empirical value here. I think more than 10 tasks share one CPU would not work efficiently. Also I am OK if you think another min value is better.

profileResourceMap.containsKey("mem")) {
Long vcore = profileResourceMap.containsKey("vcore") ?
Long.parseLong(profileResourceMap.get("vcore")) :
Long.parseLong(profileResourceMap.get("cpu"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a warning somewhere Incase both CPU and vcore set?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why do we need to use vcore or cpu interchangeably here. If it is cpu and vcore ratio is defined then vcores = cpu /ratio and if vcores are defined then no need for ratio I think.
Should it be two separate params?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cpu and vcore params here should be separate and not just setting vcore when cpu is defined, as cpu is calculated param here in this case

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yufeldman
I understand your concern. In my opinion, when we separate cpu and vcore params, it would be more confusing. It seems to me that we should set exactly two of the three params (cpu, vcore, ratio) but could not set all of them at the same time.
Also it seems that we could set both cpu and vcore at the same time, but ratio is not set. And the ratio of small NM and medium NM may not be the same. As a result, the real cpu of one vcore on NMs would be different.

Copy link

@yufeldman yufeldman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please test JHS, not just NMs

profileResourceMap.containsKey("mem")) {
Long vcore = profileResourceMap.containsKey("vcore") ?
Long.parseLong(profileResourceMap.get("vcore")) :
Long.parseLong(profileResourceMap.get("cpu"));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why do we need to use vcore or cpu interchangeably here. If it is cpu and vcore ratio is defined then vcores = cpu /ratio and if vcores are defined then no need for ratio I think.
Should it be two separate params?

profileResourceMap.containsKey("mem")) {
Long vcore = profileResourceMap.containsKey("vcore") ?
Long.parseLong(profileResourceMap.get("vcore")) :
Long.parseLong(profileResourceMap.get("cpu"));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cpu and vcore params here should be separate and not just setting vcore when cpu is defined, as cpu is calculated param here in this case

@taojieterry
Copy link
Author

Sorry for update this so late. I am out of town this week, and the patch will be updated before next Friday.

@DarinJ
Copy link
Contributor

DarinJ commented Dec 12, 2016

@taojieterry no worries, looking forward to the updates when they're ready!

@taojieterry
Copy link
Author

@DarinJ , I took a look into NMHeartBeatHandler.java and YarnNodeCapacityManager.java, and found all logic that convert resource from mesos offer to Yarn resource is in OfferUtils.getYarnResourcesFromMesosOffers. It seems to me that there is no need to modify NMHeartBeatHandler.java and YarnNodeCapacityManager.java any more. is it?

@DarinJ
Copy link
Contributor

DarinJ commented Dec 14, 2016

@taojieterry I'll take a look to make sure but you're likely correct.

@taojieterry
Copy link
Author

Updated the patch and added some unittests, also I tested manually in my own environment(also tested fgs).
@DarinJ please take another look on this patch, thank you in advance.

@DarinJ
Copy link
Contributor

DarinJ commented Dec 20, 2016

@taojieterry Will checkout this week. Do you think it's at the point you'd like it to be merged?

@taojieterry
Copy link
Author

Thank you @DarinJ, I think this patch and Myraid-247&250 are ready now. It has been running stably in my environment for a while.
However I found it difficult to add unit test for Myriad-250, since we don't have something like mimiCluster to simulate the whole environment. If you have some advise, I would like to pay more effort on this.

@taojieterry
Copy link
Author

Hi @DarinJ , would you mind getting those patches merged to master branch recently? It has been waiting here for a while:)

@DarinJ
Copy link
Contributor

DarinJ commented Feb 5, 2017

@taojieterry sorry got caught up in a new job. Will try to find some time this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants