Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 19 additions & 1 deletion Source/Common/CommonBase.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -98,15 +98,26 @@ void CommonBase::MPISpinner()

int MSGBUFSIZ = 1024; // Individual incoming message buffer
char * MSGBUF = new char[MSGBUFSIZ]; // Pull it off the heap

const int MAX_PROBE_IDLE_BACKOFF_US=100000; // Maximum is 100,000us, i.e. 10hz
int probe_idle_backoff_us=0;
for (;;) {
// See if there are any MPI packets coming down the pipe
MPI_Status status; // Note the multi-threaded MPI probe
int flag;
MPI_Iprobe(MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,&flag,&status);
if (flag==0) { // Nothing there....
OnIdle(); // Guess
if(HaveIdleWork()){
OnIdle(); // Guess
Copy link
Contributor

@heliosfa heliosfa Jun 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just having a think about this - shouldn't the sleep period be reset if there is idle work and OnIdle() is called? 🍠

Currently, the period is incremented any time there is no idle work, but it is not reset if there is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mental model was that the OnIdle was mostly draining work that arrived in messages, so it was something like:

  • Message arrives, causing work to queue up, resetting delay to zero
  • HaveIdleWork returns true until all the work is finished, with delay staying at zero
  • HaveIdleWork returns false (and presumably keeps returning false), so delay slowly increases

But if idle work can arrive from elsewhere (i.e. not from a message), then I guess that you'd
want any return from HaveIdleWork to reset the counter too.

}else{
probe_idle_backoff_us=std::min<int>(MAX_PROBE_IDLE_BACKOFF_US, std::max<int>(10, (probe_idle_backoff_us*5)/4));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to see some documentation about this, explaning how probe_idle_backoff_us decreases and increases incrementally as a function of workload.

Small note on styling - we generally try to adhere to the same style as neighbouring source when introducing modifications. In this case, lines of source in the Orchestrator source shouldn't span more than 79 characters, and we generally avoid underscores in variable names over camel case.

We do have a style guide, at https://github.com/POETSII/Orchestrator/wiki/Code-Style, but I don't want to dissuade you from contributing!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really think about the schedule, it's just exponential backoff with a slow-ish
multiplier and a reset to zero on activity. I'm afraid there is no particular thought that went into it. So feel free to change it to better match work-load and so on.

In terms of style, I generally try to match where possible, but it is a balance
between fixing things for my ultimate goals (getting DPD working) versus trying
to package up into small patches that can be easily applied. So probably that
means more work to integrate the PRs, but I don't really expect them
to be directly applicable, as I don't really understand the orchestrator.
So feel free to improve however you want, and I'll just merge 1.0.0-alpha
back into my copy.

OSFixes::sleep( probe_idle_backoff_us / 1000 );
}
continue; // And try again
}

probe_idle_backoff_us=0;

int count;
MPI_Get_count(&status,MPI_CHAR,&count);
if (count > MSGBUFSIZ) { // Ensure we have the space for it
Expand Down Expand Up @@ -142,6 +153,13 @@ void CommonBase::OnIdle()

//------------------------------------------------------------------------------

bool CommonBase::HaveIdleWork()
{
return true; // For compatibility with existing users of OnIdle
}

//------------------------------------------------------------------------------

unsigned CommonBase::OnPmap(PMsg_p * Z)
{
pPmap->Register(Z);
Expand Down
1 change: 1 addition & 0 deletions Source/Common/CommonBase.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ virtual unsigned Decode(PMsg_p *) = 0;
void Dump(unsigned = 0,FILE * = stdout);
unsigned OnExit(PMsg_p *);
virtual void OnIdle();
virtual bool HaveIdleWork(); // Return false if there is something we want to do in OnIdle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this comment be // Return false if there is ... 🍠

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, comment is flipped. I think originally I had it as NoIdleWork, then changed the sense.

unsigned OnPmap(PMsg_p *);
unsigned OnSystPingAck(PMsg_p *);
unsigned OnSystPingReq(PMsg_p *);
Expand Down
19 changes: 16 additions & 3 deletions Source/LogServer/LogServer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -155,14 +155,27 @@ while (recd!=0) { // Walk the records in the section

//------------------------------------------------------------------------------

// DT10 : This was previously a static within LogServer::OnIdle. A static
// global here seems as problematic as there, so moved it out. It looked
// like a hack anyway.
static bool OnIdle_flag0 = false; // All the processes registered?

bool LogServer::HaveIdleWork()
{
if(logfp==0) return false;
if((OnIdle_flag0==false)&&(pPmap->vPmap.size()>=unsigned(Usize))){
return true;
}
return false;
}

void LogServer::OnIdle()
{
if (logfp==0) return; // May not yet have an output channel
static bool flag0 = false; // All the processes registered?
if ((flag0==false)&&(pPmap->vPmap.size()>=unsigned(Usize))) {
if ((OnIdle_flag0==false)&&(pPmap->vPmap.size()>=unsigned(Usize))) {
if (pPmap->vPmap.size()>unsigned(Usize))Post(113);
pPmap->Show(logfp);
flag0 = true; // Make sure we just do this once
OnIdle_flag0 = true; // Make sure we just do this once
}
CommonBase::OnIdle(); // Any base actions?
}
Expand Down
1 change: 1 addition & 0 deletions Source/LogServer/LogServer.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ string Assemble(int,vector<string> &);
void Dump(unsigned = 0,FILE * = stdout);
void InitFile();
void LoadMessages(string);
bool HaveIdleWork();
void OnIdle();
unsigned OnLoad(PMsg_p *);
unsigned OnLogP(PMsg_p *);
Expand Down
7 changes: 7 additions & 0 deletions Source/OrchBase/Handlers/CmCall.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,13 @@ fflush(fp);

//------------------------------------------------------------------------------

bool CmCall::IsEmpty()
{
return Equeue.empty();
}

//------------------------------------------------------------------------------

Cli CmCall::Front()
// Provides a valid Cm object: if there was one in the Q, it gets popped off
// the front; otherwise an empty one is returned.
Expand Down
1 change: 1 addition & 0 deletions Source/OrchBase/Handlers/CmCall.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ void CaEcho(Cli::Cl_t);
void CaFile(Cli::Cl_t);
void Dump(unsigned = 0,FILE * = stdout);
Cli Front();
bool IsEmpty();
void Show(FILE * = stdout);
unsigned operator()(Cli *);

Expand Down
6 changes: 6 additions & 0 deletions Source/RTCL/RTCL.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,12 @@ fprintf(fp,"%sRTCL dump ------------------------------------------------\n",os);
fflush(stdout);
}

bool RTCL::HaveIdleWork()
{
// We don't have OnIdle, so must never have work.
return false;
}

//------------------------------------------------------------------------------

unsigned RTCL::OnExit(PMsg_p * Z)
Expand Down
2 changes: 2 additions & 0 deletions Source/RTCL/RTCL.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ void Dump(unsigned = 0,FILE * = stdout);
unsigned OnExit(PMsg_p *);
unsigned OnRTCL(PMsg_p *);

bool HaveIdleWork();

public:
struct comms_t {
RTCL * pthis;
Expand Down
11 changes: 11 additions & 0 deletions Source/Root/Root.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -323,6 +323,17 @@ fflush(fp);

//------------------------------------------------------------------------------

bool Root::HaveIdleWork()
{
// If a scheduled exit condition is satisfied, we have work to do.
if((pCmCall->IsEmpty() and exitOnEmpty) or (appJustStopped and exitOnStop)){
return true;
}

// Otherwise, simply check the queue.
return !pCmCall->IsEmpty();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return !pCmCall->IsEmpty()? ;p


void Root::OnIdle()
{
Cli Cm = pCmCall->Front(); // Anything in the batch queue?
Expand Down
1 change: 1 addition & 0 deletions Source/Root/Root.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ unsigned CmRetu(Cli *);
bool Config();
#include "Decode.cpp"
void Dump(unsigned = 0,FILE * = stdout);
bool HaveIdleWork();
void OnIdle();
unsigned OnInje(PMsg_p *);
unsigned OnKeyb(PMsg_p *);
Expand Down