You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Lazily allocate and initialise JTs. The goal is to support very large numbers of threads (JTALIGNBDY=1<<31, say, or perhaps just 26 for the sake of the rwlocks) while still having minimal startup resource usage on normal-size machines. I haven't added any allowance for decommitting these; presumably, if you need O(big) threads then you need O(big) threads, so it just doesn't seem very important
@@ -86,7 +86,7 @@ I jtextendunderlock(J jt, A *abuf, US *alock, I flags){A z;
86
86
// wakeallct keeps track of the number of wakealls being processed. When this is nonzero, a waiter must not allow the block pointed to by futexwt to go away. And, it must
87
87
// consider that wakeall may have sampled futexwt before it was cleared. So, the waiter must wait for wakeallct to go to 0 before exiting.
88
88
// Only a couple of threads can call wakeall (the leader, and a JBreak), so 1 byte suffices for wakeallct.
@@ -104,11 +104,11 @@ A jtsystemlock(J jt,I priority,A (*lockedfunction)(J)){A z;
104
104
// Process the request. We don't know what the highest-priority request is until we have heard from all the
105
105
// threads. Thus, it is possible that our request will still be pending whe we finish. In that case, loop till it is satisfied
106
106
while(priority!=0){
107
-
Sxxx=0; Ileader=__atomic_compare_exchange_n(&JT(jt,systemlock), &xxx, (S)1, 0, __ATOMIC_ACQ_REL, __ATOMIC_RELAXED); // go to state 1; set leader if we are the first to do so
107
+
Ileader=__atomic_compare_exchange_n(&JT(jt,systemlock), &(S){0}, (S)1, 0, __ATOMIC_ACQ_REL, __ATOMIC_RELAXED); // go to state 1; set leader if we are the first to do so
108
108
Inrunning=0; JTT*jjbase=JTTHREAD0(jt); // #running threads, base of thread blocks
109
109
// In the leader task only, go through all tasks (including master), turning on the SYSLOCK task flag in each thread. Count how many are running after the flag is set
110
110
// Also, wake up all tasks that are in a loop that needs interrupting on system action. Those loops will honor it when we are in state 1/2
// state 2: lock requesters indicate request priority and we wait for all tasks to come to a stop. We wake all threads that are waiting on pyx/mutex
113
113
Coldpriority; DOINSTATE(leader,2,oldpriority=__atomic_fetch_or(&JT(jt,adbreak)[1],priority,__ATOMIC_ACQ_REL);) // remember priority before we made our request
114
114
// state 3: all threads get the final request priorities
@@ -132,7 +132,7 @@ A jtsystemlock(J jt,I priority,A (*lockedfunction)(J)){A z;
132
132
if(executor){
133
133
__atomic_store_n(&((C*)&JT(jt,breakbytes))[1],0,__ATOMIC_RELEASE); // clear the error flag from the interrupt request
134
134
// go through all threads, turning off SYSLOCK in each. This allows other tasks to run and new tasks to start
// we use the 6 LSBs of jobq->ht[0] as the lock, so that when we get the lock we also have the job pointer. The job is always on a cacheline boundary
340
340
// We take JOBLOCK before taking the mutex, always. By measurement (20220516 SkylakeX, 4 cores) the job lock keeps contention low until the tasks are < 400ns
341
341
// long, while using the mutex gives out at < 1000ns
342
-
_Static_assert(MAXTHREADS<64,"JOBLOCK fails if > 63 threads");
342
+
_Static_assert(MAXTHREADSINPOOL<64,"JOBLOCK fails if > 63 threads");
jt->rngdata=(RNG*)(((I)malloc(sizeof(RNG)+CACHELINESIZE)+CACHELINESIZE-1)&-CACHELINESIZE); mvc(sizeof(RNG),jt->rngdata,1,MEMSET00); // place to hold RNG data, aligned to cacheline
189
-
}
184
+
// initialise shared buffers
185
+
staticBjtbufferinits(JSjjt){
186
+
R !!(INITJT(jjt,breakfn)=calloc(1,NPATH)); // place to hold the break filename
187
+
}
188
+
189
+
// initialise thread-local buffers for thread threadno. Requires synchronisation
jmrelease(jt,sizeof(JST)); //the jt block itself can be released; we effectively orphan any blocks pointed to there by, because they are used by the globals we've just initialised
730
728
}
731
729
732
730
// Init for a new J instance. Globals have already been initialized.
733
731
// Create a new jt, which will be the one we use for the entirety of the instance.
734
732
JS_stdcallJInit(void){
735
-
if(!dll_initialized)R0; // constructor failed
736
-
JSjtnobdy;
737
-
RZ(jtnobdy=malloc(sizeof(JST)+JTALIGNBDY-1));
738
-
JSjt= (JS)(((I)jtnobdy+JTALIGNBDY-1)&-JTALIGNBDY); // force to SDRAM page boundary
0 commit comments