Atomic store operations are not detected on cached private pages

### Issue
When an ArgoDSM page is cached by a remote node, and the page is in private state (**caching node is single writer** or **no writer, caching node is sharer**), subsequent atomic writes to the page are not detected on this node by regular reads even after self-invalidation.

The question is whether this is a bug (I believe so), or if the semantics of the ArgoDSM atomic functions simply do not define such behavior in the first place. This is related to #20, but is further exposed by Ioannis work on memory allocation policies which fails some of the atomic tests when the allocated data is owned by a node other than 0.

### Reproduction
This following test exposes the bug on 2 or more nodes with the naive allocation policy. In order to display the second case (no writer, caching node is sharer), simply substitute `*counter = 0;` with `volatile int temp = *counter`. The important point is that `counter` is allocated on node 0, and that _exactly one_ other node performs a read or write to it.
```
include "argo/argo.hpp"

int main(){
	argo::init(1*1024*1024);
	argo::data_distribution::global_ptr<int> counter(argo::conew_<int>());
	
	if(argo::node_id()==1) {
		*counter = 0; // Node 0 owns the data, node 1 becomes single writer
	}
	argo::barrier(); // Barrier makes every node aware of the initialization
	
	// Atomically increment counter on each node
	for(int i=0; i<10; i++){
		argo::backend::atomic::fetch_add(counter, 1);
	}

	argo::barrier(); // Make sure every node has completed execution
	if(*counter == argo::number_of_nodes()*10) {
		printf("Node %d successful (counter: %d).\n", argo::node_id(), *counter);
	}else{
		printf("Node %d failed (counter: %d).\n", argo::node_id(), *counter);
	}
	
	argo::finalize();
}

```

### Detail
This issue is courtesy of the following optimization:
https://github.com/etascale/argodsm/blob/5f9b5728abf3967f8f6725375a690e1e1c61d6f1/src/backend/mpi/swdsm.cpp#L984-L994
The reason is that ArgoDSM atomics do not alter the Pyxis directory (`globalSharers`) state, and therefore cached remote pages in _single writer_  or _no writer, shared_ state are not invalidated upon self-invalidation causing the node to miss updates until the state of the page changes.

### Solution?
The fact that cached private pages are not downgraded to shared on atomic writes means that it is never completely safe to mix atomic writes and regular reads/writes. I believe that the correct solution would be to write atomic changes to the cache _and_ to update local (and remote when needed as a result) Pyxis directories to the correct state.

	if(
	// node is single writer
	(globalSharers[classidx+1]==id)
	\|\|
	// No writer and assert that the node is a sharer
	((globalSharers[classidx+1]==0) && ((globalSharers[classidx]&id)==id))
	){
	MPI_Win_unlock(workrank, sharerWindow);
	touchedcache[i] =1;
	/nothing - we keep the pages, SD is done in flushWB/
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atomic store operations are not detected on cached private pages #52

Issue

Reproduction

Detail

Solution?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Atomic store operations are not detected on cached private pages #52

Description

Issue

Reproduction

Detail

Solution?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions