Table "assigned_resources" may be inconsistent, leading to phoenix ignoring some nodes

**Symptoms**:
 Phoenix is configured by default to not reboot suspected nodes that still have jobs running. This is configured by excluding nodes having a resource into the `CURRENT` state into the `assigned_resources` table. We noticed that our phoenix instance is always ignoring some nodes that don't have jobs running on it anymore.

**The suspected bug**:
A deep look inside our OAR database, revealed at least for one job, that we had such an error:
`2020-05-24 00:02:14> EXIT_VALUE_OAREXEC:[bipbip 36324341] error of oarexec, exit value = 61; the job 36324341 is in Error and the node luke17 is Suspected; If this job is of type cosystem or deploy, check if the
 oar server is able to connect to the corresponding nodes, oar-node started`
The `luke17` node was never rebooted by phoenix after this date.
And we found that the corresponding resource was still in the `CURRENT` state into the `assigned_resources` table.

```
 moldable_job_id | resource_id | assigned_resource_index
-----------------+-------------+-------------------------
        36324736 |         391 | CURRENT

 moldable_id | moldable_job_id | moldable_walltime | moldable_index                                                                                                                                                
-------------+-----------------+-------------------+----------------                                                                                                                                               
    36324736 |        36324341 |              3600 | LOG              
```

Removing the inconsistent `CURRENT` entry solved the problem.

So, maybe the case "EXIT_VALUE_OAREXEC" when launching a job does not pass the `CURRENT` entry to `LOG` into `assigned_resources` ?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Table "assigned_resources" may be inconsistent, leading to phoenix ignoring some nodes #177

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Table "assigned_resources" may be inconsistent, leading to phoenix ignoring some nodes #177

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions