-
Notifications
You must be signed in to change notification settings - Fork 0
dijasila/Defects
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This file gives an overview of the structure of the defect HT project and
should be updated whenever new important information is coming up. It also describes
how we plan to do things in order to obtain consistent results and have a consistent
way of running calculations, extracting data, and visualizing results.
It is structured the following way:
0. Folder structure
1. General
2. Submitting and running calculations
3. How to deal with failed calculations
4. How to extract data
5. How to plot things
6. Misc
0. Folder Structure
===================
Let's try to keep the folder structure as simple as possible. I have structured the
current version of our project in the following way (changes can be applied, I am
happy about good suggestions from you):
- Our projects directory is: /home/niflheim2/cmr/WIP/defects
- First, we have three trees, each of which one of us runs its respective calculations
in. I.e. tree-sajid, tree-simone, tree-fabian. Within those folders we have the host
materials, as well as all of our defect folders and materials. How the defect folders
are structured is described in the docstring of asr.setup.defects.
- For each of those trees, I set up separate queues for MyQueue such that all the
calculations we run are correctly tracked.
- Second, the folder 'venv/': in this folder, I set up the virtual environment with all
of the packages we need (GPAW, ASE, ASR, MyQueue and more). In general, we should not
change anything here, unless it is absolutely vital. If so, I would take care of the
specifics here.
- Third, there are some important files in here. For example, 'workflow.py', 'materials.db',
'get_materials.py'. I suggest that we aim to keep it to the essentials here. All of the
special scripts, plots, databases and so on should be stored in the designated folders, as
will be discussed in the next points.
- The 'databases/' folder: of course, we want to have our big final database in the end.
On top of that it might also be necessary to do some pre-analysis and have several
smaller databases to figure out things. We store all of those databases in here, and
also make several backups in order to ensure we don't mess up something.
- The 'scripts/' and 'plots/' folders: the names speak for themselves here. All of the
extraction, analysis, ... scripts should be in the first one of those, whereas all
scripts that are connected to actually plotting data will be stored in the 'plots/'
folder. Note, that withing those respective folders, I have created subfolders for the
different phases of the project, see 1. chapter (general). Depending on which phase
of the project your script/plotting routine focussed on, it needs to be stored in the
respective folder.
- Naming convention for scripts and plotting routines:
* All plotting scripts and extraction scripts are supposed to be stored in the
'scripts/' folder and the respective subfolder 'phase{1,2,3,4}/'.
* Naming convention: '{type}.{label}.py'.
* 'type' specifies what kind of script you are dealing with. Possible types
are: 'db' (a script that extracts some kind of database), 'plot' (plotting script),
'analyse' (script to analyse outcomes of calculations like validity checks, error
analysis, general analysis that doesn't end up in a database), 'add' (adding information
or changing files for compability with analysis scripts), ...
* 'label' should give a very short label on what is done within the script.
* For example, a script that extracts a database containing information about the
host systems without defects could be called 'db.hosts.py'.
* Any other suggestions for naming conventions here?
- I created a file ('ISSUES.txt') where we should collect all of the things that need
to be fixed or the ones that are not working at the moment. With this, we can keep
track of all of the issues that need to be handled without forgetting them.
1. General
==========
- We want to always use the same versions of software. Therefore, I created a virtual
environment, in which we should aim to run everything. Although MyQueue always detects
the correct virtual environment while submitting jobs, I suggest we always activate
the environment before we start working on the project. Therefore, all you need to do
is: source '/home/niflheim2/cmr/WIP/defects/venv/activate'.
- I have set up MyQueue for us already and I think it is best if each one of us has
his individual queue within his tree. Having said that, I ensured that all of us have
the correct permission such that we can still submit jobs in each others trees. That
might come in handy during the extraction phase of the project (phase2). Note: in
general, let's still aim to restrict ourselves to run stuff in our own tree and
only go into the other ones for specific purposes.
- As was seen in the 0. chapter (folder structure), I propose to split the project into
three phases. They are:
* Phase 1: Setting things up for the big project, analysis of different parts
before the actual launch, setting up the workflow, and so on...
* Phase 2: Running the workflow, analysing the progress during, conducting
sanity checks, tracking the progress, possibly rescoping, ...
* Phase 3: After the calculations ran, we focus on extracting data, analysing results,
doing performance analysis, and so on.
* Phase 4: Presenting the database
2. Submitting and Running Calculations
======================================
- In general, we aim to always submit the calculations using the workflow we have set up.
You can find it in the main folder under 'workflow.py'. Within this workflow, I have
added some easy checks whether you submitted it in the right folders, but always take
care to submit it correctly. Also, keep in mind to always use the '-z' option of
MyQueue first, in order to do the dry run. After that is done, you can submit the
workflow for good. Note: the workflow in the main folder will probably be updated quite
a lot of times and, therefore, it might be a good idea to copy it also to your
respective tree. Note: submit a workflow always with 'mq workflow workflow.py folders/'.
- Sometimes, there might be tasks that you want to submit without the workflow. When doing
that, please ensure to also submit it with MyQueue such that we can always keep track
of what we actually ran and what we didn't.
- For calculations and analysis that is not part of the workflow or the general outline of
the database, I would suggest that we always create individual folders (but not within
the respective materials, in order to make sure we can easily extract things afterwards).
Thus, I have created a 'misc/' folder within each of our trees that is dedicated to those
kind of 'unplanned' calculations. When doing this, we will have a clear structure of our
material tree.
- Within that 'misc/' folder, you can set up whichever folder structure you need or want.
3. How to Deal with Failed, Running, Finished Calculations
===========================================================
- Always try to avoid removing calculations with MyQueue. When calculations fail, run out
of time, are cancelled, or else, try resubmitting them with improved parameters, settings,
or similar in order to ensure we log all of our calculations with MyQueue correctly.
- When calculations fail, try to figure out what went wrong and depending on the outcome,
resubmit the calculation.
- If possible, try to implement scripts that can analyse running, finished calculations and
conduct simple validity checks in order to see whether a calculation makes sense. Those
checks should be more thourough than the status of MyQueue, but also should not contain
analysing the entire calculations. I guess discussing that at some point is the best idea.
- At some point we need to write scripts that give us an overview of the status of the
calculations. (Just as a reminder to keep in mind.)
4. How to Extract Data
======================
- Aim to store everything in ASE database format. This can come in handy when plotting
results at a later stage. Of course, for some cases that is not possible. Here, you
can choose other methods to achieve your goal.
- One important point is that we discuss how we specifically store results in the big
database. We should discuss it soon.
- Whenever we extract data, we should try to write general scripts that are applicable to
run on the entire tree. Depending on the phase your script belongs to, save it in the
corresponding folder as discussed in chapter 0 (folder structure).
- Before extracting the big database, it might be advantageous to also check different
smaller parts or save subsets. When doing that, save the resulting databases in the
'databases/' folder.
5. How to Plot Things
=====================
- Similar to the extraction part, always aim to develop general plotting scripts that
are applicable to a all of our systems and that can be reused. If possible, plot
things using the data that is saved in an ASE database.
- All plotting scripts are supposed to be saved in the 'scripts/' folder and the
respective 'phase{1,2,3,4}/' folder depending on the stage of the project it belongs
to. Always name plotting scripts in the following way: 'plot.{}.py' where you put
a short label that let's others know what can be found in this script within the '{}'.
6. Misc
=======
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published