Skip to content

dally96/DisplacedTauAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

To run skimming on NanoAODs:

  1. Run ./shell coffeateam/coffea-dask-almalinux9:latest to enter a singularity container

    a. These are all my current package versions:

    - uproot 5.6.2
    - awkward 2.7.4
    - dask-awkward 2025.2.0
    - dask 2024.8.0
    - coffea 2025.1.1
    - numpy 1.24.0
    - hist 2.8.0
    - dask-histogram 2025.2.0
    
  2. Next run ulimit -n 20000 -- this circumvents a "Too many files open error"

  3. Create a dictionary of the datasets like in fileset.py

  4. Add the dictionary to preprocess.py and copy the sturcture of one of the loops to dump the output of the preprocess step to a pkl file

  5. Once that's done running, the first skimming step is done by mutau_bkgd_skim.py

    a. The second argument in Line 243 changes where the output root files are written

    out_file = uproot.dask_write(out_dict, "root://cmseos.fnal.gov//store/user/dally/first_skim/"+events.metadata['dataset'], compute=False, tree_name='Events')
    b. Line 264 is where you import the pkl file from preprocess.py
    with open("preprocessed_fileset.pkl", "rb") as f:
    Stau_QCD_DY_dataset_runnable = pickle.load(f)
    c. Line 296 is there so that a dataset isn't skimmed again in case there were other datasets that failed. Change the directory to wherever you're saving the root files
    if samp not in os.listdir("/eos/uscms/store/user/dally/first_skim/"):
    d. Line 297 is there so that if there is a only some datasets you want to run over, you ignore those you don't. You can comment it out if it's not useful to you
    if "TT" in samp or "DY" in samp or "Stau" in samp or "QCD" in samp or "MET" in samp: continue
    e. If a dataset has successfully been processed, you should see line 315
    print(f"Finished in {elapsed:.1f}s")
    f. If it errors out, the jobs should close themselves

  6. Next, I usually run mergeFiles.py in the directory the root files have been written, but this is not necessary.

  7. Create another dictionary with the output root files from the first skimming step and follow step 5 for this dictionary

  8. The second skimming step is done by selection_cuts.py. Steps 5a-5f apply here at the respective lines.

  9. Again, I usually run mergeFiles.py here since it makes plotting faster

  10. Plotting is done in plotting_processor_mu.py

    a. Lines 327-329 will dump the QCD histograms into a pkl file to use later on in ROCC-coffea.py which is where we plot the isolation ROC curves https://github.com/dally96/DisplacedTauAnalysis/blob/main/plotting_processor_mu.py#L327-L329

  11. Load pkl file into ROCC-coffea.py

    with open("muon_QCD_hists_Iso_Displaced.pkl", "rb") as file:

  12. Make sure bins the bins match the ones from plotting_processor_mu.py

  13. Now run the script, and it should output ROC curves

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •