Number of Jobs control for HistProducerFileTask

Currently, HistProducerFileTask can quickly submit many jobs

Job map works as ```nInputFiles * nPlots```
Since it is common to have thousands of inputFiles (each dataset is down to the nano file) and 10's of plots, this can become very large


```
(flaf_env) [daebi@lxplus930 HH_bbWW]$ law run HistProducerFileTask --period Run3_2022 --version may22 --print-status 0,1
print task status with max_depth 0 and target_depth 1

0 > HistProducerFileTask(effective_workflow=htcondor, branch=-1, version=may22, period=Run3_2022, customisations=, test=False, n_cpus=1, workflow=htcondor)
      jobs: LocalFileTarget(fs=local_fs, path=/afs/cern.ch/work/d/daebi/diHiggs/HH_bbWW/data/HistProducerFileTask/may22/Run3_2022/htcondor_jobs_0To12780.json, opti
            onal)
        existent
      collection: TargetCollection(len=12780, threshold=12780.0)
```


The accepted solution to this was to just use ```--tasks-per-job 10``` or so, reducing the number of condor jobs by a factor of 10

But this has another issue, since the ```tasks-per-job``` argument wraps jobs near each other, it will pair similar files together in order
The problem is that quick 'small' files will run in the same job very quickly and slow 'large' files will run in the same job very very slowly

Take TTto4Q vs TTto2L2Nu as example -- This method will put 10 TTto4Q files together and finish quickly as almost no events pass the lepton requirements. On the other side, 10 TTto2L2Nu files together will finish very slowly as each file has many events passing the lepton requirements.

I do not have a solution to this, but any better branch organization could help. Even doing ```---print-status 0,1``` takes 20 minutes to load as it must index 13,000 jobs. If ```HistProducerFileTask``` could run on an hadded file for the whole dataset maybe? Or if ```--tasks-per-job``` could randomly match jobs in a way to smooth out performance?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Number of Jobs control for HistProducerFileTask #152

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Number of Jobs control for HistProducerFileTask #152

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions