Skip to content

Reduce file duplication in MadGraph #389

@tomeichlersmith

Description

@tomeichlersmith

Just reporting some statistics here - there is a lot of duplication of code with the current method for storing MadGraph files for later running. MG5 is bad but MG4 is even worse with some files being copied over 40 times. The tar-ball below has full listings of all the files and their md5sum as well as a sorted list of uniq md5sum with a corresponding example file. This does not handle duplicate code within files but it is a start.

mg-unique-file-listing.tar.gz

How

calculate md5sum of each file1

cd generators/madgraphN
fd -tf -x md5sum | sort > md5sum.list

get uniq files sorted by number of copies

uniq -c -w 32 md5sum.list | sort -nr > uniq.list

Footnotes

  1. using fd instead of find here since its faster. The find equivalent is find -type f -exec md5sum {} ';' | sort > md5sum.list

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions