-
Notifications
You must be signed in to change notification settings - Fork 0
amrith/file-dedup
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
A simple file deduplication utility
Will identify files that are potentially duplicates of each other.
This also includes the file-dedup library which can be used by itself.
from file_dedup.utils import compute_hashes
hashes = compute_hashes(path)
hashes is a dictionary that looks like:
{ 'hashvalue1': [
{ 'name': name, 'size': size },
... ],
'hashvalue2': [...]
}
In other words, for each hash (key), the value is a list of
disctionaries where each element is has a name and a size.
{
"62779c8df215502849dce5f6d8321caa": [
{"name": "/tmp/utils.pyc", "size": 1899}],
"3f04e1bcf72988422be91bf9d2791aea": [
{"name": "/tmp/dups.py", "size": 526},
{"name": "/tmp/dups2.py", "size": 526}],
"5cba436660471bf7a9de6fe412b29e64": [
{"name": "/tmp/utils.py", "size": 1235}],
"3ec03444b4db80ef14f5448f19731858": [
{"name": "/tmp/hashes.py", "size": 493}]
}
An example (hashes.py) is provided which illustrates this.
To only retreive those files which are duplicates
from file_dedup.utils import duplicates_hashes
hashes = duplicates_hashes(path)
An example (duplicates.py) is provided which illustrates this.
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published