Skip to content

Consider submission to other public archives (archive.org, archive.is) #12

@zmanion

Description

@zmanion

ArchiveBox supports submission to archive.org:

SAVE_ARCHIVE_DOT_ORG=True # set to False to disable submitting all URLs to Archive.org when archiving

https://github.com/ArchiveBox/ArchiveBox/blob/dev/archivebox/pkgs/abx-plugin-archivedotorg/abx_plugin_archivedotorg/archive_org.py

Not sure what our current ArchiveBox configuration is, but as best I recall, ArchiveBox doesn't do a lot of management, just fires off an archive.org curl command and does some minimal error/retry checking.

For the CVE capability, consider independently submitting references to at least archive.org and archive.is (which I believe is archive.ph and maybe archive.today). Manage submissions better than ArchiveBox. For example, consider if the domain is live, the URL works, if the URL is already in the archive and how old the last capture is, use an API key, check responses, and be sane/polite about retries.

I've hacked this up for archive.org and am happy to share details, like that available?url sometimes lies and cdx/search/cdx?url is better.

I have not done this for archive.is, but others have:

https://github.com/HRDepartment/archivetoday

https://webapps.stackexchange.com/questions/148066/how-do-i-archive-a-webpage-to-archive-today-using-wget-or-curl

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions