Skip to content

Command to export a site to Hugo #117

@veselosky

Description

@veselosky

Exporting Content to Hugo

Create a management command to export a site to a directory in a format suitable to use as Hugo content.

Requirements

Required arguments:

  • site: the site to be exported, by id or domain. Only one site at a time can be exported. By default, the entire site is exported.
  • outdir: the directory where output should be written. This should be the top-level Hugo project directory. The exporter will create a content subdirectory and if necessary an assets subdirectory. The command will create outdir if it does not exist. If outdir does exist, there are 3 possible behaviors: remove and recreate, overwrite, and skip. Default is skip. See below for behavior details.

To facilitate using Common Content as an admin interface for Hugo, we should support incremental export.

Create an actions.py file to house the export functions. Ensure separate functions for export_page (used for all BasePages) and export_site. The management command will call export_site.

When exporting a site to an existing directory, care must be taken not to destroy existing content unless specifically directed. Therefore the default export behavior is "skip".

The "skip" export behavior means no files will be overwritten during export. If a page or resource already exists, we will skip exporting it again. Only newly added pages or resources will be exported.

The "overwrite" export behavior allows incremental site updates to previously exported content. Page content or resources will be updated to the latest version in Common Content. For efficiency, in "overwrite" mode, the exporter should check the modified date of existing files. If the existing file is newer than the object being exported, it may be skipped, unless the force argument is true.

The "remove and recreate" behavior will dangerously delete all files in the target directory before beginning the export. Because this is dangerous and could result in permanent data loss, user confirmation should be required to use this behavior.

Content to be exported

Content should be exported in breadth-first order by model.

First, HomePage: For the HomePage model, only the current home page at the time of export will be exported.
Second: Page: export all.
Third: Section: export all.
Fourth: Article: export all.

For each page exported:

  • if the page has a share_image, that image should also be exported.
  • if the page has an associated image collection (article_images, page_images, section_images, homepage_images), all images in the collection should be exported.
  • if the page has an attachment_set, all attachments should be exported.
  • if the HTML content of the page contain img tags that reference local URLs, the corresponding media assets or static files should be exported. (If the file cannot be located in media or static, log an error and continue.)

See under Exporting Media for details.

In addition to the content stored in models, any files from static/$domain/ should be copied to the assets directory of the exported site, using the same relative path (that is, relative to static/$domain/ and assets).

Exporting pages

By default, pages will be exported in the HTML content format as supported by Hugo. The Page models store content in HTML format, so this is the best way to ensure no data or formatting is lost.

Each page will be exported as a Hugo page bundle (regardless of whether it has any page resources). The output path within the content directory will be the path returned by instance.get_absolute_url(). The file will, by Hugo conventions, be called index.html for Page and Article, and _index.html for HomePage and Section.

The exported HTML pages will contain YAML front matter. The following is a list of front matter fields and their sources. Fields with empty or null values should be omitted.

  • title: from instance.title
  • subtitle: from instance.subtitle
  • seo_title: from instance.seo_title
  • description: from instance.seo_description
  • draft: if the value of instance.status is USABLE, false, otherwise true
  • dateCreated: from instance.date_created
  • publishdate: from instance.date_published
  • lastmod: from instance.date_modified
  • expirydate: from instance.expires
  • copyright: from instance.copyright_notice
  • icon_name: from instance.icon_name
  • cover: from instance.share_image.url

The content for the index file will come from instance.body.

If the page has a value for instance.description, this should be output as a page resource called rich_description.html.

Exporting Media

Uploaded media will be exported as Hugo page resources. The resource path should be determined by the .url property of the MediaObject subclass. (Note that all resource paths are relative to the page bundle directory.)

To prevent data loss, Common Content MediaObjects need some special attention when exporting to Hugo. Hugo does not have a native data model for Resources, so we will export a YAML file containing metadata for each resource. For ease of association, the name of this file will be the name of the resource file with .yaml appended (example: resource=cover.jpg, metadata=cover.jpg.yaml).

For Attachments, the metadata fields exported should be: title, mime_type, upload_date, tags.

For Images, the metadata fields exported include those from Attachment, plus: alt_text, width, height.

Testing

The export_page and export_site functions should be thoroughly tested. Use Django's TestCase. Use database fixtures rather than mocks to set up test scenarios for export_page (for testing export_site, the export_page function may be mocked). For testing export of Image and Attachment, tiny test files may be included in the test data, for example an empty image or text file. Ensure any created temporary files are cleaned up.

To test export of site static files, create a static/$domain directory in the test data, and use @override_settings to set the STATIC_ROOT for the duration of that test.

Metadata

Metadata

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions