Exporting Content to Hugo
Create a management command to export a site to a directory in a format suitable to use as Hugo content.
Requirements
Required arguments:
- site: the site to be exported, by id or domain. Only one site at a time can be exported. By default, the entire site is exported.
- outdir: the directory where output should be written. This should be the top-level Hugo project directory. The exporter will create a
content subdirectory and if necessary an assets subdirectory. The command will create outdir if it does not exist. If outdir does exist, there are 3 possible behaviors: remove and recreate, overwrite, and skip. Default is skip. See below for behavior details.
To facilitate using Common Content as an admin interface for Hugo, we should support incremental export.
Create an actions.py file to house the export functions. Ensure separate functions for export_page (used for all BasePages) and export_site. The management command will call export_site.
When exporting a site to an existing directory, care must be taken not to destroy existing content unless specifically directed. Therefore the default export behavior is "skip".
The "skip" export behavior means no files will be overwritten during export. If a page or resource already exists, we will skip exporting it again. Only newly added pages or resources will be exported.
The "overwrite" export behavior allows incremental site updates to previously exported content. Page content or resources will be updated to the latest version in Common Content. For efficiency, in "overwrite" mode, the exporter should check the modified date of existing files. If the existing file is newer than the object being exported, it may be skipped, unless the force argument is true.
The "remove and recreate" behavior will dangerously delete all files in the target directory before beginning the export. Because this is dangerous and could result in permanent data loss, user confirmation should be required to use this behavior.
Content to be exported
Content should be exported in breadth-first order by model.
First, HomePage: For the HomePage model, only the current home page at the time of export will be exported.
Second: Page: export all.
Third: Section: export all.
Fourth: Article: export all.
For each page exported:
- if the page has a
share_image, that image should also be exported.
- if the page has an associated image collection (
article_images, page_images, section_images, homepage_images), all images in the collection should be exported.
- if the page has an
attachment_set, all attachments should be exported.
- if the HTML content of the page contain
img tags that reference local URLs, the corresponding media assets or static files should be exported. (If the file cannot be located in media or static, log an error and continue.)
See under Exporting Media for details.
In addition to the content stored in models, any files from static/$domain/ should be copied to the assets directory of the exported site, using the same relative path (that is, relative to static/$domain/ and assets).
Exporting pages
By default, pages will be exported in the HTML content format as supported by Hugo. The Page models store content in HTML format, so this is the best way to ensure no data or formatting is lost.
Each page will be exported as a Hugo page bundle (regardless of whether it has any page resources). The output path within the content directory will be the path returned by instance.get_absolute_url(). The file will, by Hugo conventions, be called index.html for Page and Article, and _index.html for HomePage and Section.
The exported HTML pages will contain YAML front matter. The following is a list of front matter fields and their sources. Fields with empty or null values should be omitted.
title: from instance.title
subtitle: from instance.subtitle
seo_title: from instance.seo_title
description: from instance.seo_description
draft: if the value of instance.status is USABLE, false, otherwise true
dateCreated: from instance.date_created
publishdate: from instance.date_published
lastmod: from instance.date_modified
expirydate: from instance.expires
copyright: from instance.copyright_notice
icon_name: from instance.icon_name
cover: from instance.share_image.url
The content for the index file will come from instance.body.
If the page has a value for instance.description, this should be output as a page resource called rich_description.html.
Exporting Media
Uploaded media will be exported as Hugo page resources. The resource path should be determined by the .url property of the MediaObject subclass. (Note that all resource paths are relative to the page bundle directory.)
To prevent data loss, Common Content MediaObjects need some special attention when exporting to Hugo. Hugo does not have a native data model for Resources, so we will export a YAML file containing metadata for each resource. For ease of association, the name of this file will be the name of the resource file with .yaml appended (example: resource=cover.jpg, metadata=cover.jpg.yaml).
For Attachments, the metadata fields exported should be: title, mime_type, upload_date, tags.
For Images, the metadata fields exported include those from Attachment, plus: alt_text, width, height.
Testing
The export_page and export_site functions should be thoroughly tested. Use Django's TestCase. Use database fixtures rather than mocks to set up test scenarios for export_page (for testing export_site, the export_page function may be mocked). For testing export of Image and Attachment, tiny test files may be included in the test data, for example an empty image or text file. Ensure any created temporary files are cleaned up.
To test export of site static files, create a static/$domain directory in the test data, and use @override_settings to set the STATIC_ROOT for the duration of that test.
Exporting Content to Hugo
Create a management command to export a site to a directory in a format suitable to use as Hugo content.
Requirements
Required arguments:
contentsubdirectory and if necessary anassetssubdirectory. The command will createoutdirif it does not exist. Ifoutdirdoes exist, there are 3 possible behaviors: remove and recreate, overwrite, and skip. Default is skip. See below for behavior details.To facilitate using Common Content as an admin interface for Hugo, we should support incremental export.
Create an
actions.pyfile to house the export functions. Ensure separate functions forexport_page(used for all BasePages) andexport_site. The management command will callexport_site.When exporting a site to an existing directory, care must be taken not to destroy existing content unless specifically directed. Therefore the default export behavior is "skip".
The "skip" export behavior means no files will be overwritten during export. If a page or resource already exists, we will skip exporting it again. Only newly added pages or resources will be exported.
The "overwrite" export behavior allows incremental site updates to previously exported content. Page content or resources will be updated to the latest version in Common Content. For efficiency, in "overwrite" mode, the exporter should check the modified date of existing files. If the existing file is newer than the object being exported, it may be skipped, unless the
forceargument is true.The "remove and recreate" behavior will dangerously delete all files in the target directory before beginning the export. Because this is dangerous and could result in permanent data loss, user confirmation should be required to use this behavior.
Content to be exported
Content should be exported in breadth-first order by model.
First,
HomePage: For theHomePagemodel, only the current home page at the time of export will be exported.Second:
Page: export all.Third:
Section: export all.Fourth:
Article: export all.For each page exported:
share_image, that image should also be exported.article_images,page_images,section_images,homepage_images), all images in the collection should be exported.attachment_set, all attachments should be exported.imgtags that reference local URLs, the corresponding media assets or static files should be exported. (If the file cannot be located in media or static, log an error and continue.)See under Exporting Media for details.
In addition to the content stored in models, any files from
static/$domain/should be copied to theassetsdirectory of the exported site, using the same relative path (that is, relative tostatic/$domain/andassets).Exporting pages
By default, pages will be exported in the HTML content format as supported by Hugo. The Page models store content in HTML format, so this is the best way to ensure no data or formatting is lost.
Each page will be exported as a Hugo page bundle (regardless of whether it has any page resources). The output path within the
contentdirectory will be the path returned byinstance.get_absolute_url(). The file will, by Hugo conventions, be calledindex.htmlfor Page and Article, and_index.htmlfor HomePage and Section.The exported HTML pages will contain YAML front matter. The following is a list of front matter fields and their sources. Fields with empty or null values should be omitted.
title: frominstance.titlesubtitle: frominstance.subtitleseo_title: frominstance.seo_titledescription: frominstance.seo_descriptiondraft: if the value ofinstance.statusis USABLE,false, otherwisetruedateCreated: frominstance.date_createdpublishdate: frominstance.date_publishedlastmod: frominstance.date_modifiedexpirydate: frominstance.expirescopyright: frominstance.copyright_noticeicon_name: frominstance.icon_namecover: frominstance.share_image.urlThe content for the index file will come from
instance.body.If the page has a value for
instance.description, this should be output as a page resource calledrich_description.html.Exporting Media
Uploaded media will be exported as Hugo page resources. The resource path should be determined by the
.urlproperty of the MediaObject subclass. (Note that all resource paths are relative to the page bundle directory.)To prevent data loss, Common Content MediaObjects need some special attention when exporting to Hugo. Hugo does not have a native data model for Resources, so we will export a YAML file containing metadata for each resource. For ease of association, the name of this file will be the name of the resource file with
.yamlappended (example: resource=cover.jpg, metadata=cover.jpg.yaml).For Attachments, the metadata fields exported should be:
title,mime_type,upload_date,tags.For Images, the metadata fields exported include those from Attachment, plus:
alt_text,width,height.Testing
The
export_pageandexport_sitefunctions should be thoroughly tested. Use Django's TestCase. Use database fixtures rather than mocks to set up test scenarios forexport_page(for testingexport_site, theexport_pagefunction may be mocked). For testing export of Image and Attachment, tiny test files may be included in the test data, for example an empty image or text file. Ensure any created temporary files are cleaned up.To test export of site static files, create a
static/$domaindirectory in the test data, and use@override_settingsto set theSTATIC_ROOTfor the duration of that test.