-
Notifications
You must be signed in to change notification settings - Fork 5
Feature: Add a persistent backend to the cloud-init server which uses duckdb via quack/quack #68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces support for a new persistent storage backend (quackstore) using duckdb via quack/quack while maintaining support for the in-memory memstore. It updates the main application flags and initialization logic, adds new tests for quackstore, and refactors model JSON marshalling functions.
Reviewed Changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| pkg/cistore/testing/store_test_suite.go | New tests for verifying store implementations. |
| pkg/cistore/models_test.go | Tests for CloudConfigFile custom JSON marshaling/unmarshaling. |
| pkg/cistore/models.go | Refactored JSON logic for CloudConfigFile serialization. |
| internal/quackstore/quackstore_test.go | Tests for quackstore including DB initialization and cleanup. |
| internal/quackstore/quackstore.go | Implementation of quackstore; schema initialization and API methods. |
| internal/memstore/ciMemStore_test.go | Standard in-memory store tests. |
| cmd/cloud-init-server/main.go | Updated flag parsing and backend switching logic for storage backend. |
Files not reviewed (1)
- go.mod: Language not supported
|
Any progress on this? |
|
Thanks @alexlovelltroy. Do you have instructions for testing this? |
|
Also:
|
|
@synackd This PR adds two flags to control the behavior of the storage backend. In addition to running the unit tests in the In a containerized deployment, the files should be stored in a volume that is remounted correctly when restarting the container. It is easier to test running manually though. |
|
I'm currently testing using a modification of the cloud-init quadlet in the release repo, using a locally-built cloud-init container using /etc/containers/systemd/cloud-init-server.container I've also created the corresponding volume: /etc/containers/systemd/cloud-init.volume I've modified /etc/openchami/configs/openchami.env to have: STORAGE_BACKEND=quack
DB_PATH=/cloud-init/cloud-init.dbAfter reloading systemd and (re)starting the container, the following fatal error appears in the journal: Adding the |
|
Building via goreleaser yields the same thing. |
|
Cross-platform builds with CGO are easier on ubuntu and should require a glibc container rather than a musl one. I've updated the container and added a build-in-container script that should replicate the way github will do the builds. |
|
With the most recent commit, I get the following error in Goreleaser when running |
|
I'll work on that in the morning. Can you test the standalone binary in the meantime? |
|
Build automation is noble yak shaving... The yak never gets any smaller. At least this time I think the builds are happy. |
synackd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finally got the container to build with the script after applying the suggested edits.
However, when starting the container, I see:
cloud-init-server[2766863]: /usr/local/bin/cloud-init-server: /lib64/libm.so.6: version `GLIBC_2.38' not found (required by /usr/local/bin/cloud-init-server)
cloud-init-server[2766863]: /usr/local/bin/cloud-init-server: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /usr/local/bin/cloud-init-server)
Run running the binary itself via
./cloud-init-server --storage-backend quack --db-path ./cloud-init.db
it panics:
No JWKS URL provided; secure route will be disabled
5:32PM ERR Failed to get SMD data error="Get \"http://smd:27779/hsm/v2/Inventory/EthernetInterfaces/\": dial tcp: lookup smd: no such host"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xab17f4]
goroutine 1 [running]:
github.com/go-chi/chi/v5.chain({0xc00038a0c8, 0x1, 0x2779680?}, {0x2925720?, 0x281f2a0?})
/root/go/pkg/mod/github.com/go-chi/chi/v5@v5.2.1/chain.go:43 +0x34
github.com/go-chi/chi/v5.Middlewares.Handler(...)
/root/go/pkg/mod/github.com/go-chi/chi/v5@v5.2.1/chain.go:13
github.com/go-chi/chi/v5.(*Mux).handle(0xc0003842a0, 0x8, {0x27ca46e, 0xa}, {0x2925720, 0x281f2a0})
/root/go/pkg/mod/github.com/go-chi/chi/v5@v5.2.1/mux.go:436 +0x1ae
github.com/go-chi/chi/v5.(*Mux).Get(0xc000384240?, {0x27ca46e?, 0x1?}, 0x735e79?)
/root/go/pkg/mod/github.com/go-chi/chi/v5@v5.2.1/mux.go:162 +0x2d
main.initCiClientRouter({0x293f9c0, 0xc000384240}, 0xc000051c40, 0x0)
/opt/shared/cloud-init/cmd/cloud-init-server/main.go:265 +0xeb
main.startServer()
/opt/shared/cloud-init/cmd/cloud-init-server/main.go:228 +0xbeb
main.main.func1(0xc000058400?, {0x27c4c6e?, 0x4?, 0x27c4b4a?})
/opt/shared/cloud-init/cmd/cloud-init-server/main.go:65 +0xf
github.com/spf13/cobra.(*Command).execute(0xc0001d5b08, {0xc000034060, 0x4, 0x4})
/root/go/pkg/mod/github.com/spf13/cobra@v1.9.1/command.go:1015 +0xa94
github.com/spf13/cobra.(*Command).ExecuteC(0xc0001d5b08)
/root/go/pkg/mod/github.com/spf13/cobra@v1.9.1/command.go:1148 +0x40c
github.com/spf13/cobra.(*Command).Execute(...)
/root/go/pkg/mod/github.com/spf13/cobra@v1.9.1/command.go:1071
main.main()
/opt/shared/cloud-init/cmd/cloud-init-server/main.go:75 +0x6f
Though this doesn't seem to be related to this pull request. Applying the following fix:
diff --git a/cmd/cloud-init-server/main.go b/cmd/cloud-init-server/main.go
index 9b5113a..9cb98f6 100644
--- a/cmd/cloud-init-server/main.go
+++ b/cmd/cloud-init-server/main.go
@@ -262,10 +262,17 @@ func initCiClientRouter(router chi.Router, handler *CiHandler, wgInterfaceManage
// Add cloud-init endpoints to router
router.Get("/openapi.json", DocsHandler)
router.Get("/version", VersionHandler)
- router.With(wireGuardMiddleware).Get("/user-data", UserDataHandler)
- router.With(wireGuardMiddleware).Get("/meta-data", MetaDataHandler(handler.sm, handler.store))
- router.With(wireGuardMiddleware).Get("/vendor-data", VendorDataHandler(handler.sm, handler.store))
- router.With(wireGuardMiddleware).Get("/{group}.yaml", GroupUserDataHandler(handler.sm, handler.store))
+ if wireGuardMiddleware != nil {
+ router.With(wireGuardMiddleware).Get("/user-data", UserDataHandler)
+ router.With(wireGuardMiddleware).Get("/meta-data", MetaDataHandler(handler.sm, handler.store))
+ router.With(wireGuardMiddleware).Get("/vendor-data", VendorDataHandler(handler.sm, handler.store))
+ router.With(wireGuardMiddleware).Get("/{group}.yaml", GroupUserDataHandler(handler.sm, handler.store))
+ } else {
+ router.Get("/user-data", UserDataHandler)
+ router.Get("/meta-data", MetaDataHandler(handler.sm, handler.store))
+ router.Get("/vendor-data", VendorDataHandler(handler.sm, handler.store))
+ router.Get("/{group}.yaml", GroupUserDataHandler(handler.sm, handler.store))
+ }
router.Post("/phone-home/{id}", PhoneHomeHandler(wgInterfaceManager, handler.sm))
router.Post("/wg-init", wgtunnel.AddClientHandler(wgInterfaceManager, handler.sm))
}and I was finally able to test! :)
I did some very basic tests and I was able to persistently store cluster-defaults and group data.
|
The container successfully builds now with e626748. |
|
I wonder if the issue is that we are building the binary using an Ubuntu container but it is being run in a Rocky Linux 9 container. |
e626748 to
a29eaca
Compare
|
Finally got the DCO to pass. @synackd can you recheck? |
|
Glibc version errors are still occurring in the container. This is because I think the solution is to build in the same container which cloud-init will run, With docker: With Podman (what I've tested with, Podman socket needs to be enable): |
|
Now that I've gotten the container running, I've tried setting: STORAGE_BACKEND=quack
DB_PATH=/cloud-init/cloud-init.db
DEBUG=trueand am using the following quadlet with an existing OpenCHAMI installation (the [Unit]
Description=The cloud-init-server container
Wants=smd.service
After=smd.service opaal.service
PartOf=openchami.target
[Container]
ContainerName=cloud-init-server
HostName=cloud-init
Image=ghcr.io/openchami/cloud-init:v1.2.3-amd64
Volume=cloud-init:/cloud-init:rw,Z
# Environment Variables
EnvironmentFile=/etc/openchami/configs/openchami.env
# Networks for the Container to use
Network=openchami-internal.network
# Proxy settings
PodmanArgs=--http-proxy=false
[Service]
Restart=alwaysI can see the DB path and enablement get picked up, but DuckDB is attempting to create a directory in the root directory at This seems to be occurring in the |
Upon further discussion, it might be better to change the container that cloud-init runs in from |
10137fc to
ea28eb1
Compare
|
I tried several different things, but switching to an Ubuntu base container makes the CGO stuff work better with the fewest complications. |
synackd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was attempting to test with ochami and was getting a bunch of 404s. Looks like the API got modified in an undocumented way which breaks functionality. See my review comment.
Other than my review above, the container build works with the new ubuntu changes. I still see the DuckDB extension permission error when running the container, but cloud-init runs in the container now. |
- Added QuackStore implementation for managing groups, instances, and cluster defaults using Quack for persistence. - Introduced unit tests for QuackStore to ensure functionality and data integrity. - Updated go.mod to include new dependencies and replaced the quack module path. - Enhanced CloudConfigFile JSON marshaling and unmarshaling for better encoding handling. test(memstore): add unit tests for MemStore implementation - Introduced a new test file for MemStore to validate its functionality. - Integrated standard test suite from cistore testing package to ensure comprehensive testing. feat(cloud-init-server): add support for multiple storage backends - Introduced a new command-line flag for selecting the storage backend (mem or quack). - Default storage backend set to memstore, with a new option for quackstore using a specified database path. - Updated the datastore initialization logic to handle both memstore and quackstore, including error handling for quackstore initialization. fix: correct UnmarshalJSON and add MarshalJSON for CloudConfigFile chore: clarify comments in custom marshal/unmarshal funcs test: add unit tests for CloudConfigFile JSON marshal/unmarshal fix(tests): use base64-encoded data for base64 marshal test refactor: Use an alternate Unmarshal implementation chore: add base64 encoding import for future functionality refactor: simplify UnmarshalJSON and add MarshalJSON for CloudConfigFile - Removed base64 encoding handling from UnmarshalJSON for clarity. - Added MarshalJSON implementation to ensure proper JSON serialization of CloudConfigFile. feat(quackstore): implement unique instance ID generation - Added functionality to generate a unique instance ID in the format "i-XXXXXX" using a random 6-digit hexadecimal string. - Updated the comment to clarify the instance ID generation process. fix: correct UnmarshalJSON and add MarshalJSON for CloudConfigFile chore: clarify comments in custom marshal/unmarshal funcs feat(quackstore): implement QuackStore for persistent storage with tests - Added QuackStore implementation for managing groups, instances, and cluster defaults using Quack for persistence. - Introduced unit tests for QuackStore to ensure functionality and data integrity. - Updated go.mod to include new dependencies and replaced the quack module path. - Enhanced CloudConfigFile JSON marshaling and unmarshaling for better encoding handling. fix: correct UnmarshalJSON and add MarshalJSON for CloudConfigFile Update dependencies and remove "replace" statement fro quack/quack feat: add versions property to API documentation and update build settings feat: update Dockerfile for Red Hat base image and add build script for GoReleaser feat: add GoReleaser configuration for Darwin builds and update build-in-container script for Docker support fix: update build-in-container script to use configurable Docker socket path and improve router middleware handling Signed-off-by: Alex Lovell-Troy <alovelltroy@lanl.gov> Signed-off-by: Alex Lovell-Troy <alex@lovelltroy.org> Signed-off-by: Alex Lovell-Troy <alovelltroy@lanl.gov>
synackd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of places where errors are being ignored, many of which I don't quite understand if there's a legitimate reason or not. If there is a legitimate reason, they should be commented (I did see some comments already) since ignoring errors in production code is generally not a good practice since it makes it hard to troubleshoot where problems are arising.
Besides that, there were a few idiomatic suggestions I added.
This was purely a code review and I still need to test it.
|
Other than the DuckDB plugins error still occurring, everything else appears to work. |
|
Here's the error I'm still getting, which occurs when cloud-init starts: |
Signed-off-by: Alex Lovell-Troy <alovelltroy@lanl.gov>
…critical. Add comments where errors are ignored Signed-off-by: Alex Lovell-Troy <alovelltroy@lanl.gov>
Signed-off-by: Alex Lovell-Troy <alovelltroy@lanl.gov>
While it's functionally feasible to ignore some errors, from a troubleshooting perspective it's better to log certain errors to pinpoint the cause of an issue in case unusual conditions arise. Signed-off-by: Devon Bautista <17506592+synackd@users.noreply.github.com>
synackd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some of the errors we are ignoring, I'd be more comfortable if we logged them even if we don't actually handle them. That way any unexpected behavior presents enough information for the administrator to troubleshoot where an issue originates. I pushed a commit that adds this.
Besides that, now I'm seeing:
cloud-init-server[178104]: 12:30AM ERR Failed to load DuckDB extensions error="IO Error: Failed to download extension \"json\" at URL \"http://extensions.duckdb.org/v1.1.3/linux_amd64/json.duckdb_extension.gz\"\nExtension \"json\" is an existing extension.\n (ERROR Could not establish connection)" home=/tmp/duckdbhome
when cloud-init starts.
…plugins Signed-off-by: Alex Lovell-Troy <alovelltroy@lanl.gov>
|
@synackd Can you confirm that you don't get that error anymore? I'm building the container differently now so we can preload all plugins which is also safer. |
|
This is what I see now: |
|
This is with the goreleaser-built docker container? |
|
Yep, built with: |
… in Dockerfile fix: bump quack dependency version to v0.0.3 in go.mod and go.sum Signed-off-by: Alex Lovell-Troy <alovelltroy@lanl.gov>
|
I added more error handling to the quack library and refactored the way extensions are loaded. |
|
I no longer see the error! Are there any more impending changes that need to be made? |
|
Not that I'm aware of. Once you approve this PR, we can merge it. |
synackd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a commit updating the documentation of the removal of /cloud-init. Let me know your thoughts on this. I think this is the last change needed before merging.
| // Client sub-router | ||
| router_client := chi.NewRouter() | ||
| initCiClientRouter(router_client, ciHandler, wgInterfaceManager) | ||
| router.Mount("/cloud-init", router_client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mounting things at / instead of /cloud-init is a welcome change, but it will be a breaking change to users of the release RPM unless
http-request replace-path ^/cloud-init(/.*) \1
is added to the cloud-init backend in the HAproxy config.
We can certainly make this change when incorporating this PR into the release repo, but I think it's worth pointing out here for posterity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reflecting some more, wouldn't this technically be an API change? Not sure we want to do that in this PR. Thoughts?
dcbf35b to
891c556
Compare
Signed-off-by: Devon Bautista <17506592+synackd@users.noreply.github.com>
891c556 to
ebf96e5
Compare
|
Missed a few things, but commit is up-to-date now. |
|
this PR requires an update to haproxy before we can make it a release. |
synackd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready. Making a note here that the endpoints are mounted at / with this PR instead of /cloud-init, so we'll need to note that when making a release.
This pull request introduces a new storage backend,
quackstore, and includes various updates to support this addition. The most significant changes involve modifying the main application logic to handle the new backend, adding the necessary dependencies, and implementing and testing thequackstorefunctionality.Main Application Updates:
quackstoreimport and new flags for selecting the storage backend and database path incmd/cloud-init-server/main.go. [1] [2] [3]memstoreorquackstore).New
quackstoreImplementation:quackstorepackage with methods for managing groups, instances, and cluster defaults, including initialization and schema setup.Testing Enhancements:
memstoreto ensure it adheres to the standard test suite.quackstoreto ensure proper functionality and integration.Model Serialization:
CloudConfigFilestruct to implement custom JSON marshaling and unmarshaling, handling different encoding types (base64andplain).