Skip to content

Commit 7d86914

Browse files
committed
(GH-4013) Clean up stranded partitions
If a partition is fully detached, but fails to be dropped during its GC operation, subsequent GC operations will not see that partition at all. It will be stranded and PuppetDB will never remove it. During GC, search for stranded partitions that need to be removed and add them to the list of partitions that need to be dropped. There is no structural way to tell the difference between a non-partitioned table and a detached partition table. This PR uses a regular expression, which means that PuppetDB cannot have any non-partitioned tables matching the regular expressions used to identify stranded partitions.
1 parent 9348675 commit 7d86914

File tree

3 files changed

+59
-14
lines changed

3 files changed

+59
-14
lines changed

documentation/release_notes_7.markdown

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,10 @@ Released TBD.
2525
partially detached and block future garbage collection progress. Garbage
2626
collection will now finalize the partition detach operation and remove the
2727
table. ([GitHub #4013](https://github.com/puppetlabs/puppetdb/issues/4013))
28+
* Fixed an issue with report garbage collection where a partition would be
29+
detached, but the table was never deleted. Garbage collection will now
30+
identify and clean-up these tables.
31+
([GitHub #4013](https://github.com/puppetlabs/puppetdb/issues/4013))
2832

2933
## PuppetDB 7.20.0
3034

src/puppetlabs/puppetdb/scf/storage.clj

Lines changed: 31 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1689,6 +1689,23 @@
16891689
(jdbc/do-commands (format "ALTER TABLE %s DETACH PARTITION %s FINALIZE" parent pending))
16901690
(str pending))))
16911691

1692+
(defn find-stranded-partitions
1693+
"Identify tables that match the child format of a partitioned table (like reports_historical)
1694+
that are not present in the pg_inherits table. These partitions have been detached, but failed
1695+
to be deleted.
1696+
1697+
Tables that are not partitioned will also not be in the pg_inherits table, so you MUST
1698+
write a child-format that does not match any non-partitioned tables.
1699+
1700+
Returns a list of strings. Each string is a stranded partition that should be removed."
1701+
[child-format]
1702+
(->> [(str "SELECT tablename"
1703+
" FROM pg_tables WHERE tablename ~ ?"
1704+
" AND tablename NOT IN (SELECT inhrelid::regclass::text FROM pg_catalog.pg_inherits)")
1705+
child-format]
1706+
jdbc/query-to-vec
1707+
(map (comp str :tablename))))
1708+
16921709
(defn prune-daily-partitions
16931710
"Either detaches or drops obsolete day-oriented partitions
16941711
older than the date. Deletes or detaches only the oldest such candidate if
@@ -1774,7 +1791,7 @@
17741791
"Drops the given set of tables. Will throw an SQLException termination if the
17751792
operation takes much longer than PDB_GC_DAILY_PARTITION_DROP_LOCK_TIMEOUT_MS."
17761793
[old-partition-tables update-lock-status status-key]
1777-
(let [drop #(doseq [table old-partition-tables]
1794+
(let [drop #(doseq [table (distinct old-partition-tables)]
17781795
(try
17791796
(update-lock-status status-key inc)
17801797
(jdbc/do-commands
@@ -1798,9 +1815,10 @@
17981815
;; PG14+
17991816
(let [detached-tables
18001817
(detach-daily-partitions "resource_events" date incremental?
1801-
update-lock-status :write-locking-resource-events)]
1818+
update-lock-status :write-locking-resource-events)
1819+
stranded-tables (find-stranded-partitions "^resource_events_\\d\\d\\d\\d\\d\\d\\d\\dz$")]
18021820
(jdbc/with-db-transaction []
1803-
(drop-partition-tables! detached-tables
1821+
(drop-partition-tables! (concat detached-tables stranded-tables)
18041822
update-lock-status :write-locking-resource-events)))))
18051823

18061824
(defn cleanup-dropped-report-certnames
@@ -1853,13 +1871,15 @@
18531871
;; PG14+
18541872
;; Detach partition concurrently must take place outside of a transaction.
18551873
(let [detached-resource-event-tables
1856-
(detach-daily-partitions "resource_events" effective-resource-events-ttl
1857-
incremental? update-lock-status
1858-
:write-locking-resource-events)
1874+
(detach-daily-partitions "resource_events" effective-resource-events-ttl
1875+
incremental? update-lock-status
1876+
:write-locking-resource-events)
1877+
stranded-events-tables (find-stranded-partitions "^resource_events_\\d\\d\\d\\d\\d\\d\\d\\dz$")
18591878
detached-report-tables
1860-
(detach-daily-partitions "reports" report-ttl
1861-
incremental? update-lock-status
1862-
:write-locking-reports)]
1879+
(detach-daily-partitions "reports" report-ttl
1880+
incremental? update-lock-status
1881+
:write-locking-reports)
1882+
stranded-reports-tables (find-stranded-partitions "^reports_\\d\\d\\d\\d\\d\\d\\d\\dz$")]
18631883
;; Now we can delete the partitions with less intrusive locking.
18641884
(jdbc/with-db-transaction []
18651885
;; Nothing should acquire locks on the detached tables, but to be safe, acquire
@@ -1869,9 +1889,9 @@
18691889
;; force a resource-events GC. prior to partitioning, this would have happened
18701890
;; via a cascade when the report was deleted, but now we just drop whole tables
18711891
;; of resource events.
1872-
(drop-partition-tables! detached-resource-event-tables
1892+
(drop-partition-tables! (concat detached-resource-event-tables stranded-events-tables)
18731893
update-lock-status :write-locking-resource-events)
1874-
(drop-partition-tables! detached-report-tables
1894+
(drop-partition-tables! (concat detached-report-tables stranded-reports-tables)
18751895
update-lock-status :write-locking-reports)
18761896
;; since we cannot cascade back to the certnames table anymore, go clean up
18771897
;; the latest_report_id column after a GC.

test/puppetlabs/puppetdb/cli/services_test.clj

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -584,7 +584,7 @@
584584
(with-test-db
585585
(let [config (-> (create-temp-config)
586586
(assoc :database *db* :read-database *read-db*)
587-
(assoc-in [:database :gc-interval] "0.01"))
587+
(assoc-in [:database :gc-interval] "60"))
588588
store-report #(sync-command-post (svc-utils/pdb-cmd-url)
589589
example-certname
590590
"store report"
@@ -610,7 +610,7 @@
610610
:db-lock-status db-lock-status})
611611
(is (empty? (jdbc/query ["SELECT * FROM reports"]))))
612612

613-
;; This test is not applicable unless our Postgres version is new enough
613+
;; These tests are not applicable unless our Postgres version is new enough
614614
;; to support the concurrent partition detach feature.
615615
(when (scf-store/detach-partitions-concurrently?)
616616
(testing "a partition stuck in the pending state is finalized and removed"
@@ -656,4 +656,25 @@
656656
(is (empty?
657657
(jdbc/query ["SELECT tablename FROM pg_tables WHERE tablename = ?" partition-table])))
658658

659-
(jdbc/do-commands "DELETE FROM reports")))))))))
659+
(jdbc/do-commands "DELETE FROM reports")))
660+
661+
(testing "a detached partition that was not removed is cleaned up by gc"
662+
(let [old-ts (-> 2 time/days time/ago)
663+
partition-table (format "reports_%s"
664+
(part/date-suffix (part/to-zoned-date-time (time/to-timestamp old-ts))))]
665+
(store-report (time/to-string old-ts))
666+
(store-report (to-string (now)))
667+
668+
;; Strand the partition before calling GC
669+
(jdbc/do-commands-outside-txn
670+
(format "ALTER TABLE reports DETACH PARTITION %s CONCURRENTLY" partition-table))
671+
672+
(svcs/sweep-reports! *db* {:incremental? false
673+
:report-ttl (time/parse-period "1d")
674+
:resource-events-ttl (time/parse-period "1d")
675+
:db-lock-status db-lock-status})
676+
677+
(is (empty?
678+
(jdbc/query ["SELECT tablename FROM pg_tables WHERE tablename = ?" partition-table])))
679+
680+
(jdbc/do-commands "DELETE FROM reports")))))))))

0 commit comments

Comments
 (0)