-
Notifications
You must be signed in to change notification settings - Fork 0
Description
For small response sizes, the total HTTP call time (from ausplotsR to this service) is perfectly acceptable. However, some of the "data frames" (to use R terminology) that are returned are quite large. The response from /veg_pi is currently 900k rows and this seems to the one that takes the longest.
Currently we have the nginx reverse proxy cache enabled, but only for 10 minutes.
Where is the bottleneck
The total time for the HTTP call to /veg_pi is ~12 seconds. It seems to be the json_agg() call that postgREST makes that makes up most of that time. The query (select * from api.veg_pi) is quite fast to run, about 1.4 seconds on my machine. So the rest is essentially the overhead that postgREST adds to convert to a JSON response. The full request (excluding setting up all the local variables) that postgREST makes is:
WITH
pgrst_source AS (SELECT "api"."veg_pi".* FROM "api"."veg_pi" )
SELECT
null::bigint AS total_result_set,
pg_catalog.count(_postgrest_t) AS page_total,
array[]::text[] AS header,
coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
coalesce(nullif(current_setting('response.headers', true), ''), '[]') AS response_headers
FROM ( SELECT * FROM pgrst_source) _postgrest_t;We can focus in on that json_agg() line, and as for a performance breakdown/explain, with (on my machine, with pg 13):
swarm=# explain analyze SELECT
swarm-# coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
swarm-# FROM ( SELECT * FROM api.veg_pi) _postgrest_t;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=49773.88..49773.89 rows=1 width=32) (actual time=7916.745..7916.751 rows=1 loops=1)
-> Hash Left Join (cost=7525.81..48548.88 rows=489997 width=464) (actual time=59.974..1541.044 rows=902338 loops=1)
Hash Cond: (hd.veg_barcode = wfod.veg_barcode)
-> Hash Left Join (cost=2105.13..27319.84 rows=489997 width=80) (actual time=14.656..956.797 rows=902338 loops=1)
Hash Cond: ((pi.veg_barcode)::text = hd.veg_barcode)
-> Hash Join (cost=254.51..24182.91 rows=489997 width=59) (actual time=1.002..695.140 rows=902338 loops=1)
Hash Cond: (slv.site_location_id = sl.site_location_id)
-> Hash Join (cost=168.92..22804.10 rows=489997 width=52) (actual time=0.737..456.859 rows=902338 loops=1)
Hash Cond: (pi.site_location_visit_id = slv.site_location_visit_id)
-> Seq Scan on point_intercept pi (cost=0.00..20050.93 rows=979993 width=48) (actual time=0.004..95.871 rows=979993 loops=1)
-> Hash (cost=163.13..163.13 rows=463 width=8) (actual time=0.729..0.730 rows=807 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 40kB
-> Seq Scan on site_location_visit slv (cost=80.56..163.13 rows=463 width=8) (actual time=0.257..0.631 rows=807 loops=1)
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 119
SubPlan 1
-> Seq Scan on site_location_visit (cost=0.00..80.26 rows=119 width=4) (actual time=0.005..0.230 rows=119 loops=1)
Filter: (NOT ok_to_publish)
Rows Removed by Filter: 807
-> Hash (cost=75.82..75.82 rows=782 width=15) (actual time=0.261..0.261 rows=782 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 45kB
-> Seq Scan on site_location sl (cost=0.00..75.82 rows=782 width=15) (actual time=0.006..0.159 rows=782 loops=1)
-> Hash (cost=1303.05..1303.05 rows=43805 width=31) (actual time=13.566..13.566 rows=43805 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 3260kB
-> Seq Scan on herbarium_determination hd (cost=0.00..1303.05 rows=43805 width=31) (actual time=0.004..5.410 rows=43805 loops=1)
-> Hash (cost=2838.45..2838.45 rows=40178 width=394) (actual time=45.184..45.185 rows=40178 loops=1)
Buckets: 16384 Batches: 8 Memory Usage: 1042kB
-> Seq Scan on wfo_determination wfod (cost=0.00..2838.45 rows=40178 width=394) (actual time=0.021..29.078 rows=40178 loops=1)
Planning Time: 1.777 ms
Execution Time: 7982.191 ms
(30 rows)
As an aside, requesting CSV instead is faster at 7.6 seconds for the HTTP call:
http 'http://129.127.180.34/veg_pi' accept:text/csv > /dev/nullSolution: indexes on the joined table
The views that feed the postgREST API are all views. Those views make a few joins and they don't have indexes. The reason indexes aren't present is that we use the pgsync tool to copy the data from the prod SWARM database, and this tool does not copy/create indexes.
I tested this solution by adding indexes with this code:
CREATE UNIQUE INDEX IF NOT EXISTS site_location_visit_pkey
ON site_location_visit (site_location_visit_id);
CREATE UNIQUE INDEX IF NOT EXISTS site_location_pkey
ON site_location (site_location_id);
CREATE INDEX IF NOT EXISTS site_location_point_pkey
ON site_location_point (site_location_id);
CREATE INDEX IF NOT EXISTS structural_summary_pkey
ON structural_summary (site_location_visit_id);
CREATE INDEX IF NOT EXISTS soil_bulk_density_pkey
ON soil_bulk_density (site_location_visit_id);
CREATE INDEX IF NOT EXISTS point_intercept_pkey
ON point_intercept (site_location_visit_id);
CREATE UNIQUE INDEX IF NOT EXISTS herbarium_determination_pkey
ON herbarium_determination (veg_barcode);
CREATE UNIQUE INDEX IF NOT EXISTS wfo_determination_pkey
ON wfo_determination (veg_barcode);
-- FIXME would also need to create indexes for
-- soil_characterisation
-- soil_subsite_observations
-- veg_vouchers
-- genetic_vouchers
-- basal_areaYou can confirm the indexes were created with the query in this guide.
It made no difference.
Solution: materialized views
We can pre-bake all the results so we don't need to worry about join performance. We do this by changing all the views in script.sql:
-CREATE VIEW api.site_inc_unpub AS
+CREATE MATERIALIZED VIEW api.site_inc_unpub ASThis does make a slight difference but for the overall HTTP call to /veg_pi it only drops the elapsed time from 12 to 11 seconds. The SQL query is now running in ~100ms (instead of the previous 1.4s), which is great, but overall it doesn't have much effect.
Note: If we went for this solution, we'd also have to refresh the materialized views each night when we sync new data.
Solution: update postgres
We are currently running postgres 10. I moved to 13 and it's slight faster, but still less than 1 second improvement.
Solution: caching
Currently we only cache, at the nginx level, for 10 minutes. This is a read-only mirror, so we know the data will only change nightly when we do the sync. This means we can:
- cache for the whole 24 hours
- pre-heat the cache (for popular queries) right after the nightly sync
With a cache hit, the response time for the /veg_pi query is 1.1 seconds. That's the kind of times we want to see.
Now, we just have to implement the code to make this cache situation work.