postgREST response time getting slower with more data

For small response sizes, the total HTTP call time (from ausplotsR to this service) is perfectly acceptable. However, some of the "data frames" (to use R terminology) that are returned are quite large. The response from `/veg_pi` is currently 900k rows and this seems to the one that takes the longest.

Currently we have the nginx reverse proxy cache enabled, but only for 10 minutes.

## Where is the bottleneck
The total time for the HTTP call to `/veg_pi` is ~12 seconds. It seems to be the `json_agg()` call that postgREST makes that makes up most of that time. The query (`select * from api.veg_pi`) is quite fast to run, about 1.4 seconds on my machine. So the rest is essentially the overhead that postgREST adds to convert to a JSON response. The full request (excluding setting up all the local variables) that postgREST makes is:

```sql
WITH
pgrst_source AS (SELECT "api"."veg_pi".* FROM "api"."veg_pi"   )
SELECT
 null::bigint AS total_result_set,
 pg_catalog.count(_postgrest_t) AS page_total,
 array[]::text[] AS header,
 coalesce(json_agg(_postgrest_t), '[]')::character varying AS body,
 coalesce(nullif(current_setting('response.headers', true), ''), '[]') AS response_headers
FROM ( SELECT * FROM pgrst_source) _postgrest_t;
```

We can focus in on that `json_agg()` line, and as for a performance breakdown/explain, with (on my machine, with pg 13):
```
swarm=# explain analyze SELECT
swarm-# coalesce(json_agg(_postgrest_t), '[]')::character varying AS body
swarm-# FROM ( SELECT * FROM api.veg_pi) _postgrest_t;
                                                                          QUERY PLAN                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=49773.88..49773.89 rows=1 width=32) (actual time=7916.745..7916.751 rows=1 loops=1)
   ->  Hash Left Join  (cost=7525.81..48548.88 rows=489997 width=464) (actual time=59.974..1541.044 rows=902338 loops=1)
         Hash Cond: (hd.veg_barcode = wfod.veg_barcode)
         ->  Hash Left Join  (cost=2105.13..27319.84 rows=489997 width=80) (actual time=14.656..956.797 rows=902338 loops=1)
               Hash Cond: ((pi.veg_barcode)::text = hd.veg_barcode)
               ->  Hash Join  (cost=254.51..24182.91 rows=489997 width=59) (actual time=1.002..695.140 rows=902338 loops=1)
                     Hash Cond: (slv.site_location_id = sl.site_location_id)
                     ->  Hash Join  (cost=168.92..22804.10 rows=489997 width=52) (actual time=0.737..456.859 rows=902338 loops=1)
                           Hash Cond: (pi.site_location_visit_id = slv.site_location_visit_id)
                           ->  Seq Scan on point_intercept pi  (cost=0.00..20050.93 rows=979993 width=48) (actual time=0.004..95.871 rows=979993 loops=1)
                           ->  Hash  (cost=163.13..163.13 rows=463 width=8) (actual time=0.729..0.730 rows=807 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 40kB
                                 ->  Seq Scan on site_location_visit slv  (cost=80.56..163.13 rows=463 width=8) (actual time=0.257..0.631 rows=807 loops=1)
                                       Filter: (NOT (hashed SubPlan 1))
                                       Rows Removed by Filter: 119
                                       SubPlan 1
                                         ->  Seq Scan on site_location_visit  (cost=0.00..80.26 rows=119 width=4) (actual time=0.005..0.230 rows=119 loops=1)
                                               Filter: (NOT ok_to_publish)
                                               Rows Removed by Filter: 807
                     ->  Hash  (cost=75.82..75.82 rows=782 width=15) (actual time=0.261..0.261 rows=782 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 45kB
                           ->  Seq Scan on site_location sl  (cost=0.00..75.82 rows=782 width=15) (actual time=0.006..0.159 rows=782 loops=1)
               ->  Hash  (cost=1303.05..1303.05 rows=43805 width=31) (actual time=13.566..13.566 rows=43805 loops=1)
                     Buckets: 65536  Batches: 1  Memory Usage: 3260kB
                     ->  Seq Scan on herbarium_determination hd  (cost=0.00..1303.05 rows=43805 width=31) (actual time=0.004..5.410 rows=43805 loops=1)
         ->  Hash  (cost=2838.45..2838.45 rows=40178 width=394) (actual time=45.184..45.185 rows=40178 loops=1)
               Buckets: 16384  Batches: 8  Memory Usage: 1042kB
               ->  Seq Scan on wfo_determination wfod  (cost=0.00..2838.45 rows=40178 width=394) (actual time=0.021..29.078 rows=40178 loops=1)
 Planning Time: 1.777 ms
 Execution Time: 7982.191 ms
(30 rows)
```

As an aside, requesting CSV instead is faster at 7.6 seconds for the HTTP call:
```bash
http 'http://129.127.180.34/veg_pi' accept:text/csv > /dev/null
```

## Solution: indexes on the joined table
The views that feed the postgREST API are all views. Those views make a few joins and they don't have indexes. The reason indexes aren't present is that we use the `pgsync` tool to copy the data from the prod SWARM database, and this tool does *not* copy/create indexes.

I tested this solution by adding indexes with this code:
```sql
CREATE UNIQUE INDEX IF NOT EXISTS site_location_visit_pkey
  ON site_location_visit (site_location_visit_id);

CREATE UNIQUE INDEX IF NOT EXISTS site_location_pkey
  ON site_location (site_location_id);

CREATE INDEX IF NOT EXISTS site_location_point_pkey
  ON site_location_point (site_location_id);

CREATE INDEX IF NOT EXISTS structural_summary_pkey
  ON structural_summary (site_location_visit_id);

CREATE INDEX IF NOT EXISTS soil_bulk_density_pkey
  ON soil_bulk_density (site_location_visit_id);

CREATE INDEX IF NOT EXISTS point_intercept_pkey
  ON point_intercept (site_location_visit_id);

CREATE UNIQUE INDEX IF NOT EXISTS herbarium_determination_pkey
  ON herbarium_determination (veg_barcode);

CREATE UNIQUE INDEX IF NOT EXISTS wfo_determination_pkey
  ON wfo_determination (veg_barcode);

-- FIXME would also need to create indexes for
-- soil_characterisation
-- soil_subsite_observations
-- veg_vouchers
-- genetic_vouchers
-- basal_area
```

You can confirm the indexes were created with the query in [this guide](https://www.postgresqltutorial.com/postgresql-indexes/postgresql-list-indexes/).

**It made no difference.**

## Solution: materialized views
We can pre-bake all the results so we don't need to worry about join performance. We do this by changing all the views in `script.sql`:
```diff
-CREATE VIEW api.site_inc_unpub AS
+CREATE MATERIALIZED VIEW api.site_inc_unpub AS
```
This *does* make a slight difference but for the overall HTTP call to `/veg_pi` it only drops the elapsed time from 12 to 11 seconds. The SQL query is now running in ~100ms (instead of the previous 1.4s), which is great, but **overall it doesn't have much effect.**

Note: If we went for this solution, we'd also have to refresh the materialized views each night when we sync new data.

## Solution: update postgres
We are currently running postgres 10. I moved to 13 and it's slight faster, but **still less than 1 second improvement.**

## Solution: caching
Currently we only cache, at the nginx level, for 10 minutes. This is a read-only mirror, so we know the data will only change nightly when we do the sync. This means we can:

  1. cache for the whole 24 hours
  2. pre-heat the cache (for popular queries) right after the nightly sync

With a cache hit, **the response time for the `/veg_pi` query is 1.1 seconds.** That's the kind of times we want to see.

Now, we just have to implement the code to make this cache situation work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

postgREST response time getting slower with more data #15

Where is the bottleneck

Solution: indexes on the joined table

Solution: materialized views

Solution: update postgres

Solution: caching

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

postgREST response time getting slower with more data #15

Description

Where is the bottleneck

Solution: indexes on the joined table

Solution: materialized views

Solution: update postgres

Solution: caching

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions