Improving performance when joining to a large table in PostgreSQL 11.11

I’m working with a scientific database (total size 1877 GB) that stores data from spatial simulations and have noticed that least one of the queries is extremely slow. Using EXPLAIN I’ve managed to trace the problem to a table that is 233 GB (147 GB index) and a fair number of rows (about 82 billion). This isn’t the largest table (853 GB, with a 642 GB index) but gets used a fair amount so some performance tuning on it is usually warranted.

The database is PostgreSQL 11.11 running on a Ubuntu 11.11-1 virtual machine with 32 GB of RAM, 8 cores, and 2.9TB of virtual disk space. shared_buffers and effective_cache_size are set to 4GB; however, increasing effective_cache_size for the second to 28 GB didn’t seem to have much impact?

Query in question selects for average EIR, PfPR, and some additional configuration values:

SELECT replicateid,
    cast((regexp_matches(filename, '^(d*)-(d*)-((.d)*)-((.d)*)'))(1) as integer) AS zone,
    cast((regexp_matches(filename, '^(d*)-(d*)-((.d)*)-((.d)*)'))(2) as integer) AS population,
    cast((regexp_matches(filename, '^(d*)-(d*)-((.d)*)-((.d)*)'))(3) as float) AS access,
    cast((regexp_matches(filename, '^(d*)-(d*)-((.d)*)-((.d)*)'))(4) as float) AS beta,
    avg(eir) AS eir, 
    avg(pfpr2to10) AS pfpr2to10
FROM sim.configuration c
    INNER JOIN sim.replicate r on r.configurationid =
    INNER JOIN sim.monthlydata md on md.replicateid =
    INNER JOIN sim.monthlysitedata msd on msd.monthlydataid =
WHERE studyid = %(studyId)s
    AND md.dayselapsed BETWEEN 4015 AND 4380
GROUP BY replicateid, filename
ORDER BY zone, population, access, pfpr2to10

Explain shows a parallel sequence scan on the monthlysitedata table, queries that don’t access that table are much more performant.

Sort  (cost=44195879.19..44202884.88 rows=2802276 width=69)
  Sort Key: (((regexp_matches((c.filename)::text, '^(d*)-(d*)-((.d)*)-((.d)*)'::text))(1))::integer), (((regexp_matches((c.filename)::text, '^(d*)-(d*)-((.d)*)-((.d)*)'::text))(2))::integer), (((regexp_matches((c.filename)::text, '^(d*)-(d*)-((.d)*)-((.d)*)'::text))(3))::double precision), (avg(msd.pfpr2to10))
  ->  Result  (cost=43272520.35..43780841.10 rows=2802276 width=69)
        ->  ProjectSet  (cost=43272520.35..43668750.06 rows=2802276 width=77)
              ->  Finalize GroupAggregate  (cost=43272520.35..43633721.61 rows=2802276 width=45)
                    Group Key: md.replicateid, c.filename
                    ->  Gather Merge  (cost=43272520.35..43568335.17 rows=2335230 width=93)
                          Workers Planned: 2
                          ->  Partial GroupAggregate  (cost=43271520.33..43297791.66 rows=1167615 width=93)
                                Group Key: md.replicateid, c.filename
                                ->  Sort  (cost=43271520.33..43274439.36 rows=1167615 width=45)
                                      Sort Key: md.replicateid, c.filename
                                      ->  Hash Join  (cost=113080.71..43117932.65 rows=1167615 width=45)
                                            Hash Cond: (md.replicateid =
                                            ->  Parallel Hash Join  (cost=112257.19..42871234.25 rows=62453068 width=20)
                                                  Hash Cond: (msd.monthlydataid =
                                                  ->  Parallel Seq Scan on monthlysitedata msd  (cost=0.00..40218889.52 rows=967651852 width=20)
                                                  ->  Parallel Hash  (cost=109727.94..109727.94 rows=202340 width=8)
                                                        ->  Parallel Seq Scan on monthlydata md  (cost=0.00..109727.94 rows=202340 width=8)
                                                              Filter: ((dayselapsed >= 4015) AND (dayselapsed   Hash  (cost=814.96..814.96 rows=685 width=29)
                                                  ->  Hash Join  (cost=46.37..814.96 rows=685 width=29)
                                                        Hash Cond: (r.configurationid =
                                                        ->  Seq Scan on replicate r  (cost=0.00..672.40 rows=36640 width=8)
                                                        ->  Hash  (cost=37.95..37.95 rows=674 width=29)
                                                              ->  Index Scan using fki_fk_configuration_studyid on configuration c  (cost=0.29..37.95 rows=674 width=29)
                                                                    Index Cond: (studyid = 12)

Is there anything that can be done to improve performance, or is this just an issue of the query on big table is slow?