I have a table with 150 million rows in a Postgres database hosted in Cloud SQL (in Google Cloud Platform).
Typically I obviously try to avoid scanning the table, but occasionally I will manually want to perform a query that I think will always produce a table scan. For example,
SELECT COUNT(*) FROM table WHERE col1 IS NOT NULL. Most recently I was running that query out of curiosity, to see if I had set any values yet, but it essentially ran forever.
Fortunately the only time I run into this need is speculative manual queries like that, but it got me curious if there’s anything I can do to improve the performance, or if I’m basically stuck with a table that I can’t perform certain queries on due to its size.
Right now I have 4 cores, 500GB storage, 15GB RAM, but it seems unlikely that boosting CPU or RAM would help (as-is, that query only consumes 1-2 cores). I might benefit from greater IO capacity, but at least with Cloud SQL hosting, 500GB storage maxes out the associated IO capacity of a 4 core box. You need to go to something like 14 cores to have a shot at any higher IO specs.
As far as I can tell, partitioning wouldn’t help here either. It seems like that would help if I could somehow cause each partition to get its own physical SSD and parallelize reads across more hardware that way, but my sense is that partitioning is more about finding ways to reduce the total number of operations that are needed, than about taking advantage of separate parallel SSDs.
Just curious if there’s anything I might be missing here.