I have a table,
account_balance, with primary key
(address, timestamp, currency). This table is pretty big, having nearly 1 billion rows and taking ~3T of space. I am doing a simple query on it:
SELECT address, EXISTS(SELECT address FROM account_balance WHERE address = a.address) ex FROM (VALUES ('ab566dceac0d857dbf05682d1ceb548667cf4580')) AS a(address);
This should be a simple index lookup but instead Postges produces a seq scan. Here’s explain result:
Result (cost=0.37..0.38 rows=1 width=33) InitPlan 1 (returns $0) -> Seq Scan on account_balance (cost=0.00..355882209.06 rows=965801108 width=0) Filter: ((address)::text = 'ab566dceac0d857dbf05682d1ceb548667cf4580'::text)
If I set
set enable_seqscan = off then as expected:
Result (cost=0.99..1.00 rows=1 width=33) InitPlan 1 (returns $0) -> Index Only Scan using account_balance_pkey on account_balance (cost=0.83..153234426.09 rows=965801135 width=0) Index Cond: (address = 'ab566dceac0d857dbf05682d1ceb548667cf4580'::text)
So the question is – why it chooses the seq scan? It should never make sense to prefer seq scan to index lookup on such a huge table. What could compel it to choose to go sequentially over a 3T table (which of course takes forever) instead of going for the index (which takes microseconds)? Is some parameter wrong or some system variable I need to check? I tried
SET random_page_cost = 1 and for some cases it helps, but for others it still insists on using seq scan. Only setting
off seems to fix it consistently, but I feel it’s a wrong approach.
I am not sure how to interpret the costs numbers – I feel like they may be the key for it, but I can’t figure out where they come from and why they are what they are.
This is on PostgreSQL 12.
P.S. Tried to do
ANALYZE, didn’t change anything.