postgresql – Postgres: Faster query when adding useless JOIN

I was looking at a query to make it more performant and I encountered an interesting case. This query executes much faster when I add a (redundant) JOIN that doesn’t change the actual result set.

Original Query:

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON)
select
    *
from
    product_reservations l1_
inner join product_occupancy_items l0_ on l1_.id = l0_.id
inner join products l2_ on l0_.product_id = l2_.id
where
    l2_.customer_id = 'a19917c2-5ee8-47c2-a757-7799c0e54b0d'
    and l0_.date_range && '(2019-08-09,2019-08-11)' = true

The query executes at ~88ms, this is the explain
https://explain.dalibo.com/plan/yJG

Adding a redundant JOIN:

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON)
select
    *
from
    product_reservations l1_
inner join product_occupancy_items l0_ on l1_.id = l0_.id
inner join products l2_ on l0_.product_id = l2_.id
inner join products l3_ on l0_.product_id = l3_.id
where
    l2_.customer_id = 'a19917c2-5ee8-47c2-a757-7799c0e54b0d'
    and l0_.date_range && '(2019-08-09,2019-08-11)' = true

This produces the following explain https://explain.dalibo.com/plan/Op3

This query runs at 20ms. About 4 times faster. As you maybe noticed, the only difference is just an useless JOIN on a table already JOINed (products).

When looking at the EXPLAIN for differences, we can find something in the product_occupancy_items section:

  • In the 1st query, the planner uses only the index of the external key with products and takes all the matching products applying the date range filter without index. This is much faster even though there are ~12k matching products to deal with (and then after applying the filter just 65)
  • In the 2nd query, the planner uses the index idx_occupancy_items_date_range to filter rows with the overlap (this is a gist index on date range) that’s actually good on paper, but it takes a whole 80ms. This is understandable since it’s used as bitmap index scan and there are many rows with that date range (~40k)

What’s going on is kind of clear from a “what” point of view. I’m not sure to understand the “why”. It looks to me that purely for an implementation point of view, it changed the part of the plan to access product_occupancy_items and got lucky.

I’m wondering if we can do anything to allow the planner to make the same decision without requiring that JOIN.

Note how while the difference may sound in the order of few ms, this query is executed many times for a bulk process. The whole bulk execution runs at about 3s with the double join and at about 30s without it. So the difference is sensible.