Difficult areas to estimate

If no data has ever been collected about a table using ANALYZE, the planner has to make up completely arbitrary estimates. This is the worst sort of statistics issue to have.

Another area where statistics can fail is situations where the query optimizer just doesn't know how to estimate something. You can watch it utterly fail to estimate this simple query correctly:

EXPLAIN ANALYZE SELECT * FROM customers WHERE customerid=customerid;
QUERY PLAN 
----------
     Seq Scan on customers  (cost=0.00..726.00 rows=100 width=268) (actual time=0.012..30.012 rows=20000 loops=1)
       Filter: (customerid = customerid)
Total runtime: 55.549 ms 

It's obvious to a person that all 20,000 rows will be returned. What's happening is that, ...

Get PostgreSQL 10 High Performance now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.