When postgreSQL executes the following query it will take about 30ms
select crm_contact_id from crm_contact where crm_contact_id in
(select co.crm_contact_id from crm_contact co join crm_department d on d.crm_department_id=co.crm_department_id where d.crm_company_id=1607)
and owner_id='cf0df836-ddf4-4ecd-8b8e-b602d4f8303c'
order by contact_start_dt
But when adding limit 200 to this query the performance drops, the query takes about 30000ms!!!
select crm_contact_id from crm_contact where crm_contact_id in
(select co.crm_contact_id from crm_contact co join crm_department d on d.crm_department_id=co.crm_department_id where d.crm_company_id=1607)
and owner_id='cf0df836-ddf4-4ecd-8b8e-b602d4f8303c'
order by contact_start_dt
limit 200
Our problem is that servoy adds ‘limit ?’ to certain postgresql queries (it didn’t happen with sybase) and this is the reason why our performance drops dramatically. We also cannot get rid of the in-clause because it is a result of using filter-params.
Can we in one way or another do something that prevents servoy adding the ‘limit’? Or perhaps can we force postgreSQL using another query-plan?
michel:
But when adding limit 200 to this query the performance drops, the query takes about 30000ms!!!
How did you measure that ? Have you tried to run it in PgAdmin/psql and time it ?
michel:
Our problem is that servoy adds ‘limit ?’ to certain postgresql queries (it didn’t happen with sybase) and this is the reason why our performance drops dramatically.
In Sybase/MSSQL you wouldn’t see a LIMIT 200 at the end of your query but instead you see SELECT TOP 200 at the beginning of the query. Both do the same thing.
A 30 second difference between the 2 queries doesn’t make much sense. Both queries should have the same query plan, only the fetched result is smaller when using a limit. So overall (with fetching) the second query should be faster so I wonder what really is going on.
I suggest you run both queries in a query tool and analyze the query plan (EXPLAIN ANALYZE ) to rule out it’s the database server.
This was already timed in PgAdmin. Both queries have a different query plan, that’s the problem. We found out that if we change the query to:
select crm_contact_id from crm_contact where crm_contact_id in
(select distinct co.crm_contact_id from crm_contact co join crm_department d on d.crm_department_id=co.crm_department_id where d.crm_company_id=1607)
and owner_id='cf0df836-ddf4-4ecd-8b8e-b602d4f8303c'
order by contact_start_dt
limit 200
the query plan is changed and it will run in 15ms!!!
By the way: I don’t see the select top 200 in sybase, there is no limitation in the query.
Both queries should have the same query plan, only the fetched result is smaller when using a limit.
FYI : introducing the limit clause in the SQL query changes the query plan completely ! In PgAdmin you can see it nicely in a graphical way.
The Postgres manual says :
The query planner takes LIMIT into account when generating a query plan, so you are very likely to get different plans (yielding different row orders) depending on what you use for LIMIT and OFFSET.
PS. We are migrating a large database from Sybase to Postgresql and noticed a huge performance loss. As we know now due to the LIMIT query plan selection Postgres does…
Both queries should have the same query plan, only the fetched result is smaller when using a limit.
FYI : introducing the limit clause in the SQL query changes the query plan completely ! In PgAdmin you can see it nicely in a graphical way.
The Postgres manual says :
The query planner takes LIMIT into account when generating a query plan, so you are very likely to get different plans (yielding different row orders) depending on what you use for LIMIT and OFFSET.
Correct. But since you are also using an ORDER BY it should handle the whole dataset anyway. Something that is touched on in the next line in the manual after the one you quoted I believe.
In any case I will look into this a little deeper.
I was talking about Postgresql query plans and performance and not about the returned data. The returned data is in both cases the same and correct (thanks to the order by, I know).
Have a look at which indexes are defined and used in both query plans.
I expect you can enforce a better query plan by dropping an index or creating a new one.
Can you provide a textual output of an EXPLAIN ANALYZE of both queries (preferably between [ code][ /code] tags)? It contains more info than the graphical output does in the screenshot.
Also what version of PostgreSQL is this (SELECT version()) ?
And how large is your work_mem (show work_mem)?
I checked with the IRC channel (#postgresql) on FreeNode and the word is the following:
The planner is faced with a decision between the fast startup plan using indexes for ordering, and the fast-completion plan using sorting. It has to estimate which is faster based on statistics, and it’s often wrong. Fixing the query might be easier than addressing the performance issue other ways.
Is this query generated by Servoy or is it a custom one? Because you can simply use the subselect as your main query like so:
SELECT DISTINCT co.crm_contact_id
FROM crm_contact co JOIN crm_department d ON d.crm_department_id=co.crm_department_id
WHERE d.crm_company_id=1607 AND co.owner_id='cf0df836-ddf4-4ecd-8b8e-b602d4f8303c'
ORDER BY co.contact_start_dt
The sub query is ours (parent - child query we defined ourselves in our servoy framework program data). The child table (crm_contact) holds ca. 500,000 rows.
We only can make that filter query persistent within Servoy through the addFoundSetFilterParam command with pkField IN subquery.
Servoy fires that on loadRecords() to postgresql with the (very important) addition limit 200
We discovered ourselves that with adding the distinct clause in the sub query the postgresql query planner takes another route (for the better performance)…
The Sybase query planner took the right route without the distinct clause.