Friday, March 30, 2012

Poor plan choice

Hello,
A couple days ago one of our queries suddenly started to perform
abyssmally. The query is pretty straightforward - it joins several
tables all on foreign keys and includes a GROUP BY with COUNT(*). I
looked over the query plan and it looked a little odd so I tried
cutting the query down to see where the issue might be. I eventually
came up with the following:
SELECT COUNT(*)
FROM Table1 T1
INNER JOIN Table2 T2 ON T2.table2_id = T1.table2_id
WHERE T1.my_date = '2004-11-18'
The table and column names have been changed to protect the innocent,
but that is the exact format of the tables. Table1 has about 35M
records. Table2 has about 6.5M records. For the date in question,
Table1 has about 165K records.
There is a non-clustered index on T1.my_date and there is a clustered
index on T2.table2_id.
The query plan for this simple query does an index seek on T1.my_date
as I expected then it does a bookmark lookup (presumably because it
needs T1.table2_id). It then includes parallelism, a hash, and then a
bitmap creation. Meanwhile, it does an index scan using an index on
Table2 that includes a single column that isn't even mentioned in the
query(?!?!). It then uses parallelism and does a hash match/inner
join.
I've done UPDATE STATISTICS using WITH FULLSCAN for both tables and
I've done a DBCC CHECKTABLE on both tables. Neither had any effect. I
also tried to force the query to use the clustered index for Table2.
For the simple query above it doesn't seem to help performance as the
clustered index scan has a very large cost to it (I'm not sure that I
entirely understand why). In the original query it helps substantially
though. Instead of joining the 6.5M records to a lookup table first it
joins it to Table1 first, which cuts down the number of records to the
165K before going about with other joins.
What I'm looking for is any advice on other things that I can look at
or any ideas on why SQL Server might be making these kinds of choices.
I would have thought that the simple query above would have performed
much better than it is currently (~30-35 seconds). I realize that
there has to be a bookmark lookup, but I was still expecting a quick
response from the server based on the indexes.
Because of the table sizes, etc. I don't expect anyone to reproduce my
results, so please don't ask me to provide DDL for all of the tables
involved. If you have some ideas or even just guesses great, if not
then that's ok too.
Thanks,
-Tom.Thomas, see inline
"Thomas R. Hummel" wrote:
> Hello,
> A couple days ago one of our queries suddenly started to perform
> abyssmally. The query is pretty straightforward - it joins several
> tables all on foreign keys and includes a GROUP BY with COUNT(*). I
> looked over the query plan and it looked a little odd so I tried
> cutting the query down to see where the issue might be. I eventually
> came up with the following:
> SELECT COUNT(*)
> FROM Table1 T1
> INNER JOIN Table2 T2 ON T2.table2_id = T1.table2_id
> WHERE T1.my_date = '2004-11-18'
> The table and column names have been changed to protect the innocent,
> but that is the exact format of the tables. Table1 has about 35M
> records. Table2 has about 6.5M records. For the date in question,
> Table1 has about 165K records.
> There is a non-clustered index on T1.my_date and there is a clustered
> index on T2.table2_id.
Consider adding a nonclustered index on T1(my_date,table2_id). This will
prevent the (quite expensive) bookmark lookups.
> The query plan for this simple query does an index seek on T1.my_date
> as I expected then it does a bookmark lookup (presumably because it
> needs T1.table2_id). It then includes parallelism, a hash, and then a
> bitmap creation. Meanwhile, it does an index scan using an index on
> Table2 that includes a single column that isn't even mentioned in the
> query(?!?!). It then uses parallelism and does a hash match/inner
> join.
Apparently SQL-Server estimates that the parallel plan will be faster.
If you expect differently, then you could add the hint OPTION (MAXDOP 1)
to force the serial plan.
Since the index on T2(table2_id) is clustered it is very wide at the
page level. In this case, SQL-Server estimates that it is faster to scan
a nonclustered index of table T2 (which also includes the clustered
index key) than it is to seek (or partially scan) the clustered index
for the estimated rows of the query.
Hope this helps,
Gert-Jan
> I've done UPDATE STATISTICS using WITH FULLSCAN for both tables and
> I've done a DBCC CHECKTABLE on both tables. Neither had any effect. I
> also tried to force the query to use the clustered index for Table2.
> For the simple query above it doesn't seem to help performance as the
> clustered index scan has a very large cost to it (I'm not sure that I
> entirely understand why). In the original query it helps substantially
> though. Instead of joining the 6.5M records to a lookup table first it
> joins it to Table1 first, which cuts down the number of records to the
> 165K before going about with other joins.
> What I'm looking for is any advice on other things that I can look at
> or any ideas on why SQL Server might be making these kinds of choices.
> I would have thought that the simple query above would have performed
> much better than it is currently (~30-35 seconds). I realize that
> there has to be a bookmark lookup, but I was still expecting a quick
> response from the server based on the indexes.
> Because of the table sizes, etc. I don't expect anyone to reproduce my
> results, so please don't ask me to provide DDL for all of the tables
> involved. If you have some ideas or even just guesses great, if not
> then that's ok too.
> Thanks,
> -Tom.|||I fully agree. OPTION (MAXDOP 1) should resolve the problem. In my
experience, UPDATE STATISTICS would temporaily fix it. And index hint,
query hint would also force a right plan (not a best practice though). If
it's from a stored procedure, WITH RECOMPILE would also fix it (not a best
practice).
Gary
SELECT COUNT(*)
> FROM Table1 T1
> INNER JOIN Table2 T2 ON T2.table2_id = T1.table2_id
> WHERE T1.my_date = '2004-11-18'
"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
news:419E4013.6DBC44D@.toomuchspamalready.nl...
> Thomas, see inline
> "Thomas R. Hummel" wrote:
> >
> > Hello,
> >
> > A couple days ago one of our queries suddenly started to perform
> > abyssmally. The query is pretty straightforward - it joins several
> > tables all on foreign keys and includes a GROUP BY with COUNT(*). I
> > looked over the query plan and it looked a little odd so I tried
> > cutting the query down to see where the issue might be. I eventually
> > came up with the following:
> >
> > SELECT COUNT(*)
> > FROM Table1 T1
> > INNER JOIN Table2 T2 ON T2.table2_id = T1.table2_id
> > WHERE T1.my_date = '2004-11-18'
> >
> > The table and column names have been changed to protect the innocent,
> > but that is the exact format of the tables. Table1 has about 35M
> > records. Table2 has about 6.5M records. For the date in question,
> > Table1 has about 165K records.
> >
> > There is a non-clustered index on T1.my_date and there is a clustered
> > index on T2.table2_id.
> Consider adding a nonclustered index on T1(my_date,table2_id). This will
> prevent the (quite expensive) bookmark lookups.
> > The query plan for this simple query does an index seek on T1.my_date
> > as I expected then it does a bookmark lookup (presumably because it
> > needs T1.table2_id). It then includes parallelism, a hash, and then a
> > bitmap creation. Meanwhile, it does an index scan using an index on
> > Table2 that includes a single column that isn't even mentioned in the
> > query(?!?!). It then uses parallelism and does a hash match/inner
> > join.
> Apparently SQL-Server estimates that the parallel plan will be faster.
> If you expect differently, then you could add the hint OPTION (MAXDOP 1)
> to force the serial plan.
> Since the index on T2(table2_id) is clustered it is very wide at the
> page level. In this case, SQL-Server estimates that it is faster to scan
> a nonclustered index of table T2 (which also includes the clustered
> index key) than it is to seek (or partially scan) the clustered index
> for the estimated rows of the query.
> Hope this helps,
> Gert-Jan
> > I've done UPDATE STATISTICS using WITH FULLSCAN for both tables and
> > I've done a DBCC CHECKTABLE on both tables. Neither had any effect. I
> > also tried to force the query to use the clustered index for Table2.
> > For the simple query above it doesn't seem to help performance as the
> > clustered index scan has a very large cost to it (I'm not sure that I
> > entirely understand why). In the original query it helps substantially
> > though. Instead of joining the 6.5M records to a lookup table first it
> > joins it to Table1 first, which cuts down the number of records to the
> > 165K before going about with other joins.
> >
> > What I'm looking for is any advice on other things that I can look at
> > or any ideas on why SQL Server might be making these kinds of choices.
> > I would have thought that the simple query above would have performed
> > much better than it is currently (~30-35 seconds). I realize that
> > there has to be a bookmark lookup, but I was still expecting a quick
> > response from the server based on the indexes.
> >
> > Because of the table sizes, etc. I don't expect anyone to reproduce my
> > results, so please don't ask me to provide DDL for all of the tables
> > involved. If you have some ideas or even just guesses great, if not
> > then that's ok too.
> >
> > Thanks,
> > -Tom.|||Thanks for the suggestions. I had thought of the nonclustered index,
but while that would help with the pared down query that I came up
with, it wouldn't help with the underlying query because that one
involves a lot of additional columns. I'm still considering a covering
index, but I'm not sure why it should be necessary. Due to the number
of columns in the query as well as the number of rows in the table,
I'm a little hesitant to create a covering index.
I may try changing the clustered index for Table1. Right now it is on
an identity column (not my design...) that is also being used as a
surrogate primary key. It makes more sense to me to have that on the
date column since most reports run off of that date column and use
date ranges. This should prevent bookmark lookups for these large
groups of rows as well. When looking up by the primary key it is
usually to grab one row anyway.
Thanks!
-Tom.
Gert-Jan Strik <sorry@.toomuchspamalready.nl> wrote in message news:<419E4013.6DBC44D@.toomuchspamalready.nl>...
> Thomas, see inline
> "Thomas R. Hummel" wrote:
> >
> > Hello,
> >
> > A couple days ago one of our queries suddenly started to perform
> > abyssmally. The query is pretty straightforward - it joins several
> > tables all on foreign keys and includes a GROUP BY with COUNT(*). I
> > looked over the query plan and it looked a little odd so I tried
> > cutting the query down to see where the issue might be. I eventually
> > came up with the following:
> >
> > SELECT COUNT(*)
> > FROM Table1 T1
> > INNER JOIN Table2 T2 ON T2.table2_id = T1.table2_id
> > WHERE T1.my_date = '2004-11-18'
> >
> > The table and column names have been changed to protect the innocent,
> > but that is the exact format of the tables. Table1 has about 35M
> > records. Table2 has about 6.5M records. For the date in question,
> > Table1 has about 165K records.
> >
> > There is a non-clustered index on T1.my_date and there is a clustered
> > index on T2.table2_id.
> Consider adding a nonclustered index on T1(my_date,table2_id). This will
> prevent the (quite expensive) bookmark lookups.
> > The query plan for this simple query does an index seek on T1.my_date
> > as I expected then it does a bookmark lookup (presumably because it
> > needs T1.table2_id). It then includes parallelism, a hash, and then a
> > bitmap creation. Meanwhile, it does an index scan using an index on
> > Table2 that includes a single column that isn't even mentioned in the
> > query(?!?!). It then uses parallelism and does a hash match/inner
> > join.
> Apparently SQL-Server estimates that the parallel plan will be faster.
> If you expect differently, then you could add the hint OPTION (MAXDOP 1)
> to force the serial plan.
> Since the index on T2(table2_id) is clustered it is very wide at the
> page level. In this case, SQL-Server estimates that it is faster to scan
> a nonclustered index of table T2 (which also includes the clustered
> index key) than it is to seek (or partially scan) the clustered index
> for the estimated rows of the query.
> Hope this helps,
> Gert-Jan
> > I've done UPDATE STATISTICS using WITH FULLSCAN for both tables and
> > I've done a DBCC CHECKTABLE on both tables. Neither had any effect. I
> > also tried to force the query to use the clustered index for Table2.
> > For the simple query above it doesn't seem to help performance as the
> > clustered index scan has a very large cost to it (I'm not sure that I
> > entirely understand why). In the original query it helps substantially
> > though. Instead of joining the 6.5M records to a lookup table first it
> > joins it to Table1 first, which cuts down the number of records to the
> > 165K before going about with other joins.
> >
> > What I'm looking for is any advice on other things that I can look at
> > or any ideas on why SQL Server might be making these kinds of choices.
> > I would have thought that the simple query above would have performed
> > much better than it is currently (~30-35 seconds). I realize that
> > there has to be a bookmark lookup, but I was still expecting a quick
> > response from the server based on the indexes.
> >
> > Because of the table sizes, etc. I don't expect anyone to reproduce my
> > results, so please don't ask me to provide DDL for all of the tables
> > involved. If you have some ideas or even just guesses great, if not
> > then that's ok too.
> >
> > Thanks,
> > -Tom.

No comments:

Post a Comment