Friday, March 30, 2012

Poor performance with NEWID()

We're experiencing very poor performance on successive runs of queries such
as the following:
'---
SET NOCOUNT ON
TRUNCATE TABLE tblSurveyTemp
INSERT INTO tblSurveyTemp (FirstName, LastName,
Email,Sex,Age,City,State,Country,ZipCode,Married,C hildrenAtHome,Education,Em
ploymentStatus,Occupation,Industry,Income,Ethnicit y,DateSent)
SELECT TOP 250 FirstName, LastName,
Email,Sex,Age,City,State,Country,ZipCode,Married,C hildrenAtHome,Education,Em
ploymentStatus,Occupation,Industry,Income,Ethnicit y,getdate()
FROM tblMember t2
WHERE Country IN ('U.S.') AND Age in ('13','20','30','40','50') AND Sex in
('F')
AND Not Exists ( Select email FROM tblSurvey t WHERE t.email=t2.email AND
t.survey='grainactivity')
ORDER BY NEWID()
SELECT FirstName,LastName,Email FROM tblSurveyTemp
'---
We're using the NEWID() function to randomize the sample. CPU is always at
100% when we run the query. The first time it runs successfully takes about
20 seconds. Second time, maybe 1 min. Third time it timed out.
Does anyone have any advice?
Thank you!!!
I dont have an answer, but a question. Why would you ever want to order by
newid()?
TIA,
ChrisR
"Dean J Garrett" wrote:

> We're experiencing very poor performance on successive runs of queries such
> as the following:
> '---
> SET NOCOUNT ON
> TRUNCATE TABLE tblSurveyTemp
> INSERT INTO tblSurveyTemp (FirstName, LastName,
> Email,Sex,Age,City,State,Country,ZipCode,Married,C hildrenAtHome,Education,Em
> ploymentStatus,Occupation,Industry,Income,Ethnicit y,DateSent)
> SELECT TOP 250 FirstName, LastName,
> Email,Sex,Age,City,State,Country,ZipCode,Married,C hildrenAtHome,Education,Em
> ploymentStatus,Occupation,Industry,Income,Ethnicit y,getdate()
> FROM tblMember t2
> WHERE Country IN ('U.S.') AND Age in ('13','20','30','40','50') AND Sex in
> ('F')
> AND Not Exists ( Select email FROM tblSurvey t WHERE t.email=t2.email AND
> t.survey='grainactivity')
> ORDER BY NEWID()
> SELECT FirstName,LastName,Email FROM tblSurveyTemp
> '---
>
> We're using the NEWID() function to randomize the sample. CPU is always at
> 100% when we run the query. The first time it runs successfully takes about
> 20 seconds. Second time, maybe 1 min. Third time it timed out.
> Does anyone have any advice?
> Thank you!!!
>
>
|||On Fri, 4 Nov 2005 14:12:01 -0800, ChrisR wrote:

>I dont have an answer, but a question. Why would you ever want to order by
>newid()?
Hi Chris,
The combination of TOP ... and ORDER BY NEWID() is often used to get a
pseudo-random sample. The ORDER BY NEWID() makes sure that the rows are
scrambled in an unpredictabable way; the TOP then takes only the few
rows that happen to be first.
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)
|||This method of choosing a random set of records (250 in your case) out of a
larger set of records is pretty effective when there are not many records in
the filtered select. In your case tblMember filtered by your where clause
must produce quite a few records. SQL Server has to pull those records
togther and sort them by the newid() value. Sorting a lot of records can take
a long time. To see how many records we're talking run:
SELECT COUNT(*)
FROM tblMember t2
WHERE Country IN ('U.S.') AND Age in ('13','20','30','40','50') AND Sex in
('F')
AND Not Exists ( Select email FROM tblSurvey t WHERE t.email=t2.email AND
t.survey='grainactivity')
In the end you may need to choose another method to choose your records
randomly.
Good luck!
-Phil
"Dean J Garrett" wrote:

> We're experiencing very poor performance on successive runs of queries such
> as the following:
> '---
> SET NOCOUNT ON
> TRUNCATE TABLE tblSurveyTemp
> INSERT INTO tblSurveyTemp (FirstName, LastName,
> Email,Sex,Age,City,State,Country,ZipCode,Married,C hildrenAtHome,Education,Em
> ploymentStatus,Occupation,Industry,Income,Ethnicit y,DateSent)
> SELECT TOP 250 FirstName, LastName,
> Email,Sex,Age,City,State,Country,ZipCode,Married,C hildrenAtHome,Education,Em
> ploymentStatus,Occupation,Industry,Income,Ethnicit y,getdate()
> FROM tblMember t2
> WHERE Country IN ('U.S.') AND Age in ('13','20','30','40','50') AND Sex in
> ('F')
> AND Not Exists ( Select email FROM tblSurvey t WHERE t.email=t2.email AND
> t.survey='grainactivity')
> ORDER BY NEWID()
> SELECT FirstName,LastName,Email FROM tblSurveyTemp
> '---
>
> We're using the NEWID() function to randomize the sample. CPU is always at
> 100% when we run the query. The first time it runs successfully takes about
> 20 seconds. Second time, maybe 1 min. Third time it timed out.
> Does anyone have any advice?
> Thank you!!!
>
>
|||Here is the article we used to figure out this technique
http://www.windowsitpro.com/Articles...842/19842.html but we don't
know if it is the best performing!
Thanks
"ChrisR" <ChrisR@.discussions.microsoft.com> wrote in message
news:B2EF1E86-E0AF-4FA3-A787-67C30B2AEA32@.microsoft.com...[vbcol=seagreen]
> I dont have an answer, but a question. Why would you ever want to order by
> newid()?
> --
> TIA,
> ChrisR
>
> "Dean J Garrett" wrote:
such[vbcol=seagreen]
Email,Sex,Age,City,State,Country,ZipCode,Married,C hildrenAtHome,Education,Em[vbcol=seagreen]
Email,Sex,Age,City,State,Country,ZipCode,Married,C hildrenAtHome,Education,Em[vbcol=seagreen]
in[vbcol=seagreen]
AND[vbcol=seagreen]
at[vbcol=seagreen]
about[vbcol=seagreen]
|||Very kinky.
I have no idea why you would get such differential results, if indeed
you are using the same parameters each time.
On a toy table, you can see that the newid() gets called BEFORE the
top function.
select top 10 * from mytable
order by newid()
StmtText
-----
|--Sort(TOP 10, ORDER BY[Expr1002] ASC))
|--Compute Scalar(DEFINE[Expr1002]=newid()))
|--Clustered Index Scan(OBJECT[HaxPlans].[dbo].[MyTable].[PK_MyTable]))
So, I guess you can do a little hack like this:
select * from
(
select top 10 * from mytable
) x
order by newid()
And get fewer calls to newid()
StmtText
-----
|--Sort(ORDER BY[Expr1002] ASC))
|--Compute Scalar(DEFINE[Expr1002]=newid()))
|--Top(10)
|--Clustered Index Scan(OBJECT[HaxPlans].[dbo].[MyTable].[PK_MyTable]))
Oh, wait a minute, you were doing an INSERT, maybe there is something
funky about the table you're inserting into? Maybe you're really
doing large numbers than 250?
But wait another minute, if you're doing an insert, WHY ARE YOU
ORDERING THE RECORDS ANYWAY? Is the destination table "flat", with no
indexes, just really an output buffer? Well, hmm, that should WORK,
and I still don't understand in that case especially why the
performance would vary so much. Just noodling around with it.
J.
On Fri, 4 Nov 2005 13:05:25 -0800, "Dean J Garrett" <info@.amuletc.com>
wrote:
>We're experiencing very poor performance on successive runs of queries such
>as the following:
>'---
>SET NOCOUNT ON
>TRUNCATE TABLE tblSurveyTemp
>INSERT INTO tblSurveyTemp (FirstName, LastName,
>Email,Sex,Age,City,State,Country,ZipCode,Married, ChildrenAtHome,Education,Em
>ploymentStatus,Occupation,Industry,Income,Ethnici ty,DateSent)
>SELECT TOP 250 FirstName, LastName,
>Email,Sex,Age,City,State,Country,ZipCode,Married, ChildrenAtHome,Education,Em
>ploymentStatus,Occupation,Industry,Income,Ethnici ty,getdate()
>FROM tblMember t2
>WHERE Country IN ('U.S.') AND Age in ('13','20','30','40','50') AND Sex in
>('F')
>AND Not Exists ( Select email FROM tblSurvey t WHERE t.email=t2.email AND
>t.survey='grainactivity')
>ORDER BY NEWID()
>SELECT FirstName,LastName,Email FROM tblSurveyTemp
>'---
>
>We're using the NEWID() function to randomize the sample. CPU is always at
>100% when we run the query. The first time it runs successfully takes about
>20 seconds. Second time, maybe 1 min. Third time it timed out.
>Does anyone have any advice?
>Thank you!!!
>
|||You just need to generate random number to select 250 random records. So a
variation of following query can be used for this purpose.
select au_id,au_lname, au_fname,
convert(smallint,rand() * ascii(left(au_lname,1)) *
ascii(right(au_lname,1))) % 77 value1 from authors
order by value1
The newid() creates a unique value of type uniqueidentifier which is a
16-byte binary values. Thus the filtered records are being sorted on a very
wide column. The query suggested by me will be sorted on a smallint data type
column which takes 2 bytes only.

No comments:

Post a Comment