Could you guys give me some feedback on the below query and suggest the
best possible query assuming this is the only query that will be ran
against these tables?
The destination has 200 million records and the staging table has 30
million records.
col1_int has 1500 unique values
col2_int has 50000 unique values
col3_char_55 has 50000 unique values
INSERT INTO myTable
SELECT a.*
--SELECT count(*)
FROM myTable_stage a with(nolock)
LEFT OUTER JOIN myTable b with(nolock) ON b.col1_int =
a.col1_int
AND b.col2_int = a.
col2_int
AND b.col3_char_55 =
a.col3_char_55
WHERE b.col1_int IS NULLCorrection... I meant the best possible index or indexes not "best
possible query".|||Are you actually plannign on returning every single row every time you run
this view?
I can't imagine this is the case, based on your 200 million * 30 million
rows.
The fastest way to get back every row would probably be to put an index on
the 200 million row table on collumns (col3_char_55, col2_int, col1_int),
although havign the same collumns indexed on the table with 30 million rows
may work as well.
Returning this many rows will take ages, just to move the data over the
network, no matter how how fast the SQL Server engine performs the query.
What are you actually trying to do?
"Dave" <daveg.01@.gmail.com> wrote in message
news:1140103779.299793.117660@.g43g2000cwa.googlegroups.com...
> Could you guys give me some feedback on the below query and suggest the
> best possible query assuming this is the only query that will be ran
> against these tables?
> The destination has 200 million records and the staging table has 30
> million records.
>
> col1_int has 1500 unique values
> col2_int has 50000 unique values
> col3_char_55 has 50000 unique values
>
> INSERT INTO myTable
> SELECT a.*
> --SELECT count(*)
> FROM myTable_stage a with(nolock)
> LEFT OUTER JOIN myTable b with(nolock) ON b.col1_int =
> a.col1_int
> AND b.col2_int = a.
> col2_int
> AND b.col3_char_55 =
> a.col3_char_55
> WHERE b.col1_int IS NULL
>|||The above query will not return every row. Only the rows that are not
already in myTable.
The query inserts the new records from the staging table into the base
table.
Your index suggestion is what I was looking for. I was thinking
something like a clustered index on (col2_int, col3_char_55, col1_int)
would be best.|||Yes, I should have noticed the outer join and the b.col1_int IS NULL, and
realized what you were doing. I was sloppy and just looked at the join,
ignoring the rest of it.
Since you are doing the outer join, I believe the index would need to on the
destination table, and I think the columns (col3_char_55, col2_int,
col1_int) would work best.
A clustered index on your destination table may actually add overhead that
you don't want. You'll have to get someone elses opinion on whether a
clustered index is best for a table of this size, although more information
is probably needed to determine that. A regular index may work as well or
better.
You might also try using a not exists instead of the outer join just to see
the difference in performance.
"Dave" <daveg.01@.gmail.com> wrote in message
news:1140108936.361531.64340@.f14g2000cwb.googlegroups.com...
> The above query will not return every row. Only the rows that are not
> already in myTable.
> The query inserts the new records from the staging table into the base
> table.
> Your index suggestion is what I was looking for. I was thinking
> something like a clustered index on (col2_int, col3_char_55, col1_int)
> would be best.
>
No comments:
Post a Comment