sql server - Sql query optimization using IN over INNER JOIN -
given:
table y
id int clustered index
name nvarchar(25)
table anothertable
id int clustered index
name nvarchar(25)
table somefunction
- does math returns valid id
compare:
select y.name y dbo.somefunction(y.id) in (select anothertable.id anothertable)
vs:
select y.name y join anothertable on dbo.somefunction(y.id) on anothertable.id
question:
while timing these 2 queries out found @ large data sets first query using in
faster second query using inner join
. not understand why can explain please.
generally speaking in
different join
in join
can return additional rows row has more 1 match in join
-ed table.
from estimated execution plan though can seen in case 2 queries semantically same
select a.col1 ,dbo.foo(a.col1) ,max(a.col2) dbo.foo(a.col1) in (select col1 b) group a.col1, dbo.foo(a.col1)
versus
select a.col1 ,dbo.foo(a.col1) ,max(a.col2) join b on dbo.foo(a.col1) = b.col1 group a.col1, dbo.foo(a.col1)
even if duplicates introduced join
removed group by
references columns left hand table. additionally these duplicate rows not alter result max(a.col2)
not change. not case aggregates however. if use sum(a.col2)
(or avg
or count
) presence of duplicates change result.
it seems sql server doesn't have logic differentiate between aggregates such max
, such sum
, quite possibly expanding out duplicates aggregating them later , doing lot more work.
the estimated number of rows being aggregated 2893.54
in
vs 28271800
join
these estimates won't reliable join predicate unsargable.
Comments
Post a Comment