sql server - Sql query optimization using IN over INNER JOIN -
given:
table y
id int clustered indexname nvarchar(25)
table anothertable
id int clustered indexname nvarchar(25)
table somefunction
- does math returns valid id
compare:
select y.name y dbo.somefunction(y.id) in (select anothertable.id anothertable) vs:
select y.name y join anothertable on dbo.somefunction(y.id) on anothertable.id question:
while timing these 2 queries out found @ large data sets first query using in faster second query using inner join. not understand why can explain please.
generally speaking in different join in join can return additional rows row has more 1 match in join-ed table.
from estimated execution plan though can seen in case 2 queries semantically same
select a.col1 ,dbo.foo(a.col1) ,max(a.col2) dbo.foo(a.col1) in (select col1 b) group a.col1, dbo.foo(a.col1) versus
select a.col1 ,dbo.foo(a.col1) ,max(a.col2) join b on dbo.foo(a.col1) = b.col1 group a.col1, dbo.foo(a.col1) even if duplicates introduced join removed group by references columns left hand table. additionally these duplicate rows not alter result max(a.col2) not change. not case aggregates however. if use sum(a.col2) (or avg or count) presence of duplicates change result.
it seems sql server doesn't have logic differentiate between aggregates such max , such sum , quite possibly expanding out duplicates aggregating them later , doing lot more work.
the estimated number of rows being aggregated 2893.54 in vs 28271800 join these estimates won't reliable join predicate unsargable.
Comments
Post a Comment