sql server - Sql query optimization using IN over INNER JOIN -


given:

table y

  • id int clustered index
  • name nvarchar(25)

table anothertable

  • id int clustered index
  • name nvarchar(25)

table somefunction

  • does math returns valid id

compare:

select y.name   y  dbo.somefunction(y.id) in (select anothertable.id                                      anothertable) 

vs:

select y.name    y   join anothertable on dbo.somefunction(y.id) on anothertable.id 

question:

while timing these 2 queries out found @ large data sets first query using in faster second query using inner join. not understand why can explain please.

execution plan

generally speaking in different join in join can return additional rows row has more 1 match in join-ed table.

from estimated execution plan though can seen in case 2 queries semantically same

select         a.col1         ,dbo.foo(a.col1)         ,max(a.col2)                 dbo.foo(a.col1)  in (select col1 b)     group         a.col1,         dbo.foo(a.col1) 

versus

select         a.col1         ,dbo.foo(a.col1)         ,max(a.col2)                 join b on dbo.foo(a.col1) = b.col1     group         a.col1,         dbo.foo(a.col1)      

even if duplicates introduced join removed group by references columns left hand table. additionally these duplicate rows not alter result max(a.col2) not change. not case aggregates however. if use sum(a.col2) (or avg or count) presence of duplicates change result.

it seems sql server doesn't have logic differentiate between aggregates such max , such sum , quite possibly expanding out duplicates aggregating them later , doing lot more work.

the estimated number of rows being aggregated 2893.54 in vs 28271800 join these estimates won't reliable join predicate unsargable.


Comments

Popular posts from this blog

objective c - Change font of selected text in UITextView -

php - Accessing POST data in Facebook cavas app -

c# - Getting control value when switching a view as part of a multiview -