sql - How to write the quantile aggregate function? -


i have following table:

create table #temp (cola varchar(max), colb varchar(max), date date, value int)  insert #temp values('a','b','7/1/2010','11143274') insert #temp values('a','b','7/1/2010','13303527') insert #temp values('a','b','7/1/2010','17344238') insert #temp values('a','b','7/1/2010','13236525') insert #temp values('a','b','7/1/2010','10825232') insert #temp values('a','b','7/1/2010','13567253') insert #temp values('a','b','7/1/2010','10726342') insert #temp values('a','b','7/1/2010','11605647')  insert #temp values('a','b','7/2/2010','13236525') insert #temp values('a','b','7/2/2010','10825232') insert #temp values('a','b','7/2/2010','13567253') insert #temp values('a','b','7/2/2010','10726342') insert #temp values('a','b','7/2/2010','11605647') insert #temp values('a','b','7/2/2010','17344238') insert #temp values('a','b','7/2/2010','17344238') insert #temp values('a','b','7/2/2010','17344238')  select * #temp  drop table #temp 

in r (a statistical software), calculate 95th percentile value of last column, doing this:

ddply(data, c("cola", "colb", "date"), summarize, value95=quantile(value, 0.95)) 

and output following:

a b 2010-07-01 16022293 b 2010-07-02 17344238 

all doing performing group by operation on cola, colb , date , applying aggregate function quantile function. far should have way in sql server because aggregate operation can cleanly done in sql , when data in order of millions, want in sql statistical software.

my problem not able find way write quantile function itself. tried using ntile not make sense using ntile(100) when number of rows under particular group by less 100. there way this?

update: more output r if helps:

> quantile(c(1,2,3,4,5,5), 0.95) 95%    5  > quantile(c(1,2,3,4,5,5), 0.0) 0%   1  > quantile(c(1,2,3,4,5,5), 1.0) 100%     5  > quantile(c(1,2,3,4,5,5), 0.5) // median 50%  3.5  

here how (the code little bit messy)

create table #temp (cola varchar(max), colb varchar(max), date date, value int)  insert #temp values('a','b','7/1/2010','11143274') insert #temp values('a','b','7/1/2010','13303527') insert #temp values('a','b','7/1/2010','17344238') insert #temp values('a','b','7/1/2010','13236525') insert #temp values('a','b','7/1/2010','10825232') insert #temp values('a','b','7/1/2010','13567253') insert #temp values('a','b','7/1/2010','10726342') insert #temp values('a','b','7/1/2010','11605647')  insert #temp values('a','b','7/2/2010','13236525') insert #temp values('a','b','7/2/2010','10825232') insert #temp values('a','b','7/2/2010','13567253') insert #temp values('a','b','7/2/2010','10726342') insert #temp values('a','b','7/2/2010','11605647') insert #temp values('a','b','7/2/2010','17344238') insert #temp values('a','b','7/2/2010','17344238') insert #temp values('a','b','7/2/2010','17344238')  insert #temp values('a','c','7/2/2010','1') insert #temp values('a','c','7/2/2010','2') insert #temp values('a','c','7/2/2010','3') insert #temp values('a','c','7/2/2010','4') insert #temp values('a','c','7/2/2010','5') insert #temp values('a','c','7/2/2010','5')   declare @perc decimal(6,5) set @perc = 1.0  select cola, colb,date, sum(value)/convert(decimal,count(value)) (  select     row_number() over(partition x.cola, x.colb, x.date order x.value) id,    x.*,    convert(int, y.zz) j,    case when (y.zz - convert(int, y.zz)) = 0 convert(int, y.zz) + 1 else convert(int, y.zz) end k,    y.zz  #temp x join  (    select        cola,        colb,        date,        count(*)*@perc zz            #temp      group        cola,        colb,        date )y on x.cola = y.cola , x.colb = y.colb , x.date = y.date  )xxx id = j or id = k group cola, colb, date 

there more ways ho calculate (in terms of method used). using sas 5 (r-2) method.


Comments

Popular posts from this blog

objective c - Change font of selected text in UITextView -

php - Accessing POST data in Facebook cavas app -

c# - Getting control value when switching a view as part of a multiview -