sql - How to write the quantile aggregate function? -
i have following table:
create table #temp (cola varchar(max), colb varchar(max), date date, value int) insert #temp values('a','b','7/1/2010','11143274') insert #temp values('a','b','7/1/2010','13303527') insert #temp values('a','b','7/1/2010','17344238') insert #temp values('a','b','7/1/2010','13236525') insert #temp values('a','b','7/1/2010','10825232') insert #temp values('a','b','7/1/2010','13567253') insert #temp values('a','b','7/1/2010','10726342') insert #temp values('a','b','7/1/2010','11605647') insert #temp values('a','b','7/2/2010','13236525') insert #temp values('a','b','7/2/2010','10825232') insert #temp values('a','b','7/2/2010','13567253') insert #temp values('a','b','7/2/2010','10726342') insert #temp values('a','b','7/2/2010','11605647') insert #temp values('a','b','7/2/2010','17344238') insert #temp values('a','b','7/2/2010','17344238') insert #temp values('a','b','7/2/2010','17344238') select * #temp drop table #temp in r (a statistical software), calculate 95th percentile value of last column, doing this:
ddply(data, c("cola", "colb", "date"), summarize, value95=quantile(value, 0.95)) and output following:
a b 2010-07-01 16022293 b 2010-07-02 17344238 all doing performing group by operation on cola, colb , date , applying aggregate function quantile function. far should have way in sql server because aggregate operation can cleanly done in sql , when data in order of millions, want in sql statistical software.
my problem not able find way write quantile function itself. tried using ntile not make sense using ntile(100) when number of rows under particular group by less 100. there way this?
update: more output r if helps:
> quantile(c(1,2,3,4,5,5), 0.95) 95% 5 > quantile(c(1,2,3,4,5,5), 0.0) 0% 1 > quantile(c(1,2,3,4,5,5), 1.0) 100% 5 > quantile(c(1,2,3,4,5,5), 0.5) // median 50% 3.5
here how (the code little bit messy)
create table #temp (cola varchar(max), colb varchar(max), date date, value int) insert #temp values('a','b','7/1/2010','11143274') insert #temp values('a','b','7/1/2010','13303527') insert #temp values('a','b','7/1/2010','17344238') insert #temp values('a','b','7/1/2010','13236525') insert #temp values('a','b','7/1/2010','10825232') insert #temp values('a','b','7/1/2010','13567253') insert #temp values('a','b','7/1/2010','10726342') insert #temp values('a','b','7/1/2010','11605647') insert #temp values('a','b','7/2/2010','13236525') insert #temp values('a','b','7/2/2010','10825232') insert #temp values('a','b','7/2/2010','13567253') insert #temp values('a','b','7/2/2010','10726342') insert #temp values('a','b','7/2/2010','11605647') insert #temp values('a','b','7/2/2010','17344238') insert #temp values('a','b','7/2/2010','17344238') insert #temp values('a','b','7/2/2010','17344238') insert #temp values('a','c','7/2/2010','1') insert #temp values('a','c','7/2/2010','2') insert #temp values('a','c','7/2/2010','3') insert #temp values('a','c','7/2/2010','4') insert #temp values('a','c','7/2/2010','5') insert #temp values('a','c','7/2/2010','5') declare @perc decimal(6,5) set @perc = 1.0 select cola, colb,date, sum(value)/convert(decimal,count(value)) ( select row_number() over(partition x.cola, x.colb, x.date order x.value) id, x.*, convert(int, y.zz) j, case when (y.zz - convert(int, y.zz)) = 0 convert(int, y.zz) + 1 else convert(int, y.zz) end k, y.zz #temp x join ( select cola, colb, date, count(*)*@perc zz #temp group cola, colb, date )y on x.cola = y.cola , x.colb = y.colb , x.date = y.date )xxx id = j or id = k group cola, colb, date there more ways ho calculate (in terms of method used). using sas 5 (r-2) method.
Comments
Post a Comment