sql - How to write the quantile aggregate function? -
i have following table:
create table #temp (cola varchar(max), colb varchar(max), date date, value int) insert #temp values('a','b','7/1/2010','11143274') insert #temp values('a','b','7/1/2010','13303527') insert #temp values('a','b','7/1/2010','17344238') insert #temp values('a','b','7/1/2010','13236525') insert #temp values('a','b','7/1/2010','10825232') insert #temp values('a','b','7/1/2010','13567253') insert #temp values('a','b','7/1/2010','10726342') insert #temp values('a','b','7/1/2010','11605647') insert #temp values('a','b','7/2/2010','13236525') insert #temp values('a','b','7/2/2010','10825232') insert #temp values('a','b','7/2/2010','13567253') insert #temp values('a','b','7/2/2010','10726342') insert #temp values('a','b','7/2/2010','11605647') insert #temp values('a','b','7/2/2010','17344238') insert #temp values('a','b','7/2/2010','17344238') insert #temp values('a','b','7/2/2010','17344238') select * #temp drop table #temp
in r (a statistical software), calculate 95th percentile value of last column, doing this:
ddply(data, c("cola", "colb", "date"), summarize, value95=quantile(value, 0.95))
and output following:
a b 2010-07-01 16022293 b 2010-07-02 17344238
all doing performing group by
operation on cola
, colb
, date
, applying aggregate function quantile
function. far should have way in sql server because aggregate operation can cleanly done in sql , when data in order of millions, want in sql statistical software.
my problem not able find way write quantile function itself. tried using ntile not make sense using ntile(100)
when number of rows under particular group by
less 100. there way this?
update: more output r
if helps:
> quantile(c(1,2,3,4,5,5), 0.95) 95% 5 > quantile(c(1,2,3,4,5,5), 0.0) 0% 1 > quantile(c(1,2,3,4,5,5), 1.0) 100% 5 > quantile(c(1,2,3,4,5,5), 0.5) // median 50% 3.5
here how (the code little bit messy)
create table #temp (cola varchar(max), colb varchar(max), date date, value int) insert #temp values('a','b','7/1/2010','11143274') insert #temp values('a','b','7/1/2010','13303527') insert #temp values('a','b','7/1/2010','17344238') insert #temp values('a','b','7/1/2010','13236525') insert #temp values('a','b','7/1/2010','10825232') insert #temp values('a','b','7/1/2010','13567253') insert #temp values('a','b','7/1/2010','10726342') insert #temp values('a','b','7/1/2010','11605647') insert #temp values('a','b','7/2/2010','13236525') insert #temp values('a','b','7/2/2010','10825232') insert #temp values('a','b','7/2/2010','13567253') insert #temp values('a','b','7/2/2010','10726342') insert #temp values('a','b','7/2/2010','11605647') insert #temp values('a','b','7/2/2010','17344238') insert #temp values('a','b','7/2/2010','17344238') insert #temp values('a','b','7/2/2010','17344238') insert #temp values('a','c','7/2/2010','1') insert #temp values('a','c','7/2/2010','2') insert #temp values('a','c','7/2/2010','3') insert #temp values('a','c','7/2/2010','4') insert #temp values('a','c','7/2/2010','5') insert #temp values('a','c','7/2/2010','5') declare @perc decimal(6,5) set @perc = 1.0 select cola, colb,date, sum(value)/convert(decimal,count(value)) ( select row_number() over(partition x.cola, x.colb, x.date order x.value) id, x.*, convert(int, y.zz) j, case when (y.zz - convert(int, y.zz)) = 0 convert(int, y.zz) + 1 else convert(int, y.zz) end k, y.zz #temp x join ( select cola, colb, date, count(*)*@perc zz #temp group cola, colb, date )y on x.cola = y.cola , x.colb = y.colb , x.date = y.date )xxx id = j or id = k group cola, colb, date
there more ways ho calculate (in terms of method used). using sas 5 (r-2) method.
Comments
Post a Comment