logging - HOW to get MFCC from an FFT on a signal? -
short , simple: hi simply... want know steps involved mfcc fft.
detailed:
hi all. working on drum application want classify sounds. matching application, returns name of note play on drum.
its simple indian loud big drum. there few notes on there 1 can play.
i've implemented fft algorithm , obtain spectrum. want take 1 step further , return mfcc fft.
this understand far. based on linear cosine transform of log power spectrum on nonlinear mel scale of frequency.
it uses triangulation filter out frequencies , desired coefficient. http://instruct1.cit.cornell.edu/courses/ece576/finalprojects/f2008/pae26_jsc59/pae26_jsc59/images/melfilt.png
so if have around 1000 values returned fft algorithm - spectrum of sound, desirably you'll around 12 elements (i.e., coefficients). 12-element vector used classify instrument, including drum played...
this want.
could please me on how this? programming skills alright. im creating application iphone. openframeworks.
any appreciated. cheers
first, have split signal in small frames 10 30ms, apply windowing function (humming recommended sound applications), , compute fourier transform of signal. dft, compute mel frequecy cepstral coefficients have follow these steps:
- get power spectrum: |dft|^2
- compute triangular bank filter transform hz scale mel scale
- get log spectrum
- apply discrete cossine transform
a python code example:
import numpy scipy.fftpack import dct scipy.io import wavfile samplerate, signal = wavfile.read("file.wav") numcoefficients = 13 # choose sive of mfcc array minhz = 0 maxhz = 22.000 complexspectrum = numpy.fft(signal) powerspectrum = abs(complexspectrum) ** 2 filteredspectrum = numpy.dot(powerspectrum, melfilterbank()) logspectrum = numpy.log(filteredspectrum) dctspectrum = dct(logspectrum, type=2) # mfcc :) def melfilterbank(blocksize): numbands = int(numcoefficients) maxmel = int(freqtomel(maxhz)) minmel = int(freqtomel(minhz)) # create matrix triangular filters, 1 row per filter filtermatrix = numpy.zeros((numbands, blocksize)) melrange = numpy.array(xrange(numbands + 2)) melcenterfilters = melrange * (maxmel - minmel) / (numbands + 1) + minmel # each array index represent center of each triangular filter aux = numpy.log(1 + 1000.0 / 700.0) / 1000.0 aux = (numpy.exp(melcenterfilters * aux) - 1) / 22050 aux = 0.5 + 700 * blocksize * aux aux = numpy.floor(aux) # arredonda pra baixo centerindex = numpy.array(aux, int) # int values in xrange(numbands): start, centre, end = centerindex[i:i + 3] k1 = numpy.float32(centre - start) k2 = numpy.float32(end - centre) = (numpy.array(xrange(start, centre)) - start) / k1 down = (end - numpy.array(xrange(centre, end))) / k2 filtermatrix[i][start:centre] = filtermatrix[i][centre:end] = down return filtermatrix.transpose() def freqtomel(freq): return 1127.01048 * math.log(1 + freq / 700.0) def meltofreq(mel): return 700 * (math.exp(mel / 1127.01048) - 1)
this code based on mfcc vamp example. hope you!
Comments
Post a Comment