c# - Use convolution to find a reference audio sample in a continuous stream of sound -
in my previous question on finding reference audio sample in bigger audio sample, proposed, should use convolution.
using dsputil, able this. played little , tried different combinations of audio samples, see result was. visualize data, dumped raw audio numbers excel , created chart using numbers. peak is visible, don't know how helps me. have these problems:
- i don't know, how infer starting position of match in original audio sample location of peak.
- i don't know, how should apply continuous stream of audio, can react, reference audio sample occurs.
- i don't understand, why picture 2 , picture 4 (see below) differ much, although, both represent audio sample convolved itself...
any highly appreciated.
the following pictures result of analysis using excel:
- a longer audio sample reference audio (a beep) near end: http://img801.imageshack.us/img801/976/values1.png
- the beep convolved itself: http://img96.imageshack.us/img96/6720/values2i.png
- a longer audio sample without beep convolved beep: http://img845.imageshack.us/img845/1091/values3.png
- the longer audio sample of point 3 convolved itself: http://img38.imageshack.us/img38/1272/values4.png
update , solution:
extensive of han, able achieve goal.
after rolled own slow implementation without fft, found alglib provides fast implementation. there 1 basic assumption problem: 1 of audio samples contained within other.
so, following code returns offset in samples in larger of 2 audio samples , normalized cross-correlation value @ offset. 1 means complete correlation, 0 means no correlation @ , -1 means complete negative correlation:
private void calccrosscorrelation(ienumerable<double> data1, ienumerable<double> data2, out int offset, out double maximumnormalizedcrosscorrelation) { var data1array = data1.toarray(); var data2array = data2.toarray(); double[] result; alglib.corrr1d(data1array, data1array.length, data2array, data2array.length, out result); var max = double.minvalue; var index = 0; var = 0; // find maximum cross correlation value , index foreach (var d in result) { if (d > max) { index = i; max = d; } ++i; } // if index bigger length of first array, has // interpreted negative index if (index >= data1array.length) { index *= -1; } var matchingdata1 = data1; var matchingdata2 = data2; var biggersequencecount = math.max(data1array.length, data2array.length); var smallersequencecount = math.min(data1array.length, data2array.length); offset = index; if (index > 0) matchingdata1 = data1.skip(offset).take(smallersequencecount).tolist(); else if (index < 0) { offset = biggersequencecount + smallersequencecount + index; matchingdata2 = data2.skip(offset).take(smallersequencecount).tolist(); matchingdata1 = data1.take(smallersequencecount).tolist(); } var mx = matchingdata1.average(); var = matchingdata2.average(); var denom1 = math.sqrt(matchingdata1.sum(x => (x - mx) * (x - mx))); var denom2 = math.sqrt(matchingdata2.sum(y => (y - my) * (y - my))); maximumnormalizedcrosscorrelation = max / (denom1 * denom2); }
bounty:
no new answers required! started bounty award han continued effort question!
here go bounty :)
to find particular reference signal in larger audio fragment, need use cross-correlation algorithm. basic formulae can found in wikipedia article.
cross-correlation process 2 signals compared. done multiplying both signals , summing results samples. 1 of signals shifted (usually 1 sample), , calculation repeated. if try visualize simple signals such single impulse (e.g. 1 sample has value while remaining samples zero), or pure sine wave, see result of cross-correlation indeed measure for how both signals alike , delay between them. article may provide more insight can found here.
this article paul bourke contains source code straightforward time-domain implementation. note article written general signal. audio has special property long-time average usualy 0. means averages used in paul bourkes formula (mx , my) can left out. there fast implementations of cross-correlation based on fft (see alglib).
the (maximum) value of correlation depends on sample values in audio signals. in paul bourke's algorithm maximum scaled 1.0. in cases 1 of signals contained entirely within signal, maximum value reach 1. in more general case maximum lower , threshold value have determined decide whether signals sufficiently alike.
Comments
Post a Comment