c# - Use convolution to find a reference audio sample in a continuous stream of sound -


in my previous question on finding reference audio sample in bigger audio sample, proposed, should use convolution.
using dsputil, able this. played little , tried different combinations of audio samples, see result was. visualize data, dumped raw audio numbers excel , created chart using numbers. peak is visible, don't know how helps me. have these problems:

  • i don't know, how infer starting position of match in original audio sample location of peak.
  • i don't know, how should apply continuous stream of audio, can react, reference audio sample occurs.
  • i don't understand, why picture 2 , picture 4 (see below) differ much, although, both represent audio sample convolved itself...

any highly appreciated.

the following pictures result of analysis using excel:

  1. a longer audio sample reference audio (a beep) near end: http://img801.imageshack.us/img801/976/values1.png
  2. the beep convolved itself: http://img96.imageshack.us/img96/6720/values2i.png
  3. a longer audio sample without beep convolved beep: http://img845.imageshack.us/img845/1091/values3.png
  4. the longer audio sample of point 3 convolved itself: http://img38.imageshack.us/img38/1272/values4.png

update , solution:
extensive of han, able achieve goal.
after rolled own slow implementation without fft, found alglib provides fast implementation. there 1 basic assumption problem: 1 of audio samples contained within other.
so, following code returns offset in samples in larger of 2 audio samples , normalized cross-correlation value @ offset. 1 means complete correlation, 0 means no correlation @ , -1 means complete negative correlation:

private void calccrosscorrelation(ienumerable<double> data1,                                    ienumerable<double> data2,                                    out int offset,                                    out double maximumnormalizedcrosscorrelation) {     var data1array = data1.toarray();     var data2array = data2.toarray();     double[] result;     alglib.corrr1d(data1array, data1array.length,                     data2array, data2array.length, out result);      var max = double.minvalue;     var index = 0;     var = 0;     // find maximum cross correlation value , index     foreach (var d in result)     {         if (d > max)         {             index = i;             max = d;         }         ++i;     }     // if index bigger length of first array, has     // interpreted negative index     if (index >= data1array.length)     {         index *= -1;     }      var matchingdata1 = data1;     var matchingdata2 = data2;     var biggersequencecount = math.max(data1array.length, data2array.length);     var smallersequencecount = math.min(data1array.length, data2array.length);     offset = index;     if (index > 0)         matchingdata1 = data1.skip(offset).take(smallersequencecount).tolist();     else if (index < 0)     {         offset = biggersequencecount + smallersequencecount + index;         matchingdata2 = data2.skip(offset).take(smallersequencecount).tolist();         matchingdata1 = data1.take(smallersequencecount).tolist();     }     var mx = matchingdata1.average();     var = matchingdata2.average();     var denom1 = math.sqrt(matchingdata1.sum(x => (x - mx) * (x - mx)));     var denom2 = math.sqrt(matchingdata2.sum(y => (y - my) * (y - my)));     maximumnormalizedcrosscorrelation = max / (denom1 * denom2); } 

bounty:
no new answers required! started bounty award han continued effort question!

here go bounty :)

to find particular reference signal in larger audio fragment, need use cross-correlation algorithm. basic formulae can found in wikipedia article.

cross-correlation process 2 signals compared. done multiplying both signals , summing results samples. 1 of signals shifted (usually 1 sample), , calculation repeated. if try visualize simple signals such single impulse (e.g. 1 sample has value while remaining samples zero), or pure sine wave, see result of cross-correlation indeed measure for how both signals alike , delay between them. article may provide more insight can found here.

this article paul bourke contains source code straightforward time-domain implementation. note article written general signal. audio has special property long-time average usualy 0. means averages used in paul bourkes formula (mx , my) can left out. there fast implementations of cross-correlation based on fft (see alglib).

the (maximum) value of correlation depends on sample values in audio signals. in paul bourke's algorithm maximum scaled 1.0. in cases 1 of signals contained entirely within signal, maximum value reach 1. in more general case maximum lower , threshold value have determined decide whether signals sufficiently alike.


Comments

Popular posts from this blog

objective c - Change font of selected text in UITextView -

php - Accessing POST data in Facebook cavas app -

c# - Getting control value when switching a view as part of a multiview -