I have the problem of input overflowing and I can’t figure out how to manage it.
So let me give you my code and explain it:
p = pyaudio.PyAudio() stream = p.open(format=FORMAT, channels=1, rate=SAMPLERATE, input_device_index=chosen_device_index, input=True, frames_per_buffer=CHUNK) frames = () for i in range(0, int(SAMPLERATE / CHUNK * RECORD_SECONDS)): data = stream.read(CHUNK) decoded = np.frombuffer(data, 'int16') mfcc_feat = mfcc(decoded, samplerate=SAMPLERATE/3, winlen=WINDOW_SIZE, winstep=WINDOW_STEP, nfft=NFFT) if len(frames) < 299: frames.append(mfcc_feat) elif len(frames) >= 299: predict_test = tf.convert_to_tensor(frames) result = model.predict(predict_test) frames = () frames.append(mfcc_feat) stream.stop_stream() stream.close() p.terminate()
So basiclly what I’m doing here is to using a trained tensorflow model to predict on audio features, that are generated in realtime.
First I open up a stream. Then I read the audio data in that stream using a for loop. This happens when I set the samplerate to 48000 and the chunk size to 192 excatly 250 times a second.
So 250 times a second I read in the next chunk, decode it using
numpy.frombuffer and then calculating the features. The features are stored in the array
frames. Each time, the length of the
frames array is
299, i will use this array to predict with my tensorflow model.
And there is the problem:
for loop iterates 250 times a second, each iteration has
0.004 seconds to complete, otherwise, the input of the stream will overflow (
data = stream.read(CHUNK)). When I just calculate the features in each iteration, this gets done faster than 0.004 s and the input does not overflow. Since the model prediction takes more than 0.004 seconds, the input overflows.
What can I do so that the prediction of my model gets done every 299 iteration steps, without letting the next for loop iteration wait?