Discussion:
[SciPy-Dev] scipy.io.wavfile to read byte array directly?
Miles Dowe
2016-09-10 20:35:06 UTC
Permalink
Hi all,

I was interested in the creation of a function for the scipy.io.wavfile
utility. Rather than requiring that read() only be performed on a file, I'd
like to add a read() function where a byte array of WAV data can be
provided directly.

Here's some background behind this motivation. I am a student with the
University of Washington and I have been working with a former student's
machine learning algorithm. The aim of the algorithm is to detect human
laughter and it utilizes SciPy and NumPy.

We're aiming to create a service-oriented architecture maintained in AWS
and our audio data is stored within S3. I've been experimenting with the
Boto3 library, which returns a byte array, and I'd like to provide that
data directly to the machine learning script (instead of writing to the
disk and reading from it).

I'd like to hear your thoughts and might experiment with this idea until
approval is expressed by the community.

Thank you for your time,


Miles
--
Miles
Joseph Booker
2016-09-10 20:53:22 UTC
Permalink
Miles,

Are you aware of io.BytesIO? I don't know the performance implications of
using a wrapper, but I'd expect loading the data to take marginal time
compared to training your ML model.

--
Joseph
Post by Miles Dowe
Hi all,
I was interested in the creation of a function for the scipy.io.wavfile
utility. Rather than requiring that read() only be performed on a file, I'd
like to add a read() function where a byte array of WAV data can be
provided directly.
Here's some background behind this motivation. I am a student with the
University of Washington and I have been working with a former student's
machine learning algorithm. The aim of the algorithm is to detect human
laughter and it utilizes SciPy and NumPy.
We're aiming to create a service-oriented architecture maintained in AWS
and our audio data is stored within S3. I've been experimenting with the
Boto3 library, which returns a byte array, and I'd like to provide that
data directly to the machine learning script (instead of writing to the
disk and reading from it).
I'd like to hear your thoughts and might experiment with this idea until
approval is expressed by the community.
Thank you for your time,
Miles
--
Miles
_______________________________________________
SciPy-Dev mailing list
https://mail.scipy.org/mailman/listinfo/scipy-dev
Ralf Gommers
2016-09-10 21:46:45 UTC
Permalink
Post by Joseph Booker
Miles,
Are you aware of io.BytesIO? I don't know the performance implications of
using a wrapper, but I'd expect loading the data to take marginal time
compared to training your ML model.
BytesIO would be useful if the data is already in an array. It's not clear
from the question that that's the case. If not, it's the interpreting of
the .wav file data format that Miles would like to reuse.
Post by Joseph Booker
--
Joseph
Post by Miles Dowe
Hi all,
I was interested in the creation of a function for the scipy.io.wavfile
utility. Rather than requiring that read() only be performed on a file, I'd
like to add a read() function where a byte array of WAV data can be
provided directly.
wavfile.read already takes a file or a file-like object. The docs don't
specify exactly what methods the file-like object needs to have. A quick
browse says: read, seek, tell and close. Would be nice to get that
documented and tested. Does that help?
Post by Joseph Booker
Post by Miles Dowe
Here's some background behind this motivation. I am a student with the
University of Washington and I have been working with a former student's
machine learning algorithm. The aim of the algorithm is to detect human
laughter and it utilizes SciPy and NumPy.
We're aiming to create a service-oriented architecture maintained in AWS
and our audio data is stored within S3. I've been experimenting with the
Boto3 library, which returns a byte array, and I'd like to provide that
data directly to the machine learning script (instead of writing to the
disk and reading from it).
I'd like to hear your thoughts and might experiment with this idea until
approval is expressed by the community.
If you can make this work with the existing read() function, that would be
useful. A separate function shouldn't be needed.

Ralf
Post by Joseph Booker
Post by Miles Dowe
Thank you for your time,
Miles
--
Miles
_______________________________________________
SciPy-Dev mailing list
https://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________
SciPy-Dev mailing list
https://mail.scipy.org/mailman/listinfo/scipy-dev
Miles
2016-09-11 18:52:11 UTC
Permalink
Joseph, Ralf,


Thank you both for your responses! I'm still very much new and
unfamiliar with Python and it's libraries, so I appreciate your quick
and courteous replies.

The BytesIO library was exactly what I needed. I was able to load the
WAV byte array data into scipy using that as a wrapper.

If it's of any interest, my code roughly looked like this:

```

import boto3

import scipy.io.wavfile as sciwav

from io import BytesIO

s3 = boto3.resource('s3')


#access bucket, get WAV data (i.e., b'RIFF\x86x.\x01WAVEfmt ...')

object = s3.Object(bucket_name, key)

result = object.get()['Body'].read()


# wrap data and submit

wrapper = BytesIO(result)

wav_file = sciwav.read(wrapper)

```


I will also go back and review the documentation regarding using
file-like objects and can add those details.


Thank you again,



Miles

Loading...