You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Conceptually replicating the analysis of our Interspeech paper is a useful goal to guide scikit-talk development. To that end I'm going to create a proof of concept of our code for identifying continuers and selecting a set of utterances that can then feed into audio clip extraction and clustering analysis. I'll be using the IFADV data so that it can be fully open.
I'll start work in the playground repo and will try to get to it within the next few days. From my side this will be based on our R code. For the audio extraction and clustering, it will be based on the code in the existing OSF repo. Both will require some degree of porting and editing to be made to work with the open IFADV data.
The paper:
Liesenfeld, Andreas, and Mark Dingemanse. 2022. “Bottom-up Discovery of Structure and Variation in Response Tokens (‘Backchannels’) across Diverse Languages.” In Proceeding of Interspeech 2022. https://doi.org/10.21437/Interspeech.2022-11288.
The text was updated successfully, but these errors were encountered:
Alright @bvreede@n400peanuts@liesenf the playground repo now contains a first go at a dataset similar to the one that underlies the first half of our paper, but now using only the IFADV package.
The R code for generating this should be fairly straightforward to port to Python. I have tried to comment as needed. Let me know if you need any further guidance. To preview the steps:
We add a column streak that holds a streak counter using the cumsum() function. This counter increments whenever a speaker produces the same utterance in succession.
We select items that occur in streaks of >2: these are our candidate continuers.
Surprise! It so happens that in the exotic language of the IFADV dataset, the top three formats found in streaks are ja, ja ja, hum, as depicted in this quick and dirty convplot of a few sample sequences:
Conceptually replicating the analysis of our Interspeech paper is a useful goal to guide scikit-talk development. To that end I'm going to create a proof of concept of our code for identifying continuers and selecting a set of utterances that can then feed into audio clip extraction and clustering analysis. I'll be using the IFADV data so that it can be fully open.
I'll start work in the playground repo and will try to get to it within the next few days. From my side this will be based on our R code. For the audio extraction and clustering, it will be based on the code in the existing OSF repo. Both will require some degree of porting and editing to be made to work with the open IFADV data.
The paper:
The text was updated successfully, but these errors were encountered: