scripts-conversation-acoustic-analysis

Scripts to analyze audio files of conversations.

All code created on Windows 11 and provided for reference. Changes will likely be required to run the code on your system.

Script types

.py    Python(3.11.2) script
.sql   SQL script (I used SQL Server)
.praat Praat(6.3.03) script
.txt   text to run from command line
.R     R(4.3.1) script

Code

scrape_CF.py                download the wav files from the CallFriend corpus
scrape_CH.py                download the wav files from the CallHome corpus
code to create same_pitch.csv
    c_pitch.sql             create table in a relational database 
    c_pitch.xml             bcp format file
    pitches.praat           extract pitch for each channel of each file
    pitch.py                create a class for pitch data
    pitch_all.py            combine the pitch data into pitch_all.csv, with a row for each channel for each conversation for each subcorpus
    bcp c_pitch.txt         load pitch_all.csv into table
    same_pitch_ratio.sql    count how often the two channels are near each other in pitch, even if there is a systematic offset to the match
single_speaker_segments.py  Main module of this project. Note that it requires praat-textgrids (https://pypi.org/project/praat-textgrids/)
    Main steps:
        call_praat          create sound/silence textgrids for each file in the corpora, for each of the parameters specified
        textgrids_to_csv    pull the sound times from the Praat textgrids into sound_times_csv
        find_turn_begin     analyze the sound times for transitions and create files for the turn times on each channel and for transitions 
        trans_textgrids     create textgrids of transitions to help explain the data
    Example call: 
        python single_speaker_segments.py c:\temp\codes.csv c:\temp\params.xlsx c:\temp\sound_times.csv c:\temp\sound_silence_turn.csv c:\temp\trans.csv c:\temp\sss.csv
silences-param.praat        create sound/silence textgrid for each channel for a file
dfs-CF,CH only.R            create data frames from the various data files
graph functions.R           graph functions to evaluate models
tests.R                     statistical tests
graphs, tables.R            graphs and tables

Manually created files

codes.csv 
    information about subcorpora
    logical PK is Code (col H)
    columns
        Corpus          Corpus name
        Language        Language name
        Description     Subcorpus full description
        LangCd          Language code (ISO 639-3)
        RegionLangCd    LangCd, with an abbreviation for the region, if one is specified
        Mode            Monomodal (phone) or Multimodal (face-to-face)
        Designation     Distinguishing feature (Required only for some corpora)
        Code            Subcorpus code, made up of abbreviation of Corpus + RegionLangCd
        wavDir          Directory where the audio files are located
        TextGridDir     Directory where the text grids will be placed
        PitchDir        Directory where the pitch files will be placed
    
params.xlsx 
    acoustic parameters to be evaluated
    logical PK is folder (col H) - iteration column is not actually used
    script only uses sheet 1, so you can keep a library of parameters in sheet 2 and just paste some of them into sheet 1 to check results
    columns
        iter        iteration (used just to help distinguish which row is which)
        sound       sound threshold in s
        silence     silence threshold in s
        ints        intency threshold as a ratio
        sound_ms    sound threshold in ms
        sil_ms      silence threshold in ms
        ints_pct    intensity threshold as a percentage
        folder      the concatenation of the above three fields to be appended onto the end of the value of TextGridDir in codes.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scripts-conversation-acoustic-analysis

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
LICENSE		LICENSE
README.md		README.md
bcp c_pitch.txt		bcp c_pitch.txt
c_pitch.sql		c_pitch.sql
codes.csv		codes.csv
dfs-CF,CH only.R		dfs-CF,CH only.R
graph functions.R		graph functions.R
graphs, tables.R		graphs, tables.R
params.xlsx		params.xlsx
params_eval.csv		params_eval.csv
pitch.py		pitch.py
pitch_all.py		pitch_all.py
pitches.praat		pitches.praat
same_pitch_ratio.sql		same_pitch_ratio.sql
scrape_CF.py		scrape_CF.py
scrape_CH.py		scrape_CH.py
silences-param.praat		silences-param.praat
single_speaker_segments.py		single_speaker_segments.py
tests.R		tests.R

License

utastudents/scripts-conversation-acoustic-analysis

Folders and files

Latest commit

History

Repository files navigation

scripts-conversation-acoustic-analysis

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages