使⽤openSMILE提取MFCC简易教程(Mac)
openSMILE是⼀款专门为提取⾳频特征设计的软件,介绍和安装⽅法⽹上已经有很多,这⾥不再赘述,我摸索openSMILE的使⽤⽅法的时候发现⽹上关于这个软件的教程很少,所以将⾃⼰使⽤的经验写出来放到这个博客上来,希望有⼈使⽤这个软件的时候不要再绕那么多弯路。
我安装软件的时候跟visual studio不停地冲突,所以我尝试了⼀下安装到mac系统上,并且使⽤shell编写程序脚本,进⾏特征的提取。
在使⽤openSMILE的时候,决定了你提取的特征的参数都存储在你所使⽤的.config⽂件中,包括了frame size, mfcc的系数的数量,是否包含delta等,⼩⽩完全可以在官⽅提供的配置⽂件上进⾏修改,提取⾃⼰模型所需的特征值。
在openSMILE中,默认提取后存储的⽂件格式为compatibility HTK format,在python中我没有到很合适的function读取这种⽂件类型,最终我使⽤的是MATLAB的voice tool 包中的readhtk() 函数。具体的模型建⽴⽅法我将在下⼀篇博客中说明。
现在我们先来看⼀下官⽅提供的配置⽂件MFCC12_0_f 。
///////////////////////////////
///// > openSMILE configuration file to extract MFCC features < //////
///// HTK target kind: MFCC_0_D_A, numCeps=12 //////////
///////////
///// (c) 2013-2016 audEERING. //////////
///// All rights reserverd. See file COPYING for details. //////
///////////////////////////////
///////////////////////////////
;
; This section is always required in openSMILE configuration files
; it configures the componentManager and gives a list of all components which are to be loaded
; The order in which the components are listed should match
;
the order of the data flow for most efficient processing
;
///////////////////////////////
[componentInstances:cComponentManager]
instance[dataMemory].type=cDataMemory
\{shared/standard_f.inc}
[componentInstances:cComponentManager]
; audio framer
instance[frame].type=cFramer
; speech pre-emphasis (on a per frame basis as HTK does it)
instance[pe].type=cVectorPreemphasis
;
apply a window function to pre-emphasised frames
instance[win].type=cWindower
; transform to the frequency domain using FFT
instance[fft].type=cTransformFFT
; compute magnitude of the complex fft from the previous component
instance[fftmag].type=cFFTmagphase
; compute Mel-bands from magnitude spectrum
instance[melspec].type=cMelspec
; compute MFCC from Mel-band spectrum
instance[mfcc].type=cMfcc
; compute delta coefficients from mfcc and energy
instance[delta].type=cDeltaRegression
; compute acceleration coefficients from delta coefficients of mfcc and energy
instance[accel].type=cDeltaRegression
; run single threaded (nThreads=1)
; NOTE: a single thread is more efficient for processing small files, since multi-threaded processing involves moredede仿站教程
; overhead during startup, which will make the system slower in the end
nThreads=1
nThreads=1
; do not show any internal dataMemory level settings
; (if you want to see them set the value to 1, 2, 3, or 4, depending on the amount of detail you wish) printLevelStats=0
/////////////////////////////////
///////// component configuration ////////////
/////////////////////////////////
; the following sections configure the components listed above
; a help on configuration parameters can be obtained with
; SMILExtract -H
; or
; SMILExtract -H configTypeName (= componentTypeName)
/////////////////////////////////
[frame:cFramer]
reader.dmLevel=wave
writer.dmLevel=frames
noPostEOIprocessing = 1
copyInputName = 1
frameSize = 0.0250
frameStep = 0.010
frameMode = fixed
frameCenterSpecial = left
[pe:cVectorPreemphasis]
reader.dmLevel=frames
writer.dmLevel=framespe
k = 0.97
de = 0
[win:cWindower]
reader.dmLevel=framespe
writer.dmLevel=winframes
copyInputName = 1
processArrayFields = 1
; hamming window
winFunc = ham
; no gain, no offset
gain = 1.0
offset = 0
[fft:cTransformFFT]
reader.dmLevel=winframes
writer.dmLevel=fft
copyInputName = 1
processArrayFields = 1
inverse = 0
; for compatibility with 2.2.0 and older versions
zeroPadSymmetric = 0
[fftmag:cFFTmagphase]
reader.dmLevel=fft
writer.dmLevel=fftmag
copyInputName = 1
processArrayFields = 1
inverse = 0
magnitude = 1
phase = 0
[melspec:cMelspec]
reader.dmLevel=fftmag
writer.dmLevel=melspec
copyInputName = 1
processArrayFields = 1
processArrayFields = 1
;
htk compatible sample value scaling
htkcompatible = 1
nBands = 26
; use power spectrum instead of magnitude spectrum
usePower = 1
lofreq = 0
hifreq = 8000
specScale = mel
inverse = 0
[mfcc:cMfcc]
reader.dmLevel=melspec
writer.dmLevel=ft0
copyInputName = 1
processArrayFields = 1
firstMfcc = 0
lastMfcc = 12
cepLifter = 22.0
htkcompatible = 1
[delta:cDeltaRegression]
reader.dmLevel=ft0
writer.dmLevel=ft0de
nameAppend = de
copyInputName = 1
noPostEOIprocessing = 0
deltawin=2
blocksize=1
[accel:cDeltaRegression]
reader.dmLevel=ft0de
writer.dmLevel=ft0dede
nameAppend = de
copyInputName = 1
noPostEOIprocessing = 0
deltawin=2
blocksize=1
//////////////////////////
/////// data output configuration //////
//////////////////////////
[componentInstances:cComponentManager]
instance[audspec_lldconcat].type=cVectorConcat
[audspec_lldconcat:cVectorConcat]
reader.dmLevel = ft0;ft0de;ft0dede
writer.dmLevel = lld
includeSingleElementFields = 1
\{shared/standard_data_f.inc}
/
/---------------------- END -------------------------///
整个⽂件被⼤致分为了三个部分:
第⼀部分是参数介绍,介绍了这个⽂件中涉及到的MFCC的参数以及它们在⽂件中的命名是什么;
第⼆部分是参数的数值设置;
第三部分是结果输出的设置,⼀般来说可以保持这部分不变。
在设置好所需的配置⽂件之后,提取过程就⾮常简单了。
⾸先我们打开⼀个空⽩⽂档,需要设定输⼊⽬录、输出⽬录以及openSMILE所在⽬录。在这个范例中,我们输⼊⽬录下的所有⽂件都
是.wav 格式,所以减少了⼀步验证⽂件格式的操作。然后转⾄openSMILE所在⽬录,使⽤循环将输⼊⽬录下的⽂件全部提取出来并存放⾄输出⽬录,存⼊的⽂件名格式为 xx.mfcc.htk 。点击保存,存储类型为.sh ,使⽤时,直接拖⼊terminal终端即可运⾏。
⽰例代码如下。
#!/bin/bash
#vi .bash_profile
PATH=$PATH:$HOME/bin
dir=/Users/lemon/Documents/wan/test
OPATH=/Users/lemon/Documents/wanzhi/test_mfcc
os=/Users/wan/Downloads/opensmile-2.3.0
cd $os
for wav in $(ls $dir); do
SMILExtract -C config/MFCC12_0_f -I $dir/$wav -O $OPATH/$wav.mfcc.htk
echo "$wav is extracted"
done
echo "work finished!"
⾄此,我们去查看输出路径,会发现提取好的mfcc特征都以.htk 的格式存放好了。
下⼀篇将解释如何把这些数据导⼊到MATLAB中进⾏分类。
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论