java开源tts_开源TTS(TextToSpeah)的选择和使⽤
TTS是Text To Speech的缩写,即“从⽂本到语⾳”,是⼈机对话的⼀部分,让机器能够说话。
TTS是语⾳合成应⽤的⼀种,它将⽂件内容或应⽤上的⽂字等,如应⽤菜单或者⽹页,转换成⾃然语⾳输出。
TTS不仅能帮助有视觉障碍的⼈阅读计算机上的信息,更能增加⽂本⽂档的可读性。
开源项目⼀、⽐较流⾏的开源TTS项⽬
MARY-- Text-to-Speech System
MARY is an open-source, multilingual Text-to-Speech Synthesis platform written in Java. It supports German, British and American English, Telugu, Turkish, and Russian.
SpeakRight Framework-- Helps to build Speech Recognition Applications
SpeakRight is an Java framework for writing speech recognition applications in VoiceXML. Dynamic generation of VoiceXML is done using the popular StringTemplate templating framework. Although Voice
XML uses a similar web architecture as HTML, the needs of a speech app are very different. SpeakRight lives in application code layer, typically in a servlet. The SpeakRight runtime dynamically generates VoiceXML pages, one per HTTP request.
Festival
-- Speech Synthesis System
Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. It offers full text to speech through a APIs via shell and though a Scheme command interpreter. It has native support for Apple OS. It supports English and Spanish languages.
FreeTTS-- Speech Synthesizer in Java
FreeTTS is a speech synthesis system written entirely in the Java. It is based upon Flite, a small run-time speech synthesis engine developed at Carnegie Mellon University. Flite is derived from the Festival Speech Synthesis System from the University of Edinburgh and the FestVox project from Carnegie Mellon University. FreeTTS supports a subset of the JSAPI 1.0 java speech synthesis specification.
Festvox
-- Builds New Synthetic Voices
The Festvox project aims to make the building of new synthetic voices more systemic and better documented, making it possible for anyone to build a new voice. Festvox is the base for most of the Speech Synthesis libraries.
Kaldi-- Speech Recognition Toolkit
Kaldi is a Speech recognition research toolkit. It is similar in aims and scope to HTK. The goal is to have modern and flexible code, written in C++, that is easy to modify and extend.
eSpeak-- Text to Speech
eSpeak is a compact open source software speech synthesizer for English and other languages. eSpeak uses a formant synthesis method. This allows many languages to be provided in a small size. It supports SAPI5 version for Windows, so it can be used with screen-readers and other programs that support the Windows SAPI5 interface. It can translate text into phoneme codes, so it could be adapted as a front end for another speech synthesis engine.
Flite
-- Fast Run time Synthesis Engine
Flite (festival-lite) is a small, fast run-time synthesis engine developed at CMU and primarily designed for small embedded machines and/or large servers. Flite is designed as an alternative synthesis engine to Festival for voices built using the FestVox suite of voice building tools.
⼆、开源项⽬的选择
基于需求,选择C/C++的开源项⽬,主要有以下三个:
(1)Festvial
它提供⼀个通⽤框架来建⽴语⾳合成系统,⽽且包含了多种模块的⽰例。它提供了完整的从⽂本到语⾳的API。它原⽣⽀持Apple OS,⽀持英语和西班⽛语。
(2)eSpeak
它是⼀个开源语⾳合成软件,⽀持英语和其他多种语⾔。使⽤共振峰合成的⽅法。这就使得提供的很多
语⾔⽂件很⼩。它位windows⽀持SAPI5版本,所以也能⽤于那些⽀持Windows SAPI5接⼝的屏幕阅读和其他程序。它可以翻译⽂本为⾳速代码,所以能⽤于另⼀种语⾳合成引擎的前端。
(3)Flite
Festival-lite版,是⼀种⼩型,反应快速的合成引擎,由CMU开发,主要设计⽤于⼩的嵌⼊式机器或⼤服务器。它是⼀种可代替Festival的语⾳合成引擎,使⽤FestVix语⾳建⽴⼯具套件来建⽴语⾳库。
下⾯将对这三个项⽬的使⽤分别进⾏介绍。环境:NMware Workstation + Lubuntu-16.04.2 32位
三、开源TTS项⽬的使⽤(⼀) Festival
1、下载
2、编译
3、使⽤
4、问题与解决
四、开源TTS项⽬的使⽤(⼆) eSpeak
1、下载
espeak依赖portaudio进⾏播放,因此还要下载
2、编译
eSpeak编译:
cd srcmake
make install
protaudio编译:
编译后⽣成在:lib/.libs/ ⽬录下,为其制作软链接
ln -s lib/.libs/libportaudio.so.2.0.0 /usr/lib/libportaudio.so
3、使⽤
espeak "hello world" -w hello.wav
4、问题与解决
(1)编译问题
g++ -o speak speak.o compiledict.o dictionary.o intonation.o readclause.o setlengths.o numbers.o synth_mbrola.o synthdata.o synthesize.o translate.o mbrowrap.o tr_languages.o voices.o wavegen.o phonemelist.o klatt.o sonic.o -lstdc++ -lportaudio -lpthread
wavegen.o:在函数‘WavegenOpenSound() [clone .part.2]’中:
wavegen.cpp:(.text+0x23a):对‘Pa_StreamActive’未定义的引⽤
wavegen.o:在函数‘WavegenCloseSound()’中:
wavegen.cpp:(.text+0x552):对‘Pa_StreamActive’未定义的引⽤
collect2: error:ld returned 1exit status
Makefile:105: recipe for target 'speak'failedmake: *** [speak] Error 1
(1)解决⽅法
cpportaudio19.h portaudio.hmakecleanmake
5、应⽤举例
#include "./speak_lib.h" //espeak头⽂件
#include #include
int main(int argc, char **argv)
{char word[] = "吃葡萄不吐葡萄⽪";
espeak_Initialize(AUDIO_OUTPUT_PLAYBACK,0, NULL, 0);
espeak_SetVoiceByName("zh+f2");
espeak_Synth(word, strlen(word)+ 1, 0, POS_CHARACTER, 0,
espeakCHARS_UTF8, NULL, NULL);
sleep(3);
espeak_Terminate();
}
如果需要将⽂字转的wav语⾳⽂件保存下来,需要实现callback。如需具体代码⽰例,可发私信。
五 、开源TTS项⽬的使⽤(三) Flite
1、下载
2、编译
sodu su./configuremake
make install
3、使⽤
flite -t hello
语⾳读出“Hello world”
flite "hello world."
语⾳读出“Hello world”
flite hello
语⾳读出⽂件“hello”的内容
flite -f "hello world"
语⾳读出⽂件“hello world”的内容
4、问题
(1)问题
root@lubuntu:# flite "hello world"
oss_audio: failed to open audio device /dev/dsp
(1)解决
ls /dev/dsp 发现该⽬录并不存在,搜索了解到flite使⽤oss框架进⾏语⾳播放。root@lubuntu:# cat /proc/asound/version
Advanced Linux Sound Architecture Driver Version k4.8.0-36-generic.
说明当前系统使⽤ALSA⾳频驱动框架。尝试:
⽅法⼀:
安装程序padsp,可以把对OSS的请求派发到ALSA
apt install pulseaudio-utils
padsp flite
失败
⽅法⼆:
sudo apt-get install pulseaudio
sudo apt-get install libpulse-dev
sudo apt-get install osspd
成功 能看到/dev/dsp⽬录了,但是依然提⽰failed to open
最后发现将vmware的声卡设备连接上,就不报错,能正常出声了!( *¯ ¯*)
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论