Skip to content

Commit cbd7842

Browse files
author
xusenlin
committed
Upload nltk data
1 parent 2b2731d commit cbd7842

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+2463203
-0
lines changed
+76
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
The Carnegie Mellon Pronouncing Dictionary [cmudict.0.7a]
2+
3+
ftp://ftp.cs.cmu.edu/project/speech/dict/
4+
https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/cmudict/cmudict.0.7a
5+
6+
Copyright (C) 1993-2008 Carnegie Mellon University. All rights reserved.
7+
8+
File Format: Each line consists of an uppercased word,
9+
a counter (for alternative pronunciations), and a transcription.
10+
Vowels are marked for stress (1=primary, 2=secondary, 0=no stress).
11+
E.g.: NATURAL 1 N AE1 CH ER0 AH0 L
12+
13+
The dictionary contains 127069 entries. Of these, 119400 words are assigned
14+
a unique pronunciation, 6830 words have two pronunciations, and 839 words have
15+
three or more pronunciations. Many of these are fast-speech variants.
16+
17+
Phonemes: There are 39 phonemes, as shown below:
18+
19+
Phoneme Example Translation Phoneme Example Translation
20+
------- ------- ----------- ------- ------- -----------
21+
AA odd AA D AE at AE T
22+
AH hut HH AH T AO ought AO T
23+
AW cow K AW AY hide HH AY D
24+
B be B IY CH cheese CH IY Z
25+
D dee D IY DH thee DH IY
26+
EH Ed EH D ER hurt HH ER T
27+
EY ate EY T F fee F IY
28+
G green G R IY N HH he HH IY
29+
IH it IH T IY eat IY T
30+
JH gee JH IY K key K IY
31+
L lee L IY M me M IY
32+
N knee N IY NG ping P IH NG
33+
OW oat OW T OY toy T OY
34+
P pee P IY R read R IY D
35+
S sea S IY SH she SH IY
36+
T tea T IY TH theta TH EY T AH
37+
UH hood HH UH D UW two T UW
38+
V vee V IY W we W IY
39+
Y yield Y IY L D Z zee Z IY
40+
ZH seizure S IY ZH ER
41+
42+
(For NLTK, entries have been sorted so that, e.g. FIRE 1 and FIRE 2
43+
are contiguous, and not separated by FIRE'S 1.)
44+
45+
Redistribution and use in source and binary forms, with or without
46+
modification, are permitted provided that the following conditions
47+
are met:
48+
49+
1. Redistributions of source code must retain the above copyright
50+
notice, this list of conditions and the following disclaimer.
51+
The contents of this file are deemed to be source code.
52+
53+
2. Redistributions in binary form must reproduce the above copyright
54+
notice, this list of conditions and the following disclaimer in
55+
the documentation and/or other materials provided with the
56+
distribution.
57+
58+
This work was supported in part by funding from the Defense Advanced
59+
Research Projects Agency, the Office of Naval Research and the National
60+
Science Foundation of the United States of America, and by member
61+
companies of the Carnegie Mellon Sphinx Speech Consortium. We acknowledge
62+
the contributions of many volunteers to the expansion and improvement of
63+
this dictionary.
64+
65+
THIS SOFTWARE IS PROVIDED BY CARNEGIE MELLON UNIVERSITY ``AS IS'' AND
66+
ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
67+
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
68+
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL CARNEGIE MELLON UNIVERSITY
69+
NOR ITS EMPLOYEES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
70+
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
71+
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
72+
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
73+
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
74+
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
75+
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
76+

0 commit comments

Comments
 (0)