Grammar Format Specification(1)

英语不好,看了一次就记下来省得下次还要再翻译。在这里共享一下。


1. Introduction

1.1 Related Documentation

2. Definitions(定义)

2.1 Grammar Names and Package Names(语法名和包名)
Each grammar defined by Java Speech Grammar Format has a unique name that is declared in the grammar header. Legal structures for grammar names are:
JSGF 定义的每个语法都有一个唯一的名字被声明在语法的头部。合理的语法名结构是:
packageName.simpleGrammarName
grammarName

The first form (package name + simple grammar name) is a full grammar name. The second form is a simple grammar name (grammar name _disibledevent=>第一个格式(包名+简单的语法名)是一个完整语法名。第二个是一个简单语法名(是有语法名)。一个完整语法名和一个简单语法名的例子:
com.sun.speech.apps.numbers
edu.unsw.med.people
examples

The package name and grammar name have the same format as packages and classes in the Java programming language. A full grammar name is a dot- separated list of Java identifiers(1) (see GJS96, §3.8 and §6.5).
包名和语法名与JAVA语言里的包名和类名有相同的结构。一个完整的语法名是由一列以点(.)分隔的JAVA标识符组成的(见GJS96, §3.8 and §6.5).。
The grammar naming convention also follows the naming convention for classes in the Java Programming Language (see GJS96). The convention minimizes the chance of naming conflicts. The package name should be:
语法名的命名规范也遵循JAVA的命名规范(见GJS96)。约定最小的命名冲突。包名应该是:
reversedDomainName.localPackaging
For example, for com.sun.speech.apps.numbers, the com.sun part is Sun's reversed Internet domain name, speech.apps is the local package name for Sun-wide division of the name space, and numbers is the simple grammar name.
例如com.sun.speech.apps.numbers的com.sun 部分是SUN公司的域名颠倒过来。speech.apps 是Sun-wide 部门命名空间的本地包名,number是一个简单的语法名。
2.2 Rulenames
A grammar is composed of a set of rules that together define what may be spoken. Rules are combinations of speakable text and references to other rules. Each rule has a unique rulename. A reference to a rule is represented by the rule's name in surrounding <> characters (less-than and greater-than).
一个语法是一个被组合在一起的共同定义将要说的内容的一个集合。规则包括可叙述性文本和其它规则的引用。每一个规则有一个唯一的规则名。一个引用的规则名用一对<> 括起来。
A legal rulename is similar to a Java identifier but allows additional extra symbols. A legal rulename is an unlimited-length sequence of Unicode characters matching the following(2):
一个合法的规则名和JAVA的标示符一样,但可以有一些特殊字符.一个合法的规则名由一串没有长度限制的Unicode 字符组成。
Characters matching java.lang.Character.isJavaIdentifierPart including the Unicode letters and numbers plus other symbols.
The following additional punctuation symbols:
+ - : ; , = | / \ ( ) [ ] @ # % ! ^ & ~

Grammar developers should be aware of two specific constraints. First, rulenames are compared with exact Unicode string matches, so case is significant. For example, , and are different. Second, whitespace(3) is not permitted in rulenames.
开发者应该知道两个特殊的约束条件。第一,规则名会被当成Unicode字符串精确匹配,大小写敏感。例如, 是不一样的。第二。规则名内不允许有空格。
The rulenames and are reserved. These special rules are discussed later in this section.
规则名 被预留。一些特殊的规则在后面讨论。
The Unicode character set includes most writing scripts from the world's living languages, so rulenames can be written in Chinese, Japanese, Korean, Thai, Arabic, European languages, and many more. The following are examples of rulenames.
Unicode 字符集包括了大部分世界上现有的可编写脚本的字符。所以规则名可以用中文,日文,韩文,泰文,阿拉伯文,欧洲语言,和更多的语言。下面是一些规则名的例子。
1:
2:
3:
4: <$100>
5: <1+2=3>
6: <>
7:

2.2.1 Qualified and Fully-Qualified Names(合法的和完全合法的名字)

Although rulenames are unique within a grammar, separate grammars may reuse the same simple rulename. A later section introduces the import statement, which allows _disibledevent=> () [] {} /* */ //
A token is a reference to an entry in a recognizer's vocabulary, often referred to as the lexicon. The recognizer's vocabulary defines the pronunciation of the token. With the pronunciation, the recognizer is able to listen for that token.
一个标示符是一个识别器词汇表(能常被叫作词典)里一个元素的引用,识别器词汇表里定义了标示符的发音,有了这个发音,识别器就可以听出这个标示符了。
The Java Speech Grammar Format allows multi-lingual grammars, that is, grammars that include tokens from more than _disibledevent=>
Most recognizers have a comprehensive vocabulary for each language they support. However, it is never possible to include 100% of a language. For example, names, technical terms and foreign words are often missing from the vocabulary. For tokens missing from the vocabulary, there are three possibilities:
大多数识别器对它们所支持的语言都有一个综合的词汇表。但它不可能100%的包含了一种语言所有的词汇。例如,事物的名字,技术术语,外来词汇,和一些不常用的词汇。对于缺失的词汇可能有三种解决方法:
An application or user can add the token and pronunciation to the recognizer's vocabulary to ensure consistent recognition.
应用或用户可以添加标示符或发音到识别器的词汇表里,以确保内容的完整性。

Good recognizers are able to guess the pronunciation of many words not in the vocabulary.
一个好的识别器可以通过一些词汇猜测出一些没有在词汇表里的发音。

If neither of the previous points apply, the behavior is determined by the software interface of the recognizer. In most cases, an undefined token will be unspeakable (equivalent to ), or it will cause an error or exception. For the Java Speech API, undefined tokens are unspeakable.
如果上面两点都不可行,运作的执行由词汇表的软件接口来决定。在大多数情况下,一个没有定义的标示符是没有表示的(相当于),或者引起一个错误或异常。对于Java Speech API,没有定义是没有表示的。
Tokens do not need to be normal written words of a language, assuming that the token is properly defined in the recognizers vocabulary. For example, to handle the two pronunciations of "read" (past tense sounds like "red", present tense sounds like "reed") an application could define two separate tokens "read_past" and "read_present" with appropriate pronunciations.
标示符不需要一种语言的正式的书面语,假设这个标示符已经正确的定义在了词汇表里了。例如,要支持read的两个发音(过去式像red,现在式像reed)程序应该分别定义两个标示符read_past和read_present和正确的发音。
Tags: 

延伸阅读

最新评论

发表评论