Context Free Grammars--688IT编程网

Context Free Grammars

So far we have looked at models of language that capture only local phenomena, namely what we can analyze when looking at only a small window of words in a sentence. To move towards more sophisticated applications that require some form of understanding, we need to be able to determine larger structures and to make decisions based on information that may be further away than we can capture in any fixed window. Here are some properties of language that we would like to be able to analyze:

1. Structure and Meaning–A sentence has a semantic structure that is critical to determine for many tasks. For instance, there is a big difference between Foxes eat rabbits and Rabbits eat foxes. Both describe some eating event, but in the first the fox is the eater, and in the second it is the one being eaten. We need some way to represent overall sentence structure that captures such relationships.

2. Agreement–In English as in most languages, there are constraints between forms of words that should be maintained, and which are useful in analysis for eliminating implausible interpretations. For instance, English requires number agreement between the subject and the verb (among other things). Thus we can say Foxes eat rabbits but “Foxes eats rabbits” is ill-formed. In contrast, The Fox eats a rabbit is fine whereas “The fox eat a rabbit” is not.

3. Recursion–Natural languages have a recursive structure that is important to capture. For instance, Foxes that eat chickens eat rabbits is best analyzed as having an embedded sentence (i.e., that eat chickens) modifying the noun phases Foxes. This one additional rule allows us to also interpret Foxes that eat chickens that eat corn eat rabbits and a host of other sentences.

4. Long Distance Dependencies–Because of the recursion, word dependencies like number agreement can be arbitrarily far apart from each other, e.g., “Foxes that eat chickens that eat corn eats rabbits” is awkward because the subject (foxes) is plural while the main verb (eats) is singular. In principle, we could find sentences that have an arbitrary number of words between the subject and its verb.

Tree Representations of Structure

The standard representation of sentence structure to capture these properties is a tree structure as shown in Figure 1 for the sentence The birds in the field eat the corn. A tree consists of labeled nodes with arcs indicates a parent-child relationship. Trees have a unique root node (S in figure 1). Every node except the root has exactly one parent node, and 0 or more children. Nodes without children are called the leaf nodes. This tree captures the following structural analysis: there is a sentence (S) that c

onsists of an NP followed by a VP. The NP is an article ART followed by a noun (N) and a prepositional phrase (PP). The ART is the, the N is birds and the PP consists of a proposition (P) followed by an NP. The P is in, and the NP consists of an ART (the) followed by a N ( field). Returning to the VP, it consists of a V followed by an NP, and so on through the structure. Note that the tree form readily captures the recursive nature of language. In this tree we have an NP as a subpart of a larger NP. Thus this structure could also easily represent the structure of the sentence The birds in the field in the forest eat the corn, where the inner NP itself also contains a PP that contains another NP.

Note also that

the structure allows us to

state

dependencies between

constituents in

a concise way.For instance,

English

requires number

agreement between the

subject NP

(however complex) and the main verb.

This constraint

is easily captured as additional information on when an NP followed by a VP can produce a reasonable sentence S. To make properties like agreement work in practice would require

developing a more general formalism for expressing constraints on trees, which takes us beyond our focus in this course (but see a text on Natural Language Understanding).

The Context-Free Grammar Formalism

Given a tree analysis, how do we tell what trees characterize acceptable structures in the

language? The space of allowable trees is specified by a Grammar , in our case a Context Free Grammar (CFG). A CFG consists of a set of the form

category 0 -> catagory 1 ... category n which states that a node labeled with category0 is allowed in a tree when if has n children of

categories (from left to right) catagory 1 ... category n . For example, Figure 2 gives a set of rules that would allow all the trees in the previous figures.

You can easily verify that every parent-child relationship shown in each of the trees is listed as a rule in

the grammar. Typically, a CFG defines an infinite space of possible trees, and for any given sentence there may be a number of possible trees that are possible. These different trees can provide an account of some of the ambiguities that are inherent in language. For example, Figure 3 provides two trees associated with the sentence I hate annoying neighbors . On the left, we have the interpretation that

don’t like doing things (like playing loud music) that annoy our neighbors.

S NP VP ART N PP V NP P ART N ART N The birds in the field eat the corn

Figure 1: A tree representation of sentence structure

S NP VP V NP ADJ

N I hate annoying neighbors

Figure 3: Two tree representation of the same sentence S NP

VP V

I N PRO PRO This ambiguity arises from choosing a different lexical category for the word annoying, which then forces different structural analyses. There are many other cases of ambiguity that are purely structural, and the words are interpreted the same way in each case. The classic example of this is the sentence I saw the man with a telescope , which is ambiguous between me seeing a man holding a telescope and me using a telescope to see a man. This ambiguity would be reflected in where the prepositional phrase with a telescope attached into the parse tree.

Probabilistic Context Free Grammars

Given a grammar and a sentence, a natural question to ask it 1) whether there is any tree that accounts for the sentence (i.e., is the sentence grammatical), and 2) if there are many trees, which is the right interpretation (i.e., disambiguation). For the first question, we need to be able to build a parse tree for a sentence if one exists. To answer the second, we need to have some way of comparing the li

kelihood of one tree over another. There have been many attempts to try to find inherent preferences based on the structure of trees. For instance, we might say we prefer tree that are smaller over ones that are larger. But none of these approaches has been able to provide a satisfactory account of parse preferences. Putting this into a probabilistic framework, given a grammar G, sentence s, and a set of trees T s that could account for s (i.e, each t has s as its leaf nodes), we want the parse tree that is

argmax t in Ts P G (t s )

To compute this directly, we would need to know the probability distribution P G (T), where T is the set of all possible trees. Clearly we will never be able to estimate this distribution directly. As usual, we must make some independence assumptions. The most radical assumption is that each node in the tree is decomposed independently of the rest of the nodes in the tree. In other words,

say we have a tree T consisting of nodes T 1, with children T 1.1, …, T 1.n , with T i.j having children T i,j,1, …, T i,j,m and so on, as shown in figure 4.

Thus making the independence assumption about, we’d say

P(T 1) = P(T 1 ‡ T 1.1 T 1.2) * P(T 1.1) * P(T 1.2)

Using the independence assumptions repeatedly, we can expand P(T 1.1) to P(T 1.1 ‡ T 1.1.1 T 1.1.2T 1.1.3) * P(T 1.1.1) * P(T 1.1.2) * P(T 1.1.3), and so on until we have expanded out the probabilities of all the subtrees. Having done this, we’d end up with the following:

P(T) =S r P(r ) where r ranges over all rules used to construct T

The probability of a rule r is the probability that the rule will be used to rewrite the node on the right hand side. Thus, if 3/4 of NPs are built by the rule NP -> ART N, then the probability of this rule would be .75. This formalism is called a probabilistic context-free grammar (PCFG).Estimating Rule Probabilities

There are a number of ways to estimate the probabilities for a PCFG. The best, but most labor intensive way is to hand-construct a corpus of parse trees for a set of sentences, and then estimate the probabilities of each rule being used by counting over the corpus. Specifically, the MLE estimate for a rule RHS -> LHS would be

P MLE (RHS->LHS) = Count(RHS->LHS) / S c Count(RHS -> c)

For example, given the trees in figures 1 and 2 as our training corpus, we would obtain the T 1T

1.1T 1.2T 1.1.1T 1.1.2T 1.1.3springframework作用

T 1.2.1 1.2.2T 1.1.3.1T 1.1.3.2T 1.2.2.1T 1.2.2.2

T 1.1.3.2.1T 1.1.3.2

w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8

Figure 4: The notation for analyzing trees

estimates for the rule probabilities in figure 5.

Rule Count Total for RHS MLE Estimate

S -> NP VP331

NP -> PRO27.29

NP -> ART N27.29

NP -> N17.15

NP -> ADJ N17.15

NP -> ART ADJ N17.15

VP -> V NP34.75

VP -> V VP14.25

PP -> P NP111

PRO -> I221

V -> hate24.5

V -> annoying14.25

ADJ -> annoying111

N -> neighbors14.25

Figure 5: The MLE Estimates from Trees in Figs 1 and 3

Given this model, we can now answer which interpretation in Figure 3 is more likely. The tree on the left (ignoring for the moment the words) involves rules S->NP VP, NP->PRO, VP->V NP, NP->ADJ N, PRO-> I, V->hate, ADJ->annoying and N->neighbors, and thus has a probability of 1 * .29 * .75 *.15 * 1 * .5 * 1 * .25 = .004, whereas the tree on the right involves rules S->NP VP, NP->PRO, VP-> V VP, VP-> V NP, NP-> N, PRO-> I, V->hate, ADJ-

>annoying and N->neighbors, and thus has a probability of 1 * .29 * .25 * .75 * .15 * 1 * .5 * 1 * .25 = .001. Thus the tree on the left appears more likely. Note that one reason that there is a large difference in probabilities is because the tree on the right involved one more rule. In general, PCFG will tend to prefer interpretations that involve the fewest number of rules.

Note that the probabilities of the lexical rules would be the same as the tag output probabilities we looked at when doing part of speech tagging, i,e,

P(ADJ -> Annoying) = P(Annoying | ADJ)

Parsing Probabilistic Context Free Grammars

To find the most likely parse tree, it might seem that we need to search an exponential number of possible trees. But using the probability model described above with its strong independence assumptions, we can use dynamic programming techniques to incrementally build up all possible constituents that cover one word, then two, and so on, over all words in the sentence. Since the probability of a tree is independent of how its subtrees were built, we can collapse the search at each point by just remembering the maximum probability parse. This algorithm looks similar to the min edit distance and Viterbi algorithms we developed earlier.

I would be 1, hate would be 2, and so on. Also, to simplify matters, let’s assume all grammar rule are of only two possible forms, namely

X -> Y Z

X -> w, where w is a word

(i.e., there are 1 or 2 categories on the right hand side). This is called Chomsky Normal Form ,and it can be shown that all context free grammars have an equivalent grammar in this form. The parsing algorithm is called the probabilistic CKY algorithm . It first builds all possible

hypotheses of length one by considering all rules that could produce each of the words. Figure 6shows all non-zero probabilities of the words, where we assume that I can be a pronoun (.9) or a noun (.1 - the letter “i”), hate is a verb, annoying is a verb (.7) or ADJ (.3), and neighbors is a noun. We first add all the lexical probabilities, and then use the rules to build any other constituents of length 1 as shown in Figure 6. For example, at position 1 we have an NP constructed from the rule NP -> PRO (prob .29) and constituent PRO 1 (prob .1), yielding NP 1with prob .29. Likewise at position 4, we have an N with probability .25 (N 4) which with rule NP->N (prob. .15) which produces an NP with probability .0375.

The next iteration constructs finds the maximum probability for all possible constituents of length two. In essence, we iterate though every non-terminal and every pair of points to find the maximum probability combination that produces that non-terminal (if there is any at all).Consider this for NP, where NP i,j will be an NP from position i to position j:

P(NP 1,2 ) = 0 (no rule that produces an NP has a non-zero LHS)

P(NP 2,3) = 0, for same reasons

P(NP 3,4) = P(NP-> ADJ N)*P(ADJ 3)*P(N 4) = .15 * 1 * .25 = .0375.

The only other non-zero constituent of length 2 in this simple example is VP 3,4, which has probability P(VP->V NP)*P(V 3)*P(NP 4) = .75*.25 *.0375 = .007. The results after this iteration are shown as Figure 7.

The next iteration is building constituents of length 3. Here we have the first case we have

688IT编程网

Context Free Grammars

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符回溯引用和前后查匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式选择题

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

688IT编程网

Context Free Grammars

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符 回溯引用和前后查 匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式 选择题

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

java正则表达式选择题

非零金额正则表达式

基本的元字符回溯引用和前后查匹配模式

java正则表达式选择题

非零金额正则表达式