IELR(1)--688IT编程网

IELR(1):Practical LR(1)Parser Tables for Non-LR(1)Grammars with

Conﬂict Resolution

Joel E.Denny and Brian A.Malloy

School of Computing

Clemson University

Clemson,SC29634,USA

{jdenny,malloy}@cs.clemson.edu

ABSTRACT

There has been a recent eﬀort in the literature to recon-sider grammar-dependent software development from an en-gineering point of view.As part of that eﬀort,we examine a deﬁciency in the state of the art of practical LR parser table generation.Speciﬁcally,LALR sometimes generates parser tables that do not accept the full language that the gram-mar developer expects,but canonical LR is t

oo ineﬃcient to be practical.In response,many researchers have attempted to develop minimal LR parser table generation algorithms. In this paper,we demonstrate that a well known algorithm described by David Pager and implemented in Menhir,the most robust minimal LR(1)implementation we have dis-covered,does not always achieve the full power of canonical LR(1)when the given grammar is non-LR(1)coupled with a speciﬁcation for resolving conﬂicts.We also outline an original minimal LR(1)algorithm,IELR(1),which we have implemented as an extension of GNU Bison and which does not exhibit this deﬁciency.Finally,using our implemen-tation,we demonstrate the relevance of this deﬁciency for several real-world grammars,and we show that our imple-mentation is feasible for generating minimal LR(1)parsers for those grammars.

Categories and Subject Descriptors

D.3.4[Programming Languages]:Processors—parsing;

D.3.1[Programming Languages]:Formal Deﬁnitions and Theory—syntax;D.2.m[Software Engineering]:Miscel-laneous

General Terms

Algorithms,Languages,Theory

Keywords

grammars,grammarware,LALR,LR,Yacc

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on theﬁrst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior speciﬁc permission and/or a fee.

SAC’08March16-20,2008,Fortaleza,Cear´a,Brazil

Grammar-dependent software is omnipresent in software development[18].For example,compilers,document pro-cessors,browsers,import/export tools,and generative pro-gramming tools are used in software development in all phases. These phases include comprehension,analysis,maintenance, reverse-engineering,code manipulation,and visualization of the application program under study.However,construc-tion of these tools relies on the correct recognition of the language constructs speciﬁed by the grammar.

Some aspects of grammar engineering are reasonably well understood.For example,the study of grammars as deﬁ-nitions of formal languages,including the study of LL,LR, LALR,and SLR algorithms and the Chomsky hierarchy, form an essential part of most computer science curricula. Nevertheless,parsing as a disciplined study must be recon-sidered from an engineering point of view[18,19].Many parser developers eschew the use of parser generators be-cause it is too diﬃcult to customize the generated parser or because the generated parser requires considerable mod-iﬁcation to incorporate suﬃcient power to handle modern grammars such as the C++and C#grammars.Thus,in-dustrial strength parser development requires considerable eﬀort,and many approaches to parser generation are ad hoc [26,27].

From an engineering point of view,one source of diﬃ-culty in parser development stems from a deﬁciency in the state of the art of practical LR parser table generation.LR parsing is“the most general nonbacktracking shift-reduce parsing method known”,and canonical LR is the most gen-eral technique for generating LR parser tables from a given grammar[12].As a result,canonical LR parser tables ac-cept the language that a grammar developer expects a given grammar to deﬁne.Unfortunately,canonical LR tables re-quire“too much space and time to be useful in practice”

[12].In contrast,LALR parser tables are feasible and are thus employed by widely used tools like Yac

c[17,11]and its GNU implementation Bison[1].Unfortunately,as Bison’s manual points out,LALR parser tables contain“mysterious conﬂicts”that“don’t look warranted”[16].These conﬂicts cause the parser to encounter“unnatural errors”because LALR“is not powerful enough to match the intuition of the grammar writer”[24,25].In this way,LALR can worsen the diﬃculty of developing a correct grammar.Even for an existing correct LALR grammar,LALR can interfere with incremental changes[24,25],which are inevitable in the face

1.S →aAa

2.→bAb

3.A →a

→aa

S R1

ÕÕÕW

W W a

A R3a a

S R1

ÔÔÔX

X X

A R4a aa S R2

ÖÖÖV

V V b

A R3b a

S R2

ÔÔÔX

X X

A R4

Figure 1:A Non-LR(1)Grammar Example.This grammar deﬁnes a language consisting of 4sentences,each of which corresponds to 1parse tree:aaa ,aaaa ,bab ,and baab .Assuming that the parser is LR(1)and that a is left-associative,the sentence aaaa is dropped from the lan-guage.

of software evolution.

In response,many researchers have developed minimal LR algorithms,which attempt to generate parser tables with the power of canonical LR but with nearly the eﬃciency of LALR [20,22,21,24,25,23,8,6].Menhir,hosted at [7],is an implementation of Pager’s algorithm,which is de-scribed in [21].Menhir is the most robust minimal LR(1)implementation we have discovered available.

In this paper,we show that Pager’s algorithm and thus Menhir are not always able to generate parser tables with the full power of canonical LR(1)if the given grammar is non-LR(1)coupled with a speciﬁcation for resolving con-ﬂicts.We also describe an original minimal LR(1)algo-rithm,IELR(1),tha

t does not suﬀer from this deﬁciency.In section 2,we present an original non-LR(1)grammar exam-ple to demonstrate the deﬁciency of Pager’s algorithm.We also introduce some original terminology to facilitate fur-ther discussion.In section 3,we outline our IELR(1)algo-rithm,which we have implemented as an extension of Bison.In section 4,we compare the Bison LALR(1)implementa-tion with our IELR(1)implementation using our example grammar plus ﬁve grammars for popular languages as case studies.In section 5,we review related work.In section 6,we use the results of our case studies to conclude that (1)Menhir’s deﬁciency does aﬀect real-world parsers,(2)it can create bugs relative to the intended design of such parsers,and (3)IELR(1)is feasible for generating minimal LR(1)parsers for sophisticated real-world grammars.

2.MOTIV ATING EXAMPLE

In this section,we demonstrate an original non-LR(1)grammar example.We analyze the language deﬁned by that grammar in section 2.1.In sections 2.2and 2.3,we describe its canonical LR(1)and LALR(1)parser tables in order to introduce some original terminology to facilitate discussion.In section 2.4,we identify a deﬁciency of Pager’s algorithm and thus of Menhir.

2.1A Non-LR(1)Grammar

As written,the grammar in Figure 1deﬁnes a language consisting of 4sentences:aaa ,aaaa ,bab ,and baab .Consider how an LR parser behaves when it reaches the ·in either of

Canonical LR(1)LALR(1)0.S →·S ,{λ}G10.

→·S ,{λ}G1S →·aAa ,{λ}S2S →·aAa ,{λ}S2→·bAb ,{λ}S3→·bAb ,{λ}S31.S

→S ·,{λ}Acc 1.S

→S ·,{λ}Acc 2.S →a ·Aa ,{λ}G4 2.

S →a ·Aa ,{λ}G4A →·a ,{a }S8A →·a ,{a }S8→·aa ,{a }S8→·aa ,{a }S83.S →b ·Ab ,{λ}G5 3.

S →b ·Ab ,{λ}G5A →·a ,{b }S10A →·a ,{b }S8→·aa ,{b }S10→·aa ,{b }S84.S →aA ·a ,{λ}S6 4.S →aA ·a ,{λ}S65.S →bA ·b ,{λ}S7 5.S →bA ·b ,{λ}S76.S →aAa ·,{λ}R1 6.S →aAa ·,{λ}R17.S →bAb ·,{λ}R27.S →bAb ·,{λ}R2*8.A →a ·,{a }R3

*8.A →a ·,{ab }R3A →a ·a ,{a }S9A →a ·a ,{ab }S99.A →aa ·,{a }R49.

A →aa ·,{ab }

10.A →a ·,{b }R3

A →a ·a ,{b }S1111.A →aa ·,{b }R4

Table 1:Example Parser Tables.These are the canon-ical LR(1)and LALR(1)parser tables for the grammar of Figure 1.λin a lookahead set represents the end of the input.Diﬀerences between canonical LR(1)and LALR(1)are shown in bold.In both cases,state 8has a S/R conﬂict on a resolved as a reduce since a is left-associative.

the marked input sentences,ba ·b and ba ·ab .These sentences look the same before the ·and so the parser performs identi-cal actions until this point.If the parser then looks ahead 1token and sees b ,it knows it must reduce the previous a to A as in the third parse tree in Figure 1.If it sees a instead,it must shift the a so that it can then reduce the previous aa to A as in the fourth parse tree.Similarly,at the ·in aa ·a and aa ·aa ,the parser must choose to reduce in order to accept aaa as in the ﬁrst parse tree,but it must choose to shift in order to accept aaaa as in the second parse tree.However,in this ca

se,the ﬁrst token of lookahead is the same,a ,and so does not distinguish between the possible parser actions.Assume that the grammar author has designed the gram-mar with LR(1)in mind.If he has declared a as left-associative,the parser chooses to reduce in the cases of both aa ·a and aa ·aa .In this way,the grammar author has speci-ﬁed a new language consisting of only 3sentences:aaa ,bab ,and baab .Notice how we seem to be able to determine the language without computing any parser tables.

2.2New Canonical LR(1)Terminology

The ﬁrst column of Table 1shows the canonical LR(1)parser tables for the grammar of Figure 1when a is left-associative.We introduce the term isocores to describe states that have the same core.For example,state 8is an isocore of state 10,but their lookahead sets and actions are diﬀerent.When we speak of a particular state’s set of viable preﬁxes,we are speaking of the set of possible viable preﬁxes that could be on the stack when that state is at the top of the stack.For example,state 8’s only viable preﬁx is

Canonical LR(1)LALR(1)

Stack Input Action Stack Input Action 0baab S30baab S3

0,3aab S100,3aab S8 0,3,10ab S110,3,8ab R3 0,3,10,11b R40,3Aab G5 0,3Ab G50,3,5ab Err

0,3,5b S7

0,3,5,7R2

0S G1

0,1Acc

Table2:An Example Parse.This table shows how the canonical LR(1)and LALR(1)parser tables of Table1parse the sentence baab.Because LALR(1)merges state10into state8,it rejects the sentence even though the canonical LR(1)parser accepts it.

aa,and state10’s only viable preﬁx is ba.State8contains the S/R conﬂict on token a whose resolution as reduce(1) permits a successful completion of the parse of aa·a but(2) leads the parse of aa·aa to a syntax error.We say the shift and reduce actions are contributions to the conﬂict.We call the reduce action the dominant contribution.

2.3LALR(1)Conﬂicts Reconsidered

The second column of Table1shows the LALR(1)parser tables for the grammar in Figure1.They are nearly identical to the canonical LR(1)parser tables but isocores are merged together.For example,canonical LR(1)state10is merged into state8.Merging isocores can create new conﬂicts that are not present in the canonical LR(1)parser tables.[16] calls these mysterious conﬂicts because they can be a source of confusion for grammar authors.The trouble is that a sim-ple grammar analysis as we performed in section2.1does not easily reveal mysterious conﬂicts.To discover them,the grammar author is forced to examine parser tables.In this paper,we rename[16]’s mysterious conﬂicts to mysterious new conﬂicts since we wish to emphasize that this term al-ways refers to conﬂicts that do not exist in the canonical LR(1)parser tables.As[16]points out,such conﬂicts are always R/R conﬂicts since,as[12]points out,merging iso-cores can never create new S/R conﬂicts.

Because canonical LR(1)states10and8in Table1are merged,the LALR(1)parser encounters the conﬂict of state 8even after the viable preﬁx of state10.Unfortunately for ba·ab,which starts with that viable preﬁx and which requires a shift in state10,the conﬂict is resolved as a reduce.As Table2demonstrates,this leads the LALR(1)parse of ba·ab to a syntax error even though canonical LR(1)is able to parse it successfully.The conﬂict here is not a mysterious new conﬂict since it already

exists in canonical LR(1)state8. Instead,we have identiﬁed a second category of mysterious conﬂicts that we call mysterious invasive conﬂicts.That is,switching from canonical LR(1)to LALR(1)can cause a parser to encounter existing conﬂicts after viable preﬁxes for which it never would have encountered those conﬂicts before and,as a result,to perform incorrect actions.This switch can also cause a parser to perform incorrect actions after viable preﬁxes for which it would have encountered the existing conﬂict before.We call this a mysterious mutated conﬂict.Since mysterious conﬂicts of these two categories stem from existing conﬂicts in canonical LR(1)tables,they are only possible for non-LR(1)grammars.

In summary,given the grammar of Figure1,given that a is left-associative,and assuming an LR(1)parser,the lan-guage so speciﬁed is a set of3sentences:aaa,bab,and baab. As expected,the canonical LR(1)parser tables for this spec-iﬁcation accept exactly that language.However,LALR(1) diminishes the language to only2sentences:aaa and bab. The culprit is not a mysterious new conﬂict.The culprit is

a mysterious invasive conﬂict.

2.4The Deﬁciency of Pager’s Algorithm

Most algorithms for generating LR(1)parser tables merge states only if they pass some sort of compatibility test.Canon-ical LR(1)parser tables are relatively large because the com-patibility test is relatively restrictive:states must be isocores and their lookahead sets must be identical.LALR(1)parser tables are sometimes too small to maintain the full power of LR(1)because the compatibility test is too lenient:states must simply be isocores.[21]describes an algorithm to gen-erate LR(1)parser tables while employing tests for two kinds of compatibility,weak compatibility and strong compatibil-ity,both of which lie somewhere in between.Throughout this paper,we refer to this algorithm as Pager’s algorithm. We identify two problems with Pager’s algorithm for our purposes.First,its weak compatibility test is not designed for non-LR(1)grammars.Thus,the only category of mys-terious conﬂicts that the weak compatibility test can always avoid is mysterious new conﬂicts.For example,it does not avoid the mysterious invasive conﬂict in Table1.Second, both its weak and strong compatibility tests permit isocores to be merged if the merged state would not contain a poten-tial conﬂict,but the deﬁnition of potential conﬂict employed does not include potential or even real S/R conﬂicts.Thus, even using the strong compatibility test by itself,Pager’s algorithm does not avoid the mysterious invasive conﬂict in Table1.

Like Bison,Menhir is designed to accept non-LR(1)gram-mars with a speciﬁcation for resolving conﬂi

cts.Thus,even though Menhir employs Pager’s algorithm to try to achieve the power of canonical LR(1),it fails to in some cases.For example,we have conﬁrmed that,when given the gram-mar of Figure1combined with the declaration of a as left-associative,Menhir generates a parser that does not accept the sentence baab.Again,a canonical LR(1)parser does accept this sentence.

3.METHODOLOGY

In this section,we outline our IELR(1)algorithm in6 phases.We label these phases Phase0through Phase5 and describe them in sections3.1through3.6.Due to space restrictions,we omit the details.For a complete description, see our technical report at[14].

3.1Phase0:LALR(1)

Pager’s algorithm avoids some mysterious conﬂicts by re-fusing to merge isocores if its compatibility tests predict that doing so would induce an inadequacy in the parser tables. In contrast,IELR(1)computes up front all possible inade-quacies,and then afterwards it eliminates all the inadequa-cies that would not appear in canonical LR(1)parser tables. Since LALR(1)parser tables have all isocores merged fully,

they contain all such inadequacies.Thus,Phase0computes LALR(1)parser tables.It does so in two steps:(1)com-pute LR(0)parser tables,and(2)compute the reduction lookahead sets using the technique described by[15].As part of its algorithm,step2also computes a set of goto ta-bles that the remaining IELR(1)phases require.In our im-plementation of Phase0,we use Bison’s existing LALR(1) implementation,which already computes those goto tables.

3.2Phase1:Compute Auxiliary Tables

The remaining phases of IELR(1)require three additional tables beyond those computed in Phase0,and they are computed in Phase1.follow kernel items records depen-dencies of goto follow sets on kernel item lookahead sets. always follows records goto follow tokens that do not de-pend on the kernel item lookahead sets of predecessor states. That is,this table records goto follows that will never change no matter how IELR(1)may split the LALR(1)parse states. predecessors records transition predecessor relations between LALR(1)states.

3.3Phase2:Compute Annotations

Phase2annotates inadequate LALR(1)states.Each an-notation describes how isocores of the annotated state might contribute to an inadequacy.The predecessors table en-ables reverse iteration

from the conﬂicted state to all the predecessor states that might contribute to the same in-adequacy.This iteration is similar to lane tracing as de-scribed by[20]and[22].The follow kernel items table, always follows table,and goto tables enable the computa-tion of the inadequacy contributions made by each state. 3.4Phase3:Split States

Phase3recomputes the parse states in a manner simi-lar to Phase0step1.Whereas Phase0step1merges all isocores together,Phase3only merges isocores if they will maintain the same dominant contribution to every inade-quacy according to their core’s annotations.In order to do this,Phase3must compute partial kernel item lookahead sets by using the follow kernel items and always follows tables and by propagating lookaheads from each parse state to its successor if the lookaheads appear in the annotations.

3.5Phase4:Compute Reduction Lookaheads Phase4runs step2of Phase0again without modiﬁcation. That is,it computes the full lookahead sets on reductions in all IELR(1)parse states.

3.6Phase5:Resolve Remaining Conﬂicts

All that’s left is to resolve the remaining conﬂicts in the parser tables.Our IELR(1)implementation uses Bison’s existing conﬂict resolution algorithm without modiﬁcation.

4.RESULTS OF USING IELR(1)

In this section,we compare the Bison LALR(1)implemen-tation with our IELR(1)implementation using six grammars as case studies,which are summarized in Table3.In section 4.1,we describe the grammars in detail.In section4.2,we compare the parser tables that the algorithms generate for the grammars.In section4.3,we compare the performance of the algorithms in terms of time and space.

4.1Case Studies

Grammar Version|T||V||T∪V||P|

Figure12244

Gawk Gawk3.1.06145106163

Gpic Groﬀ1.18.113845183247

C GCC4.0.492208300573

Java GCC4.2.1109164273516

C++ISO2003117184301481 Table3:Grammar Characteristics.These counts measure the size of each case study’s grammar G= (V,T,P,S),such that V is the set of nonterminals,T is the set of terminals or tokens,P is the set of productions, and S is the start symbol.These counts include the produc-tions and nonterminals that Bison generates implicitly for mid-rule actions.

Table3characterizes the grammars of our case studies. Ourﬁrst case study is the example grammar of Figure1 including the declaration of a as left-associative.

Our next four case studies are mature grammars from widely used software products employing LALR(1)parser generators.Gawk(GNU AWK),a text-based data process-ing language,wasﬁrst written in1986but is based on the original AWK,which was written in1977and which is stan-dardized in SUSv3(the Single UNIX Speciﬁcation,Version 3)[11,2].Groﬀ(GNU Troﬀ)is a document formatting sys-tem for UNIX that includes Gpic(GNU Pic),a Groﬀpre-processor for specifying diagrams.Groﬀwasﬁrst released in 1990and is based on Troﬀwhich has existed since the early 1970’s[13,4].We copied our C and Java grammars from GCC(the GNU Compiler Collection),which is a widely used collection of compilers developed by the GNU Project[3]. The latest version of the C++programming language is C++2003,the formal speciﬁcation for which is[10].Annex A of that speciﬁcation presents a formal C++grammar.As ourﬁnal case study,we formatted this grammar as a Bi-son grammarﬁle except that,f

or section A.2,Lexical con-ventions,we replaced the integer literal,character literal, floating literal,and string literal nonterminals with to-kens.

4.2LALR(1)vs.IELR(1)Parser Tables

Table4describes the parser tables that the Bison LALR(1) implementation and our IELR(1)implementation generate for each of our case studies’grammars.Since some gram-mars include token precedence and associativity declarations that resolve most of their conﬂicts,we also describe their parser tables when generated without these declarations in order to better demonstrate grammar analysis complexity. When IELR(1)splits states into isocores,each conﬂict from the original LALR(1)state might be duplicated among several isocores,and Bison then counts the conﬂict sepa-rately for each.Sometimes the multiple count is a mis-leading representation of grammar complexity because all isocores have perfect duplicates of the conﬂict and thus the same precedence and associativity declarations can resolve them all.Therefore,from each IELR(1)unresolved conﬂict count in Table4,we subtract all but one copy of each unre-solved conﬂict that is perfectly duplicated among isocores. Table5describes the parser actions that are corrected in the parser tables by switching from LALR(1)to IELR(1).

Grammar

States S/R R/R LA IE LA IE LA IE

Figure1101200-000-0

no prec/assoc111111-000-0 Gawk3203296565-000-0

no prec/assoc320320410410-000-0 Gpic42342800-000-0

no prec/assoc426426803803-088-0 C9339331313-000-0

no prec/assoc933933329329-000-0 Java79279200-06262-0

C++822836407410-3135169-34 Table4:Parser Tables.In this table,we report the num-ber of states,the number of unresolved S/R conﬂicts,and the number of unresolved R/R conﬂicts in the parser tables that the Bison LALR(1)implementation and our IELR(1) implementation generate for our case studies’grammars. For IELR(1)conﬂict counts,we show adjustments to ac-count for unresolved conﬂicts that are perfectly duplicated among isocores.

Grammar Actions States Tokens

Figure1111

Gawk933

Gpic212

C000

Java000minimal

C++442

Table5:Action Corrections.For each of our case studies’grammars,this table reports the number of parser actions that are corrected by switching from LALR(1)to IELR(1),the number of parser states containing corrected parser actions,and the number of unique tokens in the gram-mar on which there are corrected parser actions.

For the Figure1,Gawk,and Gpic grammars,every corrected action originates from a mysterious invasive LALR(1)S/R conﬂict resolved as a reduce action,and only the shift action remains in each of the new corrected IELR(1)isocores.

In the case of the Figure1grammar,the LALR(1)and IELR(1)parser tables generated by Bison are ne

arly the same as the LALR(1)and canonical LR(1)parser tables shown in Table1.That is,when a is declared left-associative, LALR(1)accepts only2input sentences,but IELR(1)ac-cepts the same set of3input sentences that canonical LR(1) does.

After exploring the Gpic grammar’s source comments,we conclude that IELR(1)corrects the behavior of a Gpic fea-ture that was intentionally created by the author of Gpic’s grammar.However,we also contacted the current Gpic de-velopers for their opinion.They seem to have been pre-viously unaware that the aﬀected feature even exists,and some members expressed an interest in seeing it removed. For the full discussion,see[5].

We have conﬁrmed that Menhir is unable to recognize the need to split oﬀany corrected isocore for the Figure1, Gawk,or Gpic grammar and so leaves the incorrect actions.

Grammar

Real Run Peak Mem

Annotations

Time(s)Usage(KB)

LA IE LA IE

Figure10.0380.03860621

Gawk0.0650.0841101952537

Gpic0.1170.1621154506123

C0.1180.1502256506573

Java0.1490.2503759004636

C++0.0800.2594259755346

Table6:Parser Tables Computation.This table de-scribes the performance of the Bison LALR(1)implementa-tion and our IELR(1)implementation for each of our case studies’grammars.The real run time measures the full run time for Bison.However,we report peak memory usage only for the duration of the LALR(1)or IELR(1)algorithm.We also report the total number of inadequacy annotations at-tached to the LALR(1)states during phase2of IELR(1).

In this way,LALR(1)and Menhir fail to generate parsers that accurately implement these grammars,but IELR(1)is successful.

4.3LALR(1)vs.IELR(1)Performance

In Table6,we report the performance of the Bison LALR(1) implementation and our IELR(1)implementation for each of our case studies’grammars.We performed these measure-ments on a Toshiba Tecra M4-S435with an Intel Pentium M Processor740[1.73GHz,2MB L2,533MHz FSB]with 1024MB DDR2SDRAM running the Slackware10.2.0dis-tribution of GNU/Linux with Linux kernel version2.6.13. Using GNU time1.7,we measured the full real run time of Bison,and we report the average for3trial runs.We estimated peak memory usage during the execution of the LALR(1)and IELR(1)algorithms based on diagrams gen-erated by Massif from Valgrind3.2.3[9].

5.RELATED WORK

[18,19,26,27]discuss the need to improve the state of the art in grammar-dependent software engineering.[20,22,21, 24,25,23,8,6]develop minimal LR algorithms.Menhir, hosted at[7],is an implementation of Pager’s algorithm, which is described in[21].It is the most robust minimal LR(1)implementation we have discovered available,but it is not always able to generate parser tables

with the full power of canonical LR(1)if the given grammar is non-LR(1) coupled with a speciﬁcation for resolving conﬂicts.

6.CONCLUDING REMARKS

We are surprised to discover that mature grammars from widely used software products employing LALR(1)parser generators should suﬀer from any incorrect parser actions that result from the misuse of the LALR(1)algorithm.The Gawk and Gpic case studies provide strong evidence that such incorrect actions do occur in real-world parsers.Such actions are unintuitive and thus may impede the develop-ment of a correct parser.Moreover,the Gpic case study shows that such incorrect actions can create actual bugs relative to the intended design of a real-world LALR(1)-generated parser.Pager’s algorithm and thus Menhir do not

address the incorrect actions in either of these case studies. IELR(1)corrects them for both.

Because of the maturity of the GCC project,we are not surprised that its C and Java grammars suﬀer from no incor-rect parser actions that result from a misuse of the LALR(1) algorithm.These case studies demonstrate IELR(1)’s ability to recognize when LALR(1)parser tables are suﬃcient with-out unnecessarily splitting its states into additional canoni-cal LR(1)states.

The performance measurements from all of our case stud-ies show that IELR(1)is feasible for generating minimal LR(1)parsers for sophisticated real-world grammars.Specif-ically,for each of our case studies when using the IELR(1) algorithm,Bison did not run longer than0.3seconds,and the IELR(1)algorithm never required as much as1MB of memory at any point in time.

7.REFERENCES

[1]Bison.The GNU Project.

/software/bison/.

[2]Gawk.The GNU Project.

/software/gawk/.

[3]GCC.The GNU Project./.

[4]Groﬀ.The GNU Project.

/software/groff/.

[5][Groﬀ]parsing a corner speciﬁcation.Groﬀmailing

list archives./archive/html/

groff/2007-08/msg00051.html.

[6]LRGen.P.B.Mann.

www.paulbmann/lrgen/.

[7]Menhir.F.Pottier and Y.R´e gis-Gianas.

cristal.inria.fr/∼fpottier/menhir/. [8]The Honalee LR(k)Algorithm.D.R.Tribble.

ibble/text/honalee.html.

[9]Valgrind./.

[10]International Standard,Programming Languages–

C++.Number ISO/IEC14882:2003(E).American

National Standards Institute,October2003.

[11]Single UNIX Speciﬁcation,Version3.The IEEE and

the Open Group,April2004.http:

///bookstore/catalog/t041.htm.

[12]A.V.Aho,R.Sethi,and J.D.Ullman.Compilers:

Principles,Techniques,and Tools.Addison-Wesley,

1986.[13]ﬀ.org,The Text Processor for

Typesetters.

[14]Joel E.Denny.

[15]F.DeRemer and T.Pennello.Eﬃcient Computation

of LALR(1)Look-Ahead Sets.ACM Transactions on

Programming Languages and Systems,4(4):615–649,

October1982.

[16]The GNU Project.Bison,2.3edition,May2006.

[17]S.Johnson.Yacc:Yet Another Compiler Compiler.In

UNIX Programmer’s Manual,volume2,pages

353–387.Holt,Rinehart,and Winston,New York,

NY,USA,1979.

[18]P.Klint,R.L¨a mmel,and Chris Verhoef.Toward an

engineering discipline for grammarware.ACM

Transactions on Software Engineering Methodology,

14(3):331–380,2005.

[19]R.L¨a mmel.Grammar Testing.In FASE,pages

201–216,2001.

[20]D.Pager.The lane tracing algorithm for constructing

LR(k)parsers.In STOC’73:Proceedings of theﬁfth

annual ACM symposium on Theory of computing,

pages172–181,New York,NY,USA,1973.ACM

Press.

[21]D.Pager.A Practical General Method for

Constructing LR(k)Parsers.Act Informatica,

7(3):249–268,September1977.

[22]D.Pager.The Lane-Tracing Algorithm for

Constructing LR(k)Parsers and Ways of Enhancing

Its Eﬃciency.Information Sciences,12(1):19–42,1977.

[23]P.Pepper.LR Parsing=Grammar Transformation+

LL Parsing,Making LR Parsing More Understandable And More Eﬃcient.Technical Report99-05,TU

Berlin,1999.

[24]D.Spector.Full LR(1)parser generation.SIGPLAN

Not.,16(8):58–66,1981.

[25]D.Spector.Eﬃcient full LR(I)parser generation.

SIGPLAN Not.,23(12):143–150,1988.

[26]M.van den Brand and E.Visser.Generation of

formatters for context-free languages.ACM Trans.

Softw.Eng.Methodol.,5(1):1–41,1996.

[27]M.G.J.van den Brand,A.Sellink,and C.Verhoef.

Current Parsing Techniques in Software Renovation

Considered Harmful.In IWPC,page108,Washington, DC,USA,1998.

688IT编程网

IELR(1)

发表评论

推荐文章

随机森林算法介绍及R语言实现

基于随机森林优化的神经网络算法在冬小麦产量预测中的应用研究_百度文 ...

基于正则化贪心森林算法的情感分析方法研究

随机森林算法和grandientboosting算法

基于随机森林的图像分类算法研究

热门文章

随机森林算法的改进方法

基于随机森林算法的风险预警模型研究

Python中的随机森林算法详解

随机森林发展历史

如何使用随机森林进行时间序列数据模式识别(八)

随机森林回归模型原理

如何使用随机森林进行时间序列数据模式识别(六)

如何使用随机森林进行时间序列数据预测(四)

如何使用随机森林进行异常检测(六)

随机森林算法和grandientboosting算法 -回复

随机森林方法总结全面

随机森林算法原理和步骤

随机森林的原理

随机森林重要性

随机森林算法

机器学习中随机森林的原理

随机森林算法原理

使用计算机视觉技术进行动物识别的技巧

基于crf命名实体识别实验总结

transformer预测模型训练方法

最新文章

随机森林算法介绍及R语言实现

基于随机森林优化的神经网络算法在冬小麦产量预测中的应用研究_百度文 ...

基于正则化贪心森林算法的情感分析方法研究

随机森林算法和grandientboosting算法

基于随机森林的图像分类算法研究

随机森林结合直接正交信号校正的模型传递方法

标签列表

688IT编程网

IELR(1)

发表评论

推荐文章

随机森林算法介绍及R语言实现

基于随机森林优化的神经网络算法在冬小麦产量预测中的应用研究_百度文 ...

基于正则化贪心森林算法的情感分析方法研究

随机森林算法和grandientboosting算法

基于随机森林的图像分类算法研究

热门文章

随机森林算法的改进方法

基于随机森林算法的风险预警模型研究

Python中的随机森林算法详解

随机森林发展历史

如何使用随机森林进行时间序列数据模式识别(八)

随机森林回归模型原理

如何使用随机森林进行时间序列数据模式识别(六)

如何使用随机森林进行时间序列数据预测(四)

如何使用随机森林进行异常检测(六)

随机森林算法和grandientboosting算法 -回复

随机森林方法总结全面

随机森林算法原理和步骤

随机森林的原理

随机森林 重要性

随机森林算法

机器学习中随机森林的原理

随机森林算法原理

使用计算机视觉技术进行动物识别的技巧

基于crf命名实体识别实验总结

transformer预测模型训练方法

最新文章

随机森林算法介绍及R语言实现

基于随机森林优化的神经网络算法在冬小麦产量预测中的应用研究_百度文 ...

基于正则化贪心森林算法的情感分析方法研究

随机森林算法和grandientboosting算法

基于随机森林的图像分类算法研究

随机森林结合直接正交信号校正的模型传递方法

标签列表

随机森林重要性