IELR(1):Practical LR(1)Parser Tables for Non-LR(1)Grammars with
Conflict Resolution
Joel E.Denny and Brian A.Malloy
School of Computing
Clemson University
Clemson,SC29634,USA
{jdenny,malloy}@cs.clemson.edu
ABSTRACT
There has been a recent effort in the literature to recon-sider grammar-dependent software development from an en-gineering point of view.As part of that effort,we examine a deficiency in the state of the art of practical LR parser table generation.Specifically,LALR sometimes generates parser tables that do not accept the full language that the gram-mar developer expects,but canonical LR is t
oo inefficient to be practical.In response,many researchers have attempted to develop minimal LR parser table generation algorithms. In this paper,we demonstrate that a well known algorithm described by David Pager and implemented in Menhir,the most robust minimal LR(1)implementation we have dis-covered,does not always achieve the full power of canonical LR(1)when the given grammar is non-LR(1)coupled with a specification for resolving conflicts.We also outline an original minimal LR(1)algorithm,IELR(1),which we have implemented as an extension of GNU Bison and which does not exhibit this deficiency.Finally,using our implemen-tation,we demonstrate the relevance of this deficiency for several real-world grammars,and we show that our imple-mentation is feasible for generating minimal LR(1)parsers for those grammars.
Categories and Subject Descriptors
D.3.4[Programming Languages]:Processors—parsing;
D.3.1[Programming Languages]:Formal Definitions and Theory—syntax;D.2.m[Software Engineering]:Miscel-laneous
General Terms
Algorithms,Languages,Theory
Keywords
grammars,grammarware,LALR,LR,Yacc
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on thefirst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.
SAC’08March16-20,2008,Fortaleza,Cear´a,Brazil
Copyright2008ACM978-1-59593-753-7/$5.00.1.INTRODUCTION
Grammar-dependent software is omnipresent in software development[18].For example,compilers,document pro-cessors,browsers,import/export tools,and generative pro-gramming tools are used in software development in all phases. These phases include comprehension,analysis,maintenance, reverse-engineering,code manipulation,and visualization of the application program under study.However,construc-tion of these tools relies on the correct recognition of the language constructs specified by the grammar.
Some aspects of grammar engineering are reasonably well understood.For example,the study of grammars as defi-nitions of formal languages,including the study of LL,LR, LALR,and SLR algorithms and the Chomsky hierarchy, form an essential part of most computer science curricula. Nevertheless,parsing as a disciplined study must be recon-sidered from an engineering point of view[18,19].Many parser developers eschew the use of parser generators be-cause it is too difficult to customize the generated parser or because the generated parser requires considerable mod-ification to incorporate sufficient power to handle modern grammars such as the C++and C#grammars.Thus,in-dustrial strength parser development requires considerable effort,and many approaches to parser generation are ad hoc [26,27].
From an engineering point of view,one source of diffi-culty in parser development stems from a deficiency in the state of the art of practical LR parser table generation.LR parsing is“the most general nonbacktracking shift-reduce parsing method known”,and canonical LR is the most gen-eral technique for generating LR parser tables from a given grammar[12].As a result,canonical LR parser tables ac-cept the language that a grammar developer expects a given grammar to define.Unfortunately,canonical LR tables re-quire“too much space and time to be useful in practice”
[12].In contrast,LALR parser tables are feasible and are thus employed by widely used tools like Yac
c[17,11]and its GNU implementation Bison[1].Unfortunately,as Bison’s manual points out,LALR parser tables contain“mysterious conflicts”that“don’t look warranted”[16].These conflicts cause the parser to encounter“unnatural errors”because LALR“is not powerful enough to match the intuition of the grammar writer”[24,25].In this way,LALR can worsen the difficulty of developing a correct grammar.Even for an existing correct LALR grammar,LALR can interfere with incremental changes[24,25],which are inevitable in the face
1.S →aAa
2.→bAb
3.A →a
4.
→aa
S R1
Õ
ÕÕÕW
W
W W a
A R3a a
S R1
Ô
ÔÔÔX
X
X X
a
A R4a aa S R2
Ö
ÖÖÖV
V
V V b
A R3b a
S R2
Ô
ÔÔÔX
X
X X
b
A R4
b
aa
Figure 1:A Non-LR(1)Grammar Example.This grammar defines a language consisting of 4sentences,each of which corresponds to 1parse tree:aaa ,aaaa ,bab ,and baab .Assuming that the parser is LR(1)and that a is left-associative,the sentence aaaa is dropped from the lan-guage.
of software evolution.
In response,many researchers have developed minimal LR algorithms,which attempt to generate parser tables with the power of canonical LR but with nearly the efficiency of LALR [20,22,21,24,25,23,8,6].Menhir,hosted at [7],is an implementation of Pager’s algorithm,which is de-scribed in [21].Menhir is the most robust minimal LR(1)implementation we have discovered available.
In this paper,we show that Pager’s algorithm and thus Menhir are not always able to generate parser tables with the full power of canonical LR(1)if the given grammar is non-LR(1)coupled with a specification for resolving con-flicts.We also describe an original minimal LR(1)algo-rithm,IELR(1),tha
t does not suffer from this deficiency.In section 2,we present an original non-LR(1)grammar exam-ple to demonstrate the deficiency of Pager’s algorithm.We also introduce some original terminology to facilitate fur-ther discussion.In section 3,we outline our IELR(1)algo-rithm,which we have implemented as an extension of Bison.In section 4,we compare the Bison LALR(1)implementa-tion with our IELR(1)implementation using our example grammar plus five grammars for popular languages as case studies.In section 5,we review related work.In section 6,we use the results of our case studies to conclude that (1)Menhir’s deficiency does affect real-world parsers,(2)it can create bugs relative to the intended design of such parsers,and (3)IELR(1)is feasible for generating minimal LR(1)parsers for sophisticated real-world grammars.
2.MOTIV ATING EXAMPLE
In this section,we demonstrate an original non-LR(1)grammar example.We analyze the language defined by that grammar in section 2.1.In sections 2.2and 2.3,we describe its canonical LR(1)and LALR(1)parser tables in order to introduce some original terminology to facilitate discussion.In section 2.4,we identify a deficiency of Pager’s algorithm and thus of Menhir.
2.1A Non-LR(1)Grammar
As written,the grammar in Figure 1defines a language consisting of 4sentences:aaa ,aaaa ,bab ,and baab .Consider how an LR parser behaves when it reaches the ·in either of
Canonical LR(1)LALR(1)0.S →·S ,{λ}G10.
S
→·S ,{λ}G1S →·aAa ,{λ}S2S →·aAa ,{λ}S2→·bAb ,{λ}S3→·bAb ,{λ}S31.S
→S ·,{λ}Acc 1.S
→S ·,{λ}Acc 2.S →a ·Aa ,{λ}G4 2.
S →a ·Aa ,{λ}G4A →·a ,{a }S8A →·a ,{a }S8→·aa ,{a }S8→·aa ,{a }S83.S →b ·Ab ,{λ}G5 3.
S →b ·Ab ,{λ}G5A →·a ,{b }S10A →·a ,{b }S8→·aa ,{b }S10→·aa ,{b }S84.S →aA ·a ,{λ}S6 4.S →aA ·a ,{λ}S65.S →bA ·b ,{λ}S7 5.S →bA ·b ,{λ}S76.S →aAa ·,{λ}R1 6.S →aAa ·,{λ}R17.S →bAb ·,{λ}R27.S →bAb ·,{λ}R2*8.A →a ·,{a }R3
*8.A →a ·,{ab }R3A →a ·a ,{a }S9A →a ·a ,{ab }S99.A →aa ·,{a }R49.
A →aa ·,{ab }
R4
10.A →a ·,{b }R3
A →a ·a ,{b }S1111.A →aa ·,{b }R4
Table 1:Example Parser Tables.These are the canon-ical LR(1)and LALR(1)parser tables for the grammar of Figure 1.λin a lookahead set represents the end of the input.Differences between canonical LR(1)and LALR(1)are shown in bold.In both cases,state 8has a S/R conflict on a resolved as a reduce since a is left-associative.
the marked input sentences,ba ·b and ba ·ab .These sentences look the same before the ·and so the parser performs identi-cal actions until this point.If the parser then looks ahead 1token and sees b ,it knows it must reduce the previous a to A as in the third parse tree in Figure 1.If it sees a instead,it must shift the a so that it can then reduce the previous aa to A as in the fourth parse tree.Similarly,at the ·in aa ·a and aa ·aa ,the parser must choose to reduce in order to accept aaa as in the first parse tree,but it must choose to shift in order to accept aaaa as in the second parse tree.However,in this ca
se,the first token of lookahead is the same,a ,and so does not distinguish between the possible parser actions.Assume that the grammar author has designed the gram-mar with LR(1)in mind.If he has declared a as left-associative,the parser chooses to reduce in the cases of both aa ·a and aa ·aa .In this way,the grammar author has speci-fied a new language consisting of only 3sentences:aaa ,bab ,and baab .Notice how we seem to be able to determine the language without computing any parser tables.
2.2New Canonical LR(1)Terminology
The first column of Table 1shows the canonical LR(1)parser tables for the grammar of Figure 1when a is left-associative.We introduce the term isocores to describe states that have the same core.For example,state 8is an isocore of state 10,but their lookahead sets and actions are different.When we speak of a particular state’s set of viable prefixes,we are speaking of the set of possible viable prefixes that could be on the stack when that state is at the top of the stack.For example,state 8’s only viable prefix is
Canonical LR(1)LALR(1)
Stack Input Action Stack Input Action 0baab S30baab S3
0,3aab S100,3aab S8 0,3,10ab S110,3,8ab R3 0,3,10,11b R40,3Aab G5 0,3Ab G50,3,5ab Err
0,3,5b S7
0,3,5,7R2
0S G1
0,1Acc
Table2:An Example Parse.This table shows how the canonical LR(1)and LALR(1)parser tables of Table1parse the sentence baab.Because LALR(1)merges state10into state8,it rejects the sentence even though the canonical LR(1)parser accepts it.
aa,and state10’s only viable prefix is ba.State8contains the S/R conflict on token a whose resolution as reduce(1) permits a successful completion of the parse of aa·a but(2) leads the parse of aa·aa to a syntax error.We say the shift and reduce actions are contributions to the conflict.We call the reduce action the dominant contribution.
2.3LALR(1)Conflicts Reconsidered
The second column of Table1shows the LALR(1)parser tables for the grammar in Figure1.They are nearly identical to the canonical LR(1)parser tables but isocores are merged together.For example,canonical LR(1)state10is merged into state8.Merging isocores can create new conflicts that are not present in the canonical LR(1)parser tables.[16] calls these mysterious conflicts because they can be a source of confusion for grammar authors.The trouble is that a sim-ple grammar analysis as we performed in section2.1does not easily reveal mysterious conflicts.To discover them,the grammar author is forced to examine parser tables.In this paper,we rename[16]’s mysterious conflicts to mysterious new conflicts since we wish to emphasize that this term al-ways refers to conflicts that do not exist in the canonical LR(1)parser tables.As[16]points out,such conflicts are always R/R conflicts since,as[12]points out,merging iso-cores can never create new S/R conflicts.
Because canonical LR(1)states10and8in Table1are merged,the LALR(1)parser encounters the conflict of state 8even after the viable prefix of state10.Unfortunately for ba·ab,which starts with that viable prefix and which requires a shift in state10,the conflict is resolved as a reduce.As Table2demonstrates,this leads the LALR(1)parse of ba·ab to a syntax error even though canonical LR(1)is able to parse it successfully.The conflict here is not a mysterious new conflict since it already
exists in canonical LR(1)state8. Instead,we have identified a second category of mysterious conflicts that we call mysterious invasive conflicts.That is,switching from canonical LR(1)to LALR(1)can cause a parser to encounter existing conflicts after viable prefixes for which it never would have encountered those conflicts before and,as a result,to perform incorrect actions.This switch can also cause a parser to perform incorrect actions after viable prefixes for which it would have encountered the existing conflict before.We call this a mysterious mutated conflict.Since mysterious conflicts of these two categories stem from existing conflicts in canonical LR(1)tables,they are only possible for non-LR(1)grammars.
In summary,given the grammar of Figure1,given that a is left-associative,and assuming an LR(1)parser,the lan-guage so specified is a set of3sentences:aaa,bab,and baab. As expected,the canonical LR(1)parser tables for this spec-ification accept exactly that language.However,LALR(1) diminishes the language to only2sentences:aaa and bab. The culprit is not a mysterious new conflict.The culprit is
a mysterious invasive conflict.
2.4The Deficiency of Pager’s Algorithm
Most algorithms for generating LR(1)parser tables merge states only if they pass some sort of compatibility test.Canon-ical LR(1)parser tables are relatively large because the com-patibility test is relatively restrictive:states must be isocores and their lookahead sets must be identical.LALR(1)parser tables are sometimes too small to maintain the full power of LR(1)because the compatibility test is too lenient:states must simply be isocores.[21]describes an algorithm to gen-erate LR(1)parser tables while employing tests for two kinds of compatibility,weak compatibility and strong compatibil-ity,both of which lie somewhere in between.Throughout this paper,we refer to this algorithm as Pager’s algorithm. We identify two problems with Pager’s algorithm for our purposes.First,its weak compatibility test is not designed for non-LR(1)grammars.Thus,the only category of mys-terious conflicts that the weak compatibility test can always avoid is mysterious new conflicts.For example,it does not avoid the mysterious invasive conflict in Table1.Second, both its weak and strong compatibility tests permit isocores to be merged if the merged state would not contain a poten-tial conflict,but the definition of potential conflict employed does not include potential or even real S/R conflicts.Thus, even using the strong compatibility test by itself,Pager’s algorithm does not avoid the mysterious invasive conflict in Table1.
Like Bison,Menhir is designed to accept non-LR(1)gram-mars with a specification for resolving confli
cts.Thus,even though Menhir employs Pager’s algorithm to try to achieve the power of canonical LR(1),it fails to in some cases.For example,we have confirmed that,when given the gram-mar of Figure1combined with the declaration of a as left-associative,Menhir generates a parser that does not accept the sentence baab.Again,a canonical LR(1)parser does accept this sentence.
3.METHODOLOGY
In this section,we outline our IELR(1)algorithm in6 phases.We label these phases Phase0through Phase5 and describe them in sections3.1through3.6.Due to space restrictions,we omit the details.For a complete description, see our technical report at[14].
3.1Phase0:LALR(1)
Pager’s algorithm avoids some mysterious conflicts by re-fusing to merge isocores if its compatibility tests predict that doing so would induce an inadequacy in the parser tables. In contrast,IELR(1)computes up front all possible inade-quacies,and then afterwards it eliminates all the inadequa-cies that would not appear in canonical LR(1)parser tables. Since LALR(1)parser tables have all isocores merged fully,
they contain all such inadequacies.Thus,Phase0computes LALR(1)parser tables.It does so in two steps:(1)com-pute LR(0)parser tables,and(2)compute the reduction lookahead sets using the technique described by[15].As part of its algorithm,step2also computes a set of goto ta-bles that the remaining IELR(1)phases require.In our im-plementation of Phase0,we use Bison’s existing LALR(1) implementation,which already computes those goto tables.
3.2Phase1:Compute Auxiliary Tables
The remaining phases of IELR(1)require three additional tables beyond those computed in Phase0,and they are computed in Phase1.follow kernel items records depen-dencies of goto follow sets on kernel item lookahead sets. always follows records goto follow tokens that do not de-pend on the kernel item lookahead sets of predecessor states. That is,this table records goto follows that will never change no matter how IELR(1)may split the LALR(1)parse states. predecessors records transition predecessor relations between LALR(1)states.
3.3Phase2:Compute Annotations
Phase2annotates inadequate LALR(1)states.Each an-notation describes how isocores of the annotated state might contribute to an inadequacy.The predecessors table en-ables reverse iteration
from the conflicted state to all the predecessor states that might contribute to the same in-adequacy.This iteration is similar to lane tracing as de-scribed by[20]and[22].The follow kernel items table, always follows table,and goto tables enable the computa-tion of the inadequacy contributions made by each state. 3.4Phase3:Split States
Phase3recomputes the parse states in a manner simi-lar to Phase0step1.Whereas Phase0step1merges all isocores together,Phase3only merges isocores if they will maintain the same dominant contribution to every inade-quacy according to their core’s annotations.In order to do this,Phase3must compute partial kernel item lookahead sets by using the follow kernel items and always follows tables and by propagating lookaheads from each parse state to its successor if the lookaheads appear in the annotations.
3.5Phase4:Compute Reduction Lookaheads Phase4runs step2of Phase0again without modification. That is,it computes the full lookahead sets on reductions in all IELR(1)parse states.
3.6Phase5:Resolve Remaining Conflicts
All that’s left is to resolve the remaining conflicts in the parser tables.Our IELR(1)implementation uses Bison’s existing conflict resolution algorithm without modification.
4.RESULTS OF USING IELR(1)
In this section,we compare the Bison LALR(1)implemen-tation with our IELR(1)implementation using six grammars as case studies,which are summarized in Table3.In section 4.1,we describe the grammars in detail.In section4.2,we compare the parser tables that the algorithms generate for the grammars.In section4.3,we compare the performance of the algorithms in terms of time and space.
4.1Case Studies
Grammar Version|T||V||T∪V||P|
Figure12244
Gawk Gawk3.1.06145106163
Gpic Groff1.18.113845183247
C GCC4.0.492208300573
Java GCC4.2.1109164273516
C++ISO2003117184301481 Table3:Grammar Characteristics.These counts measure the size of each case study’s grammar G= (V,T,P,S),such that V is the set of nonterminals,T is the set of terminals or tokens,P is the set of productions, and S is the start symbol.These counts include the produc-tions and nonterminals that Bison generates implicitly for mid-rule actions.
Table3characterizes the grammars of our case studies. Ourfirst case study is the example grammar of Figure1 including the declaration of a as left-associative.
Our next four case studies are mature grammars from widely used software products employing LALR(1)parser generators.Gawk(GNU AWK),a text-based data process-ing language,wasfirst written in1986but is based on the original AWK,which was written in1977and which is stan-dardized in SUSv3(the Single UNIX Specification,Version 3)[11,2].Groff(GNU Troff)is a document formatting sys-tem for UNIX that includes Gpic(GNU Pic),a Groffpre-processor for specifying diagrams.Groffwasfirst released in 1990and is based on Troffwhich has existed since the early 1970’s[13,4].We copied our C and Java grammars from GCC(the GNU Compiler Collection),which is a widely used collection of compilers developed by the GNU Project[3]. The latest version of the C++programming language is C++2003,the formal specification for which is[10].Annex A of that specification presents a formal C++grammar.As ourfinal case study,we formatted this grammar as a Bi-son grammarfile except that,f
or section A.2,Lexical con-ventions,we replaced the integer literal,character literal, floating literal,and string literal nonterminals with to-kens.
4.2LALR(1)vs.IELR(1)Parser Tables
Table4describes the parser tables that the Bison LALR(1) implementation and our IELR(1)implementation generate for each of our case studies’grammars.Since some gram-mars include token precedence and associativity declarations that resolve most of their conflicts,we also describe their parser tables when generated without these declarations in order to better demonstrate grammar analysis complexity. When IELR(1)splits states into isocores,each conflict from the original LALR(1)state might be duplicated among several isocores,and Bison then counts the conflict sepa-rately for each.Sometimes the multiple count is a mis-leading representation of grammar complexity because all isocores have perfect duplicates of the conflict and thus the same precedence and associativity declarations can resolve them all.Therefore,from each IELR(1)unresolved conflict count in Table4,we subtract all but one copy of each unre-solved conflict that is perfectly duplicated among isocores. Table5describes the parser actions that are corrected in the parser tables by switching from LALR(1)to IELR(1).
Grammar
States S/R R/R LA IE LA IE LA IE
Figure1101200-000-0
no prec/assoc111111-000-0 Gawk3203296565-000-0
no prec/assoc320320410410-000-0 Gpic42342800-000-0
no prec/assoc426426803803-088-0 C9339331313-000-0
no prec/assoc933933329329-000-0 Java79279200-06262-0
C++822836407410-3135169-34 Table4:Parser Tables.In this table,we report the num-ber of states,the number of unresolved S/R conflicts,and the number of unresolved R/R conflicts in the parser tables that the Bison LALR(1)implementation and our IELR(1) implementation generate for our case studies’grammars. For IELR(1)conflict counts,we show adjustments to ac-count for unresolved conflicts that are perfectly duplicated among isocores.
Grammar Actions States Tokens
Figure1111
Gawk933
Gpic212
C000
Java000minimal
C++442
Table5:Action Corrections.For each of our case studies’grammars,this table reports the number of parser actions that are corrected by switching from LALR(1)to IELR(1),the number of parser states containing corrected parser actions,and the number of unique tokens in the gram-mar on which there are corrected parser actions.
For the Figure1,Gawk,and Gpic grammars,every corrected action originates from a mysterious invasive LALR(1)S/R conflict resolved as a reduce action,and only the shift action remains in each of the new corrected IELR(1)isocores.
In the case of the Figure1grammar,the LALR(1)and IELR(1)parser tables generated by Bison are ne
arly the same as the LALR(1)and canonical LR(1)parser tables shown in Table1.That is,when a is declared left-associative, LALR(1)accepts only2input sentences,but IELR(1)ac-cepts the same set of3input sentences that canonical LR(1) does.
After exploring the Gpic grammar’s source comments,we conclude that IELR(1)corrects the behavior of a Gpic fea-ture that was intentionally created by the author of Gpic’s grammar.However,we also contacted the current Gpic de-velopers for their opinion.They seem to have been pre-viously unaware that the affected feature even exists,and some members expressed an interest in seeing it removed. For the full discussion,see[5].
We have confirmed that Menhir is unable to recognize the need to split offany corrected isocore for the Figure1, Gawk,or Gpic grammar and so leaves the incorrect actions.
Grammar
Real Run Peak Mem
Annotations
Time(s)Usage(KB)
LA IE LA IE
Figure10.0380.03860621
Gawk0.0650.0841101952537
Gpic0.1170.1621154506123
C0.1180.1502256506573
Java0.1490.2503759004636
C++0.0800.2594259755346
Table6:Parser Tables Computation.This table de-scribes the performance of the Bison LALR(1)implementa-tion and our IELR(1)implementation for each of our case studies’grammars.The real run time measures the full run time for Bison.However,we report peak memory usage only for the duration of the LALR(1)or IELR(1)algorithm.We also report the total number of inadequacy annotations at-tached to the LALR(1)states during phase2of IELR(1).
In this way,LALR(1)and Menhir fail to generate parsers that accurately implement these grammars,but IELR(1)is successful.
4.3LALR(1)vs.IELR(1)Performance
In Table6,we report the performance of the Bison LALR(1) implementation and our IELR(1)implementation for each of our case studies’grammars.We performed these measure-ments on a Toshiba Tecra M4-S435with an Intel Pentium M Processor740[1.73GHz,2MB L2,533MHz FSB]with 1024MB DDR2SDRAM running the Slackware10.2.0dis-tribution of GNU/Linux with Linux kernel version2.6.13. Using GNU time1.7,we measured the full real run time of Bison,and we report the average for3trial runs.We estimated peak memory usage during the execution of the LALR(1)and IELR(1)algorithms based on diagrams gen-erated by Massif from Valgrind3.2.3[9].
5.RELATED WORK
[18,19,26,27]discuss the need to improve the state of the art in grammar-dependent software engineering.[20,22,21, 24,25,23,8,6]develop minimal LR algorithms.Menhir, hosted at[7],is an implementation of Pager’s algorithm, which is described in[21].It is the most robust minimal LR(1)implementation we have discovered available,but it is not always able to generate parser tables
with the full power of canonical LR(1)if the given grammar is non-LR(1) coupled with a specification for resolving conflicts.
6.CONCLUDING REMARKS
We are surprised to discover that mature grammars from widely used software products employing LALR(1)parser generators should suffer from any incorrect parser actions that result from the misuse of the LALR(1)algorithm.The Gawk and Gpic case studies provide strong evidence that such incorrect actions do occur in real-world parsers.Such actions are unintuitive and thus may impede the develop-ment of a correct parser.Moreover,the Gpic case study shows that such incorrect actions can create actual bugs relative to the intended design of a real-world LALR(1)-generated parser.Pager’s algorithm and thus Menhir do not
address the incorrect actions in either of these case studies. IELR(1)corrects them for both.
Because of the maturity of the GCC project,we are not surprised that its C and Java grammars suffer from no incor-rect parser actions that result from a misuse of the LALR(1) algorithm.These case studies demonstrate IELR(1)’s ability to recognize when LALR(1)parser tables are sufficient with-out unnecessarily splitting its states into additional canoni-cal LR(1)states.
The performance measurements from all of our case stud-ies show that IELR(1)is feasible for generating minimal LR(1)parsers for sophisticated real-world grammars.Specif-ically,for each of our case studies when using the IELR(1) algorithm,Bison did not run longer than0.3seconds,and the IELR(1)algorithm never required as much as1MB of memory at any point in time.
7.REFERENCES
[1]Bison.The GNU Project.
/software/bison/.
[2]Gawk.The GNU Project.
/software/gawk/.
[3]GCC.The GNU Project./.
[4]Groff.The GNU Project.
/software/groff/.
[5][Groff]parsing a corner specification.Groffmailing
list archives./archive/html/
groff/2007-08/msg00051.html.
[6]LRGen.P.B.Mann.
www.paulbmann/lrgen/.
[7]Menhir.F.Pottier and Y.R´e gis-Gianas.
cristal.inria.fr/∼fpottier/menhir/. [8]The Honalee LR(k)Algorithm.D.R.Tribble.
ibble/text/honalee.html.
[9]Valgrind./.
[10]International Standard,Programming Languages–
C++.Number ISO/IEC14882:2003(E).American
National Standards Institute,October2003.
[11]Single UNIX Specification,Version3.The IEEE and
the Open Group,April2004.http:
///bookstore/catalog/t041.htm.
[12]A.V.Aho,R.Sethi,and J.D.Ullman.Compilers:
Principles,Techniques,and Tools.Addison-Wesley,
1986.[13]ff.org,The Text Processor for
Typesetters.
[14]Joel E.Denny.
[15]F.DeRemer and T.Pennello.Efficient Computation
of LALR(1)Look-Ahead Sets.ACM Transactions on
Programming Languages and Systems,4(4):615–649,
October1982.
[16]The GNU Project.Bison,2.3edition,May2006.
[17]S.Johnson.Yacc:Yet Another Compiler Compiler.In
UNIX Programmer’s Manual,volume2,pages
353–387.Holt,Rinehart,and Winston,New York,
NY,USA,1979.
[18]P.Klint,R.L¨a mmel,and Chris Verhoef.Toward an
engineering discipline for grammarware.ACM
Transactions on Software Engineering Methodology,
14(3):331–380,2005.
[19]R.L¨a mmel.Grammar Testing.In FASE,pages
201–216,2001.
[20]D.Pager.The lane tracing algorithm for constructing
LR(k)parsers.In STOC’73:Proceedings of thefifth
annual ACM symposium on Theory of computing,
pages172–181,New York,NY,USA,1973.ACM
Press.
[21]D.Pager.A Practical General Method for
Constructing LR(k)Parsers.Act Informatica,
7(3):249–268,September1977.
[22]D.Pager.The Lane-Tracing Algorithm for
Constructing LR(k)Parsers and Ways of Enhancing
Its Efficiency.Information Sciences,12(1):19–42,1977.
[23]P.Pepper.LR Parsing=Grammar Transformation+
LL Parsing,Making LR Parsing More Understandable And More Efficient.Technical Report99-05,TU
Berlin,1999.
[24]D.Spector.Full LR(1)parser generation.SIGPLAN
Not.,16(8):58–66,1981.
[25]D.Spector.Efficient full LR(I)parser generation.
SIGPLAN Not.,23(12):143–150,1988.
[26]M.van den Brand and E.Visser.Generation of
formatters for context-free languages.ACM Trans.
Softw.Eng.Methodol.,5(1):1–41,1996.
[27]M.G.J.van den Brand,A.Sellink,and C.Verhoef.
Current Parsing Techniques in Software Renovation
Considered Harmful.In IWPC,page108,Washington, DC,USA,1998.
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论