Bits of Learning

Learning sometimes happens in big jumps, but mostly in little tiny steps. I share my baby steps of learning here, mostly on topics around programming, programming languages, software engineering, and computing in general. But occasionally, even on other disciplines of engineering or even science. I mostly learn through examples and doing. And this place is a logbook of my experiences in learning something. You may find several things interesting here: little cute snippets of (hopefully useful) code, a bit of backing theory, and a lot of gyan on how learning can be so much fun.

Sunday, March 26, 2006

Separate Parsers in The Same Application

It might often happen that you would like to have two parsers coexist in the same program. Here's an example of this situation in my current implementation of Modest -- the Model Based Testing tool.

There're the following two modules:

cg : This reads API specifications (written in a language, say, A) from a spec file. It then generates the GraphMaker code. This, when built and executed, will generate the state space graph (written in a language, say, B) of the given application.

pf : This reads the state space graph, again written in B, reads test specifications, written in a language, say, C, and computes the test sequences.

We observe that there are three languages to be recognised -- The API specification language A, the graph description language B, and the test specification language C. We need parsers for all three of them. Incidentally, it our case, B = C (in context-free grammatical sense). However, the data-structure into which they are read is different. Hence different parsers are anyway required. But the lexical analyser for both B and C is the same.

Say the lexical analyser for A, B and C are l(A), l(B) and l(C), and let the syntax analysers be p(A), p(B), and p(C) respectively. I used yacc (in fact bison) to write the specs for p(A). I hand-coded p(B) and p(C).

l(A) and l(B) were written in lex (in fact flex). And, as mentioned above, l(B) = l(C).

Initially cg and pf were developed separately. Hence, the parsers and the lexical analysers didn't interfere with each. However, when I tried integrating them into Modest, I ran into trouble due to the following:

1. Name Conflicts among globals
--------------------------------------------

When I did flex(Vocab (A)), it generated the lexical analyser function yylex(), which is global. Similarly when I did flex(Vocab(B)), it too generated a lexical analyser function yylex(). Both global, and hence, while linking, gave redefinition error.

Solution:
As mentioned above, the default name of the lexical analyser function generated by flex is yylex(). Similarly, the default name of the syntax analyser function generated by bison is yyparse(). Both these names can, however, be changed with the following.

Running flex as follows:

flex -Pprefixname inputfilename.flex

will generate lexical analyser function with the name prefixnamelex() instead of yylex().

Similarly running bison as follows:

bison --name-prefix prefixname inputfilename.yy

will generate the syntax analyser function with the name prefixnameparse() instead of yyparse(). Corresponding changes happen to many important tokens in the generated parser. For instance, the calls to yylex() in the generated code will all now be to prefixnamelex(). Hence, it is necessary to have the prefixname same for both the flex and the bison commands, so that the linker finds the prefixnamelex() function that the prefixnameparse() functions calls.

This solves the name conflict problems for the lexical analyser and syntax analyser functions for more than than one analysers in the same program. The name coflicts arising between other globals that you might have created can be easily resolved by encapsulating them into namespaces of those modules (I am assuming C++).

2. Name conflict between the input source file-pointer
--------------------------------------------------------------------------

The way to direct flex to generate a lexical analyser that reads from a file pointer of a particular name, say fin, is to embed the following preprocessor directive in the flex input file:

#undef YY_INPUT
#define YY_INPUT(buf,result,max_size) \
if ( (result = fread( (char*)buf, sizeof(char), max_size, fin)) <>
YY_FATAL_ERROR( "read() in flex scanner failed");

For example, for cg, the above was

#undef YY_INPUT
#define YY_INPUT(buf,result,max_size) \
if ( (result = fread( (char*)buf, sizeof(char), max_size, cgfin)) <>
YY_FATAL_ERROR( "read() in flex scanner failed");

And for pf, it was

#undef YY_INPUT
#define YY_INPUT(buf,result,max_size) \
if ( (result = fread( (char*)buf, sizeof(char), max_size, pffin)) <>
YY_FATAL_ERROR( "read() in flex scanner failed");

Of course, it's our responsibility that the lexical analyser finds this FILE * open when it tries to read from it.

No comments: