1

I am in the middle of developing a compiler for a C like language and am having some difficulties in the semantic analysis and code generation phases. My questions are the following: 1) For an if statement, the following is the syntax:

if (expression) then
statement1;
statement2;
else
statement3;
end if;

Now, in my target code, it has to be a 3-address code with go to statements, so it should

look something like:
if (Rx)  // Rx is the register where the expression is evaluated and stored
go to X1 //for if part
X2 // for else part;

So now, my question is, how do I generate the addresses for "go to" statements?

2) This question is about semantic analysis: I have been able to build and use a symbol table for a single function. What is the approach I should be using to build symbol tables for function calls? In other words for different lexical levels? I know this should somehow involve having multiple trees. One tree for one function. But what's the approach to point to a different tree from somewhere in the middle of a program?

I'm a beginner and thus any suggestions/thoughts would be greatly appreciated.

12
  • What do you mean by "symbol table"?
    – SK-logic
    Commented Feb 16, 2011 at 10:41
  • My symbol table presently has all the declared identifiers with their type (and size in case of arrays) arranged in the form of a binary tree.
    – beanyblue
    Commented Feb 16, 2011 at 10:50
  • 1
    @SK-logic: "symbol table" is a perfectly normal term in the context of developing a compiler; I suggest just Googling. I don't really see why it would be implemented as a tree, though, unless it's a search tree (i.e. to sort the symbols left to right). Commented Feb 16, 2011 at 10:51
  • When you call your function, why do you need the symbol table for it's scope? Can you nest functions in your language?
    – Skurmedel
    Commented Feb 16, 2011 at 11:01
  • @Karl Knechtel - I used a tree because it would be easier when I have to look for those symbols when they are used later in the program. @Skurmedel - yes, I can nest functions, functions can be called recursively. I need different symbol tables for different functions because, the variables would be declared newly at each function level.
    – beanyblue
    Commented Feb 16, 2011 at 11:07

2 Answers 2

1

It depends on how and when your compiler will generate code.

If your compiler generates the code sequentially (from the first line of the code to the last one), then the only thing you can do is to remember the places where you want to jump to (store them in a table), and patch the code after everything has been generated.

If your compiler generates the code bottom-up (from the most inside statement to the most outside statement), and your underlying machine (physical or virtual) supports relative jumps, then you can simply generate the relative jumps when generating the code. For example.

Suppose you have this piece of code:

if (condition) then
   someexpressionsA
else
   someexpressionsB
endif;

The bottom-up compilation means that the code will be generated like this:

  • first the code for someexpressionsA
  • then the code for someexpressionsB
  • then the code for the if-then-else-endif statement

Suppose that our compiler has generated the code for someexpressionsA, called codeblockA (same for B). Then the code for the if-then-else-endif statement can be written like this (pseudo code):

  • Check condition
  • if condition is false jump sizeof(codeblockA+1) instructions further
  • codeblockA
  • jump sizeof(codeblockB) further
  • codeblockB

Things might get trickier if the condition contains multiple conditions (and, or, ...) but the example above should get you started.

4
  • 1
    Thanks Patrick. I am generating the code sequentially. So I guess I'm looking for a way to save the locations that I want to jump to. I don't know how I can do this. Like, in the example you gave, I have to be saving the addresses of someexpreA and someexpreB. How do I do that? I think this is something very basic, but please bear with me, I am not able to figure out :(
    – beanyblue
    Commented Feb 16, 2011 at 11:12
  • @beanyblue, give them sequential numbers or symbolic names, and use that numbers or names to look them up. This is what each and every existing assembler is doing with the labels.
    – SK-logic
    Commented Feb 16, 2011 at 12:25
  • Okay, let me put my question this way: How will my C code look when I am saving the memory location, Will it be something like : MEM[lab] = &label1 (assuming label 1 is one of the locs i am saving) and then when I branch I would use : GO TO MEM[lab]; Is this way of thinking correct? I understand the idea of using labels now, my question was about implementing it. Thank you so much for your inputs! I'm really grateful!
    – beanyblue
    Commented Feb 16, 2011 at 12:38
  • @beanyblue, this is specific for the machine for which you are generating the code. You will probably have to generate something like "JMP address". For x86 (see faydoc.tripod.com/cpu/jmp.htm) this could be the instruction "EA xx" where xx represents the address you want to jump to. Remember the place where you have stored this instruction, then at the end replace the xx with the correct address.
    – Patrick
    Commented Feb 16, 2011 at 13:40
0

Are you generating the code immediately, without an "assembler" pass which will resolve symbolic labels? In this case you have to build a table of all the labels and branching instructions, and then backpatch your code once all the labels are generated.

4
  • Are you saying that, i should have a phase after parsing where I should add labels for branching instructions? In this case, assuming that I generate an intermediate code similar to: if (expr) then label1: st1; else label2: st2; , how should I be saving this and using them in my final 3 address code?
    – beanyblue
    Commented Feb 16, 2011 at 10:53
  • Leave all the branch targets uninitialised (e.g., set them to 0), save their positions in a table, and save all the label positions in another table. When all the labels are generated, substitute them for all of the uninitialised branch targets. It is not even a pass, just a backpatching. Of course, a dedicated assembler pass will be a much better solution, but if you really want to do everything in a single pass, backpatching is the only option.
    – SK-logic
    Commented Feb 16, 2011 at 11:16
  • Thanks SK-logic. I have a very basic doubt actually. Please excuse me if it's very silly :( The thing is, I am not sure how to "save" the positions. How would my C code look when I am trying to save a particular location? If I include an assembler pass, then I would add labels to all branches,right? It would look like : if (expre) label1: st1; else label2: st2; end if; Now, how do I "save" the label in the memory? I have currently defined my memory space as an array of int.
    – beanyblue
    Commented Feb 16, 2011 at 11:38
  • @beanyblue, how do you handle backward branching? Do the same thing with forward branching, simply wait for a label to be known and substitute it wherever it was referenced in an already emited code.
    – SK-logic
    Commented Feb 16, 2011 at 12:13

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.