CSCI 380 - Compilers - Daily Work Plan - Winter 2007

(Jump to current week)

Explanation of codes in this workplan:

   Day: 1.3 means week 1, day 3 (that is, Wednesday)
   Note: homework shown for 1.3 is done the evening of 1.3
      and is therefore due at classtime on 1.4

   C means a classroom activity
   J means an entry in the journal
   R means reading Parsons
   P means a programming activity or due date
   Q means a quiz!
   W means a written assignment due by the next class

Day  Task

Week 1: 3 January 2007
Before the first class!
Please read Parsons (xiii - 32) and the syllabus. Be prepared to do the first eight problems at the end of Chapter Two. Be prepared for a quiz on this reading.

 1.3 CQ Write your email on handout sheet, review syllabus & plan, compiler concepts, elements; tokens, lexemes; token handout; page 6, 8; finite state automata, DFA (parity checker, 3 ones, 25-cent drink machine), NFA 
  Sample Modula-2 source with token list
 1.3 R pp. xiii-32. Read Web-based syllabus.
 1.3 W Problem 2.1 (on page 62). 
 1.3 R pp. 33-49
 1.3 W Problems on pp. 62-3/ 2.2bc, 2.3bc, 2.4 - 2.7 
 1.3 W Draw on a single sheet of paper an class/object diagram for your compiler.

 1.5 C review homework in pairs, DFAs, NFAs, constructing DFAs from NFAs and lexers, regular expressions, 
 1.5 C review diagrams, homework, hear the wisdom of Niklaus Wirth, "Tokens are described as regular expressions," pumping lemma, build an FSA for recognizing tokens
 1.5 R pp. 49-62, and read Chapter 6 of the Modula-2 User's Manual to begin learning the language. The manuals (User's, Reference, and Tutorial) are in the corner cabinet in Room 129. 
     This project is the start of your lexer assignment.
 1.5 C (optional, of course, as always) review the code in  pickup directory, CS380 asm handout
 1.5 J remember to email your journal by 3:00 pm today
 1.6 P First programming project (assignment 0) due: single character reader.

Week 2: 8 January 2007
 2.1 C review object diagram, review lexer code in Parsons, review TomPile, progress and problems on token list, return quiz 
 2.3 Q quiz; work on token list (lexer) 
 2.5 C introduce parsing and the symbol table. Second version of OOD due at classtime.
 2.6 J remember to email your journal by 11:59 pm today
 2.6 P lexer due Saturday at 11:59 pm

Week 3: 15 January 2007
 3.1 P start on symbol table
 3.1 R pp. 65-71, and 281-286 symbol table, look up symbol table in the index
 3.1 C Review .mod programs; list needs for symbol table; questions about parse.[h,cpp] in detail, symbol table, a driver function to test the symbol table (EnterScope, Add vars, Enter Scope, Add more, etc.), scope stack, run-time stack and variable management at run-time
 3.3 C More about symbols 
 3.5 C and more questions about the symbol table, life, the universe, and everything
 3.6 J remember to email your journal by 11:59 pm today
 3.6 P symbol table and related implementation due at 6:00 pm
 3.6 P symbol table due Saturday at 11:59 pm

Week 4: 22 January 2007
 4.1 C Microsoft Macro Assembler, emitter.h/cpp, parsing simple Modula-2 programs.
 4.3 C More about the emitter and MASM.
 4.5 C Answer MASM and other questions.
 4.6 J Remember to email your journal by 11:59 pm today
 4.6 P working Modula-2 compiler due at 11:59 pm for the first four sample programs:
         test01.mod, test02.mod, test03.mod, test04.mod 

Week 5: 29 January 2007
 5.1 C expressions, reverse Polish notation (RPN), grammars, and the stack
 5.1 R pp. 65-82 (You have read some of this before.)
 5.1 W pp. 114ff./3.1, 3.2, 3.3, 3.4, 3.5, 3.6
 5.3 C more about expressions and grammars. Paren parsing and the compiler's run-time stack (cf. parse tree)
 5.3 R pp. 82-114
 5.3 W pp. 114ff./3.7, 3.8, 3.9, 3.10, 3.11
 5.3 C Review assembler arithmetic (Note that there is no class Friday.)
 5.6 J remember to email your journal by 11:59 pm today
 5.6 P working Modula-2 compiler due at 11:59 pm for TestExpr.mod

Note the resources for assembly language in the classroom, such as Dandamudi

Week 6: 5 February 2007
 6.1 C class: FIRST/FOLLOW Enlarging Expr() to handle conditional expressions (preparing for IF and LOOP), jumps
 6.3 C more about conditional expressions and jumps
 6.3 Q Quiz 2 (parsing, first, follow), Review of language theory, thinking more about the 
          stack, labels, and the symbol table
 6.6 J remember to email your journal by 11:59 pm today 
Week 7: 12 February 2007 7.1 R Chapter 4 of Parsons 7.1 C explaining Chapter 4 and bottom-up compilers 7.3 C bottom-up compilers, continued, intro to arrays (and possibly procedures) 7.3 P working Modula-2 compiler due at 11:59 pm for loops, and conditional statements (Test05.mod, Test06.mod, and much of FullExpr.mod) 7.6 J remember to email your journal by 11:59 pm today.
Week 8: 19 February 2007 8.1 Q Quiz 3 bottom-up parsing 8.3 C procedures 8.6 J remember to email your journal by 11:59 pm today 8.6 P working Modula-2 compiler due at 11:59 pm for arrays This compiler should be able to handle square.mod and sieve.mod
Week 9: 26 February 2007 9.3 C short class to review final presentation of compilers, and procedure design and implementation 9.6 J Remember to email your journal by 11:59 pm today
9.6 P working Modula-2 compiler due at 11:59 pm for procedures with value parameters. This compiler should be able to handle test07.mod, ProcTest.mod, and towers.mod. Week 10: 5 March 2007 10.1 Course evals 10.5 J remember to email your final journal by 1:30 pm today 10.5 P working Modula-2 compiler due at 1:29 pm for procedures with reference parameters. (such as bubl.mod) 10.5 P present your compiler during exam period: 1:30 pm - 3:30 pm This means that you should show the class two or three particular features or clever implementations of which you are proud. This final class session is an important part of the educational mission of our class. Please give it due preparation. The actual presentation should take about 10 minutes. Remember to bring the sheet documenting your compiler's features.  


What's the big idea?

The following words are "thought-nudgers;" that is, they suggest ideas and concepts that you should increasingly understand as the course progresses. They are arranged by meaningful groups where possible. These may reasonably appear on quizzes.

machine code, assembler, high-level language

source language, object code, object file, target machine, executable, run-time library

lexer, parser, parse tree, optimization, emitter

compiler, linker, loader, interpreter, three-address code

Chomsky's language hierarchy (you should know all four and examples of each), FSA, DFA, NFA, Turing Machine, runtime stack, reverse Polish notation (RPN), and more.

 

 


Assignment Descriptions:

Token List

Lexically analyze Modula-2 source code to produce a token list. The following tokens should be recognized and listed in accordance with the description of Modula-2 given in the TopSpeed Modual-2 User's Manual:

enum TOKENTYPE {

/* 0*/ T_ERROR=0, T_AND, T_ARRAY, T_BEGIN,
/* 4*/ T_CARDINAL, T_CONST, T_DIV, T_DO,
/* 8*/ T_ELSE, T_END, T_EXIT, T_FOR, T_IF,
/*13*/ T_INTEGER, T_LOOP, T_MOD, T_MODULE,
/*17*/ T_NOT, T_OF, T_OR, T_PROCEDURE,
/*21*/ T_REAL, T_THEN, T_TYPE, T_VAR,
/*25*/ T_WHILE, T_WRCARD, T_WRINT, T_WRLN,
/*29*/ T_WRREAL, T_RDCARD, T_RDINT, T_RDREAL, T_WRSTR,

/*34*/ T_ID, T_CARD_NUM, T_REAL_NUM, T_INT_NUM,
/*38*/ T_STRING,

/*39*/ T_ASSIGN, T_COLON, T_COMMA, T_DOT,
/*43*/ T_DOT_DOT, T_EQUAL, T_LEFT_BRACK, T_LEFT_PAREN,
/*47*/ T_MINUS, T_MULT, T_NOT_EQ, T_PLUS,
/*51*/ T_RIGHT_BRACK,T_RIGHT_PAREN, T_SEMI_COLON, T_SLASH,
/*55*/ T_GRTR_THAN, T_GRTR_THAN_EQ, T_LESS_THAN, T_LESS_THAN_EQ,

/*59*/ T_COMMENT_BEG, T_COMMENT_END,
/*61*/ T_EOF

};

Working examples are available in the pickups directory.

 


Symbol Table

Create a symbol table class (symtable.cs) using Hashtable and related data structures to retain information about the constants, types, variables, and procedures encountered while parsing the token list. The symbol table must include attention to values, lexical scope (current scope stack, and the scope of previously-declared variables), procedure parameters (reference and value), local variables, and arrays. It must be able to dump its contents to a text file. Some needed member functions might be: add a symbol, retrieve a symbol and its attributes, modify the attributes for a symbol, enter a new scope, leave a scope, get the total memory needed for a particular scope, find the innermost variable with a particular name in the current scope stack, and dump the table info to a file

Although you don't really need to do much about procedure calls for awhile, it's helpful to at least think about what you will need. The symbol table will be consulted as you are parsing both the procedure call (to load the parameters and local variables onto the run-time stack), and while you are parsing statements within the procedure (to retrieve variables and constants from the run-time stack). Remember the run-time stack built within Medusa. What information is needed in the symbol table to support such a stack?

You might ask about the "right" way to use a Hashtable to build the symbol table. There isn't one! I've seen some students store the identifier in the Hashtable with a pointer to a linked list. Each element of the linked list was a struct (or class) with all needed attributes including scope. Thus a search within a particular scope would find the linked list and then traverse it looking for the current scope. Others have decided to combine the scope number with the identifier -- thus creating a unique name to store in the Hashtable for each. Either will work.

Similarly, you can use one class to store every possible type of attribute (which means much of it will go unused for most identifiers) or have different classes derived from a common parent class for each kind of identifier (one for CONST, one for PROCEDURE, etc.) I do the former mostly but have a special class to deal with PROCEDUREs.

There is an odd little class (or struct) to deal with a declaration like:

PROCEDURE doStuff (x, y, z : INTEGER);

While parsing the x, the y, and the z, I don't know what their type is, so I save them all in an ArrayList until I find the INTEGER. Then I add all three to the symbol table (now that I know their type).

To test your symbol table, you will need to "hard code" several invented identifiers into the table (some INTEGERs, CONSTs, etc.) and then call the DumpSymTable() function to prove that it remembered what you put into it.


Actual working compiler with output of the executable:

Week 4

Beginning at week 4, you are submitting a working compiler and the output generated by the executable it creates. The compiler is measured against increasingly sophisticated Modula-2 programs.

Developer Studio sometimes runs into trouble when working from a network drive (Windows NT permission oddities, I guess). It's best to copy your complete directory from your network drive (H:\) into a corresponding C:\ or D:\ drive. When you're done, remember to copy back to the H:\ drive and to remove everything from the local hard drives.

Note that the TomPile C++ source for emitter.cpp includes several hard-coded references to directories where the compiler can find io.mac and io.obj. These will be different for your compilers. The correct code for the Unit Lab and CS Lab computers is:

	// copy some files that will be needed for the linking
	fAsm << "\tcopy c:\\programs\\masm611\\ntsample\\src\\io.mac .\n";
	fAsm << "\tcopy c:\\programs\\masm611\\ntsample\\src\\io.obj .\n";


Actual working compiler with output of the executable. (Please include the *.asm and *.exe files for each *.mod file you test -- my *.mod files and some that you might create.)

Week 5
Expressions are the main subject for this week.

Weeks 6 and 7
Conditional statements (IF-THEN-ELSE and several types of loops). Your compiler should correctly parse and execute the test programs: test05.mod, test06.mod, and LOOP_IF.mod. (Note that AND and OR move your program into the A-zone. Aim for basic loops and conditional statements first. Save AND and OR for last.)

Week 8
Arrays. Your compiler should correctly parse and execute the test programs: square.mod and sieve.mod.

Week 9
Procedures with parameters passed by value. Your compiler should correctly parse and execute the test programs: test07.mod, ProcTest.mod, and towers.mod.

Week 10
Pass by reference (Bubl.mod) and individual extensions.


What to turn in:

Please drop into the CS 380 drop box (\\es-filer-01\drops\courses\cs380). Be sure that your root folder begins with a unique prefix from your last name and the number of the current week. (My week three submission would start with "full3" for example.)

Complete source code with all .NET project files so that I can compile it if necessary.

Test results (dump files, etc.) to prove correct operation of the program. These are usually in their own directory (e.g. FullTest).

Comments in your dialog header (e.b. TomPileDlg.h) explaining where I should look when grading your programs. (Most of the code files will not have been modified between submissions.)

Don't turn in profligate debug or release directories. They consume too much drive space. Please clean eveything but the executables

How will the programs be graded?


Please email tom to report errors or make suggestions. We welcome contributed pages!

Revised by Tom: March 7, 2007 5:43 PM