Sunday, June 24, 2007

Semantic Merge into the Data Traffic Lane - Data Structures and the Symbol Table

After completing and testing the individual sentence syntax objects of Mynx (with approximately 300-test cases) the next phase is what I term uniqueness and existence semantic checks -- in context within the class or program. These are the usual semantic checks in compiler texts involving the symbol table. The checks are semantic context as they are in the unit of context -- the class or program. Consider the Mynx EBNF for a class body:

CLASS_BODY := ( CLASS_METHOD | CLASS_ATTRIBUTE | CLASS_OVERLOAD | CLASS_OBVIATE
| CONSTRUCT | DESTRUCT | DEFAULT)*

PROGRAM_BODY := (PROGRAM_ATTRIBUTE | PROGRAM_METHOD)+

Another important construct is the module and namespace inclusion that is:

MODULE := [ module (default | NAME) ‘;’ ]
WITH := ( with NAME [‘.’ ‘*’] ‘;’ )

The class body has 0 or more class elements, and a program body at least 1 or more program elements. The namespace elements are 0 or more. The module plus the program or class name creates an implicit namespace statement, but being optional, is again 0 or 1.

The semantic merge phase follows the sentence semantic check, and the semantic merge has two aspects:

  1. Uniqueness - context syntax elements are unique - declaration only once.

  2. Existence - context syntax elements exist before use - declaration before use.


Each syntax object must check as semantically unique and entered into the data structures within the symbol table before the existence checks can be performed as the overall semantic context checks.

Uniqueness - there can only be one



The uniqueness check is that syntax elements are unique. The major question of concern is: unique on what within the syntax object?

For each class element:

  1. Attribute - unique on name identifier

  2. Constructor - unique on type identifiers in parameter list

  3. Default - only one

  4. Destructor - only one

  5. Method - unique on name identifier, parameter list, and if covariant return type identifier

  6. Namespace - unique on sequence of identifiers

  7. Obviate - unique on method name identifier, parameter list; if an all methods obviate, only one unique on method identifier

  8. Overload - unique on operator


For each program element:

  1. Attribute - unique on name identifier

  2. Method - unique on name identifier, parameter list


For each syntax element is unique on identifier, sequence of identifiers, or operator.

After the semantic sentence check then follows the semantic merge, and the semantic context checks. In the process, the context elements are entered in the symbol table.

Symbol Table - Collection of Data in Structures



The symbol table is a composition of data structures, for each of the context elements of a class or program.

Each syntax object implements a specific interface to get the unique string for use in the symbol table. Depending upon the syntax object, and what information is to be stored, a different data structure is used.

Three data structures: ordered vector, ordered map, and ordered multimap.

  1. Ordered Vector - elements are unique, but no association with syntax object

  2. Ordered Map - elements are unique, but association with syntax object

  3. Ordered Multimap - elements are unique, but association in multiple syntax objects


All three data structures are implemented from scratch (to foster portability to C++ and C#...so no specific data structures library), and are linear data structures that maintain ordering -- sorted for O(lg n) access performance (to check for duplicates). The ordered map is an ordered vector, only with a pair key to value association class, and the ordered multi map is with a tuple key to an array of values (dynamically resized if need be).

Many compiler texts spend an inordinate amount of time discussing implementation of the symbol table, but the reason is that the implementation of the symbol table impacts performance. Using object-oriented data structures hides and simplifies many of the details.

Implementation of Semantic Merge



The symbol table is within the context object -- the singleton class that contains context information about the class or program being semantically analyzed.

The merge occurs within the sentence, after the sentence semantic check, the semantic merge checks for duplicates on the sentence itself, and reports errors. If no errors, then the sentence is merged by calling a method on the symbol table.

The sentences are the driving point of the semantic merge and error reporting, the symbol table and context object act as a single point or locus for access and storage.

Existence Semantic Checks



The existence semantic checks will follow once the uniqueness semantic checks are complete. Existence is the declaration before use principle common to many of the strongly typed programming languages such as Algol, Pascal, Java, Oberon, C++.

Labels: , , , , ,

Website Spy Software