Wednesday, August 29, 2007

Mynx Semantic Checks and Language Design for Declaration Before Usage

In implementing the Mynx semantic checks, the general semantic constraints of uniqueness in declaration and existence in executable that for a single pass compiler declaration before use is necessary. Otherwise a multiple pass system is needed, one for checking declarations are unique, and the other that the declaration exists. In effect, should the compiler pass require mutually exclusive semantic checks (2-pass compiler) or not? If not, the language design must require declaration before use.

One goal of mine in the compiler implementation is to try and do things minimally - the fewer passes the better. Even better is to do things on-the-fly so that information does not accumulate and consume memory, and the time to iteratively process the Mynx source being compiled.

At first, it seems there are one of two choices. Nominally, I’d opt for the second choice as type annotation, validation, and compatibility follow in the compilation process. But then I realized with the appropriate data structure, the same approach used to store information about type, the identifier, and the sentence can be used if a declaration does not exist. Simply put, existence and uniqueness semantic checks do not have to be mutually exclusive and then require other compiler passes. The dilemma of two design choices was binary thinking, a false dilemma.

During the compiler pass, if a declaration sentence is encountered, the declaration is stored in a map with the identifier as the key, and the sentence as the value. For an executable or non-declaration, the type information is accessed in the map, and the type information annotated to the sentence. If it is not found, normally with declaration before use it would flag a semantic error in a one pass compiler.

When a declaration is not found, the expected declaration identifier is stored in the multi map, along with the sentence the expected declaration is not found. A multimap is used so that any other sentences in which an expected declaration is not found is stored multiply under the same identifier. Essentially a forward declaration is implicit, a declaration later on within the class scope.

As other sentences are processed declarations are handled by a check if the declaration identifier is a forward declaration. If so, the various sentences have the declaration type annotated and existence check as if the declaration type existed when the sentence was processed. The existence semantic check is performed on the fly along with the type annotation.

Example of Forward Declaration with the Class in Mynx:


with mynx.io.IOStream;

class X is

public void doMethod is

var Int y to null;

y = x; //x nonexistent,add (x, y = x>)

IO <<< x <<< eoln; //x nonexistent,add (x, y = x;,IO <<< x <<< eoln;)
IO <<< i <<< eoln; //i nonexistent,add (i, IO <<< i <<< eoln;)

end doMethod;

public Int x to 0; //update found x declaration
public Ord i to 0; //update found i declaration

end class;


If all the sentences that use a forward declaration are cleared from the multimap, then the semantic checks of existence and uniqueness are satisfied.

However, not all the identifiers need be declared. Left over identifiers are at first glance, erroneous, but in Mynx there is an implicit static method. An implicit static method is included by using an absolute namespace inclusion.

The implicit static method uses identifiers not declared as internal variables or constants. After a pass through the source code, the remaining identifiers if not associated with a declaration are checked against any absolute namespaces for inclusion as a static method.

Example of Implicit Static Methods with the Class in Mynx:


with mynx.io.IOStream;

class X is

public void doMethod is

var Int y to null;
y = x;

IO <<< x <<< eoln; //IO nonexistent, add (IO, IO <<< x <<< eoln;)
//eoln nonexistent, add (eoln,IO <<< x <<< eoln;)

IO <<< i <<< eoln; //IO nonexistent, add (IO, IO <<< x <<< eoln;)
//eoln nonexistent, add (eoln,IO <<< x <<< eoln;)

end doMethod;

public Int x to 0; //update found x declaration
public Ord i to 0; //update found i declaration

end class;


The same Mynx class has two static methods - IO, eoln. Rewriting the Mynx class with the static methods:

Example of Explicit Static Methods with the Class in Mynx:


with mynx.io.IOStream;

class X is

public void doMethod is

var Int y to null;
y = x;

IOStream.IO <<< x <<< IOStream.eoln;
IOStream.IO <<< i <<< IOStream.eoln;

end doMethod;

public Int x to 0;
public Ord i to 0;

end class;


The interesting point is that handling implicit static methods was not by design, but a result of the properties of using the multi map. The solution is flexible to encompass that aspect of the semantic checks, which is a indication of adaptability and generality.

One other type information annotation is which operator is associated with method for a given variable. Substituting that external type validation information in the class:


with mynx.io.IOStream;

class X is

public void doMethod is

var Int y to null;
y.set(x);

IOStream.IO.put(x).put(IOStream.eoln);
IOStream.IO.put(i).put(IOStream.eoln);

end doMethod;

public Int x to 0;
public Ord i to 0;

end class;


At the end of the semantic process, there is explicitness in methods (static or otherwise) and operators. This is for the next stage of the compilation process using a sentential compiler -- code synthesis. For code synthesis, all internal declarations must exist and are unique; externally all methods and operators are resolved, and type compatibility is valid.

A further goal is to do the external semantic checks for the existence of class elements and validation of types and type compatibility on the fly if possible. One design faux pas is to do things in batch mode, a compiler pass for each major compiler function. A batch process is repetitive and inefficient with storing information between each pass.

And one other kind of semantic check is sentence-method and sentence-unit, context checks that are not specifically external type or internal uniqueness/existence. For example, an abstract method in a static class, or generic variable in a default class method. Another semantic context check is obviating a method declared in the class (a semantic mistake in the language definition).

Labels: , , ,

Mynx Semantic Checks and Language Design for Declaration Before Usage

In implementing the Mynx semantic checks, the general semantic constraints of uniqueness in declaration and existence in executable that for a single pass compiler declaration before use is necessary. Otherwise a multiple pass system is needed, one for checking declarations are unique, and the other that the declaration exists. In effect, should the compiler pass require mutually exclusive semantic checks (2-pass compiler) or not? If not, the language design must require declaration before use.

One goal of mine in the compiler implementation is to try and do things minimally - the fewer passes the better. Even better is to do things on-the-fly so that information does not accumulate and consume memory, and the time to iteratively process the Mynx source being compiled.

At first, it seems there are one of two choices. Nominally, I’d opt for the second choice as type annotation, validation, and compatibility follow in the compilation process. But then I realized with the appropriate data structure, the same approach used to store information about type, the identifier, and the sentence can be used if a declaration does not exist. Simply put, existence and uniqueness semantic checks do not have to be mutually exclusive and then require other compiler passes. The dilemma of two design choices was binary thinking, a false dilemma.

During the compiler pass, if a declaration sentence is encountered, the declaration is stored in a map with the identifier as the key, and the sentence as the value. For an executable or non-declaration, the type information is accessed in the map, and the type information annotated to the sentence. If it is not found, normally with declaration before use it would flag a semantic error in a one pass compiler.

When a declaration is not found, the expected declaration identifier is stored in the multi map, along with the sentence the expected declaration is not found. A multi map is used so that any other sentences in which an expected declaration is not found is stored multiply under the same identifier. Essentially a forward declaration is implicit, a declaration later on within the class scope.

As other sentences are processed declarations are handled by a check if the declaration identifier is a forward declaration. If so, the various sentences have the declaration type annotated and existence check as if the declaration type existed when the sentence was processed. The existence semantic check is performed on the fly along with the type annotation.

Example of Forward Declaration with the Class in Mynx:



with mynx.io.IOStream;

class X is

public void doMethod is

var Int y to null;

y = x; //x nonexistent,add (x, y = x>)

IO <<< x <<< eoln; //x nonexistent,add (x, y = x;,IO <<< x <<< eoln;)
IO <<< i <<< eoln; //i nonexistent,add (i, IO <<< i <<< eoln;)

end doMethod;

public Int x to 0; //update found x declaration
public Ord i to 0; //update found i declaration

end class;

If all the sentences that use a forward declaration are cleared from the multi map, then the semantic checks of existence and uniqueness are satisfied.

However, not all the identifiers need be declared. Left over identifiers are at first glance, erroneous, but in Mynx there is an implicit static method. An implicit static method is included by using an absolute namespace inclusion.

The implicit static method uses identifiers not declared as internal variables or constants. After a pass through the source code, the remaining identifiers if not associated with a declaration are checked against any absolute namespaces for inclusion as a static method.

Example of Implicit Static Methods with the Class in Mynx:



with mynx.io.IOStream;

class X is

public void doMethod is

var Int y to null;
y = x;

IO <<< x <<< eoln; //IO nonexistent, add (IO, IO <<< x <<< eoln;)
//eoln nonexistent, add (eoln,IO <<< x <<< eoln;)

IO <<< i <<< eoln; //IO nonexistent, add (IO, IO <<< x <<< eoln;)
//eoln nonexistent, add (eoln,IO <<< x <<< eoln;)

end doMethod;

public Int x to 0; //update found x declaration
public Ord i to 0; //update found i declaration

end class;

The same Mynx class has two static methods - IO, eoln. Rewriting the Mynx class with the static methods:

Example of Explicit Static Methods with the Class in Mynx:



with mynx.io.IOStream;

class X is

public void doMethod is

var Int y to null;
y = x;

IOStream.IO <<< x <<< IOStream.eoln;
IOStream.IO <<< i <<< IOStream.eoln;

end doMethod;

public Int x to 0;
public Ord i to 0;

end class;

The interesting point is that handling implicit static methods was not by design, but a result of the properties of using the multi map. The solution is flexible to encompass that aspect of the semantic checks, which is a indication of adaptability and generality.

One other type information annotation is which operator is associated with method for a given variable. Substituting that external type validation information in the class:

with mynx.io.IOStream;

class X is

public void doMethod is

var Int y to null;
y.set(x);

IOStream.IO.put(x).put(IOStream.eoln);
IOStream.IO.put(i).put(IOStream.eoln);

end doMethod;

public Int x to 0;
public Ord i to 0;

end class;

At the end of the semantic process, there is explicitness in methods (static or otherwise) and operators. This is for the next stage of the compilation process using a sentential compiler -- code synthesis. For code synthesis, all internal declarations must exist and are unique; externally all methods and operators are resolved, and type compatibility is valid.

A further goal is to do the external semantic checks for the existence of class elements and validation of types and type compatibility on the fly if possible. One design faux pas is to do things in batch mode, a compiler pass for each major compiler function. A batch process is repetitive and inefficient with storing information between each pass.

And one other kind of semantic check is sentence-method and sentence-unit, context checks that are not specifically external type or internal uniqueness/existence. For example, an abstract method in a static class, or generic variable in a default class method. Another semantic context check is obviating a method declared in the class (a semantic mistake in the language definition).

Labels: , , ,

Tuesday, August 14, 2007

Language Design and Semantics-- Language Semantics to Compiler Design

In a previous blog entry, language design can impact the design of the compiler, especially the semantic checks. A one pass compiler seems to require a more restrictive language akin to Pascal, with strict declaration before use semantics. That simplifies the compiler design and compiler writers task of creating an efficient (efficient in being a one pass design) compiler.

Moving Forward -- with Forward Method Syntax


One possibility from Pascal (actually stemming from a limitation of Pascal) is a forward declaration -- adding a feature into the language (instead constraining a feature). As the compiler for Mynx uses sentential or bounded context of a sentence parsing, it is simply adding a keyword into the language, and the prefix and suffix for a sentence. Something like:

method blah is forward;

The disadvantage is adding another sentence and keyword into the language, and for the convenience of the compiler writer and the compiler design.

Alternatively, Mynx already has the feature of method equivalents, so once could define a dummy method as a forward declaration, as:

method blah(Void) is to null; //dummy method takes a Void parameter

This has the comparative advantage that no new language feature is necessary, and uses a feature already in the language (and not designed for that purpose, but that illustrates a level of adaptability). The real disadvantage is a Mynx class will have many dummy methods that are confusing and difficult to comprehend, and couples the compiler design to an expected language feature that is not part of the language. There is no explicit syntax for the semantic feature, not like obviating a method which has syntax to explicitly remove a method from a class.

Type Annotation


Type annotation is the first part of the last phase of semantic checks of type checks. After uniqueness, and existence semantic checks, type checking verifies that if the elements of a class or program are unique, and exist (are declared and initialized), type checking verifies the elements can be connected in expressions and statements -- are type compatible. Before checking type compatibility, the elements must have the type information annotated.

Ideally type annotation is done on the fly (OTF...sorta like WTF?!) in a one pass compiler. In terms of read and write:

  1. declaration of attribute or variable - write type into symbol table.

  2. expression of attribute or variable - read type from symbol table.


Within the class or program scope, all methods and attributes are visible within the context of the unit -- hence there is no declaration before use, so first pass writes the type information into the symbol table, the second pass reads the information from the symbol table.

Type Annotation Forward


A forward declaration is a stub for the real declaration later on at some point in the class or program. It is a low-fat declaration that writes type into the symbol table.

As previously stated, the added complexity of another syntax construct, and a new lexeme or keyword. A further complication is that after a forward method is parsed, and entered into the symbol table, a post-check requires storing all forward method declarations, and verifying that an actual method was declared exactly but later on in the class.

Another semantic twist is that a forward semantic declaration would require a uniqueness (not declared before), and that an actual method declaration does not proceed the forward method declaration (not declared exactly). Instead of simplifying semantics at the cost and trade-off of added lexical and syntactic elements and processing, more semantic complexity is added in the process.

Zero Sum Gain Forward



The result in the compiler design is zero sum (to borrow a term from game theory), so in simplifying of the semantics the complexity has increased in all three phases: lexical, syntax, and semantic.

The simpler approach (and simplicity is better than complexity without any real benefit) is to implicitly require declaration before use, and have the compiler flag a semantic error if an expected attribute or method is not in the symbol table to annotate the type. Of course, this approach is under the requirement of a one pass compiler, for a two pass compiler it would be superfluous.

The real question and the subsequent issues that follow is one pass or two pass, which are driven by the design of the Mynx programming language. In a strange way, its almost a vicious circle with feedback from the design of the compiler design by the compiler writer impacts the design of the language and the programming language designer. The trade-off is that language design complexity is inversely related to the compiler design complexity.

In this case, I am doing both so it is not separate people, but the design issues even in the same head of one person are paradoxical and counter-intuitive. The eventual goal is to find a point of equilibrium between the language design and the compiler design.

As a philosophy professor once might have said, "A long contemplative meditation upon the problem is necessitated."

Labels: , , , , , ,

Tuesday, August 07, 2007

Semantic Checks - Uniqueness Context Check--the Nuances of Semantics

The uniqueness context check is of two distinct parts:

  1. global - context of class or program.

  2. local - contex of constructor/destructor/method.


The approach to verify that all attributes, constants, variables are unique is either a two pass approach, or a one pass approach through the source code.

Uniqueness is verified by storing the identifiers and the descriptor of the declaration in a symbol table structure. At the completion, all identifiers are known to be unique in both the global and local contexts within the class or program.

Significance of Uniqueness - Existence and Type Semantic Checks



The decision to use a two or one pass approach for uniqueness seems overkill in design. However, the next step is existence in context semantic checks, in which the original declaration information of type is annotated to the syntactic descriptors (which are elements of essentially an abstract syntax tree (AST), and then type checking is performed. Hence a one or two pass approach to uniqueness will impact the existence semantic check, and then the type semantic check.

Either way there is at least one pass, but both approaches have trade-offs in terms of speed and storage a.k.a. time and space.

Both approaches, the two pass, and the one pass depend on the semantic rules of context constraint of existence. Within a class or program does declaration of a method or attribute preceed use?

Two Pass Approach to Context Uniqueness Check



The two pass approach uses two passes that are:


  1. first pass is for the global context, done on the fly as each sentence is parsed. The body of a method/constructor/destructor is ignored.

  2. second pass is for the local context, done afterward on a buffer of the parsed sentences -- the descriptors -- handling the body of the methods.


Requires 2-passes - one for class elements, second for methods.
Requires buffer of syntax descriptors.

One Pass Approach



The single pass approach handles both the global context and local context within a single pass through the source code, done on the fly as each sentence parsed.

The second pass is essentially integrated into the first, so instead of ignoring the body of a method, it is handled. If a method uses an attribute or other method that is declared later on, the use is stored in a forward attribute or forward method map. Later the maps must be processed with the attribute and method declarations.

Requires a map for forward declarations for attributes and methods. Requires post-pass processing of the forward declarations.

Language Design and Impact on the Compiler



The declaration before use semantics allow one pass compilation, but make the language more rigid like Pascal which had a very strict order of declaration.

C-style of using separate headers dot-h, then everything is known in one central source when compiling the dot-c, but duplicates the declaration, and separates it from the definition.

Originally, I considered Mynx to have strict sections like the visiblity sections of a C++ class, but opted not to use that approach as it is too rigid.

Language design is not just about features and functionality, but also a strong bearing on compiler design and efficiency. A compiler for a given language is a trade-off in efficiency to features of the language.

I am opting towards some declaration before use semantics in some features such as attributes and methods (including the constructor) but not other elements. I am aiming for a single pass compiler to avoid the buffering and additional processing, as well as the time involved.

Uniqueness Semantic Check as First Step in Context Semantic Check



The uniqueness context check of semantics is followed by existence -- a check of existence in an expression or statement. Hence uniqueness checks are directly related to existence, the declaration before use semantics. The uniqueness checks ensure that identifiers are unique, and the existence checks that identifiers exist -- and that type information associated with the identifer is annotated to the syntax element.

The uniqueness semantic context check is the first context semantic check on the class/program and any constructors/methods/destructor written. How the uniqueness checks are implemented later will impact the further semantic check functionality.

The context semantic checks and actions are:


  1. Uniqueness in Context (check)

  2. Existence in Context (check)

  3. Annotation in Context (action)


It should be noted that each is not a discrete check or action, all three will occur (once fully implemented) in the single pass on the fly as sentences are parsed.

Labels: , , , , ,

Website Spy Software