Think Out Loud: Mynx Programming Language Design: January 2008

Wednesday, January 30, 2008

Mynx Method Obviates All for One, but One for All or Something Like That...

Method obviation allows a software developer to simplify the class functionality, to remove obsolete or deprecated methods in a sub-class.

Mynx Method Obviation

Mynx has a feature to obviate, or remove from the class interface/protocol methods from a super-class (if you do not want to create a method in the class, then do not or make it private...) to simplify the sub-class interface/protocol.

Method obviation is of one of two forms:

All - obviate all methods by the method name.

One - obviate one method by the method name and parameter list.

There are semantic rules for method obviation listed in the Mynx Programming Language Manual (MPLM), but one specific rule impacts the implementation and design of the compiler.

Mynx Obviate Semantic Rules

The specific semantic rule that impacts the compiler is that Mynx method obviates must be mutually exclusive to avoid obviation ambiguity--a method cannot be obviated specifically for one kind of method, and then all the methods of the same name obviated. For example, consider the Mynx class:


class badObviate as someSuperClass is

     //obviate specific method
     method noDuh(Int) as void;
 
     //obviate all methods
     method doNop as all void;   

     //obviate all but conflicts with specific
     method noDuh as all void;  

end class;

Why the concern about overlapping method obviates? The ambiguity is not that the specific method is obviated, or all of them -- but what specifically? All or one, one or all? (To paraphrase from the Three Musketeers...)

Compiler Internals for Method Obviates

Internally, the compiler is constructed to use a map data structure (called an OrderedMap--a hand crafted class for efficiency and efficacy in the compiler design) to associate the specific signature of an obviate method with the sentence that declares the method obviate. The Mynx class obviates return a specific key as a String for the method obviate.

Two maps are used, one for the obviate of all methods (the key the method identifier as a string), and one for the obviate of one method (the key the method identifier and parameter list of specific types as a string).

Both maps ensure another semantic constraint--uniqueness. No method obviate can be repeated or re-declared, just like any other declaration must be unique in context of unit or method.

The snafu is in the mutual exclusiveness constraint; originally I had intended to check the obviate mutual exclusiveness on the fly, as obviates are inserted into each map. However, in contemplating the use cases of possible Mynx classes, using two separate maps has the problem that later on a method for all method obviation might conflict with a method inserted earlier for one method obviation--remember the key is the method name and parameter signature, not just the method name.

Implementing Method Obviate Semantics

One approach is to use a short method name map that shadows the obviate map for one method. After a obviate declaration for all is checked against duplicates in its own map, it is then checked against the shadow map. Of course, this duplicates the obviate map for single method obviation. Another problem with on the fly approach is that for an error, the error is muddled or obfuscated.

For a conflicting error of a single obviate with a multiple obviate overlap--the error is limited to being reported as a Boolean true or false. In the parser, the semantic error is reported as a semantic error of duplicate obviate declaration if the method used to add the obviate returns a Boolean true; the error can not be a more specific error of conflicting method obviate declaration.

Semantic Implementation of Method Obviate Declarations

Summarizing, on the fly obviate method declaration check is possible, yet leads to both muddling of an error report, and duplicate information in the compiler. On the fly has far too much of a cost to be used. Later, after the first semantic pass, before the second semantic pass (or as the first stage of the second pass...) the obviates are unique for each kind, but then each can be checked against the other for overlap and a more informative error message given--without an extra map data structure.

It also occurred to me that the first semantic pass in tandem with parsing is to ensure that unit elements are unique in context -- overlaps in method obviates are a semantic constraint checked before the second semantic phase is performed.

For the second semantic phase, the Context object that contains the symbol table and the Store of Mynx sentences is passed to a class which will perform the semantic analysis. It is in that class in the second phase of semantic checks that the method obviates are checked for being mutually exclusive. One for all, all for one, and none for all is one...whatever that means or could mean.

Labels: context semantic, Mynx compiler, Mynx compiler design, mynx obviates, mynx semantics

Friday, January 18, 2008

The Art of the Trade-Off in Programming Language Design and Compiler Construction

In continuing implementing the Mynx compiler, I've had to re-visit questions and original decisions that as the design and construction pushes further requires a re-think as unexpected twists and turns appear in the compiler road (to use a loose metaphor.)

Number of Passes in a Compiler - Semantics and Synthesis

Originally I intended and had designed for a single-pass or one-pass compiler. But, it required a great deal of forwarding of information which had to be checked at the end, creating a large post-pass batch job. So fixing on the efficacy of a one-pass compiler creates an overhead and large time consuming check. Time to re-consider if one-pass is really better, as it creates a large delay as post-checks must be done for all the forwarded information. The single-pass seems faster but then the compiler pays with time and space as interest on the quickness.

The two-pass compiler has a concurrent phase in tandem with the parser, and then a second phase. The second phase or second pass processes the stored sentences put into the code store during the first phase. No forwarded information, no post-checks or extra required storage. In a way it is one separate phase, with a concurrent phase during parsing that is parallel. One mistake in compiler theory (I took a course that used the infamous/ubiquitous "Dragon Book") is that each compiler phase is mutually exclusive and distinct--it can be, but it is more overlapping shades of grey not black and white.

Inside the Compiler

Internally, as each sentence is partitioned, parsed with semantic checks, and then is stored in the store, an array of sentences where the next phase in semantic analysis is performed. The code store is used later to code synthesize or code generation sentence by sentence into a code synthesis object. Hence the use of the code store in code synthesis later is a natural stepping stone from the concurrent semantic analysis with parsing, then a separate semantic check after parsing but before code synthesis.

The concurrent semantic analysis in parallel with the parse is done by the sentences in conjunction with a context object--so essentially the sentences in good object-oriented design do their own checks internally, and pass some information to a common context object--which is where the code store is located. Every sentence has a method check that performs a semantic check internally and in context, being passed the Context object.

This occurs after a successful parse, then the last action (or the only action for some sentences) is to store the sentence in the code store as the last semantic action. During the second independent semantic analysis phase, a separate class for semantics is used--the Analyzer class which processes the sentences in the code store.

External Semantics of Type and Inclusion--Language Design and Compiler Implementation

A previous blog entry I left open a question about Mynx semantics of type inclusion class name, and namespace. The entry mentions the question "Are Mynx classes unique by the class name or within the namespace--the module and class name?" I discussed the trade-offs in each approach. But in contemplating the question, one important point is in the area of language design and compiler implementation.

The point is about making a language design decision that impacts compiler implementation for the language feature resolved in the decision. A features can be permissive or open for flexibility, or constraint or closed for efficiency. Flexibility gives some latitude in using the language but increases compiler complexity, whereas a constraint puts limits on using the language but makes the compiler simpler. A correlation between language design and compiler complexity.

Note that some features in some languages seem like great simplifications. For example no declaration before use, it simplifies the compiler or interpreter, but does create a headache for the software developer to track where a variable or constant is first used, and the type. The complexity is shifted from compile-time to run-time although it does simplify the compiler.

For example, Pascal insists on declaration before use, which greatly simplifies the compiler--but it a restrictive language feature. Later Pascal implementations deviated and allowed forward (although it became part of the language standard) declarations, and a few allowed declarations externally (which became Modula-2 later on, Wirth's successor to Pascal). The decision which led to a constraint on declarations before use impacted the compiler, making it very simple to implement. But, the constraint could have been relaxed later, with more sophisticated Pascal compilers, yet still backward compatible with existing Pascal code.

Language Design Constraints--Easier to Relax than Restrict

Principle of programming language feature: "It is easier to rescind or relax a language design constraint than to impose a constraint later on a programming language."

The difficulty is that a language design feature is sometimes set in stone, like the constraint on declaration before use in Pascal. Wirth could have later extended Pascal into Pascal2 or Pascal++ or Pascal# relaxing the constraint but making it backward compatible with the original Pascal. C++ by Stroustrup did this to great effect with the syntax, backward compatibility with C but new features, whereas Cox with Objective-C was not syntactically backward compatible (and used unfamiliar Smalltalk syntax to C-style syntax...). There are other philisophical differences, but that is not the point (reminds me of quibbling over cement or concrete, meter vs. foot, Ford vs. Chevy, Coke vs. Pepsi, ad nauseaum...) - but the point is that constraint versus flexibility of a language feature is important. A constrained source code in a programming language can be upwardly compatible with a compiler that handles a more flexible source code, and not vice-versa.

The question of namespace, class, and uniqueness of thereto is on class name--it requires a class to be unique on name, not namespace. Namespaces are to organize classes, but a class is unique on name alone--this is a language feature as a constraint.

How does the language design question of what is a class unique on, namespace in conjunction with class name, or class name alone impact the Mynx compiler? It simplifies resolution of type--a type alias such as Int, Real, or String resolves to a unique name but organized by namespaces, whereas unique on namespace requires resolution--resolving the type name. Similar class names lead to multiple types for a type alias, Vector might be several different Vector classes in the namespace of the compiler.

During type annotation, when the full type name and type information is annotated to each sentence, each use of the type must be checked to eliminate multiple types from the type alias. By the end of the phase of semantic checks of type annotation and compatibility every type alias in a declaration must resolve to one type. There is an added amount of complexity to resolve each type alias to a single class, and then to verify that all the type aliases have resolved to a single class.

The Design Trade-Off in Mynx Class to Type Alias

A trade-off is that each class is unique on the class name, not the namespace. This simplifies semantics of namespaces and classes--a type alias that has more than one potential class from different namespaces is reported as a unit context error--type reference ambiguity a type alias resolves to more than one type. In the language design of Mynx (and adding it to the Mynx Programming Language Manual (MPLM)) that was an open feature that is now a language design constraint. But, to get back to the original point, the constraint can later be relaxed, allowing flexibility by a class being unique by namespace and not name. The more strict Mynx classes and programs will then be upwardly compatible with the new compiler--but for the initial implementation, some trade-offs are necessary to simplify things. Later the constraint can be relaxed and the ability to resolve multiple type aliases incorporated into the Mynx compiler.

For the feature of uniqueness of a class by name not namespace is unique; Mynx classes and programs can use absolute namespace inclusion for a specific class, and avoid duplicating class names in the Mynx languages's namespace--such as classes in mynx.core and mynx.io. Later this constraint can be relaxed once the compiler is up and running.

Labels: class name, compiler writing, Mynx compiler, mynx language design, namespace, programming language constraint, programming language feature, types

Monday, January 07, 2008

Module, Class Name, Namespace, and Type--That is the Semantic Question

In the process of implementing semantic checks within the context of unit (class/program) and methods, I have been trying to organize the semantic checks and actions into discrete phases. It is a process of pushing and poking the operations into some semi-ordered chaos, fleshing out the entire process so that the implementation source code is ready to implement the functionality. I've had to restart (but I did so with the syntax and lexical stages of the compiler, so its a learn by doing, then re-do but with the knowledge gained).

Currently I'm implementing the functionality to read and write MOXI (Mynx Object XML Interface) files--files that contain the semantic information of an external class, such as a method, the parameters, and its mode. Operator overloading of operator and method are also defined. XML is used as a platform neutral means to provide external semantic data. Unlike Java with .class or C# and assemblies which can load the binary class files and by reflection check if a class has a method or attribute, Mynx does not have a platform binary format external object. Reading and writing a MOXIObject to and from an external XML file from MOXIReader, MOXIWriter. MOXI files also will at some future date allow a graphical tool to display class information and namespaces hierarchically. But it is tedious to implement and requires much processing of XML tokens reading or writing from an external file. I've considered a binary format MOBI (Mynx Object Binary Interface) for external semantic class information. But external type information is needed for type compatibility checks, and later in the code generation/synthesis for operators overloaded by class methods. Part of the compilation process is to create a MOXI file for a compiled class for re-use in other Mynx classes or programs.

One complexity is that for a declaration, a type is a type alias, a short identifier for a canonically dotted name the canonical type name.


with mynx.core.*;

class exampleTypes is

    public Int attr to null;  //Int  => mynx.core.Int

    public void exampleMethod is

        var Char chr to null; //Char => mynx.core.Char

    end exampleMethod;

end class;

For the given class, the type alias

Int

Char

is used in the declarations of any types in the class. The inclusion statement

with mynx.core.*

acts as the means to include all necessary classes within that namespace by specified with the type alias. It is convenient, both for specifying the namespace to utilize, and in the declaration.

Despite is utility and convenience, there is a design decision--one that must be firmly and clearly stated to avoid any ambiguity in the semantics of Mynx. The design decision can be (excuse the obvious allusion to a popular game show) put in the form of a question. The answer to the question is the resulting choice of design decision.

The question is: "Are Mynx classes unique by the class name or within the namespace--the module and class name?"

For example, the class declared with the type alias of "Int" can exist in multiple namespaces if the class is unique within a namespace, for the module and classes:

mynx.core.Int

org.apache.mynxtype.Int

com.williamgilreath.mynx.numeric.Int

The same class name can be used in different namespaces if the class is unique within the namespace.

Alternatively, a class that must be unique on name only must be unique for any and all possible namespaces. Thus

Int

can be resolved to only one namespace and class, all other classes require a unique name for the class. Thus

Int

is not unique by the namespace, so then the namespaces require unique class names:

mynx.core.Int

org.apache.mynxtype.Integer

com.williamgilreath.mynx.numeric.MathInt

Thus there are two options to the semantics of the type alias and its resolution to the full canonical type name:

Class is unique by name alone independent of namespace.

Class is unique by name in conjunction with namespace.

Why the incredible concern and focus on what seems a tawdry semantic question of type names?

Simple, if Mynx uses (as was originally but not overtly stated) a class name is unique within its namespace, it is more flexible, but it requires more semantic processing. A type list is created during the initial semantic checks, but for classes unique in a namespace it necessitates a type map, a mapping of the type alias to the canonical type names. Then later, during further semantic phases, the potential canonical type names must be resolved to one specific canonical type name.

If Mynx uses uniqueness on a class name independent of a namespace, the namespace becomes a method of organizing classes, but not in creating a unique type name for a class. The class name is unique independent of namespace, so a specific class can be included by the class name in the declaration and the namespace for the inclusion. Effectively it puts the onus on the Mynx software developer--the user of the language instead of the implementor (me for this compiler).

The unique by class name alone also simplifies the compiler, as it would be a semantic error to have multiple canonical type names for a type alias. If a type alias is not resolved to one, and only one canonical type name it is a fatal semantic error--class inclusion ambiguity (but essentially the same thing unique by class name with namespace allows). The need for a type map would be unnecessary (the type list fully expanded from type alias to canonical type name), as would the resolution of multiple canonical type names for a type alias.

The question stands, the only remaining thing is what approach? That is the question to be considered, along with examining other programming languages including the "Big Three" of C++, Java, and C#. Mynx will be synthesized into code for Java and C#, and potentially C++. In the meantime semantic checks can be implemented up the point of type checks, type annotation, and any other future type-specific semantic handling.

Labels: Mynx classes, Mynx compiler, mynx semantics, name collisions, namespace, uniqueness, uniqueness semantics

Think Out Loud: Mynx Programming Language Design