Friday, January 18, 2008

The Art of the Trade-Off in Programming Language Design and Compiler Construction

In continuing implementing the Mynx compiler, I've had to re-visit questions and original decisions that as the design and construction pushes further requires a re-think as unexpected twists and turns appear in the compiler road (to use a loose metaphor.)

Number of Passes in a Compiler - Semantics and Synthesis



Originally I intended and had designed for a single-pass or one-pass compiler. But, it required a great deal of forwarding of information which had to be checked at the end, creating a large post-pass batch job. So fixing on the efficacy of a one-pass compiler creates an overhead and large time consuming check. Time to re-consider if one-pass is really better, as it creates a large delay as post-checks must be done for all the forwarded information. The single-pass seems faster but then the compiler pays with time and space as interest on the quickness.

The two-pass compiler has a concurrent phase in tandem with the parser, and then a second phase. The second phase or second pass processes the stored sentences put into the code store during the first phase. No forwarded information, no post-checks or extra required storage. In a way it is one separate phase, with a concurrent phase during parsing that is parallel. One mistake in compiler theory (I took a course that used the infamous/ubiquitous "Dragon Book") is that each compiler phase is mutually exclusive and distinct--it can be, but it is more overlapping shades of grey not black and white.

Inside the Compiler



Internally, as each sentence is partitioned, parsed with semantic checks, and then is stored in the store, an array of sentences where the next phase in semantic analysis is performed. The code store is used later to code synthesize or code generation sentence by sentence into a code synthesis object. Hence the use of the code store in code synthesis later is a natural stepping stone from the concurrent semantic analysis with parsing, then a separate semantic check after parsing but before code synthesis.

The concurrent semantic analysis in parallel with the parse is done by the sentences in conjunction with a context object--so essentially the sentences in good object-oriented design do their own checks internally, and pass some information to a common context object--which is where the code store is located. Every sentence has a method check that performs a semantic check internally and in context, being passed the Context object.

This occurs after a successful parse, then the last action (or the only action for some sentences) is to store the sentence in the code store as the last semantic action. During the second independent semantic analysis phase, a separate class for semantics is used--the Analyzer class which processes the sentences in the code store.

External Semantics of Type and Inclusion--Language Design and Compiler Implementation



A previous blog entry I left open a question about Mynx semantics of type inclusion class name, and namespace. The entry mentions the question "Are Mynx classes unique by the class name or within the namespace--the module and class name?" I discussed the trade-offs in each approach. But in contemplating the question, one important point is in the area of language design and compiler implementation.

The point is about making a language design decision that impacts compiler implementation for the language feature resolved in the decision. A features can be permissive or open for flexibility, or constraint or closed for efficiency. Flexibility gives some latitude in using the language but increases compiler complexity, whereas a constraint puts limits on using the language but makes the compiler simpler. A correlation between language design and compiler complexity.

Note that some features in some languages seem like great simplifications. For example no declaration before use, it simplifies the compiler or interpreter, but does create a headache for the software developer to track where a variable or constant is first used, and the type. The complexity is shifted from compile-time to run-time although it does simplify the compiler.

For example, Pascal insists on declaration before use, which greatly simplifies the compiler--but it a restrictive language feature. Later Pascal implementations deviated and allowed forward (although it became part of the language standard) declarations, and a few allowed declarations externally (which became Modula-2 later on, Wirth's successor to Pascal). The decision which led to a constraint on declarations before use impacted the compiler, making it very simple to implement. But, the constraint could have been relaxed later, with more sophisticated Pascal compilers, yet still backward compatible with existing Pascal code.

Language Design Constraints--Easier to Relax than Restrict



Principle of programming language feature: "It is easier to rescind or relax a language design constraint than to impose a constraint later on a programming language."

The difficulty is that a language design feature is sometimes set in stone, like the constraint on declaration before use in Pascal. Wirth could have later extended Pascal into Pascal2 or Pascal++ or Pascal# relaxing the constraint but making it backward compatible with the original Pascal. C++ by Stroustrup did this to great effect with the syntax, backward compatibility with C but new features, whereas Cox with Objective-C was not syntactically backward compatible (and used unfamiliar Smalltalk syntax to C-style syntax...). There are other philisophical differences, but that is not the point (reminds me of quibbling over cement or concrete, meter vs. foot, Ford vs. Chevy, Coke vs. Pepsi, ad nauseaum...) - but the point is that constraint versus flexibility of a language feature is important. A constrained source code in a programming language can be upwardly compatible with a compiler that handles a more flexible source code, and not vice-versa.

The question of namespace, class, and uniqueness of thereto is on class name--it requires a class to be unique on name, not namespace. Namespaces are to organize classes, but a class is unique on name alone--this is a language feature as a constraint.

How does the language design question of what is a class unique on, namespace in conjunction with class name, or class name alone impact the Mynx compiler? It simplifies resolution of type--a type alias such as Int, Real, or String resolves to a unique name but organized by namespaces, whereas unique on namespace requires resolution--resolving the type name. Similar class names lead to multiple types for a type alias, Vector might be several different Vector classes in the namespace of the compiler.

During type annotation, when the full type name and type information is annotated to each sentence, each use of the type must be checked to eliminate multiple types from the type alias. By the end of the phase of semantic checks of type annotation and compatibility every type alias in a declaration must resolve to one type. There is an added amount of complexity to resolve each type alias to a single class, and then to verify that all the type aliases have resolved to a single class.

The Design Trade-Off in Mynx Class to Type Alias



A trade-off is that each class is unique on the class name, not the namespace. This simplifies semantics of namespaces and classes--a type alias that has more than one potential class from different namespaces is reported as a unit context error--type reference ambiguity a type alias resolves to more than one type. In the language design of Mynx (and adding it to the Mynx Programming Language Manual (MPLM)) that was an open feature that is now a language design constraint. But, to get back to the original point, the constraint can later be relaxed, allowing flexibility by a class being unique by namespace and not name. The more strict Mynx classes and programs will then be upwardly compatible with the new compiler--but for the initial implementation, some trade-offs are necessary to simplify things. Later the constraint can be relaxed and the ability to resolve multiple type aliases incorporated into the Mynx compiler.

For the feature of uniqueness of a class by name not namespace is unique; Mynx classes and programs can use absolute namespace inclusion for a specific class, and avoid duplicating class names in the Mynx languages's namespace--such as classes in mynx.core and mynx.io. Later this constraint can be relaxed once the compiler is up and running.

Labels: , , , , , , ,

Website Spy Software