Think Out Loud: Mynx Programming Language Design

Wednesday, April 01, 2009

Mynx nor Never, Not

Mynx Compiler Development

I've been doing some work on Mynx, the compiler but because my Acer 3000 laptop I'm using is malfunctioning and erratic, my development efforts have been slowed to a very slow crawl. I keep re-installing Windows XP, but I have to install the tools, and the software directories for the Mynx compiler, documents, and tools.

Given my current economic situation (unemployed), I simply cannot buy another laptop or notebook.

Development at a Crawl

So Mynx is not cancelled, but it is going to have to wait. The current economy limits options for another job as a software engineer, and living in south Mississippi, there are few options for me, and consequently Mynx.

To quote the character Robert Goren from Law and Order: Criminal Intent, "Bing! Reality"

Unfortunately I need a Windows PC to develop the parallel compiler for the .NET CLR in C#.

On the back burner on the software development, relegated to return to it someday.

Quod, fiat, fiat.

Labels: malfunctioning laptop, mynx, Mynx compiler

Sunday, November 02, 2008

Start with a Little MOXI, end with a Compiler?

Problem - Global Scope

A unit (a class or program) in Mynx is compiled, but the complexity is that an element in a unit--a method, attribute, or overload does not have to be declared before use. Effectively all elements in a unit are in a global scope with each other, so any attributes or methods are visible even if not declared before referenced.

Declaration before use would simplify type checking, the last stage of semantic analysis by avoiding the need to either store a unit element reference and verify the element exists, is unique, and is type compatible. In short, the problem is that global scope and visibility requires more than one pass. Looking at Pascal, the design of the programming language impacts the compiler, and Pascal was elegant for its one pass compilation.

But, the trade-off as Kernighan strongly points out in "Why Pascal is not my Favorite Programming Language" is the declaration before use. It creates, to quote, "The result is that a typical Pascal program reads from the bottom up - all the procedures and functions are displayed before any of the code that calls them, at all levels."

An aside: I had a teacher in college who loved Pascal and detested C (it was funny to find for loops in C with an index starting at 1...and other statements Pascalized in a Unix software project) and pointed out "but you know where it [the declaration] is at..." (good for a learning language, and simplifies the compiler, but too rigid and sacrifices flexibility for software development; just for fun ((if you can call it that)) take a Java/C++/C# class and move all declarations to the beginning of a class or method, now you know where they are...).

Kernighan makes the point of "This means that all declarations of one kind (types, for instance) must be grouped together for the convenience of the compiler, even when the programmer would like to keep together things that are logically related so as to understand the program better."

Convenience for the compiler writer, for the language designer impacts the software developer--but a programming language, once created is widely used, for better or worse.

MOXI

MOXI is an XML file that contains the semantic information of the Mynx class API--instead of a JVM .class file as Mynx is compiled to a .class or CLR assembly a platform neutral means is needed to store class API information.

Type checking utilizes a MOXI file, and a MOXI object; a MOXI file is read to create a MOXI object, which provides a searchable object to check for type, if a class contains an element, the mode and kind of the element, and operator overloads. Hence a MOXI file is the software object for external semantics of type checking.

Approach to Solution

There are two possible approaches to a solution:

batch mode checking--store and check types.

on-the-fly checking, with code synthesis.

Batch

The batch mode approach is problematic with memory usage and forward referencing. Every type that is deferred in checking but must be stored, and checked later. For a reference, if there is an error, the specific information of where at line and position must be tracked. Deferring type checks, and the external semantic checks while possible is unfeasible--complexity added for a single-pass compiler.

On the Fly

A workable solution is a two pass system. For each pass:

1st pass, create MOXI object, parse sentence, sentence semantic checks, buffer sentences.

2nd pass, unit type checks, code synthesis, store MOXI object for class.

After the first pass, any sentence semantic errors will terminate the second pass, if there are errors in the discrete sentences of the source code being compiled, no point to continue. In the first pass, a MOXI object is created. Another important step is to create a typeMap object, so that for a short, discrete name like Type, the overall complete namespace for the associated class such as com.williamgilreath.mynx.Type is available.

The second pass begins with the created MOXI object checked against the MOXI objects for any inherited classes with a class, if the classes are not mutually disjoint, compilation terminates. Compilation proceeds with no MOXI object errors, and using the own class's MOXI object for semantic checks. In tandem, code synthesis creates code for each sentence, appending it to the overall CodeStore object.

At the end of the second pass, type checking is complete, and code synthesis. The MOXI object is then exported to an external XML file. Compilation is not finished, the CodeStore object needs to be passed to the high-level language compiler, the underlying compiler that compiles the standard native code to the platform code.

If a compiler option to compile, analyze, synthesize, but not build (such as -nobuild were passed at runtime, all the phases of the compiler, scanning, partitioning, parsing, analysis, synthesis would be complete, and a MOXI file and HLL source code would be the results from the compilation.

Labels: code synthesis, external semantics, moxi, mynx semantic, mynx semantic check

Saturday, July 19, 2008

High-Level Source Code Synthesis and the Host Language

High level source code synthesis

The Mynx programming language uses high-level language (HLL) code synthesis. Instead of generating a direct binary for a JVM .class or CLR assembly, Java source (of the most elementary kind to avoid a specific version of Java) and C# source (with the same restriction) is created--raw source code.

Later, a C/C++ compiler could be used to generate binary libraries and executables for a platform. Code synthesis is code generation, but I use a different terminology to emphasis the distinction of code generation from code synthesis.

Raw Source Code

The raw source code (which must be a valid class in the host language) is then compiled into a binary using a back end compiler. The binary can then be moved to another directory, put in a archive file, or processed by another library.

The CodeDom is a really cool idea in .NET, one that Java has finally caught up to formalizing in Java 6--although before in Java there were work arounds and libraries to compile code on the fly. Effectively, instead of assembly that is interpreted, or assembled into a binary form, the HLL source code is the assembly.

Advantages of Raw Source Code Synthesis

The advantages of using HLL source code synthesis are:

simplicity - simplifies code synthesis or code generation.

flexibility - code synthesis can be adapted for features.

versatile - synthesized code can add code for tracing, debugging, profiling.

utility - HLL source code can be used in host language (in theory language inter-operable).

abstraction - source code can use HLL features without specifics (such as reflection).

Disadvantages of Raw Source Code Synthesis

There is no free lunch, so it follows that there are disadvantages to HLL code synthesis. The disadvantages of using HLL source code synthesis are:

collision - Java/C# rooted class hierarchy, so class names for methods and attributes might collide in the language during HLL code synthesis.

entanglement - quirks in HLL features can cause problems; one host language to another might not be same code synthesis.

Summary

Like anything in engineer, there are trade-offs, but the disadvantages as compared to the advantages of HLL code synthesis are not intractable. The really big advantage is code synthesized in the HLL and compiled can be used on any platform the host language is available for. Another big advantage is as improvements are made in the host HLL and its platform, those advantages translate upward (no pun intended) to the language implemented in the host HLL. All the advantages make compensating, and working around the disadvantages worthwhile for programming language implementation.

Labels: code generation, code synthesis, high-level language, mynx, Mynx compiler, source code synthesis

Saturday, June 07, 2008

Mynx Code Synthesis Compiling to a High-level Language not Platform

The Mynx compiler uses a high-level language as the means of implementation, Mynx source code is compiled, analyzed, processed, and compiled into a high-level programming language, in this case the "Big Three" of Java, C#, and C++. This avoids platform specificity, and having to map the high level language structures into the platform/operating system. The original Eiffel was compiled in to the C programming language--as Bertrand Meyer put it (to paraphrase), using the C language as a form of high-level assembly language.

Trade-offs in Software Engineering

But compiler construction, programming language design are all computer science, I like the term "informatics" or informatik from German--but science when applied is engineering, and in engineering there is the concept of a trade-off--computer science or programming language engineering is no exception.

Using the HLL-code approach to implement Mynx is simpler in approach (and when effort is simplified and complexity reduced, so much the better...), as a Mynx class or program is translated or mapped to the HLL.

The Trade-off

Simplify, and portability by using the approach of, compile to HLL, but in any trade-off there is something paid or lost.

semantics - code synthesis to avoid host language idiosyncrasies and semantic inconsistencies, but have semantic inconsistencies among host language and other languages.

boiler-plate code - code to use the implemented host language unit as part of the "overhead" to manage and manipulate the code--such as for reflection, instantiation, serialization.

In each case a 1:1 mapping is unattainable, and the synthesized code can become more complex as a result. The trade-off is code complexity for using a host high-level language.

Host Language

One obvious but very important requirement is that the synthesized code in the host language for a Mynx unit must compile without any problem on the host language compiler, but the trade-off is also meeting the semantic expectations for Mynx.

Niklaus Wirth in a paper (Niklaus Wirth, On the Design of Programming Languages. In Information Processing 74, pages 386-393) about programming language design, states that a user expects the compiler to enforce every language rule; but in this case the Mynx rules but also the host language--without the host language semantics overriding the Mynx rules. Thus, Wirth's statement is very apt, but the implementation of the compiler using a host language and using a host language compiler requires both the host source code synthesized and the host language compiler are transparent to the user. Otherwise, the host language semantics and nuances will bubble up into the Mynx source code written, and then its is a Mynx-host programming language hybrid--not pure Mynx.

Another not so obvious consideration is that in code synthesis in one host language, another host language might entail different semantics. For example, synthesizing Java code as the host language is similar but different from C++, and similar but different from C#. The Big Three programming languages are semantically at the same level, using classes, instances, having statics, but not quite the same. C++ does not have reflection, the interface construct, or reflection, whereas C# and Java do--and this is not meant as a critique or criticism of C++. The host languages will entail different semantics, and possibly the boiler-plate code that is synthesized.

Synopsis--The Most with Mynx and Host, Language

In summary, compiling Mynx and then synthesizing high-level host language has a trade-off in additional code complexity--the code is not Mynx:Host at 1-to-1 or bijective mapping. The semantic considerations of the host language, and the necessary code to implement the housekeeping functionality are a trade-off for using a HLL host language.

Labels: code generation, code synthesis, mynx, Mynx compiler, programming language design

Tuesday, April 22, 2008

Got MOXI, What's Next: In Betwixt and Between the Second Compiler Phase

After implementing the necessary MOXI functionality to read, write, and create instances of a MOXI object for the next or second compiler stage of phase -- TEAC (type existence, annotation, compatibility). But between the first and second stages of the compiler is an intermediate step to translate types into full type names, the pre-stage of type existence analysis.

Need and Necessity for Type Existence Analysis

The principle of type necessity; which is the principle of if a type is not found, why bother to check existence internally (declaration before use in method, or declaration of attribute) and then annotate and type check? Fatal error of semantics, like a fatal error in scanning, partitioning, or parsing--lexeme/sentence/syntax is flawed so can't continue to compile.

Type Existence Analysis the Pre-TEAC Step

Check existence and create moxiMap of MOXIObject for absolute namespace types

Resolve existence and create moxiMap of MOXIObject for relative namespace types

A point is the distinction between resolution or resolving a class or type and checking external existence. Resolution is creating the full name such as mynx.core.Integer, finding a class or type in the full name. Checking is that for a full name, that the class or type exists as an external MOXI file. Resolution is both creating the full name, and checking for the existence of a MOXI file, whereas checking avoids the full name creation process.

Absolute Namespace Type Checking

For classes that are included with the absolute namespace, the process is simply to see if within the search path of the compiler for MOXI files (i.e. the MOXI path) the files exist, and then are read as a MOXIObject. The internal compiler structure that holds an instantiated MOXIObject is (quite obviously...) a moxiMap.

Relative Namespace Type Resolution

Relative namespaces must be resolved, that is, the short type or class name such as Int, Ordinal, Real, or String, must be appended to relative namespace inclusions such as mynx.core.* or mynx.trove.* so resolution is to find the full namespace such as mynx.core.Int for the short type alias of Int.

Naturally, the check is to see if a MOXI file exists. Remember order of declaration determines which namespace is used, and a class name is unique within the unit, hence a short class name like Int is resolved to one full namespace in the order the relative namespaces are declared. Hence there is no mynx.core.Int and mynx.type.Int, the class is unique thus there cannot be two inclusions from resolution. First come, first resolution.

Again, like absolute namespaces, a MOXIObject is instantiated and stored in moxiMap.

No MOXI, No Problem--No Compilation

Should at least one of the MOXIObject is not be found, or an if an error occurs while instantiating (or marshaling from XML to an instance object within the compiler) a MOXIObject, the semantic error is a fatal one...the compiler will continue to try and instantiate other MOXIObjects, but will not proceed in the semantic checks or further compilation process. The error is fatal, a specific class type could not be found, so the process cannot continue without the semantic information in the MOXI file.

Compiler Interface to External Environment

An interesting and unexpected complication is the implementation of the base functionality for the checking and resolving of a namespace. The complication is accessing the external environment--specifically the compiler can potentially couple, or have a strong coupling, to the runtime environment or platform.

Host from Modula-2

A compiler book (and an excellent one at that...) "Compiler Construction: A Recursive Descent Model" by John Elder 1994 actually in the implementation of the programming language Model examines this possibility. The author (p. 37, Chapter 4.4) puts it, "...an interface to these facilities of the compiler's environment."

Realizing the approach to the problem of non-coupled access to the platform, now the question is the approach, such as interface/class, a singleton class, etc.

Namespace, Where Art Thou Class?

Another design consideration is the namespace to place the Host class in--such as proxima.semantic or proxima.compiler, etcetera. Since the Host class allows platform independent access to the resources of the environment it is not necessarily specific to semantic analysis. Thus the namespace to place and organize the Host class is a concern for future compiler design and development.

The decision is two-fold and simple. Use a specific semantics host class for semantic access, and keep it in the proxima.semantic namespace. The class is named possibly moxiHost or hostMOXI or perhaps ExternMOXI.

MOXI Path Parameter

In finding the MOXI files to determine if a particular one exists, the compiler must know where to look--a MOXI path parameter. Use the MOXI path parameter to search for MOXI classes, where MYNA files (Mynx Archive) found. How the MOXI path parameter is provided (command-line parameter, configuration file) is irrelevant as how it is accessed--through the hosting class.

Labels: existence semantics, moxi, Mynx compiler, mynx semantic

Monday, March 24, 2008

Uniqueness in Method a Sanity Semantic Check for a Unit Method

In uniqueness context semantic checks, originally declarations within the semantic context of a method were not checked to be unique. The existence checks were originally thought to be completed, but then I realized for a method the internal declarations require a uniqueness check.

A Premise of Semantics

The premise is that of fatal semantic error terminating compilation, find ambiguity in non-uniqueness within a method, compiler bails out. To put it more colloquially, if the Mynx source code has a defect, the compiler must reject; if the Mynx code does not semantically fit, the compiler must acquit...err, quit. This avoids wasted effort and compiler cycles only later to detect an error that terminates compilation. Better sooner rather than later to stop compiling.

Concept - What a Concept

From the premise to the concept is simple enough, check within a class or program method if local declarations are unique in context. Enforce a local uniqueness semantic constraint, using a temporary symbol map.


class Exemplar is

    public methodDeclCheck is
        
        var Int x to default;

        var Ord y to null;
        
        //semantic method context error: 
        //variable 'x' not unique in method
        var Str x to default; 

    end methodDeclCheck

end class;

From Idea to Implementation in Context

The implementation uses the Context object, create a temporary symbol map, and long term type table to gather types used in declarations within a method. A method can be a method, constructor, or a destructor. The temporary symbol map is transitory, it is created at the start of a method, and disposed of at the end of a method. As Mynx does not allow nesting of methods (method scope is explicit in the method header, not implicit in nesting within another method) it is a simple start the symbol map, add a declaration to see if it is already declared, and close the symbol map. Variables declared in a method header are the first added to the symbol map, declared and initialized with a method invocation.


class methodSymbolMap is
    
 public construct is           //=>Context start symbol map
        
  var myType to null;          //=>Context check declaration unique

 end construct;                //=>Context close symbol map

 
 public void doIt(in Int i) is //=>Context start symbol map
        
  var Ordinal j to null;       //=>Context check unique in method

 end doIt;                     //=>Context close symbol map

     
 public void doIt is           //=>Context start symbol map        
  to doIt(0);                  //=>Context close symbol map

     
 public peerless is            //=>Context start symbol map
    implied;                   //=>Context close symbol map

end class;

For a class method, there are some variations--an implied method, and a method equivalent. In either case, the symbol map is disposed of, as there is no method body for declarations. The type map stores all types or classes used, including those within method declarations.

And After Uniqueness in Method

After the first pass and uniqueness constraint check of class or program elements, and declarations within a method are unique, the next state is type existence, annotation, compatibility (TEAC). But if there are no semantic errors in the first stage, the compiler is assured that internally types are unique and not replicated, so can check existence semantics without any variations or complications relating to uniqueness in context.

The declarations within a method are unique, so the existence checks only focus on existence externally and then internally. And if not, then the semantic error is a fatal one, thus the compiler would terminate compilation before proceeding to the next phase.

Labels: Mynx compiler design, mynx semantic, mynx semantic check, mynx semantics, uniqueness semantics

Sunday, March 16, 2008

The Catch-22 of MOXI--Need some Mynx to get MOXI

I've been down and out of late, but continuing Mynx--albeit slowly.

The Catch-22 with MOXI and Mynx

In implementing the external semantic checks for Mynx, one curious paradox has arisen. The external semantic information files for Mynx classes - MOXI files, are strict XML read into an internal form of a MOXIObject. The Catch-22 is a Mynx compiler is needed to create a MOXI file for a Mynx class, but to further develop, test, and implement the Mynx compiler a set of MOXI files is needed. Doh! It is a chicken-and-egg style paradox.

Resolution of the Paradox: A Bit of MOXI

Unlike the philosophical or logical paradoxes and conundrums that have no immediately apparent solution, the solution is simply not to think in terms of for a Mynx compiler to create MOXI files, and MOXI files for a Mynx compiler.


    Mynx compiler => MOXI files, MOXI files => Mynx compiler, etc. ad nausea...

The MOXI files are needed, and the paradox is that it seems MOXI files must only come from a Mynx compiler, and a MOXI file is needed to further test, implement, and develop the Mynx compiler. But a MOXI file need NOT come from a Mynx compiler. Realizing this, the Catch-22 is no longer a paradox. The implicit presumption is that a MOXI file must come from a Mynx compiler.

MOXI without Mynx

A MOXI file can be generated by hand, which is exactly how I wrote the MOXI related classes. But hand coding an XML file is a bit tedious if the MOXI file is for a class with many methods, constructors, and operators. The resolution is to use both Java's and C#'s introspection by reflection capability. Instead of a screen dump of a class's elements, the information is extracted, and then modified to generate a valid XML MOXI file. The MOXI file appears to be a real Mynx class semantically. I've implemented this capability and now can generate a MOXI file for any class in the Java classpath and namespace. For some classes, operator information can be added by hand editing the XML file. So the Java java.lang.String can be modified so that the + plus operator is overloaded for concatenation in Mynx semantics.

MOXI on board-MOXI information inside a Mynx class

One possibility to avoid an external MOXI file or MOXI files in a Myna (Mynx Archive, a ZIP file with Mynx specific information added internally) file is to have the synthesized Java and C# code include a method to read and extract the XML, or even the MOXIObject for the class (such as marshaling and unmarshaling the MOXI information via serialization into an array of bytes). This simplifies things, as the compiler would generate a MOXIObject and then either export and store it to an XML file, or it could be internalized within the synthesized class source as an immutable array of bytes, or immutable MOXI as an XML string.

Similar to Java, external semantics could be checked using the bytecode classfile in Java, or .NET IL assembly for C#. The downside is that the MOXI information could add bulk to the generated binary, and in the code synthesis (the next compiler stage after semantics) require implementing a specific interface.

Later this might be an option for the Mynx compiler, but for now in the implementation an external MOXI file as XML will be utilized. The really deciding factor is that simplicity to get an implementation and working prototype of the Mynx compiler takes precedent--changing the design and implemention should only be done if really necessary or needed--in this case the need is more a variation than rectifying a design flaw or implementation glitch.

Labels: external semantics, moxi, Mynx compiler, Mynx compiler design, mynx semantic check