Think Out Loud: Mynx Programming Language Design: October 2007

Tuesday, October 30, 2007

The Semantics of Multiple Disjoint Inheritance and Operating Overloading in Mynx

The Mynx programming language has the capacity for multiple inheritance similar to C++, but it avoids referential ambiguity (in UML diagram, the "deadly diamond") by requiring multiply inheriting super-classes are disjoint--no two methods are the same, each method is unique in name, parameters, and possibly (if a covariant function) return type. Multiple-disjoint inheritance could be termed "multiple unique inheritance" but disjoint is more explicit in terms of a set.

Semantic Context Quandary of Multiple Disjoint Inheritance

For a given set of super-classes that are multiply inherited, the methods can be associated with an operator that is overloadable, operator overloading. In implementing context semantics (semantics of the overall unit--a class or program) one semantic consideration is for operator overloading in multiple-disjoint inheritance. The methods of the multiply inherited classes are disjoint from one another, but it is very possible for the sub-class to have non-disjoint, multiply inherited operators. The semantic question is one of for multiply inherited operators, which method is invoked for an expression using the operator.

Example Operator Overload on Different Methods

Consider the following Mynx classes:


class X is

   public X cat(in X x) is
       //method definition for concatenation
   end cat;

   overload default + as cat;

end class;

class Y is

    public Y add(in Y y) is
        //method definition for addition
    end add;

    overload default + as add;

end class;

class Z is

    public Z join(in Z z) is
        //method definition for join
    end join;

    overload default + as join;

end class;

Each class is disjoint (X=cat, Y=add, Z=join) so can be multiply inherited.

Consider the Mynx class that derives from the Mynx classes of X,Y,Z (great names, remember the X,Y,Z affair in history?) that is:


class A as X,Y,Z is

    public construct is to null;

    public void doNothing is
        null;
    end doNothing;

end class;

To illustrate the semantic ambiguity, the following code fragment illustrates:


    A myVar1  to default;
    A myVar2  to default;

    A result to null;

    result is myVar1 + myVar2; //uh-oh! which method to use in overload?? 

    //three possibilities is:
    result is myVar1.add(myVar2);
    result is myVar1.cat(myVar2);
    result is myVar1.join(myVar2);

The Problem of Semantic Ambiguity

The semantic consideration is to avoid ambiguity, to consistently and the expected substitute a method invocation for an overload. The Mynx language in the semantics has to avoid the unexpected or inconsistent--something both Wirth and Hoare strongly admonish for a language designer to consider in designing a language.

Thus there is definitely semantic ambiguity--a major problem, ambiguity is the anathema of a programming language, and a major frustration for the software developer using the language.

A solution is possible--the semantic context rule for operator overloading with multiple inheritance. But, while many solutions are possible, consistency is important. Otherwise you wind up with many exceptions to a general rule, making the rule a footnote in the programming language semantics. Spoken, natural languages have lots of little rules, I remember learning German and the article to use for a word, the gender (male, female, neuter), the form (nominative, accusative, genitive, dative). English of course is no different, but a programming language has intention so lots of little rules is illogical. And a software developer has to carry the cognitive overhead of knowing all the rules in order to use the language effectively.

One solution is the ostrich-buried-head-in-the-sand approach--pretend the semantic ambiguity exists, or like most managers stressing teamwork and consensus, do not make any decision. Managers delegate the decision to team members and build consensus. Unfortunately that is problem avoidance.

Unresolved Semantic Considerations Create Semantic Ambiguity

What happens in a programming language for a semantic consideration that is solved (an oxymoron) by problem avoidance? The answer is better given with an anecdote and illustrative example.

Take a programming language that was immensely popular, Pascal. The creation of Niklaus Wirth, it was popular for many years from its original introduction in the early 1970's, when structured/procedural programming was emerging as the dominant paradigm.

Pascal (and if you read some of the papers that were critiques of Pascal about Pascal's limitations) was limited in some of its core features, but some of the features had no clear solution for semantic considerations.

For example, the for loop in Pascal had an interesting semantic ambiguity for (no pun intended) the loop variable. Consider the Pascal code fragment:


    var
        K, I: Integer;
    begin
        K := 0;
        for I := 1 to 10 do
        begin
            K := K + I;
        end;

The semantic consideration, and resulting ambiguity is the for loop variable: I. What is the value of I after the for loop terminates, is it 10, or 11, or 9?

Once I was implementing a program in Pascal as an undergrad (that was the learning language until the computer science department went over to C++), and I had a for loop that always did something on the last pass through a for loop (I forget what it was, I think it was write a total to a file and close it). I originally implemented the program using (and you guessed it...) Turbo Pascal on a 80386 PC. But, I found out later I needed much bigger arrays than the 64K limit of Turbo Pascal, so I opted to use the SunOS system, and the Sun Pascal compiler "pc" which ran on the Sun system. I was able to define huge arrays, but then looking at the created file, I saw that the for loop went one further, instead of 1 to 10, it was 1 to 11.

I wrote a simple dummy program with a bunch of writeln statements, and low-and-behold, the for loop generated code that went from 1 to 11. It got worse when I thought (you learn as an undergrad not to think...) of e-mailing my short Pascal program to a friend at another university, and in the guise of duplication of a scientific experiment. My friend had access to a esoteric minicomputer system that the university had bought into, but it had a Pascal compiler that loyally followed Wirth's definition of Pascal (something most Pascal implementations did not in order to be useable--like Turbo and Sun).

A few days later my friend e-mailed me one of the text files created, and the for-loop went from 1 to 9, apparently it was like the C programming language which normally went for(x=1;x<10;x++). The next semester, I was in a course on programming languages, and I recall the lecture about semantics, and that one quirk of Pascal mentioned. The point was any semantic consideration that was not decided by the language designer, it was up to the compiler implementer, so is idiosyncratic or specific to the platform. So if the language designer does not say, the compiler writer may do. The difficulty is that semantics can be very subtle, a reaction between two features of the language. In this case, the semantic context of operator overloading and multiple-disjoint inheritance in Mynx.

Hence the solution of leaving the semantic consideration unresolved leads to semantic ambiguity, leaving the solution up to a compiler writer. A potential inconsistency that leads to non-portable, idiosyncratic software for classes. So a formal semantic specification is needed for the solution, not semantic ambiguity. Now it becomes a question of how to handle the semantic ambiguity?

Solution to Operator Overloading with Multiple Inheritance

The solution is simple, and uses by extension an existing rule for operator overloading.

For a constant operator overload, one that is immutable, in a sub-class the operator overload must be explicitly declared and reference the super-class.

Using this rule, for a sub-class that has super-classes with multiple operator overloads, the solution is to require the sub-class to specify which operator overload to use, one of the super-class overloads, or one in the class itself. For super-classes with multiple constant overloads, the rule of explicit declaration is used, and the specific super-class referenced.

Thus for the Mynx class X, the overload is specified:


class A as X,Y,Z is

    public construct is to null;

    public void doNothing is
        null;
    end doNothing;

    overload default + as super.cat;

end class;

The class references the super-class with the cat method. Then the Mynx code fragment is not semantically ambiguous:


    A myVar1  to default;
    A myVar2  to default;

    A result to null;

    result is myVar1 + myVar2; //uh-oh! which method to use in overload?? cat

    result is myVar1.cat(myVar2);

The existing operator overload rule for a default operator overload is that it is implicitly included, although I wonder if it should be explicit so that intention is always manifest in a Mynx class.

Also imagine if a super-class has a deep sub-class that has implicitly included an operator overload, it would be very tedious and time consuming to find the original operator overload in the super-class.

The explicit but verbose approach is preferable, appealing to total typing torpidity is not an objective metric for assessing a programming language. Intention by verbosity over succinctness via implicitness.

Oh, and have a safe, fun, and Happy Hallowe'en!!!

Labels: multiple inheritance, mynx semantic, operator overload

Wednesday, October 24, 2007

Mynx Type and Casts in the Design of a Language

In the October revision of The Mynx Programming Language Manual (TMPLM), one important clarification and specification is the rules for types and type casting. Before, the general rule is strong typing, but in compiler implementation, it requires more specific explanation.

Language Complexity

Other programming languages (such as the synthesized into programming languages of Java and C#) can have very labyrinthine type rules. The complexity stems from the programming language having many different entities--such as Java having primitive types, classes, and interfaces, or C# with struct, class, and interface.

Mynx has only one reusable entity that has type--a class. Programs are never reused, so have no type rules to use them. There are no other entities, and in Mynx class is type (a point that creates much religious-level conflict in programming language theory), class and type are synonymous.

Mynx only has references to instances, no primitives or pointers. Again, only one conceptual entity, one thing. One software entity--class, and one kind of runtime entity--a reference.

Mynx is strongly typed, so the overall guiding principle or rule is like is compatible only with like. General rule of strong typing -- for two references (identifiers or literals) both must be of the same type, i.e. type compatible.

Mynx has only three rules for type and casting. All are in some ways common sense, but must be formalized.

Type Span

A type can only be cast to an immediate super-class. In effect an upward cast can only be to a super-class inherited by the base class.

Type Intent

A type can only be intentionally or explicitly type cast, never implicitly. There is no implicit upcast to a super-class.

Type Limit

A type that has been cast upward can be re-cast downward to the base class, but never to a sub-class derived from the base class.

Type and Casting Rules Synopsis

Type Span - Cast upward to immediate super-class type not further up the hierarchy.

Type Intent - Cast to alternate class type is explicit, not implicit.

Type Limit - Cast to own type or super-type; but not to sub-class types.

A Note on Operator Overloading and Methods

The rules for Mynx cast and type apply to Mynx classes. However, it does not mean the following code fragment is type invalid:


    Int i to 0;
    Real r to null;

    r as Real(0);
 
    r = r + i; //type incompatible??

Remember though, that a class can have a method to take any type and operators can be overloaded.

Rewriting the Mynx source code with the explicit method calls instead of the operators:


    Int i to 0;
    Real r to null;

    r as Real(0);
 
    r.set(r.add(i)); //type incompatible -- no!

Providing the class Real.mynx has a method: add(Int) that returns a type Real, type compatibility is valid. The method set(Real) assigns a type Real instance to the variable identifier.

I Think, Therefore I Code

The software developer--the class designer must design a class to mix and match the variety of types a class can work with, and the operator overloads to associate a method with operator. For a class designer, the immediate super-classes (including virtual classes) will impact the ability of a software developer using the class to type cast to those classes.

In short, object-oriented software design in Mynx requires thinking unfortunately it seems much software is ad hoc, designed for the immediate need, but then grandfathered into the future, and other software developers are stuck with it.

Labels: mynx, mynx cast, mynx language design, mynx semantics, mynx type

Wednesday, October 17, 2007

Forward Declaration and Static References in Mynx

Mynx uses forward declaration within unit scope -- anything declared in unit scope (i.e. attributes or methods) is global within the scope, hence the rule within a method of declaration before use semantics is inapplicable.

Implementation as previously discussed does not require multiple compiler passes, or language design of declaration before use semantics. Using a multi map, every undeclared identifier encountered is tracked, and can be resolved on the fly as declarations are processed.

In case by case examples, the case of implicit static references emerges as a possible usage case in Mynx source code text for implicit forward declaration context semantics.


    with mynx.io.IOStream; //absolute inclusion of class 
                           //implicit static reference
    
    class ImplicitForward is
    
        public theMethod is
        
            var Int x to 0;
            
            IO <<< x <<< eoln; //IOStream.IO <<< x <<< IOStream.eoln;
                              
        end theMethod;
    
    end class;

At the end of processing the class, the variable identifiers: IO, eoln do not have a corresponding declaration in the unit or class context.

There is a natural fit with implicit static methods, not by intention or design, but by properties or effect.

As a programming language designer, often things are forced together (sorta like taking the stickers off a Rubik’s cube so it is “solved”) to put distinct features together in a common foundation. Java or C#’s primitives with references (autoboxing to connect the features from C# to Java); C++ has union, struct, and class; C# struct, class, interface; Java has simplified somewhat to class and interface.

When features are naturally joined, it indicates a good language design by its effect. A language with many features that are not well joined or need other features to connect them is not as well designed. Implicit static having a natural resolution with forward declarations in a multi-map is a case in point with Mynx.

Labels: mynx, Mynx rationale, mynx semantic

Wednesday, October 10, 2007

Mynx Constructor and Destructor Separating Definition from Reference

The original design of the Mynx super-class and this-class semantics for a constructor and destructor used super.construct, this.construct, and super.destruct. (Note: this.destruct is a semantic contradiction, as only one destructor is defined so no need to define additional destructor that is semantically erroneous, and self-reference would be infinitely recursive.)

Problems of Mynx Constructor, Destructor Design

The syntax as designed had the problem of ambiguity in reference for constructors, and with the destructor explicit semantics which are redundant and contradictory. Both indicate that the solution can work, but it is not the best solution--a solution without those problems is clearly better.

Using a reference to the destructor is also explicitly calling the destructor -- and the destructor, like the constructor is never invoked explicitly. The only minor variant is that the constructor is referenced in definition of another constructor -- either within the same class, or in the super-class.

Constructor Reference


 class X is
 
  public construct is to this.construct(0);
 
  public construct(in Int i) is
   ...constructor definition
  end construct;
 
 end class;

Constructor Ambiguity of Reference by Keyword

This approach works for constructor reference within the class. But a super-class reference is not so neat and clean.


 class A as X,Y,Z is
 
  public construct is to super.construct;
    
 end class;

If only one of the super-classes has a null constructor, then the constructor is clearly unambiguous. But in the case of more than one null constructor creates ambiguity, which super-class constructor is being used to define the class constructor? Constructors are not inherited, so a super-class constructor cannot be disambiguated as the class’s own constructors. Remember, ambiguity is the source of uncertainty that creates problems in a programming language. The solution to constructor and destructor semantics creates ambiguity so it is problematic at best, unworkable in the least.

Explicit Constructor Reference by Name

Constructor reference is explicit both within and without a class.

Explicit Constructor Inside a Class

A constructor is distinguished in definition in contrast to a constructor reference. The important feature is that a constructor definition uses ‘construct’ keyword and a constructor reference uses ‘this’ keyword.


 class X as A,B,C is
 
        //this keyword to reference a constructor
  public construct is to this(0); 
 
        //construct keyword to define constructor
  public construct(in Int i) is   
   ...constructor definition
  end construct;
 
 end class;

Using the ‘this’ keyword makes the intent, constructor definition versus constructor reference more explicit. The simplification of using the ‘this’ keyword instead of this.constructor is a nice simplification.

Explicit Constructor Outside a Class

Constructor reference outside a class to the super-class is the source of ambiguity. It is appropriately expressed in the form of a question (start counting down from 30-seconds to 0-seconds with catchy elevator music playing in the background...) as “How do you clearly refer to a super-class constructor?”

The syntax of super.construct(...) is inadequate, but adding the class name as an identifier does neatly and clearly resolve the ambiguity. So then the syntax super.construct.A(...) is explicit. But the keyword ‘construct’ is for a definition, and for a reference is redundant. A more generalized form is to use the syntax of a method call in the super-class, that of - super.(...).


 class A as X,Y,Z is
 
  public construct is to super.X; //instead of super.construct
    
 end class;

A semantic consideration is the question of how does the Mynx compiler know that the method is a constructor of the super-class and not a method? Simple, the Mynx compiler determines that the method does not exist in the super-class method list, and the class name is referenced, not a method. As a class method cannot be the class name, there is no overlap -- and the constructor definition uses the keyword ‘construct’ not the class name (as in Java, C++, C#).

In Mynx, a super-class reference to a constructor uses the class name, along with the prefix ‘super’ keyword. The super-class constructor can be clearly referenced, and the intention explicit.

The “Big Three” of C++, Java, C# use the class name as the constructor name; it is strange, but it does have the advantage that for reference to a constructor and destructor in super-classes (particularly in multiple-inheritance in C++) it avoids any possibility of ambiguity. The intention of definition versus reference is somewhat masked, but the clarity in reference is a gain.

The revised Mynx approach of distinguishing definition from reference, and using the class name to reference a constructor is much better solution than the one designed before. (Think drum versus disk brakes or piston versus turbines.)

One important semantic consideration is if a super-class constructor is referenced by a sub-class constructor, is it valid to reference a super-class constructor from a non-constructor (class method, destructor, default method)?

A non-existent class method referenced in a class constructor with a ‘super’ keyword must be a super-class constructor. If referenced in another method, is a check for a super-class constructor performed, or is it a non-existent method. The added semantic check to report a semantic error of super-class constructor reference outside of a constructor while informative, adds extra processing time. Conversely, a report of a non-existent method in a class method is misleading semantically, although less costly in terms of processing time for semantics.

Destructor Semantics Implicit

The previously designed Mynx destructor semantics were complex. The new design for the destructor semantics is for all destructors to be implicitly invoked first, before the sub-class destructor is executed. The destructor invocation is automatic, synthesized by the compiler. Each destructor is mutually exclusive when invoked, and in the order of inclusion in the class super-class list. Only destructors that are protect or public access can be invoked. The semantic constraint of not explicitly invoking a destructor is maintained. There is no keyword for explicit super-class destructor reference. The keyword ‘destructor’ is used in the definition of the one and only class destructor. For a software developer, the implicitness is a reduction in the cognitive load about explicitly invoking a destructor and referencing it.


 class X as A,B,C is
 
  public destruct is
   
   //implicit invocation of A.destruct, B.destruct, C.destruct
   null;
 
  end destruct;
 
 end class;

Modifying the Mynx Grammar

The updated Mynx grammar from version 9.3.2 to 9.3.3 is void (no pun meant). The new grammar for the re-design is Mynx EBNF 9.3.3.1. Interestingly enough, the older grammar supported the super-class reference semantics, and the implicit destructor semantics. Only one syntax production rule is modified to allow class reference within the class.

Labels: mynx, mynx constructor, mynx destructor, mynx grammar, mynx semantic

Think Out Loud: Mynx Programming Language Design