Tuesday, April 24, 2007

Namespace - Probabilistic Name Collisions and Ambiguity in Classes

In a recent conversation about C++ and Java, a person I spoke with seemed very adamant that in sub-classing, both programming languages were inadequate about resolving ambiguity - particularly in multiple inheritance. C++ has full multiple inheritance, Java has interface is kind of multiple inheritance (but have to re-implement inherited methods), Mynx uses multiple disjoint. The major point seemed to be that ambiguity occurs (not with Java as you must re-implement, so its a deficiency and a strength) but the programming languages should facilitate avoiding ambiguity. My point was that handling ambiguity is more important than trying to prevent it.

Ambiguity needs to be quantified. In C++ it is a question of which method for a given name from a super-class is used in the sub-class. Mynx is strict in that there is no ambiguity as any super-classes must be disjoint. C++ a good compiler will indicate a problem in determing which super-class method is referenced, but back when C++ was the hot new object-oriented language, some compilers used an ad hoc approach - it was up to the compiler writer.

One point I made in this conversation is that language ambiguity is from the language designer - an unresolved point that needs to be resolved formally. Beyond that though, the question of making the programming language able to handle ambiguity is a feature if not present is a moot point.

Collisions in the namespace are avoided with a module or package - a namespace construct. The class name is part of the namespace for which the class elements of attributes and methods are enclosed.

While the namespace mechanism is avoiding a problem with name collisions, it does not guarantee a collision will not occur. Java uses a namespace with the Internet domain and other information. That helps, but it still possible to have collisions by classes create within the same organization such as a institute, educational organization, or large company. Names can collide because the same name has different meanings.

Once I worked with a database, and one of the tables was “TEMP” - which for a software developer is a temporary table. In actuality the table was for temperature - in a climatology database. TEMP as a name is ambigious, and a collision from two different perspectives - one computer science, the other climate. Hence a name collision can occur from different domains of science using an identifier mnemonic with multiple meanings. (And the database was very amateurish - reflected in the grossly ambigious and vague names used for tables).

The idea of using a language construct to resolve ambiguity is dependent on the degree of collision avoidance, if two classes names of methods collide, all the ambiguity resolution might not help. Prevention helps to avoid, but does not completely avoid namespace ambiguity in a name collision. Prevention and a language mechanism to resolve ambiguity (a reverse prevention) do not handle the problem as discussed.

As prevention is not a solution, thus it follows need a means to handle name collisions and super-class reference ambiguity. Mynx has two programming language features:

1. Privatize a method in a class;remove or mask ambigious name - obviate.
2. Rename a method or all methods for a name - method equivalents.

Mynx avoids C++ ambiguity by requiring that class names, and class element names are disjoint. The Mynx compiler will not have to use an ad hoc approach, it is a critical semantic error.

The language features of Mynx allow two possibilities:

1. Sub-class classes with common names, and rename to unique, then inherit those classes. N-classes externally resolved to one class.

2. In the sub-class, rename the methods and/or obviate the duplicate method names - more compiler work to verify internal and external semantics. N-classes internally resolved to one class.

Language features must work to resolve and handle the unanticipated. The software developer as (using an artist metaphor) a painter needs the tools in their palette to be able to handle the unanticipated in painting a canvas.

A programming language must avoid adding a new feature for every possibility and nuance; the language must have the capacity to adapt - from the features in the language. One reason why C is around (and not a fad like some claim) but Pascal has now fallen by the wayside is that C can handle the twists and turns in the software development labryinthe over time. Pascal required proprietary extension.

The concept of adaptivity of a programming language is malleability - if a programming language can adapt, or brittle if it is inflexible.

Labels: , , ,

Thursday, April 19, 2007

Look within Yourself - Sentence Semantic Checks in Mynx

In Mynx, the parser operates on souce code in the forms of sentence, statement, method, and unit. These four elements are the organizing structures in terms of syntax, and following a syntax-directed approach to semantic checks, a semantic check within each of the organizing structures.

Semantic checks work from the basic element - a sentence, to the more complex - a unit in the form of a class or program.

The interesting observation about sentence semantic checks is that information is unnecessary such as type. The sentence semantic checks are more a structural semantic check within the sentence, without outside information.

For example, the following expression statement sentence is invalid:

this = x;

What is the type of ‘x’? Is it a class attribute, a method variable, or a method call?

Interesting information, but unnecessary and unneeded. The structure of the sentence by the information within the sentence is semantically invalid. What ‘x’ is does not matter, since the sentence is flawed, determining the nature and type of ‘x’ is irrelevant.

That is the entire notion of semantic checks on sentences - look for errors with minimal information, otherwise an error might be detected much further in the process of semantic analysis - a waste of time and processing.

From the organization of the semantic checks, type checking is the most expensive and final phase of the semantic checks.

The sentence semantic checks also involve collective semantic constraints - semantics taken together. One of which is uniqueness, so that within a higher structure such as a statement, method, or class the sentences are unique.

The difficult part of implementing semantic check is to do so consistently, and to extract the necessary information in each sentence. This can be difficult in some sentences because the sentence contains other syntax objects - and the semantic check implementation avoids any direct coupling. So part of the process is to add query and accessor methods to access information in each syntax object for the semantic check.

After the sentence semantic check is the semantic merge - putting data into a symbol table that is accessed during other semantic checks. The semantic merge enforces the semantic notion of uniqueness - a constraint that certain elements are unique. Class attributes, if semantically valid within a sentence, are merged into the symbol table, so if duplicated is immediately flagged as a semantic error.

Writing the semantic checks involves implementing much of the infrastructure code - the symbol table, the code to access the syntax object information, etc. I thought the parsing phase would be the most difficult, creating syntax objects and then checking them is much more difficult. Never say "It's a snap."

Labels: , , ,

Monday, April 09, 2007

Implementing Semantic Checks - Ambiguity over Intention

I’ve finished implementing the sentence semantic checks for a Mynx class attribute. Originally, there were 20-checks, but during the implementation 3-checks were deleted, leaving 17-checks.

The class attribute is a basic sentence, but is similar to a program attribute, and a declaration expression statement. So the intention is for some of the sentence semantic checks to “map” to the other two sentences. The class attribute had the most semantic checks, and was the most complex, so in re-doing the semantic checks (the first time the design was too rigid, and was based upon existing compiler parsing algorithms so untenable) the most difficult seemed the best starting point.

In implementing the semantic checks, it was often the case that some were ambigious or duplicated information. I often griped as an undergrad with why the compiler could not just make the correction. It is not that the compiler (and the compiler writer whose code is dirving the compiler) cannot correct certain semantic errors, its just that in doing so, the original compiler writer’s intentions take precedent over the user of the compiler.

These kinds of issues really show up in programming language ambiguity, where the original language spec does not clarify a detail. For example, in Pascal (ye olden languages of the old ones...) in a for loop, after the for loop finishes, what is the value of the variable? Is it the n+1 value, or n? I had many a frustrating night on a programming project when I used Turbo Pascal, but then using the Sun Pascal compiler, something didn’t quite work...often the for loop. Later, I simply assigned the value to another variable, and moved the question from the idiosyncratic fiat of the compiler writer, to the intention of the user.

Implementing the semantic checks for a class attribute, I can see why it might be useful to do something to avoid an error, and make it a compiler alert with a correction. But its more a fault of the language specification at some points (needing clarification, one reason I keep the Mynx Programming Language Manual handy to annotate things...) and at others its simple intention ambiguity, but by the formalism of the class attribute, the correction is obvious.

But, from experience, it is an easy temptation to do auto-correction of source code...what was coded with one intent that is invalid semantically and what is correct could be the difference of night and day. It is more sensible to give a semantic error and stop. For example:

public generic gVar to default; //initialize variable using default constructor


What is wrong? Well a generic variable is type anonymous, so there is no guarantee of a default constructor when the generic class is instantiated and bound to a type. The only valid initial value for a generic class attribute is null - no value. So why not just change the code and continue compiling? The compiler and compiler writer do not know if the error was in creating a generic attribute, or initializing it to null. Even in a generic class there are non-generic attributes.

Did the user mean:


public generic gVar to null;


or some non-generic:


public TYPE gVar to default;


Only the user knows, so the compiler should reject the input and scream for a sentence semantic error.

A C compiler often would make assumptions about functions, types, which made for easy to compile, but then spend hours tracking down a bug later on caused by the implicit correction that was not highlighted as problematic. Yeah, you could use lint or some of the other tools that find problems, but the compiler should be the first source, not the last. I don’t blame the C language, but avoid some compilers which are permissive. It is frustrating to have the compiler whine and complain, but its a wake-up call. Auto-fixing a problem with an implicit assumption is adding more ambiguity into the ambiguity. The syntax fragment that doesn’t make sense is now insensible because what is coded is not exactly what the compiler views internally, and ultimately when a binary is created.

Unfortunately intention is more difficult to determine from what is written, especially if what is written in code is ambigious.

Garbage-in should be errors reported, not garbage-out.

Labels: , , ,

Website Spy Software