Implementing Semantic Checks - Ambiguity over Intention
I’ve finished implementing the sentence semantic checks for a Mynx class attribute. Originally, there were 20-checks, but during the implementation 3-checks were deleted, leaving 17-checks.
The class attribute is a basic sentence, but is similar to a program attribute, and a declaration expression statement. So the intention is for some of the sentence semantic checks to “map” to the other two sentences. The class attribute had the most semantic checks, and was the most complex, so in re-doing the semantic checks (the first time the design was too rigid, and was based upon existing compiler parsing algorithms so untenable) the most difficult seemed the best starting point.
In implementing the semantic checks, it was often the case that some were ambigious or duplicated information. I often griped as an undergrad with why the compiler could not just make the correction. It is not that the compiler (and the compiler writer whose code is dirving the compiler) cannot correct certain semantic errors, its just that in doing so, the original compiler writer’s intentions take precedent over the user of the compiler.
These kinds of issues really show up in programming language ambiguity, where the original language spec does not clarify a detail. For example, in Pascal (ye olden languages of the old ones...) in a for loop, after the for loop finishes, what is the value of the variable? Is it the n+1 value, or n? I had many a frustrating night on a programming project when I used Turbo Pascal, but then using the Sun Pascal compiler, something didn’t quite work...often the for loop. Later, I simply assigned the value to another variable, and moved the question from the idiosyncratic fiat of the compiler writer, to the intention of the user.
Implementing the semantic checks for a class attribute, I can see why it might be useful to do something to avoid an error, and make it a compiler alert with a correction. But its more a fault of the language specification at some points (needing clarification, one reason I keep the Mynx Programming Language Manual handy to annotate things...) and at others its simple intention ambiguity, but by the formalism of the class attribute, the correction is obvious.
But, from experience, it is an easy temptation to do auto-correction of source code...what was coded with one intent that is invalid semantically and what is correct could be the difference of night and day. It is more sensible to give a semantic error and stop. For example:
What is wrong? Well a generic variable is type anonymous, so there is no guarantee of a default constructor when the generic class is instantiated and bound to a type. The only valid initial value for a generic class attribute is null - no value. So why not just change the code and continue compiling? The compiler and compiler writer do not know if the error was in creating a generic attribute, or initializing it to null. Even in a generic class there are non-generic attributes.
Did the user mean:
or some non-generic:
Only the user knows, so the compiler should reject the input and scream for a sentence semantic error.
A C compiler often would make assumptions about functions, types, which made for easy to compile, but then spend hours tracking down a bug later on caused by the implicit correction that was not highlighted as problematic. Yeah, you could use lint or some of the other tools that find problems, but the compiler should be the first source, not the last. I don’t blame the C language, but avoid some compilers which are permissive. It is frustrating to have the compiler whine and complain, but its a wake-up call. Auto-fixing a problem with an implicit assumption is adding more ambiguity into the ambiguity. The syntax fragment that doesn’t make sense is now insensible because what is coded is not exactly what the compiler views internally, and ultimately when a binary is created.
Unfortunately intention is more difficult to determine from what is written, especially if what is written in code is ambigious.
Garbage-in should be errors reported, not garbage-out.
The class attribute is a basic sentence, but is similar to a program attribute, and a declaration expression statement. So the intention is for some of the sentence semantic checks to “map” to the other two sentences. The class attribute had the most semantic checks, and was the most complex, so in re-doing the semantic checks (the first time the design was too rigid, and was based upon existing compiler parsing algorithms so untenable) the most difficult seemed the best starting point.
In implementing the semantic checks, it was often the case that some were ambigious or duplicated information. I often griped as an undergrad with why the compiler could not just make the correction. It is not that the compiler (and the compiler writer whose code is dirving the compiler) cannot correct certain semantic errors, its just that in doing so, the original compiler writer’s intentions take precedent over the user of the compiler.
These kinds of issues really show up in programming language ambiguity, where the original language spec does not clarify a detail. For example, in Pascal (ye olden languages of the old ones...) in a for loop, after the for loop finishes, what is the value of the variable? Is it the n+1 value, or n? I had many a frustrating night on a programming project when I used Turbo Pascal, but then using the Sun Pascal compiler, something didn’t quite work...often the for loop. Later, I simply assigned the value to another variable, and moved the question from the idiosyncratic fiat of the compiler writer, to the intention of the user.
Implementing the semantic checks for a class attribute, I can see why it might be useful to do something to avoid an error, and make it a compiler alert with a correction. But its more a fault of the language specification at some points (needing clarification, one reason I keep the Mynx Programming Language Manual handy to annotate things...) and at others its simple intention ambiguity, but by the formalism of the class attribute, the correction is obvious.
But, from experience, it is an easy temptation to do auto-correction of source code...what was coded with one intent that is invalid semantically and what is correct could be the difference of night and day. It is more sensible to give a semantic error and stop. For example:
public generic gVar to default; //initialize variable using default constructor
What is wrong? Well a generic variable is type anonymous, so there is no guarantee of a default constructor when the generic class is instantiated and bound to a type. The only valid initial value for a generic class attribute is null - no value. So why not just change the code and continue compiling? The compiler and compiler writer do not know if the error was in creating a generic attribute, or initializing it to null. Even in a generic class there are non-generic attributes.
Did the user mean:
public generic gVar to null;
or some non-generic:
public TYPE gVar to default;
Only the user knows, so the compiler should reject the input and scream for a sentence semantic error.
A C compiler often would make assumptions about functions, types, which made for easy to compile, but then spend hours tracking down a bug later on caused by the implicit correction that was not highlighted as problematic. Yeah, you could use lint or some of the other tools that find problems, but the compiler should be the first source, not the last. I don’t blame the C language, but avoid some compilers which are permissive. It is frustrating to have the compiler whine and complain, but its a wake-up call. Auto-fixing a problem with an implicit assumption is adding more ambiguity into the ambiguity. The syntax fragment that doesn’t make sense is now insensible because what is coded is not exactly what the compiler views internally, and ultimately when a binary is created.
Unfortunately intention is more difficult to determine from what is written, especially if what is written in code is ambigious.
Garbage-in should be errors reported, not garbage-out.
Labels: ambiguity, intention, semantic analysis, semantics


<< Home