Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update feature-specification.md #3472

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Update feature-specification.md #3472

wants to merge 1 commit into from

Conversation

lrhn
Copy link
Member

@lrhn lrhn commented Nov 16, 2023

Tweak a wording in Extension Type spec.

Tweak a wording in Extension Type spec.
@lrhn lrhn requested a review from eernstg November 16, 2023 10:05
@@ -1326,7 +1326,7 @@ fresh, non-late, final variable `v` is created. An initializing formal
argument passed to this formal. An initializer list element of the
form `id = e` or `this.id = e` is evaluated by evaluating `e` to an
object `o` and binding `v` to `o`. During the execution of the
constructor body, `this` and `id` are bound to the value of `v`. The
constructor body, `this` is bound to the value of `v`. The
Copy link
Member Author

@lrhn lrhn Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The id is referenced normally using the scope or implicit this.id.
That makes a difference for something like:

extension type E._(Object? id) {
  E(Object? base, Object? id) : id = base {
    assert(identical(this.id, id));
  };
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but it is still not working as intended if id evaluates to an arbitrary object, I'd expect identical(this, id) to be true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you assume identical(this.id, id) to be true in the example above, even if the two arguments are not the same object?
(I'm using assert, maybe that's confusing, I'm not claiming it must be true, I'm just doing some operation that can be not true. Should have used !identical I guess.)

So more precisely:

extension type E._(Object? id) {
  /// The second argument must not be the same as the first.
  E(Object? base, Object? id) : id = base {
    assert(!identical(this.id, id));
  };
}

I expect this extension type to work such that E(2, 3) is successful, doesn't fail the assert because it compares 2 and 3, and ends up having 2 as representation value.

I expect that because I expect the same for a similar class:

class C {
  final Object? id;
  C._(this.id);
  
  /// The second argument must not be the same as the first.
  C(Object? base, Object? id) : id = base {
    assert(!identical(this.id, id))
  };
}

That works as I described above.
And I want the extension type to work the same.

Which means that the claim that "... ann id are bound to the value of v" during the execution of the constructor body is incorrect. It's not in this case, where it's bound to the value of the second argument, not v which is the value of the first argument.

(If the claim should be interpreted in any other way than that the identifier id in the lexical scope available to the constructor body is bound to the value v, then it should be rephrased, because that's the only reading I can see.)

Every instance method and generative constructor body (anything which can access this) can access the representation variable as if it was an instance getter on this. It's not added to the scope of those members, like this text says, it's added to the member scope of the extension type declaration itself.
Then it's looked up as normal for a this reference.

So the "and id are" shouldn't be here, because the only thing I can read it as saying, is not correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed the point about multiple declarations with the same name.

I think the following change that I mentioned would suffice:

During the execution of the constructor body, this is bound to the value of v, and so is the representation variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably.

But we'll also have to defined "representation variable" then, because (unless my browser's search function is busted), the phrase "representation variable" only occurs once in this document, where we say that it can be promoted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The representation variable is implied in many locations (and we should probably adjust the text to make it explicit in many of those situations, because that's needed in order to avoid the imprecision about shadowing).

For example, we have a couple of occurrences of text like this:

this and the name of the representation are bound as with the getter invocation

which is used to specify the dynamic semantics of an extension type member invocation/tear-off, and we shouldn't say that "the name" is bound to anything: It's the representation variable which is bound to said receiver object, and the name of the representation is an identifier which may or may not resolve to the representation variable in any given expression. We don't need any special rules for that, name resolution is determined by the syntactic structure (nested scopes) and by implicit insertion of this., and I think it would be a significant source of complexity if we introduce a different set of rules about extension types than the ones that we're using for classes/mixins/enums/etc.

Another example is commentary saying "the static analysis considers the representation as a final instance variable".

Another example is "The static type of the representation name is the representation type" which should again specify the static type of the representation variable because the name may resolve to other declarations.

By the way, I noticed at least one occurrence of 'representation\nvariable'.

I agree, we do have to make the representation variable more explicit in a number of locations in the feature specification.

Copy link
Member

@eernstg eernstg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure why we'd eliminate a specific bit of text.

@@ -1326,7 +1326,7 @@ fresh, non-late, final variable `v` is created. An initializing formal
argument passed to this formal. An initializer list element of the
form `id = e` or `this.id = e` is evaluated by evaluating `e` to an
object `o` and binding `v` to `o`. During the execution of the
constructor body, `this` and `id` are bound to the value of `v`. The
constructor body, `this` is bound to the value of `v`. The
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it be an improvement to eliminate the rule that the representation variable is bound to the representation object? As far as I can see this means that a conforming implementation could bind the representation variable to an arbitrary object (OK, an arbitrary object whose run-time type is a subtype of the actual value of the representation type).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It says id is bound to the value v. And id is not the representation variable, that's the name of the representation variable.
Binding the representation variable to v is fine, if that's what it wants to say.
Binding an identifier to a value during execution of code means something else.

Since theree can be other things in scope with the same name, and I read this as saying that the identifier is bound to the value v while executing the constructor body. (I really couldn't read it any other way.)
Which it isn't.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It says id is bound to the value v

No, it says that id is bound to the value of v. As is common in the language specification we introduce a fresh local variable (here: v) whose allocation is unspecified, but it occurs in the context of the semantics of the evaluation of an expression e, so it's presumably always possible to allocate it as an additional local variable in the scope which is the current scope for e.

This is slightly magic in cases like var x = e; because there is no syntactic support for allocating local variables in the initializing expression of a variable. On the other hand, the language specification does use let expressions in order to specify the semantics of certain constructs.

So let's say that we can in fact create a fresh variable like that. (Otherwise it's certainly a broader discussion if we insist that the language specification should be rewritten to avoid using this kind of local variables to specify the semantics of some expressions, and we should handle that separately.)

It is also common for our specification documents (including the language specification) to refer to a variable (of any kind) using its name. In some cases we say "the variable 'someName' is bound to ...", but in other cases we just say "'someName' is bound to ...".

We do not have the concept that an identifier is bound to anything.

I would not have a problem with "this and the variable id are bound to ...", but it does seem somewhat verbose, considering that we don't always do so.

Binding the representation variable to v is fine

We never bind a variable to a variable. In this case I really don't think we have a habit of abbreviating "the value of v" to "v", and I wouldn't want to start doing that. (Well, why would that be worse? I guess it's because there is no way we could misunderstand "bind v to ..." to mean that we're binding the identifier to an object, but both the variable and the value of the variable are run-time entities, so that's more likely to be an actual source of confusion).

Binding an identifier to a value during execution of code means something else.

We do have the notion of a run-time namespace (as well as a compile-time namespace). A run-time namespace could actually (if we squint) be considered to bind an identifier to a storage location, that is, it binds the identifier to a variable (and that storage location would in turn hold a pointer to an object, except that we don't specify the run-time semantics with that much detail, and it doesn't have to be a pointer, e.g., it could be a SmallInteger).

Nevertheless, the language specification and feature specifications don't usually include these details. I'd prefer to say that we (slightly magically) obtain a fresh variable, and that variable is bound to an object.

That's how we talk about objects that don't occur as the value of any variable when we specify the semantics of a construct.

Since theree can be other things in scope with the same name ...

That's a good point! We need to mention the scopes where the representation name resolves to the representation variable, because there could be other variables (parameters or locals) shadowing it. The specification of the semantics of an initializing formal is still fine, and so is the specification of the semantics of initializer list elements of the form id = e and this.id = e (they will both initialize the representation variable, even in the case where there is a formal parameter named id).

But the part about the binding of the representation variable should not refer to its name.

So maybe:

During the execution of the constructor body, this is bound to the value of v, and so is the representation variable.

Copy link
Member

@eernstg eernstg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must specify the binding of the representation variable at run time.

(Alternatively, we could start from scratch and specify that it's a getter that returns this as R, but that's a really twisted perspective because this would reasonably be specified to be a getter that returns id as Name<...> ;-)

I believe we can achieve this by making a small adjustment and talk about the representation variable as such, and avoid mentioning its name in that single sentence where shadowing is relevant.

@@ -1326,7 +1326,7 @@ fresh, non-late, final variable `v` is created. An initializing formal
argument passed to this formal. An initializer list element of the
form `id = e` or `this.id = e` is evaluated by evaluating `e` to an
object `o` and binding `v` to `o`. During the execution of the
constructor body, `this` and `id` are bound to the value of `v`. The
constructor body, `this` is bound to the value of `v`. The
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed the point about multiple declarations with the same name.

I think the following change that I mentioned would suffice:

During the execution of the constructor body, this is bound to the value of v, and so is the representation variable.

@lrhn
Copy link
Member Author

lrhn commented Nov 17, 2023

I actually think we should define the meaning of the representation variable as a getter.

Something like:

An extension type declaration of the form extension type constopt Name opt .constructorNameopt(R id) { ... }
introduces an implicit non-redirectinging generative constructor declaration equivalent to constopt Name.constructorNameopt(R this.id);,
and it introduces an implicit extension type getter declaration with a member signature of R get id, which,
when invoked as an instance member, returns the value bound to this.
For all purposes, including naming conflicts, those declarations count as if they were declared as members of the extension type.

Then we only have to define what those constructors and extension type getters do, which is something we have to do anyway for other constructors or getter declarations. So saying something like:

A non-redirecting generative constructor (of an extension type) is executed in a context with a temporary instance variable that it can, and must, initialize, equivalent to an instance variable V declared as final R id;. An initializing formal or initializer-list assignment may initialize this variable, and no other variables are available to initializer (so the only valid initializing formal is this.id and the only initializer list assignment is to id/this.id). It's a compile-time error if the constructor would not initialize the V variable, or if it would initialize it more than once.
After execution of the initializer list has completed, let v be the value that the V variable is initialized to.
Then any constructor body is executed as an instance method body with this bound to the value v.
Finally, the result of invoking the constructor is the value v.

That accounts for every non-redirecting generative constructor, including the implicit one introduced by the extension type declaration "header", so it's something we need to say anyway, and by not saying it twice, we avoid discrepancies.

Similarly, we need to say (and probably do) that execution of the body of an extension type method, what happens when invoking it, will happen in a scope where this is bound to the value of the receiver expression of the invocation (as normal for instance member invocations).
Having done that, also specifying how you access the representation variable separately from that, is another discrepancy waiting to happen.

Take:

extension type N(num numValue) implements num {
  num get negativeNumValue => -value;
  int get intValue => this.toInt();
}
extension type I(int intValue) implements N, int {
 int get negativeIntValue => I(-intValue);
}
void main() {
  print(I(42).negativeNumValue); // Prints -42
  print(I(42).numValue); // Prints 42
}

If we don't just say that the N(num numValue) introduces an implicit extension type getter, effectively equivalent to num get numValue => this as num;, then we have to special-case every part of this to ensure that I(42).numValue works.

For example, currently:

We say that an extension type declaration DV has an extension type
member named n in the case where DV declares a member named n, and in
the case where DV has no such declaration, but DV has a direct
extension type superinterface V that has an extension type member named
n. In both cases, when this is unique, the extension type member
declaration named n that DV has
is said declaration.

By this description, N does not thave an extension type member named numValue since it doesn't (explicitly) declare a member named numValue. And therefore I doesn't have an extension type member named numValue.

But I should have one, which means at N should count as having an extension type member named numValue. But that requires a reference to a declaration.
The path of least resisitance is to introduce that declaration, and then we're just done! Everything else falls out of the normal rules for an extension type declaration with such an extension type member declaration, rules we already have.

As long as we make it perfectly clear that implicit and explicit declarations are otherwise equivalent in all following steps, which means they count towards naming conflicts in declarations, towards "has an extension type member", etc., then their actual behavior should be subsumed by the rules for such declarations that we already have. Making the primary constructor and "representation variable" into implicit declarations means we don't have to do anything again just for those. (And we won't forget a case.)

(And "because this would reasonably be specified to be a getter that returns id as Name<...>" is not how this works in Dart. For a number of reasons, not limited to this then being in the lexical scope for static members. The this is not a variable name, it's an operator which evaluates to an object set during the invocation of the member that refers to the this. Sure, like a parameter, but still not a parameter. It's not a local variable, it's not subject to promotion, which is a mistake, and it's the implicit target of unqualified invocations that are not in the lexical scope, but that's a specification artifact, they're called on the same receiver, which can also be accessed as this, we don't have to make them invoked on this itself, that was just the easy way to specify it, "as if prefixed by this.".)

Actually, more generally, we could say that a generative constructor is executed in an "initialization scope" containing a set of variables to assign a value to, some of which must be assigned a value by the constructor. For a class, the variables in that scope are the instance variables declared by the class, except those that are final and have an initializer expression, and the variable must be assigned a value if they are final or non-nullable, and not late. The constructor initializing those means that the instance variables are thus initialized when the constructor ends. For an extension type generative constructor's initialization scope is the single temporary variable named id with type R, which must be initialized. At the end of initializer-list execution, the constructor provides the object which is the initialized value created by the constructor. Then any constructor body is executed as an instance member body with this bound to the created value.
That's would allow us to use the same specification of what a non-redirecting generative constructor does for both classes and extension types, without having to repeat everything for extension types as if they are a new thing, and probably forgetting something.
(Repeating outselves is a good way to do so incorrectly, and have slight deviations in specified behavior between things that should have been the same, just because we wrote it twice instead of reusing. And since I also frown at specification by "works the same as a similar class would", the solution I'd prefer is abstracting over the differences and defining the shared requirements and behaviors once, then paramterizing only with the necessary differences.)

@eernstg
Copy link
Member

eernstg commented Nov 20, 2023

Something like: ...

I think these are great comments, and we do need to tighten the language about the semantics of constructors!

I'm not convinced, though, that it is helpful to specify a representation getter that returns this as R, and deny the existence of the representation variable. But I do think that the representation variable should be mentioned explicitly in several locations in the feature specification where it currently isn't, because the mere reference to the representation name breaks when that identifier resolves to some other (shadowing) declaration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants