1 // Copyright 2012 The Rust Project Developers. See the COPYRIGHT
2 // file at the top-level directory of this distribution and at
3 // http://rust-lang.org/COPYRIGHT.
4 //
5 // Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
6 // http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
7 // <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
8 // option. This file may not be copied, modified, or distributed
9 // except according to those terms.
10
11 /*!
12
13 # Type inference engine
14
15 This is loosely based on standard HM-type inference, but with an
16 extension to try and accommodate subtyping. There is nothing
17 principled about this extension; it's sound---I hope!---but it's a
18 heuristic, ultimately, and does not guarantee that it finds a valid
19 typing even if one exists (in fact, there are known scenarios where it
20 fails, some of which may eventually become problematic).
21
22 ## Key idea
23
24 The main change is that each type variable T is associated with a
25 lower-bound L and an upper-bound U. L and U begin as bottom and top,
26 respectively, but gradually narrow in response to new constraints
27 being introduced. When a variable is finally resolved to a concrete
28 type, it can (theoretically) select any type that is a supertype of L
29 and a subtype of U.
30
31 There are several critical invariants which we maintain:
32
33 - the upper-bound of a variable only becomes lower and the lower-bound
34 only becomes higher over time;
35 - the lower-bound L is always a subtype of the upper bound U;
36 - the lower-bound L and upper-bound U never refer to other type variables,
37 but only to types (though those types may contain type variables).
38
39 > An aside: if the terms upper- and lower-bound confuse you, think of
40 > "supertype" and "subtype". The upper-bound is a "supertype"
41 > (super=upper in Latin, or something like that anyway) and the lower-bound
42 > is a "subtype" (sub=lower in Latin). I find it helps to visualize
43 > a simple class hierarchy, like Java minus interfaces and
44 > primitive types. The class Object is at the root (top) and other
45 > types lie in between. The bottom type is then the Null type.
46 > So the tree looks like:
47 >
48 > ```notrust
49 > Object
50 > / \
51 > String Other
52 > \ /
53 > (null)
54 > ```
55 >
56 > So the upper bound type is the "supertype" and the lower bound is the
57 > "subtype" (also, super and sub mean upper and lower in Latin, or something
58 > like that anyway).
59
60 ## Satisfying constraints
61
62 At a primitive level, there is only one form of constraint that the
63 inference understands: a subtype relation. So the outside world can
64 say "make type A a subtype of type B". If there are variables
65 involved, the inferencer will adjust their upper- and lower-bounds as
66 needed to ensure that this relation is satisfied. (We also allow "make
67 type A equal to type B", but this is translated into "A <: B" and "B
68 <: A")
69
70 As stated above, we always maintain the invariant that type bounds
71 never refer to other variables. This keeps the inference relatively
72 simple, avoiding the scenario of having a kind of graph where we have
73 to pump constraints along and reach a fixed point, but it does impose
74 some heuristics in the case where the user is relating two type
75 variables A <: B.
76
77 Combining two variables such that variable A will forever be a subtype
78 of variable B is the trickiest part of the algorithm because there is
79 often no right choice---that is, the right choice will depend on
80 future constraints which we do not yet know. The problem comes about
81 because both A and B have bounds that can be adjusted in the future.
82 Let's look at some of the cases that can come up.
83
84 Imagine, to start, the best case, where both A and B have an upper and
85 lower bound (that is, the bounds are not top nor bot respectively). In
86 that case, if we're lucky, A.ub <: B.lb, and so we know that whatever
87 A and B should become, they will forever have the desired subtyping
88 relation. We can just leave things as they are.
89
90 ### Option 1: Unify
91
92 However, suppose that A.ub is *not* a subtype of B.lb. In
93 that case, we must make a decision. One option is to unify A
94 and B so that they are one variable whose bounds are:
95
96 UB = GLB(A.ub, B.ub)
97 LB = LUB(A.lb, B.lb)
98
99 (Note that we will have to verify that LB <: UB; if it does not, the
100 types are not intersecting and there is an error) In that case, A <: B
101 holds trivially because A==B. However, we have now lost some
102 flexibility, because perhaps the user intended for A and B to end up
103 as different types and not the same type.
104
105 Pictorally, what this does is to take two distinct variables with
106 (hopefully not completely) distinct type ranges and produce one with
107 the intersection.
108
109 ```notrust
110 B.ub B.ub
111 /\ /
112 A.ub / \ A.ub /
113 / \ / \ \ /
114 / X \ UB
115 / / \ \ / \
116 / / / \ / /
117 \ \ / / \ /
118 \ X / LB
119 \ / \ / / \
120 \ / \ / / \
121 A.lb B.lb A.lb B.lb
122 ```
123
124
125 ### Option 2: Relate UB/LB
126
127 Another option is to keep A and B as distinct variables but set their
128 bounds in such a way that, whatever happens, we know that A <: B will hold.
129 This can be achieved by ensuring that A.ub <: B.lb. In practice there
130 are two ways to do that, depicted pictorally here:
131
132 ```notrust
133 Before Option #1 Option #2
134
135 B.ub B.ub B.ub
136 /\ / \ / \
137 A.ub / \ A.ub /(B')\ A.ub /(B')\
138 / \ / \ \ / / \ / /
139 / X \ __UB____/ UB /
140 / / \ \ / | | /
141 / / / \ / | | /
142 \ \ / / /(A')| | /
143 \ X / / LB ______LB/
144 \ / \ / / / \ / (A')/ \
145 \ / \ / \ / \ \ / \
146 A.lb B.lb A.lb B.lb A.lb B.lb
147 ```
148
149 In these diagrams, UB and LB are defined as before. As you can see,
150 the new ranges `A'` and `B'` are quite different from the range that
151 would be produced by unifying the variables.
152
153 ### What we do now
154
155 Our current technique is to *try* (transactionally) to relate the
156 existing bounds of A and B, if there are any (i.e., if `UB(A) != top
157 && LB(B) != bot`). If that succeeds, we're done. If it fails, then
158 we merge A and B into same variable.
159
160 This is not clearly the correct course. For example, if `UB(A) !=
161 top` but `LB(B) == bot`, we could conceivably set `LB(B)` to `UB(A)`
162 and leave the variables unmerged. This is sometimes the better
163 course, it depends on the program.
164
165 The main case which fails today that I would like to support is:
166
167 ```notrust
168 fn foo<T>(x: T, y: T) { ... }
169
170 fn bar() {
171 let x: @mut int = @mut 3;
172 let y: @int = @3;
173 foo(x, y);
174 }
175 ```
176
177 In principle, the inferencer ought to find that the parameter `T` to
178 `foo(x, y)` is `@const int`. Today, however, it does not; this is
179 because the type variable `T` is merged with the type variable for
180 `X`, and thus inherits its UB/LB of `@mut int`. This leaves no
181 flexibility for `T` to later adjust to accommodate `@int`.
182
183 ### What to do when not all bounds are present
184
185 In the prior discussion we assumed that A.ub was not top and B.lb was
186 not bot. Unfortunately this is rarely the case. Often type variables
187 have "lopsided" bounds. For example, if a variable in the program has
188 been initialized but has not been used, then its corresponding type
189 variable will have a lower bound but no upper bound. When that
190 variable is then used, we would like to know its upper bound---but we
191 don't have one! In this case we'll do different things depending on
192 how the variable is being used.
193
194 ## Transactional support
195
196 Whenever we adjust merge variables or adjust their bounds, we always
197 keep a record of the old value. This allows the changes to be undone.
198
199 ## Regions
200
201 I've only talked about type variables here, but region variables
202 follow the same principle. They have upper- and lower-bounds. A
203 region A is a subregion of a region B if A being valid implies that B
204 is valid. This basically corresponds to the block nesting structure:
205 the regions for outer block scopes are superregions of those for inner
206 block scopes.
207
208 ## Integral and floating-point type variables
209
210 There is a third variety of type variable that we use only for
211 inferring the types of unsuffixed integer literals. Integral type
212 variables differ from general-purpose type variables in that there's
213 no subtyping relationship among the various integral types, so instead
214 of associating each variable with an upper and lower bound, we just
215 use simple unification. Each integer variable is associated with at
216 most one integer type. Floating point types are handled similarly to
217 integral types.
218
219 ## GLB/LUB
220
221 Computing the greatest-lower-bound and least-upper-bound of two
222 types/regions is generally straightforward except when type variables
223 are involved. In that case, we follow a similar "try to use the bounds
224 when possible but otherwise merge the variables" strategy. In other
225 words, `GLB(A, B)` where `A` and `B` are variables will often result
226 in `A` and `B` being merged and the result being `A`.
227
228 ## Type coercion
229
230 We have a notion of assignability which differs somewhat from
231 subtyping; in particular it may cause region borrowing to occur. See
232 the big comment later in this file on Type Coercion for specifics.
233
234 ### In conclusion
235
236 I showed you three ways to relate `A` and `B`. There are also more,
237 of course, though I'm not sure if there are any more sensible options.
238 The main point is that there are various options, each of which
239 produce a distinct range of types for `A` and `B`. Depending on what
240 the correct values for A and B are, one of these options will be the
241 right choice: but of course we don't know the right values for A and B
242 yet, that's what we're trying to find! In our code, we opt to unify
243 (Option #1).
244
245 # Implementation details
246
247 We make use of a trait-like impementation strategy to consolidate
248 duplicated code between subtypes, GLB, and LUB computations. See the
249 section on "Type Combining" below for details.
250
251 */