1 // Copyright 2012 The Rust Project Developers. See the COPYRIGHT
2 // file at the top-level directory of this distribution and at
3 // http://rust-lang.org/COPYRIGHT.
4 //
5 // Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
6 // http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
7 // <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
8 // option. This file may not be copied, modified, or distributed
9 // except according to those terms.
10
11 /*!
12
13 # The Borrow Checker
14
15 This pass has the job of enforcing memory safety. This is a subtle
16 topic. This docs aim to explain both the practice and the theory
17 behind the borrow checker. They start with a high-level overview of
18 how it works, and then proceed to dive into the theoretical
19 background. Finally, they go into detail on some of the more subtle
20 aspects.
21
22 # Table of contents
23
24 These docs are long. Search for the section you are interested in.
25
26 - Overview
27 - Formal model
28 - Borrowing and loans
29 - Moves and initialization
30 - Future work
31
32 # Overview
33
34 The borrow checker checks one function at a time. It operates in two
35 passes. The first pass, called `gather_loans`, walks over the function
36 and identifies all of the places where borrows (e.g., `&` expressions
37 and `ref` bindings) and moves (copies or captures of a linear value)
38 occur. It also tracks initialization sites. For each borrow and move,
39 it checks various basic safety conditions at this time (for example,
40 that the lifetime of the borrow doesn't exceed the lifetime of the
41 value being borrowed, or that there is no move out of an `&T`
42 referent).
43
44 It then uses the dataflow module to propagate which of those borrows
45 may be in scope at each point in the procedure. A loan is considered
46 to come into scope at the expression that caused it and to go out of
47 scope when the lifetime of the resulting reference expires.
48
49 Once the in-scope loans are known for each point in the program, the
50 borrow checker walks the IR again in a second pass called
51 `check_loans`. This pass examines each statement and makes sure that
52 it is safe with respect to the in-scope loans.
53
54 # Formal model
55
56 Throughout the docs we'll consider a simple subset of Rust in which
57 you can only borrow from lvalues, defined like so:
58
59 ```notrust
60 LV = x | LV.f | *LV
61 ```
62
63 Here `x` represents some variable, `LV.f` is a field reference,
64 and `*LV` is a pointer dereference. There is no auto-deref or other
65 niceties. This means that if you have a type like:
66
67 ```notrust
68 struct S { f: uint }
69 ```
70
71 and a variable `a: Box<S>`, then the rust expression `a.f` would correspond
72 to an `LV` of `(*a).f`.
73
74 Here is the formal grammar for the types we'll consider:
75
76 ```notrust
77 TY = () | S<'LT...> | Box<TY> | & 'LT MQ TY | @ MQ TY
78 MQ = mut | imm | const
79 ```
80
81 Most of these types should be pretty self explanatory. Here `S` is a
82 struct name and we assume structs are declared like so:
83
84 ```notrust
85 SD = struct S<'LT...> { (f: TY)... }
86 ```
87
88 # Borrowing and loans
89
90 ## An intuitive explanation
91
92 ### Issuing loans
93
94 Now, imagine we had a program like this:
95
96 ```notrust
97 struct Foo { f: uint, g: uint }
98 ...
99 'a: {
100 let mut x: Box<Foo> = ...;
101 let y = &mut (*x).f;
102 x = ...;
103 }
104 ```
105
106 This is of course dangerous because mutating `x` will free the old
107 value and hence invalidate `y`. The borrow checker aims to prevent
108 this sort of thing.
109
110 #### Loans and restrictions
111
112 The way the borrow checker works is that it analyzes each borrow
113 expression (in our simple model, that's stuff like `&LV`, though in
114 real life there are a few other cases to consider). For each borrow
115 expression, it computes a `Loan`, which is a data structure that
116 records (1) the value being borrowed, (2) the mutability and scope of
117 the borrow, and (3) a set of restrictions. In the code, `Loan` is a
118 struct defined in `middle::borrowck`. Formally, we define `LOAN` as
119 follows:
120
121 ```notrust
122 LOAN = (LV, LT, MQ, RESTRICTION*)
123 RESTRICTION = (LV, ACTION*)
124 ACTION = MUTATE | CLAIM | FREEZE
125 ```
126
127 Here the `LOAN` tuple defines the lvalue `LV` being borrowed; the
128 lifetime `LT` of that borrow; the mutability `MQ` of the borrow; and a
129 list of restrictions. The restrictions indicate actions which, if
130 taken, could invalidate the loan and lead to type safety violations.
131
132 Each `RESTRICTION` is a pair of a restrictive lvalue `LV` (which will
133 either be the path that was borrowed or some prefix of the path that
134 was borrowed) and a set of restricted actions. There are three kinds
135 of actions that may be restricted for the path `LV`:
136
137 - `MUTATE` means that `LV` cannot be assigned to;
138 - `CLAIM` means that the `LV` cannot be borrowed mutably;
139 - `FREEZE` means that the `LV` cannot be borrowed immutably;
140
141 Finally, it is never possible to move from an lvalue that appears in a
142 restriction. This implies that the "empty restriction" `(LV, [])`,
143 which contains an empty set of actions, still has a purpose---it
144 prevents moves from `LV`. I chose not to make `MOVE` a fourth kind of
145 action because that would imply that sometimes moves are permitted
146 from restrictived values, which is not the case.
147
148 #### Example
149
150 To give you a better feeling for what kind of restrictions derived
151 from a loan, let's look at the loan `L` that would be issued as a
152 result of the borrow `&mut (*x).f` in the example above:
153
154 ```notrust
155 L = ((*x).f, 'a, mut, RS) where
156 RS = [((*x).f, [MUTATE, CLAIM, FREEZE]),
157 (*x, [MUTATE, CLAIM, FREEZE]),
158 (x, [MUTATE, CLAIM, FREEZE])]
159 ```
160
161 The loan states that the expression `(*x).f` has been loaned as
162 mutable for the lifetime `'a`. Because the loan is mutable, that means
163 that the value `(*x).f` may be mutated via the newly created reference
164 (and *only* via that pointer). This is reflected in the
165 restrictions `RS` that accompany the loan.
166
167 The first restriction `((*x).f, [MUTATE, CLAIM, FREEZE])` states that
168 the lender may not mutate, freeze, nor alias `(*x).f`. Mutation is
169 illegal because `(*x).f` is only supposed to be mutated via the new
170 reference, not by mutating the original path `(*x).f`. Freezing is
171 illegal because the path now has an `&mut` alias; so even if we the
172 lender were to consider `(*x).f` to be immutable, it might be mutated
173 via this alias. They will be enforced for the lifetime `'a` of the
174 loan. After the loan expires, the restrictions no longer apply.
175
176 The second restriction on `*x` is interesting because it does not
177 apply to the path that was lent (`(*x).f`) but rather to a prefix of
178 the borrowed path. This is due to the rules of inherited mutability:
179 if the user were to assign to (or freeze) `*x`, they would indirectly
180 overwrite (or freeze) `(*x).f`, and thus invalidate the reference
181 that was created. In general it holds that when a path is
182 lent, restrictions are issued for all the owning prefixes of that
183 path. In this case, the path `*x` owns the path `(*x).f` and,
184 because `x` is an owned pointer, the path `x` owns the path `*x`.
185 Therefore, borrowing `(*x).f` yields restrictions on both
186 `*x` and `x`.
187
188 ### Checking for illegal assignments, moves, and reborrows
189
190 Once we have computed the loans introduced by each borrow, the borrow
191 checker uses a data flow propagation to compute the full set of loans
192 in scope at each expression and then uses that set to decide whether
193 that expression is legal. Remember that the scope of loan is defined
194 by its lifetime LT. We sometimes say that a loan which is in-scope at
195 a particular point is an "outstanding loan", and the set of
196 restrictions included in those loans as the "outstanding
197 restrictions".
198
199 The kinds of expressions which in-scope loans can render illegal are:
200 - *assignments* (`lv = v`): illegal if there is an in-scope restriction
201 against mutating `lv`;
202 - *moves*: illegal if there is any in-scope restriction on `lv` at all;
203 - *mutable borrows* (`&mut lv`): illegal there is an in-scope restriction
204 against claiming `lv`;
205 - *immutable borrows* (`&lv`): illegal there is an in-scope restriction
206 against freezing `lv`.
207
208 ## Formal rules
209
210 Now that we hopefully have some kind of intuitive feeling for how the
211 borrow checker works, let's look a bit more closely now at the precise
212 conditions that it uses. For simplicity I will ignore const loans.
213
214 I will present the rules in a modified form of standard inference
215 rules, which looks as follows:
216
217 ```notrust
218 PREDICATE(X, Y, Z) // Rule-Name
219 Condition 1
220 Condition 2
221 Condition 3
222 ```
223
224 The initial line states the predicate that is to be satisfied. The
225 indented lines indicate the conditions that must be met for the
226 predicate to be satisfied. The right-justified comment states the name
227 of this rule: there are comments in the borrowck source referencing
228 these names, so that you can cross reference to find the actual code
229 that corresponds to the formal rule.
230
231 ### Invariants
232
233 I want to collect, at a high-level, the invariants the borrow checker
234 maintains. I will give them names and refer to them throughout the
235 text. Together these invariants are crucial for the overall soundness
236 of the system.
237
238 **Mutability requires uniqueness.** To mutate a path
239
240 **Unique mutability.** There is only one *usable* mutable path to any
241 given memory at any given time. This implies that when claiming memory
242 with an expression like `p = &mut x`, the compiler must guarantee that
243 the borrowed value `x` can no longer be mutated so long as `p` is
244 live. (This is done via restrictions, read on.)
245
246 **.**
247
248
249 ### The `gather_loans` pass
250
251 We start with the `gather_loans` pass, which walks the AST looking for
252 borrows. For each borrow, there are three bits of information: the
253 lvalue `LV` being borrowed and the mutability `MQ` and lifetime `LT`
254 of the resulting pointer. Given those, `gather_loans` applies four
255 validity tests:
256
257 1. `MUTABILITY(LV, MQ)`: The mutability of the reference is
258 compatible with the mutability of `LV` (i.e., not borrowing immutable
259 data as mutable).
260
261 2. `ALIASABLE(LV, MQ)`: The aliasability of the reference is
262 compatible with the aliasability of `LV`. The goal is to prevent
263 `&mut` borrows of aliasability data.
264
265 3. `LIFETIME(LV, LT, MQ)`: The lifetime of the borrow does not exceed
266 the lifetime of the value being borrowed. This pass is also
267 responsible for inserting root annotations to keep managed values
268 alive.
269
270 4. `RESTRICTIONS(LV, LT, ACTIONS) = RS`: This pass checks and computes the
271 restrictions to maintain memory safety. These are the restrictions
272 that will go into the final loan. We'll discuss in more detail below.
273
274 ## Checking mutability
275
276 Checking mutability is fairly straightforward. We just want to prevent
277 immutable data from being borrowed as mutable. Note that it is ok to
278 borrow mutable data as immutable, since that is simply a
279 freeze. Formally we define a predicate `MUTABLE(LV, MQ)` which, if
280 defined, means that "borrowing `LV` with mutability `MQ` is ok. The
281 Rust code corresponding to this predicate is the function
282 `check_mutability` in `middle::borrowck::gather_loans`.
283
284 ### Checking mutability of variables
285
286 *Code pointer:* Function `check_mutability()` in `gather_loans/mod.rs`,
287 but also the code in `mem_categorization`.
288
289 Let's begin with the rules for variables, which state that if a
290 variable is declared as mutable, it may be borrowed any which way, but
291 otherwise the variable must be borrowed as immutable or const:
292
293 ```notrust
294 MUTABILITY(X, MQ) // M-Var-Mut
295 DECL(X) = mut
296
297 MUTABILITY(X, MQ) // M-Var-Imm
298 DECL(X) = imm
299 MQ = imm | const
300 ```
301
302 ### Checking mutability of owned content
303
304 Fields and owned pointers inherit their mutability from
305 their base expressions, so both of their rules basically
306 delegate the check to the base expression `LV`:
307
308 ```notrust
309 MUTABILITY(LV.f, MQ) // M-Field
310 MUTABILITY(LV, MQ)
311
312 MUTABILITY(*LV, MQ) // M-Deref-Unique
313 TYPE(LV) = Box<Ty>
314 MUTABILITY(LV, MQ)
315 ```
316
317 ### Checking mutability of immutable pointer types
318
319 Immutable pointer types like `&T` and `@T` can only
320 be borrowed if MQ is immutable or const:
321
322 ```notrust
323 MUTABILITY(*LV, MQ) // M-Deref-Borrowed-Imm
324 TYPE(LV) = &Ty
325 MQ == imm | const
326
327 MUTABILITY(*LV, MQ) // M-Deref-Managed-Imm
328 TYPE(LV) = @Ty
329 MQ == imm | const
330 ```
331
332 ### Checking mutability of mutable pointer types
333
334 `&mut T` can be frozen, so it is acceptable to borrow it as either imm or mut:
335
336 ```notrust
337 MUTABILITY(*LV, MQ) // M-Deref-Borrowed-Mut
338 TYPE(LV) = &mut Ty
339 ```
340
341 ## Checking aliasability
342
343 The goal of the aliasability check is to ensure that we never permit
344 `&mut` borrows of aliasable data. Formally we define a predicate
345 `ALIASABLE(LV, MQ)` which if defined means that
346 "borrowing `LV` with mutability `MQ` is ok". The
347 Rust code corresponding to this predicate is the function
348 `check_aliasability()` in `middle::borrowck::gather_loans`.
349
350 ### Checking aliasability of variables
351
352 Local variables are never aliasable as they are accessible only within
353 the stack frame.
354
355 ```notrust
356 ALIASABLE(X, MQ) // M-Var-Mut
357 ```
358
359 ### Checking aliasable of owned content
360
361 Owned content is aliasable if it is found in an aliasable location:
362
363 ```notrust
364 ALIASABLE(LV.f, MQ) // M-Field
365 ALIASABLE(LV, MQ)
366
367 ALIASABLE(*LV, MQ) // M-Deref-Unique
368 ALIASABLE(LV, MQ)
369 ```
370
371 ### Checking mutability of immutable pointer types
372
373 Immutable pointer types like `&T` are aliasable, and hence can only be
374 borrowed immutably:
375
376 ```notrust
377 ALIASABLE(*LV, imm) // M-Deref-Borrowed-Imm
378 TYPE(LV) = &Ty
379 ```
380
381 ### Checking mutability of mutable pointer types
382
383 `&mut T` can be frozen, so it is acceptable to borrow it as either imm or mut:
384
385 ```notrust
386 ALIASABLE(*LV, MQ) // M-Deref-Borrowed-Mut
387 TYPE(LV) = &mut Ty
388 ```
389
390 ## Checking lifetime
391
392 These rules aim to ensure that no data is borrowed for a scope that exceeds
393 its lifetime. In addition, these rules manage the rooting of `@` values.
394 These two computations wind up being intimately related. Formally, we define
395 a predicate `LIFETIME(LV, LT, MQ)`, which states that "the lvalue `LV` can be
396 safely borrowed for the lifetime `LT` with mutability `MQ`". The Rust
397 code corresponding to this predicate is the module
398 `middle::borrowck::gather_loans::lifetime`.
399
400 ### The Scope function
401
402 Several of the rules refer to a helper function `SCOPE(LV)=LT`. The
403 `SCOPE(LV)` yields the lifetime `LT` for which the lvalue `LV` is
404 guaranteed to exist, presuming that no mutations occur.
405
406 The scope of a local variable is the block where it is declared:
407
408 ```notrust
409 SCOPE(X) = block where X is declared
410 ```
411
412 The scope of a field is the scope of the struct:
413
414 ```notrust
415 SCOPE(LV.f) = SCOPE(LV)
416 ```
417
418 The scope of a unique referent is the scope of the pointer, since
419 (barring mutation or moves) the pointer will not be freed until
420 the pointer itself `LV` goes out of scope:
421
422 ```notrust
423 SCOPE(*LV) = SCOPE(LV) if LV has type Box<T>
424 ```
425
426 The scope of a managed referent is also the scope of the pointer. This
427 is a conservative approximation, since there may be other aliases fo
428 that same managed box that would cause it to live longer:
429
430 ```notrust
431 SCOPE(*LV) = SCOPE(LV) if LV has type @T
432 ```
433
434 The scope of a borrowed referent is the scope associated with the
435 pointer. This is a conservative approximation, since the data that
436 the pointer points at may actually live longer:
437
438 ```notrust
439 SCOPE(*LV) = LT if LV has type &'LT T or &'LT mut T
440 ```
441
442 ### Checking lifetime of variables
443
444 The rule for variables states that a variable can only be borrowed a
445 lifetime `LT` that is a subregion of the variable's scope:
446
447 ```notrust
448 LIFETIME(X, LT, MQ) // L-Local
449 LT <= SCOPE(X)
450 ```
451
452 ### Checking lifetime for owned content
453
454 The lifetime of a field or owned pointer is the same as the lifetime
455 of its owner:
456
457 ```notrust
458 LIFETIME(LV.f, LT, MQ) // L-Field
459 LIFETIME(LV, LT, MQ)
460
461 LIFETIME(*LV, LT, MQ) // L-Deref-Send
462 TYPE(LV) = Box<Ty>
463 LIFETIME(LV, LT, MQ)
464 ```
465
466 ### Checking lifetime for derefs of references
467
468 References have a lifetime `LT'` associated with them. The
469 data they point at has been guaranteed to be valid for at least this
470 lifetime. Therefore, the borrow is valid so long as the lifetime `LT`
471 of the borrow is shorter than the lifetime `LT'` of the pointer
472 itself:
473
474 ```notrust
475 LIFETIME(*LV, LT, MQ) // L-Deref-Borrowed
476 TYPE(LV) = <' Ty OR <' mut Ty
477 LT <= LT'
478 ```
479
480 ### Checking lifetime for derefs of managed, immutable pointers
481
482 Managed pointers are valid so long as the data within them is
483 *rooted*. There are two ways that this can be achieved. The first is
484 when the user guarantees such a root will exist. For this to be true,
485 three conditions must be met:
486
487 ```notrust
488 LIFETIME(*LV, LT, MQ) // L-Deref-Managed-Imm-User-Root
489 TYPE(LV) = @Ty
490 LT <= SCOPE(LV) // (1)
491 LV is immutable // (2)
492 LV is not moved or not movable // (3)
493 ```
494
495 Condition (1) guarantees that the managed box will be rooted for at
496 least the lifetime `LT` of the borrow, presuming that no mutation or
497 moves occur. Conditions (2) and (3) then serve to guarantee that the
498 value is not mutated or moved. Note that lvalues are either
499 (ultimately) owned by a local variable, in which case we can check
500 whether that local variable is ever moved in its scope, or they are
501 owned by the referent of an (immutable, due to condition 2) managed or
502 references, in which case moves are not permitted because the
503 location is aliasable.
504
505 If the conditions of `L-Deref-Managed-Imm-User-Root` are not met, then
506 there is a second alternative. The compiler can attempt to root the
507 managed pointer itself. This permits great flexibility, because the
508 location `LV` where the managed pointer is found does not matter, but
509 there are some limitations. The lifetime of the borrow can only extend
510 to the innermost enclosing loop or function body. This guarantees that
511 the compiler never requires an unbounded amount of stack space to
512 perform the rooting; if this condition were violated, the compiler
513 might have to accumulate a list of rooted objects, for example if the
514 borrow occurred inside the body of a loop but the scope of the borrow
515 extended outside the loop. More formally, the requirement is that
516 there is no path starting from the borrow that leads back to the
517 borrow without crossing the exit from the scope `LT`.
518
519 The rule for compiler rooting is as follows:
520
521 ```notrust
522 LIFETIME(*LV, LT, MQ) // L-Deref-Managed-Imm-Compiler-Root
523 TYPE(LV) = @Ty
524 LT <= innermost enclosing loop/func
525 ROOT LV at *LV for LT
526 ```
527
528 Here I have written `ROOT LV at *LV FOR LT` to indicate that the code
529 makes a note in a side-table that the box `LV` must be rooted into the
530 stack when `*LV` is evaluated, and that this root can be released when
531 the scope `LT` exits.
532
533 ## Computing the restrictions
534
535 The final rules govern the computation of *restrictions*, meaning that
536 we compute the set of actions that will be illegal for the life of the
537 loan. The predicate is written `RESTRICTIONS(LV, LT, ACTIONS) =
538 RESTRICTION*`, which can be read "in order to prevent `ACTIONS` from
539 occuring on `LV`, the restrictions `RESTRICTION*` must be respected
540 for the lifetime of the loan".
541
542 Note that there is an initial set of restrictions: these restrictions
543 are computed based on the kind of borrow:
544
545 ```notrust
546 &mut LV => RESTRICTIONS(LV, LT, MUTATE|CLAIM|FREEZE)
547 &LV => RESTRICTIONS(LV, LT, MUTATE|CLAIM)
548 &const LV => RESTRICTIONS(LV, LT, [])
549 ```
550
551 The reasoning here is that a mutable borrow must be the only writer,
552 therefore it prevents other writes (`MUTATE`), mutable borrows
553 (`CLAIM`), and immutable borrows (`FREEZE`). An immutable borrow
554 permits other immutable borows but forbids writes and mutable borows.
555 Finally, a const borrow just wants to be sure that the value is not
556 moved out from under it, so no actions are forbidden.
557
558 ### Restrictions for loans of a local variable
559
560 The simplest case is a borrow of a local variable `X`:
561
562 ```notrust
563 RESTRICTIONS(X, LT, ACTIONS) = (X, ACTIONS) // R-Variable
564 ```
565
566 In such cases we just record the actions that are not permitted.
567
568 ### Restrictions for loans of fields
569
570 Restricting a field is the same as restricting the owner of that
571 field:
572
573 ```notrust
574 RESTRICTIONS(LV.f, LT, ACTIONS) = RS, (LV.f, ACTIONS) // R-Field
575 RESTRICTIONS(LV, LT, ACTIONS) = RS
576 ```
577
578 The reasoning here is as follows. If the field must not be mutated,
579 then you must not mutate the owner of the field either, since that
580 would indirectly modify the field. Similarly, if the field cannot be
581 frozen or aliased, we cannot allow the owner to be frozen or aliased,
582 since doing so indirectly freezes/aliases the field. This is the
583 origin of inherited mutability.
584
585 ### Restrictions for loans of owned referents
586
587 Because the mutability of owned referents is inherited, restricting an
588 owned referent is similar to restricting a field, in that it implies
589 restrictions on the pointer. However, owned pointers have an important
590 twist: if the owner `LV` is mutated, that causes the owned referent
591 `*LV` to be freed! So whenever an owned referent `*LV` is borrowed, we
592 must prevent the owned pointer `LV` from being mutated, which means
593 that we always add `MUTATE` and `CLAIM` to the restriction set imposed
594 on `LV`:
595
596 ```notrust
597 RESTRICTIONS(*LV, LT, ACTIONS) = RS, (*LV, ACTIONS) // R-Deref-Send-Pointer
598 TYPE(LV) = Box<Ty>
599 RESTRICTIONS(LV, LT, ACTIONS|MUTATE|CLAIM) = RS
600 ```
601
602 ### Restrictions for loans of immutable managed/borrowed referents
603
604 Immutable managed/borrowed referents are freely aliasable, meaning that
605 the compiler does not prevent you from copying the pointer. This
606 implies that issuing restrictions is useless. We might prevent the
607 user from acting on `*LV` itself, but there could be another path
608 `*LV1` that refers to the exact same memory, and we would not be
609 restricting that path. Therefore, the rule for `&Ty` and `@Ty`
610 pointers always returns an empty set of restrictions, and it only
611 permits restricting `MUTATE` and `CLAIM` actions:
612
613 ```notrust
614 RESTRICTIONS(*LV, LT, ACTIONS) = [] // R-Deref-Imm-Managed
615 TYPE(LV) = @Ty
616 ACTIONS subset of [MUTATE, CLAIM]
617
618 RESTRICTIONS(*LV, LT, ACTIONS) = [] // R-Deref-Imm-Borrowed
619 TYPE(LV) = <' Ty
620 LT <= LT' // (1)
621 ACTIONS subset of [MUTATE, CLAIM]
622 ```
623
624 The reason that we can restrict `MUTATE` and `CLAIM` actions even
625 without a restrictions list is that it is never legal to mutate nor to
626 borrow mutably the contents of a `&Ty` or `@Ty` pointer. In other
627 words, those restrictions are already inherent in the type.
628
629 Clause (1) in the rule for `&Ty` deserves mention. Here I
630 specify that the lifetime of the loan must be less than the lifetime
631 of the `&Ty` pointer. In simple cases, this clause is redundant, since
632 the `LIFETIME()` function will already enforce the required rule:
633
634 ```
635 fn foo(point: &'a Point) -> &'static f32 {
636 &point.x // Error
637 }
638 ```
639
640 The above example fails to compile both because of clause (1) above
641 but also by the basic `LIFETIME()` check. However, in more advanced
642 examples involving multiple nested pointers, clause (1) is needed:
643
644 ```
645 fn foo(point: &'a &'b mut Point) -> &'b f32 {
646 &point.x // Error
647 }
648 ```
649
650 The `LIFETIME` rule here would accept `'b` because, in fact, the
651 *memory is* guaranteed to remain valid (i.e., not be freed) for the
652 lifetime `'b`, since the `&mut` pointer is valid for `'b`. However, we
653 are returning an immutable reference, so we need the memory to be both
654 valid and immutable. Even though `point.x` is referenced by an `&mut`
655 pointer, it can still be considered immutable so long as that `&mut`
656 pointer is found in an aliased location. That means the memory is
657 guaranteed to be *immutable* for the lifetime of the `&` pointer,
658 which is only `'a`, not `'b`. Hence this example yields an error.
659
660 As a final twist, consider the case of two nested *immutable*
661 pointers, rather than a mutable pointer within an immutable one:
662
663 ```
664 fn foo(point: &'a &'b Point) -> &'b f32 {
665 &point.x // OK
666 }
667 ```
668
669 This function is legal. The reason for this is that the inner pointer
670 (`*point : &'b Point`) is enough to guarantee the memory is immutable
671 and valid for the lifetime `'b`. This is reflected in
672 `RESTRICTIONS()` by the fact that we do not recurse (i.e., we impose
673 no restrictions on `LV`, which in this particular case is the pointer
674 `point : &'a &'b Point`).
675
676 #### Why both `LIFETIME()` and `RESTRICTIONS()`?
677
678 Given the previous text, it might seem that `LIFETIME` and
679 `RESTRICTIONS` should be folded together into one check, but there is
680 a reason that they are separated. They answer separate concerns.
681 The rules pertaining to `LIFETIME` exist to ensure that we don't
682 create a borrowed pointer that outlives the memory it points at. So
683 `LIFETIME` prevents a function like this:
684
685 ```
686 fn get_1<'a>() -> &'a int {
687 let x = 1;
688 &x
689 }
690 ```
691
692 Here we would be returning a pointer into the stack. Clearly bad.
693
694 However, the `RESTRICTIONS` rules are more concerned with how memory
695 is used. The example above doesn't generate an error according to
696 `RESTRICTIONS` because, for local variables, we don't require that the
697 loan lifetime be a subset of the local variable lifetime. The idea
698 here is that we *can* guarantee that `x` is not (e.g.) mutated for the
699 lifetime `'a`, even though `'a` exceeds the function body and thus
700 involves unknown code in the caller -- after all, `x` ceases to exist
701 after we return and hence the remaining code in `'a` cannot possibly
702 mutate it. This distinction is important for type checking functions
703 like this one:
704
705 ```
706 fn inc_and_get<'a>(p: &'a mut Point) -> &'a int {
707 p.x += 1;
708 &p.x
709 }
710 ```
711
712 In this case, we take in a `&mut` and return a frozen borrowed pointer
713 with the same lifetime. So long as the lifetime of the returned value
714 doesn't exceed the lifetime of the `&mut` we receive as input, this is
715 fine, though it may seem surprising at first (it surprised me when I
716 first worked it through). After all, we're guaranteeing that `*p`
717 won't be mutated for the lifetime `'a`, even though we can't "see" the
718 entirety of the code during that lifetime, since some of it occurs in
719 our caller. But we *do* know that nobody can mutate `*p` except
720 through `p`. So if we don't mutate `*p` and we don't return `p`, then
721 we know that the right to mutate `*p` has been lost to our caller --
722 in terms of capability, the caller passed in the ability to mutate
723 `*p`, and we never gave it back. (Note that we can't return `p` while
724 `*p` is borrowed since that would be a move of `p`, as `&mut` pointers
725 are affine.)
726
727 ### Restrictions for loans of const aliasable referents
728
729 Freeze pointers are read-only. There may be `&mut` or `&` aliases, and
730 we can not prevent *anything* but moves in that case. So the
731 `RESTRICTIONS` function is only defined if `ACTIONS` is the empty set.
732 Because moves from a `&const` or `@const` lvalue are never legal, it
733 is not necessary to add any restrictions at all to the final
734 result.
735
736 ```notrust
737 RESTRICTIONS(*LV, LT, []) = [] // R-Deref-Freeze-Borrowed
738 TYPE(LV) = &const Ty or @const Ty
739 ```
740
741 ### Restrictions for loans of mutable borrowed referents
742
743 Mutable borrowed pointers are guaranteed to be the only way to mutate
744 their referent. This permits us to take greater license with them; for
745 example, the referent can be frozen simply be ensuring that we do not
746 use the original pointer to perform mutate. Similarly, we can allow
747 the referent to be claimed, so long as the original pointer is unused
748 while the new claimant is live.
749
750 The rule for mutable borrowed pointers is as follows:
751
752 ```notrust
753 RESTRICTIONS(*LV, LT, ACTIONS) = RS, (*LV, ACTIONS) // R-Deref-Mut-Borrowed
754 TYPE(LV) = <' mut Ty
755 LT <= LT' // (1)
756 RESTRICTIONS(LV, LT, ACTIONS) = RS // (2)
757 ```
758
759 Let's examine the two numbered clauses:
760
761 Clause (1) specifies that the lifetime of the loan (`LT`) cannot
762 exceed the lifetime of the `&mut` pointer (`LT'`). The reason for this
763 is that the `&mut` pointer is guaranteed to be the only legal way to
764 mutate its referent -- but only for the lifetime `LT'`. After that
765 lifetime, the loan on the referent expires and hence the data may be
766 modified by its owner again. This implies that we are only able to
767 guarantee that the referent will not be modified or aliased for a
768 maximum of `LT'`.
769
770 Here is a concrete example of a bug this rule prevents:
771
772 ```
773 // Test region-reborrow-from-shorter-mut-ref.rs:
774 fn copy_pointer<'a,'b,T>(x: &'a mut &'b mut T) -> &'b mut T {
775 &mut **p // ERROR due to clause (1)
776 }
777 fn main() {
778 let mut x = 1;
779 let mut y = &mut x; // <-'b-----------------------------+
780 // +-'a--------------------+ |
781 // v v |
782 let z = copy_borrowed_ptr(&mut y); // y is lent |
783 *y += 1; // Here y==z, so both should not be usable... |
784 *z += 1; // ...and yet they would be, but for clause 1. |
785 } // <------------------------------------------------------+
786 ```
787
788 Clause (2) propagates the restrictions on the referent to the pointer
789 itself. This is the same as with an owned pointer, though the
790 reasoning is mildly different. The basic goal in all cases is to
791 prevent the user from establishing another route to the same data. To
792 see what I mean, let's examine various cases of what can go wrong and
793 show how it is prevented.
794
795 **Example danger 1: Moving the base pointer.** One of the simplest
796 ways to violate the rules is to move the base pointer to a new name
797 and access it via that new name, thus bypassing the restrictions on
798 the old name. Here is an example:
799
800 ```
801 // src/test/compile-fail/borrowck-move-mut-base-ptr.rs
802 fn foo(t0: &mut int) {
803 let p: &int = &*t0; // Freezes `*t0`
804 let t1 = t0; //~ ERROR cannot move out of `t0`
805 *t1 = 22; // OK, not a write through `*t0`
806 }
807 ```
808
809 Remember that `&mut` pointers are linear, and hence `let t1 = t0` is a
810 move of `t0` -- or would be, if it were legal. Instead, we get an
811 error, because clause (2) imposes restrictions on `LV` (`t0`, here),
812 and any restrictions on a path make it impossible to move from that
813 path.
814
815 **Example danger 2: Claiming the base pointer.** Another possible
816 danger is to mutably borrow the base path. This can lead to two bad
817 scenarios. The most obvious is that the mutable borrow itself becomes
818 another path to access the same data, as shown here:
819
820 ```
821 // src/test/compile-fail/borrowck-mut-borrow-of-mut-base-ptr.rs
822 fn foo<'a>(mut t0: &'a mut int,
823 mut t1: &'a mut int) {
824 let p: &int = &*t0; // Freezes `*t0`
825 let mut t2 = &mut t0; //~ ERROR cannot borrow `t0`
826 **t2 += 1; // Mutates `*t0`
827 }
828 ```
829
830 In this example, `**t2` is the same memory as `*t0`. Because `t2` is
831 an `&mut` pointer, `**t2` is a unique path and hence it would be
832 possible to mutate `**t2` even though that memory was supposed to be
833 frozen by the creation of `p`. However, an error is reported -- the
834 reason is that the freeze `&*t0` will restrict claims and mutation
835 against `*t0` which, by clause 2, in turn prevents claims and mutation
836 of `t0`. Hence the claim `&mut t0` is illegal.
837
838 Another danger with an `&mut` pointer is that we could swap the `t0`
839 value away to create a new path:
840
841 ```
842 // src/test/compile-fail/borrowck-swap-mut-base-ptr.rs
843 fn foo<'a>(mut t0: &'a mut int,
844 mut t1: &'a mut int) {
845 let p: &int = &*t0; // Freezes `*t0`
846 swap(&mut t0, &mut t1); //~ ERROR cannot borrow `t0`
847 *t1 = 22;
848 }
849 ```
850
851 This is illegal for the same reason as above. Note that if we added
852 back a swap operator -- as we used to have -- we would want to be very
853 careful to ensure this example is still illegal.
854
855 **Example danger 3: Freeze the base pointer.** In the case where the
856 referent is claimed, even freezing the base pointer can be dangerous,
857 as shown in the following example:
858
859 ```
860 // src/test/compile-fail/borrowck-borrow-of-mut-base-ptr.rs
861 fn foo<'a>(mut t0: &'a mut int,
862 mut t1: &'a mut int) {
863 let p: &mut int = &mut *t0; // Claims `*t0`
864 let mut t2 = &t0; //~ ERROR cannot borrow `t0`
865 let q: &int = &*t2; // Freezes `*t0` but not through `*p`
866 *p += 1; // violates type of `*q`
867 }
868 ```
869
870 Here the problem is that `*t0` is claimed by `p`, and hence `p` wants
871 to be the controlling pointer through which mutation or freezes occur.
872 But `t2` would -- if it were legal -- have the type `& &mut int`, and
873 hence would be a mutable pointer in an aliasable location, which is
874 considered frozen (since no one can write to `**t2` as it is not a
875 unique path). Therefore, we could reasonably create a frozen `&int`
876 pointer pointing at `*t0` that coexists with the mutable pointer `p`,
877 which is clearly unsound.
878
879 However, it is not always unsafe to freeze the base pointer. In
880 particular, if the referent is frozen, there is no harm in it:
881
882 ```
883 // src/test/run-pass/borrowck-borrow-of-mut-base-ptr-safe.rs
884 fn foo<'a>(mut t0: &'a mut int,
885 mut t1: &'a mut int) {
886 let p: &int = &*t0; // Freezes `*t0`
887 let mut t2 = &t0;
888 let q: &int = &*t2; // Freezes `*t0`, but that's ok...
889 let r: &int = &*t0; // ...after all, could do same thing directly.
890 }
891 ```
892
893 In this case, creating the alias `t2` of `t0` is safe because the only
894 thing `t2` can be used for is to further freeze `*t0`, which is
895 already frozen. In particular, we cannot assign to `*t0` through the
896 new alias `t2`, as demonstrated in this test case:
897
898 ```
899 // src/test/run-pass/borrowck-borrow-mut-base-ptr-in-aliasable-loc.rs
900 fn foo(t0: & &mut int) {
901 let t1 = t0;
902 let p: &int = &**t0;
903 **t1 = 22; //~ ERROR cannot assign
904 }
905 ```
906
907 This distinction is reflected in the rules. When doing an `&mut`
908 borrow -- as in the first example -- the set `ACTIONS` will be
909 `CLAIM|MUTATE|FREEZE`, because claiming the referent implies that it
910 cannot be claimed, mutated, or frozen by anyone else. These
911 restrictions are propagated back to the base path and hence the base
912 path is considered unfreezable.
913
914 In contrast, when the referent is merely frozen -- as in the second
915 example -- the set `ACTIONS` will be `CLAIM|MUTATE`, because freezing
916 the referent implies that it cannot be claimed or mutated but permits
917 others to freeze. Hence when these restrictions are propagated back to
918 the base path, it will still be considered freezable.
919
920
921
922 **FIXME #10520: Restrictions against mutating the base pointer.** When
923 an `&mut` pointer is frozen or claimed, we currently pass along the
924 restriction against MUTATE to the base pointer. I do not believe this
925 restriction is needed. It dates from the days when we had a way to
926 mutate that preserved the value being mutated (i.e., swap). Nowadays
927 the only form of mutation is assignment, which destroys the pointer
928 being mutated -- therefore, a mutation cannot create a new path to the
929 same data. Rather, it removes an existing path. This implies that not
930 only can we permit mutation, we can have mutation kill restrictions in
931 the dataflow sense.
932
933 **WARNING:** We do not currently have `const` borrows in the
934 language. If they are added back in, we must ensure that they are
935 consistent with all of these examples. The crucial question will be
936 what sorts of actions are permitted with a `&const &mut` pointer. I
937 would suggest that an `&mut` referent found in an `&const` location be
938 prohibited from both freezes and claims. This would avoid the need to
939 prevent `const` borrows of the base pointer when the referent is
940 borrowed.
941
942 # Moves and initialization
943
944 The borrow checker is also in charge of ensuring that:
945
946 - all memory which is accessed is initialized
947 - immutable local variables are assigned at most once.
948
949 These are two separate dataflow analyses built on the same
950 framework. Let's look at checking that memory is initialized first;
951 the checking of immutable local variabe assignments works in a very
952 similar way.
953
954 To track the initialization of memory, we actually track all the
955 points in the program that *create uninitialized memory*, meaning
956 moves and the declaration of uninitialized variables. For each of
957 these points, we create a bit in the dataflow set. Assignments to a
958 variable `x` or path `a.b.c` kill the move/uninitialization bits for
959 those paths and any subpaths (e.g., `x`, `x.y`, `a.b.c`, `*a.b.c`).
960 The bits are also killed when the root variables (`x`, `a`) go out of
961 scope. Bits are unioned when two control-flow paths join. Thus, the
962 presence of a bit indicates that the move may have occurred without an
963 intervening assignment to the same memory. At each use of a variable,
964 we examine the bits in scope, and check that none of them are
965 moves/uninitializations of the variable that is being used.
966
967 Let's look at a simple example:
968
969 ```
970 fn foo(a: Box<int>) {
971 let b: Box<int>; // Gen bit 0.
972
973 if cond { // Bits: 0
974 use(&*a);
975 b = a; // Gen bit 1, kill bit 0.
976 use(&*b);
977 } else {
978 // Bits: 0
979 }
980 // Bits: 0,1
981 use(&*a); // Error.
982 use(&*b); // Error.
983 }
984
985 fn use(a: &int) { }
986 ```
987
988 In this example, the variable `b` is created uninitialized. In one
989 branch of an `if`, we then move the variable `a` into `b`. Once we
990 exit the `if`, therefore, it is an error to use `a` or `b` since both
991 are only conditionally initialized. I have annotated the dataflow
992 state using comments. There are two dataflow bits, with bit 0
993 corresponding to the creation of `b` without an initializer, and bit 1
994 corresponding to the move of `a`. The assignment `b = a` both
995 generates bit 1, because it is a move of `a`, and kills bit 0, because
996 `b` is now initialized. On the else branch, though, `b` is never
997 initialized, and so bit 0 remains untouched. When the two flows of
998 control join, we union the bits from both sides, resulting in both
999 bits 0 and 1 being set. Thus any attempt to use `a` uncovers the bit 1
1000 from the "then" branch, showing that `a` may be moved, and any attempt
1001 to use `b` uncovers bit 0, from the "else" branch, showing that `b`
1002 may not be initialized.
1003
1004 ## Initialization of immutable variables
1005
1006 Initialization of immutable variables works in a very similar way,
1007 except that:
1008
1009 1. we generate bits for each assignment to a variable;
1010 2. the bits are never killed except when the variable goes out of scope.
1011
1012 Thus the presence of an assignment bit indicates that the assignment
1013 may have occurred. Note that assignments are only killed when the
1014 variable goes out of scope, as it is not relevant whether or not there
1015 has been a move in the meantime. Using these bits, we can declare that
1016 an assignment to an immutable variable is legal iff there is no other
1017 assignment bit to that same variable in scope.
1018
1019 ## Why is the design made this way?
1020
1021 It may seem surprising that we assign dataflow bits to *each move*
1022 rather than *each path being moved*. This is somewhat less efficient,
1023 since on each use, we must iterate through all moves and check whether
1024 any of them correspond to the path in question. Similar concerns apply
1025 to the analysis for double assignments to immutable variables. The
1026 main reason to do it this way is that it allows us to print better
1027 error messages, because when a use occurs, we can print out the
1028 precise move that may be in scope, rather than simply having to say
1029 "the variable may not be initialized".
1030
1031 ## Data structures used in the move analysis
1032
1033 The move analysis maintains several data structures that enable it to
1034 cross-reference moves and assignments to determine when they may be
1035 moving/assigning the same memory. These are all collected into the
1036 `MoveData` and `FlowedMoveData` structs. The former represents the set
1037 of move paths, moves, and assignments, and the latter adds in the
1038 results of a dataflow computation.
1039
1040 ### Move paths
1041
1042 The `MovePath` tree tracks every path that is moved or assigned to.
1043 These paths have the same form as the `LoanPath` data structure, which
1044 in turn is the "real world version of the lvalues `LV` that we
1045 introduced earlier. The difference between a `MovePath` and a `LoanPath`
1046 is that move paths are:
1047
1048 1. Canonicalized, so that we have exactly one copy of each, and
1049 we can refer to move paths by index;
1050 2. Cross-referenced with other paths into a tree, so that given a move
1051 path we can efficiently find all parent move paths and all
1052 extensions (e.g., given the `a.b` move path, we can easily find the
1053 move path `a` and also the move paths `a.b.c`)
1054 3. Cross-referenced with moves and assignments, so that we can
1055 easily find all moves and assignments to a given path.
1056
1057 The mechanism that we use is to create a `MovePath` record for each
1058 move path. These are arranged in an array and are referenced using
1059 `MovePathIndex` values, which are newtype'd indices. The `MovePath`
1060 structs are arranged into a tree, representing using the standard
1061 Knuth representation where each node has a child 'pointer' and a "next
1062 sibling" 'pointer'. In addition, each `MovePath` has a parent
1063 'pointer'. In this case, the 'pointers' are just `MovePathIndex`
1064 values.
1065
1066 In this way, if we want to find all base paths of a given move path,
1067 we can just iterate up the parent pointers (see `each_base_path()` in
1068 the `move_data` module). If we want to find all extensions, we can
1069 iterate through the subtree (see `each_extending_path()`).
1070
1071 ### Moves and assignments
1072
1073 There are structs to represent moves (`Move`) and assignments
1074 (`Assignment`), and these are also placed into arrays and referenced
1075 by index. All moves of a particular path are arranged into a linked
1076 lists, beginning with `MovePath.first_move` and continuing through
1077 `Move.next_move`.
1078
1079 We distinguish between "var" assignments, which are assignments to a
1080 variable like `x = foo`, and "path" assignments (`x.f = foo`). This
1081 is because we need to assign dataflows to the former, but not the
1082 latter, so as to check for double initialization of immutable
1083 variables.
1084
1085 ### Gathering and checking moves
1086
1087 Like loans, we distinguish two phases. The first, gathering, is where
1088 we uncover all the moves and assignments. As with loans, we do some
1089 basic sanity checking in this phase, so we'll report errors if you
1090 attempt to move out of a borrowed pointer etc. Then we do the dataflow
1091 (see `FlowedMoveData::new`). Finally, in the `check_loans.rs` code, we
1092 walk back over, identify all uses, assignments, and captures, and
1093 check that they are legal given the set of dataflow bits we have
1094 computed for that program point.
1095
1096 # Future work
1097
1098 While writing up these docs, I encountered some rules I believe to be
1099 stricter than necessary:
1100
1101 - I think restricting the `&mut` LV against moves and `ALIAS` is sufficient,
1102 `MUTATE` and `CLAIM` are overkill. `MUTATE` was necessary when swap was
1103 a built-in operator, but as it is not, it is implied by `CLAIM`,
1104 and `CLAIM` is implied by `ALIAS`. The only net effect of this is an
1105 extra error message in some cases, though.
1106 - I have not described how closures interact. Current code is unsound.
1107 I am working on describing and implementing the fix.
1108 - If we wish, we can easily extend the move checking to allow finer-grained
1109 tracking of what is initialized and what is not, enabling code like
1110 this:
1111
1112 a = x.f.g; // x.f.g is now uninitialized
1113 // here, x and x.f are not usable, but x.f.h *is*
1114 x.f.g = b; // x.f.g is not initialized
1115 // now x, x.f, x.f.g, x.f.h are all usable
1116
1117 What needs to change here, most likely, is that the `moves` module
1118 should record not only what paths are moved, but what expressions
1119 are actual *uses*. For example, the reference to `x` in `x.f.g = b`
1120 is not a true *use* in the sense that it requires `x` to be fully
1121 initialized. This is in fact why the above code produces an error
1122 today: the reference to `x` in `x.f.g = b` is considered illegal
1123 because `x` is not fully initialized.
1124
1125 There are also some possible refactorings:
1126
1127 - It might be nice to replace all loan paths with the MovePath mechanism,
1128 since they allow lightweight comparison using an integer.
1129
1130 */