1 // Copyright 2014 The Rust Project Developers. See the COPYRIGHT
2 // file at the top-level directory of this distribution and at
3 // http://rust-lang.org/COPYRIGHT.
4 //
5 // Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
6 // http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
7 // <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
8 // option. This file may not be copied, modified, or distributed
9 // except according to those terms.
10
11 /*!
12
13 # Documentation for the trans module
14
15 This module contains high-level summaries of how the various modules
16 in trans work. It is a work in progress. For detailed comments,
17 naturally, you can refer to the individual modules themselves.
18
19 ## The Expr module
20
21 The expr module handles translation of expressions. The most general
22 translation routine is `trans()`, which will translate an expression
23 into a datum. `trans_into()` is also available, which will translate
24 an expression and write the result directly into memory, sometimes
25 avoiding the need for a temporary stack slot. Finally,
26 `trans_to_lvalue()` is available if you'd like to ensure that the
27 result has cleanup scheduled.
28
29 Internally, each of these functions dispatches to various other
30 expression functions depending on the kind of expression. We divide
31 up expressions into:
32
33 - **Datum expressions:** Those that most naturally yield values.
34 Examples would be `22`, `box x`, or `a + b` (when not overloaded).
35 - **DPS expressions:** Those that most naturally write into a location
36 in memory. Examples would be `foo()` or `Point { x: 3, y: 4 }`.
37 - **Statement expressions:** That that do not generate a meaningful
38 result. Examples would be `while { ... }` or `return 44`.
39
40 ## The Datum module
41
42 A `Datum` encapsulates the result of evaluating a Rust expression. It
43 contains a `ValueRef` indicating the result, a `ty::t` describing
44 the Rust type, but also a *kind*. The kind indicates whether the datum
45 has cleanup scheduled (lvalue) or not (rvalue) and -- in the case of
46 rvalues -- whether or not the value is "by ref" or "by value".
47
48 The datum API is designed to try and help you avoid memory errors like
49 forgetting to arrange cleanup or duplicating a value. The type of the
50 datum incorporates the kind, and thus reflects whether it has cleanup
51 scheduled:
52
53 - `Datum<Lvalue>` -- by ref, cleanup scheduled
54 - `Datum<Rvalue>` -- by value or by ref, no cleanup scheduled
55 - `Datum<Expr>` -- either `Datum<Lvalue>` or `Datum<Rvalue>`
56
57 Rvalue and expr datums are noncopyable, and most of the methods on
58 datums consume the datum itself (with some notable exceptions). This
59 reflects the fact that datums may represent affine values which ought
60 to be consumed exactly once, and if you were to try to (for example)
61 store an affine value multiple times, you would be duplicating it,
62 which would certainly be a bug.
63
64 Some of the datum methods, however, are designed to work only on
65 copyable values such as ints or pointers. Those methods may borrow the
66 datum (`&self`) rather than consume it, but they always include
67 assertions on the type of the value represented to check that this
68 makes sense. An example is `shallow_copy_and_take()`, which duplicates
69 a datum value.
70
71 Translating an expression always yields a `Datum<Expr>` result, but
72 the methods `to_[lr]value_datum()` can be used to coerce a
73 `Datum<Expr>` into a `Datum<Lvalue>` or `Datum<Rvalue>` as
74 needed. Coercing to an lvalue is fairly common, and generally occurs
75 whenever it is necessary to inspect a value and pull out its
76 subcomponents (for example, a match, or indexing expression). Coercing
77 to an rvalue is more unusual; it occurs when moving values from place
78 to place, such as in an assignment expression or parameter passing.
79
80 ### Lvalues in detail
81
82 An lvalue datum is one for which cleanup has been scheduled. Lvalue
83 datums are always located in memory, and thus the `ValueRef` for an
84 LLVM value is always a pointer to the actual Rust value. This means
85 that if the Datum has a Rust type of `int`, then the LLVM type of the
86 `ValueRef` will be `int*` (pointer to int).
87
88 Because lvalues already have cleanups scheduled, the memory must be
89 zeroed to prevent the cleanup from taking place (presuming that the
90 Rust type needs drop in the first place, otherwise it doesn't
91 matter). The Datum code automatically performs this zeroing when the
92 value is stored to a new location, for example.
93
94 Lvalues usually result from evaluating lvalue expressions. For
95 example, evaluating a local variable `x` yields an lvalue, as does a
96 reference to a field like `x.f` or an index `x[i]`.
97
98 Lvalue datums can also arise by *converting* an rvalue into an lvalue.
99 This is done with the `to_lvalue_datum` method defined on
100 `Datum<Expr>`. Basically this method just schedules cleanup if the
101 datum is an rvalue, possibly storing the value into a stack slot first
102 if needed. Converting rvalues into lvalues occurs in constructs like
103 `&foo()` or `match foo() { ref x => ... }`, where the user is
104 implicitly requesting a temporary.
105
106 Somewhat surprisingly, not all lvalue expressions yield lvalue datums
107 when trans'd. Ultimately the reason for this is to micro-optimize
108 the resulting LLVM. For example, consider the following code:
109
110 fn foo() -> Box<int> { ... }
111 let x = *foo();
112
113 The expression `*foo()` is an lvalue, but if you invoke `expr::trans`,
114 it will return an rvalue datum. See `deref_once` in expr.rs for
115 more details.
116
117 ### Rvalues in detail
118
119 Rvalues datums are values with no cleanup scheduled. One must be
120 careful with rvalue datums to ensure that cleanup is properly
121 arranged, usually by converting to an lvalue datum or by invoking the
122 `add_clean` method.
123
124 ### Scratch datums
125
126 Sometimes you need some temporary scratch space. The functions
127 `[lr]value_scratch_datum()` can be used to get temporary stack
128 space. As their name suggests, they yield lvalues and rvalues
129 respectively. That is, the slot from `lvalue_scratch_datum` will have
130 cleanup arranged, and the slot from `rvalue_scratch_datum` does not.
131
132 ## The Cleanup module
133
134 The cleanup module tracks what values need to be cleaned up as scopes
135 are exited, either via failure or just normal control flow. The basic
136 idea is that the function context maintains a stack of cleanup scopes
137 that are pushed/popped as we traverse the AST tree. There is typically
138 at least one cleanup scope per AST node; some AST nodes may introduce
139 additional temporary scopes.
140
141 Cleanup items can be scheduled into any of the scopes on the stack.
142 Typically, when a scope is popped, we will also generate the code for
143 each of its cleanups at that time. This corresponds to a normal exit
144 from a block (for example, an expression completing evaluation
145 successfully without failure). However, it is also possible to pop a
146 block *without* executing its cleanups; this is typically used to
147 guard intermediate values that must be cleaned up on failure, but not
148 if everything goes right. See the section on custom scopes below for
149 more details.
150
151 Cleanup scopes come in three kinds:
152 - **AST scopes:** each AST node in a function body has a corresponding
153 AST scope. We push the AST scope when we start generate code for an AST
154 node and pop it once the AST node has been fully generated.
155 - **Loop scopes:** loops have an additional cleanup scope. Cleanups are
156 never scheduled into loop scopes; instead, they are used to record the
157 basic blocks that we should branch to when a `continue` or `break` statement
158 is encountered.
159 - **Custom scopes:** custom scopes are typically used to ensure cleanup
160 of intermediate values.
161
162 ### When to schedule cleanup
163
164 Although the cleanup system is intended to *feel* fairly declarative,
165 it's still important to time calls to `schedule_clean()` correctly.
166 Basically, you should not schedule cleanup for memory until it has
167 been initialized, because if an unwind should occur before the memory
168 is fully initialized, then the cleanup will run and try to free or
169 drop uninitialized memory. If the initialization itself produces
170 byproducts that need to be freed, then you should use temporary custom
171 scopes to ensure that those byproducts will get freed on unwind. For
172 example, an expression like `box foo()` will first allocate a box in the
173 heap and then call `foo()` -- if `foo()` should fail, this box needs
174 to be *shallowly* freed.
175
176 ### Long-distance jumps
177
178 In addition to popping a scope, which corresponds to normal control
179 flow exiting the scope, we may also *jump out* of a scope into some
180 earlier scope on the stack. This can occur in response to a `return`,
181 `break`, or `continue` statement, but also in response to failure. In
182 any of these cases, we will generate a series of cleanup blocks for
183 each of the scopes that is exited. So, if the stack contains scopes A
184 ... Z, and we break out of a loop whose corresponding cleanup scope is
185 X, we would generate cleanup blocks for the cleanups in X, Y, and Z.
186 After cleanup is done we would branch to the exit point for scope X.
187 But if failure should occur, we would generate cleanups for all the
188 scopes from A to Z and then resume the unwind process afterwards.
189
190 To avoid generating tons of code, we cache the cleanup blocks that we
191 create for breaks, returns, unwinds, and other jumps. Whenever a new
192 cleanup is scheduled, though, we must clear these cached blocks. A
193 possible improvement would be to keep the cached blocks but simply
194 generate a new block which performs the additional cleanup and then
195 branches to the existing cached blocks.
196
197 ### AST and loop cleanup scopes
198
199 AST cleanup scopes are pushed when we begin and end processing an AST
200 node. They are used to house cleanups related to rvalue temporary that
201 get referenced (e.g., due to an expression like `&Foo()`). Whenever an
202 AST scope is popped, we always trans all the cleanups, adding the cleanup
203 code after the postdominator of the AST node.
204
205 AST nodes that represent breakable loops also push a loop scope; the
206 loop scope never has any actual cleanups, it's just used to point to
207 the basic blocks where control should flow after a "continue" or
208 "break" statement. Popping a loop scope never generates code.
209
210 ### Custom cleanup scopes
211
212 Custom cleanup scopes are used for a variety of purposes. The most
213 common though is to handle temporary byproducts, where cleanup only
214 needs to occur on failure. The general strategy is to push a custom
215 cleanup scope, schedule *shallow* cleanups into the custom scope, and
216 then pop the custom scope (without transing the cleanups) when
217 execution succeeds normally. This way the cleanups are only trans'd on
218 unwind, and only up until the point where execution succeeded, at
219 which time the complete value should be stored in an lvalue or some
220 other place where normal cleanup applies.
221
222 To spell it out, here is an example. Imagine an expression `box expr`.
223 We would basically:
224
225 1. Push a custom cleanup scope C.
226 2. Allocate the box.
227 3. Schedule a shallow free in the scope C.
228 4. Trans `expr` into the box.
229 5. Pop the scope C.
230 6. Return the box as an rvalue.
231
232 This way, if a failure occurs while transing `expr`, the custom
233 cleanup scope C is pushed and hence the box will be freed. The trans
234 code for `expr` itself is responsible for freeing any other byproducts
235 that may be in play.
236
237 */