| Preface |
|
vi | |
| 1 Introduction to High-Performance Memory Systems |
|
|
Haldun Hadimioglu, David Kaeli, Jeffrey Kuskin, Ashwini Nanda, and Josep Torrellas |
|
|
1 | (10) |
|
1.1 Coherence, Synchronization, and Allocation |
|
|
1 | (1) |
|
1.2 Power-Aware, Reliable, and Reconfigurable Memory |
|
|
2 | (1) |
|
1.3 Software-Based Memory Tuning |
|
|
3 | (2) |
|
1.4 Architecture-Based Memory Tuning |
|
|
5 | (2) |
|
1.5 Workload Considerations |
|
|
7 | (4) |
| Part I Coherence, Synchronization, and Allocation |
|
|
2 Speculative Locks: Concurrent Execution of Critical Sections in Shared-Memory Multiprocessors |
|
|
|
José F. Martinez and Josep Torrellas |
|
|
11 | (14) |
|
|
|
11 | (1) |
|
|
|
12 | (9) |
|
|
|
21 | (1) |
|
|
|
22 | (1) |
|
|
|
23 | (2) |
|
3 Dynamic Verification of Cache Coherence Protocols |
|
|
|
Jason F. Cantin, Mikko H. Lipasti and James E. Smith |
|
|
25 | (18) |
|
|
|
25 | (4) |
|
3.2 Dynamic Verification of Cache Coherence |
|
|
29 | (4) |
|
3.3 SMP Coherence Checker Correctness, Coverage, and Specificity |
|
|
33 | (1) |
|
3.4 Coherence Checker Overhead |
|
|
34 | (4) |
|
|
|
38 | (1) |
|
|
|
39 | (1) |
|
|
|
39 | (1) |
|
|
|
40 | (3) |
|
4 Timestamp-Based Selective Cache Allocation |
|
|
|
Martin Karlsson and Erik Hagersten |
|
|
43 | (20) |
|
|
|
43 | (1) |
|
|
|
44 | (2) |
|
4.3 Evaluation Methodology |
|
|
46 | (1) |
|
|
|
46 | (2) |
|
|
|
48 | (5) |
|
|
|
53 | (1) |
|
|
|
54 | (2) |
|
|
|
56 | (2) |
|
|
|
58 | (5) |
| Part II Power-Aware, Reliable, and Reconfigurable Memory |
|
|
5 Power-Efficient Cache Coherence |
|
|
|
Craig Saldanha and Mikko H. Lipasti |
|
|
63 | (16) |
|
|
|
63 | (1) |
|
5.2 Snoopy Coherence Protocols |
|
|
64 | (2) |
|
|
|
66 | (6) |
|
|
|
72 | (3) |
|
|
|
75 | (2) |
|
|
|
77 | (1) |
|
|
|
78 | (1) |
|
6 Improving Power Efficiency with an Asymmetric Set-Associative Cache |
|
|
|
Zhigang Hu, Stefanos Kaxiras and Margaret Martonosi |
|
|
79 | (18) |
|
|
|
79 | (2) |
|
|
|
81 | (3) |
|
6.3 Methodology and Modeling |
|
|
84 | (1) |
|
6.4 Asymmetric Set-Associative Cache |
|
|
85 | (5) |
|
|
|
90 | (2) |
|
6.6 Discussion and Future Work |
|
|
92 | (2) |
|
|
|
94 | (1) |
|
|
|
95 | (2) |
|
7 Memory Issues in Hardware-Supported Software Safety |
|
|
|
Diana Keen, Frederic T. Chong, Premkumar Devanbu, Matthew Farrees, Jeremy Brown, Jennifer Hollfelder and Xiu Ting Zhuang |
|
|
97 | (16) |
|
|
|
97 | (1) |
|
|
|
98 | (2) |
|
7.3 Motivating Applications |
|
|
100 | (4) |
|
7.4 Architectural Mechanisms |
|
|
104 | (4) |
|
|
|
108 | (2) |
|
|
|
110 | (1) |
|
|
|
111 | (2) |
|
8 Reconfigurable Memory Module in the RAMP System for Stream Processing |
|
|
|
Vason P. Srini, John Thendean and J.M. Rabaey |
|
|
113 | (22) |
|
|
|
113 | (2) |
|
|
|
115 | (2) |
|
|
|
117 | (2) |
|
8.4 Memory Module Architecture |
|
|
119 | (1) |
|
|
|
119 | (4) |
|
|
|
123 | (3) |
|
|
|
126 | (1) |
|
|
|
127 | (1) |
|
|
|
127 | (2) |
|
|
|
129 | (1) |
|
|
|
130 | (5) |
| Part III Software-Based Memory Tuning |
|
|
9 Performance of Memory Expansion Technology (MXT) |
|
|
|
Dan E. Poff, Mohammad Banikazemi, Robert Saccone, Hubertus Franke, Bulent Abali and T. Basil Smith |
|
|
135 | (18) |
|
|
|
135 | (2) |
|
9.2 Overview of MXT Hardware |
|
|
137 | (2) |
|
9.3 The MXT Memory Management Software |
|
|
139 | (1) |
|
9.4 Performance Evaluation |
|
|
140 | (8) |
|
|
|
148 | (3) |
|
|
|
151 | (1) |
|
|
|
151 | (2) |
|
10 Profile-Tuned Heap Access |
|
|
|
Efe Yardzmci and David Kaeli |
|
|
153 | (12) |
|
|
|
153 | (3) |
|
|
|
156 | (1) |
|
|
|
157 | (4) |
|
|
|
161 | (1) |
|
|
|
161 | (1) |
|
|
|
162 | (3) |
|
11 Array Merging: A Technique for Improving Cache and TLB Behavior |
|
|
|
Daniela Genius, Siddhartha Chatterjee, and Alvin R. Lebeck |
|
|
165 | (16) |
|
|
|
165 | (1) |
|
|
|
166 | (2) |
|
|
|
168 | (2) |
|
11.4 Cache-conscious Merging |
|
|
170 | (3) |
|
|
|
173 | (2) |
|
11.6 Experimental Results |
|
|
175 | (3) |
|
|
|
178 | (1) |
|
|
|
178 | (3) |
|
12 Software Logging under Speculative Parallelization |
|
|
|
Maria Jesús Garzarán, Milos Prvulovic, José Maria Llaberia, Victor Viñals, Lawrence Rauchwerger, and Josep Torrellas |
|
|
181 | (18) |
|
|
|
181 | (2) |
|
12.2 Speculative Parallelization and Versioning |
|
|
183 | (3) |
|
12.3 Speculation Protocol Used |
|
|
186 | (1) |
|
12.4 Efficient Software Logging |
|
|
187 | (2) |
|
12.5 Evaluation Methodology |
|
|
189 | (2) |
|
|
|
191 | (1) |
|
|
|
192 | (1) |
|
|
|
193 | (1) |
|
|
|
193 | (6) |
| Part IV Architecture-Based Memory Tuning |
|
|
13 An Analysis of Scalar Memory Accesses in Embedded and Multimedia Systems |
|
|
|
Osman S. Unsal, Zhenlin Wang, Israel Zoren, C. Mani Krishna, and Csaba Andras Moritz |
|
|
199 | (14) |
|
13.1 Introduction and Motivation |
|
|
199 | (1) |
|
|
|
200 | (1) |
|
|
|
201 | (1) |
|
|
|
202 | (7) |
|
13.5 Conclusion and Future Work |
|
|
209 | (1) |
|
|
|
210 | (3) |
|
14 Bandwidth-Based Prefetching for Constant-Stride Arrays |
|
|
|
Steven O. Hobbs, John S. Pieper, and Stephen C. Root |
|
|
213 | (14) |
|
|
|
213 | (1) |
|
|
|
214 | (1) |
|
|
|
215 | (2) |
|
|
|
217 | (1) |
|
|
|
218 | (4) |
|
|
|
222 | (2) |
|
|
|
224 | (1) |
|
|
|
225 | (2) |
|
15 Performance Potential of Effective Address Prediction of Load Instructions |
|
|
|
Pritap S. Ahúja, Joel Emer, Artur Klauser and Shubhendu S. Mukherjee |
|
|
227 | (22) |
|
|
|
227 | (3) |
|
15.2 Effective Address Predictors |
|
|
230 | (4) |
|
15.3 Evaluation Methodology |
|
|
234 | (4) |
|
|
|
238 | (4) |
|
|
|
242 | (1) |
|
15.6 Conclusion and Future Work |
|
|
243 | (2) |
|
|
|
245 | (4) |
| Part V Workload Considerations |
|
|
16 Evaluating Novel Memory System Alternatives for Speculative Multithreaded Computer Systems |
|
|
|
A.J. Klein Osowski and David J. Lilja (University of Minnesota |
|
|
249 | (14) |
|
|
|
249 | (1) |
|
16.2 Background and Motivation |
|
|
250 | (1) |
|
16.3 The Superthreaded Architecture Model |
|
|
251 | (1) |
|
|
|
252 | (2) |
|
|
|
254 | (3) |
|
|
|
257 | (3) |
|
|
|
260 | (3) |
|
17 Evaluation of Large L3 Caches Using TPC-H Trace Samples |
|
|
|
Jaeheon Jeong, Ramendra Sahoo, Krishnan Sugavanam, Ashwini Nanda and Michel Dubois |
|
|
263 | (16) |
|
|
|
263 | (1) |
|
|
|
264 | (3) |
|
17.3 Evaluation Methodology |
|
|
267 | (1) |
|
|
|
268 | (6) |
|
|
|
274 | (1) |
|
|
|
275 | (1) |
|
|
|
276 | (3) |
|
18 Exploiting Intelligent Memory for Database Workloads |
|
|
|
Pedro Trancoso and Josep Torrellas |
|
|
279 | (14) |
|
|
|
279 | (1) |
|
|
|
280 | (1) |
|
|
|
281 | (1) |
|
|
|
282 | (3) |
|
|
|
285 | (2) |
|
18.6 Experimental Results |
|
|
287 | (3) |
|
18.7 Conclusion and Future Work |
|
|
290 | (1) |
|
|
|
290 | (3) |
| Author Index |
|
293 | (2) |
| Subject Index |
|
295 | |