High Performance Computing: 5th International Symposium, ISHPC 2003, Tokyo-Odaiba, Japan, October 20-22, 2003, ProceedingsAlex Veidenbaum The 5th International Symposium on High Performance Computing (ISHPC–V) was held in Odaiba, Tokyo, Japan, October 20–22, 2003. The symposium was thoughtfully planned, organized, and supported by the ISHPC Organizing C- mittee and its collaborating organizations. The ISHPC-V program included two keynote speeches, several invited talks, two panel discussions, and technical sessions covering theoretical and applied research topics in high–performance computing and representing both academia and industry. One of the regular sessions highlighted the research results of the ITBL project (IT–based research laboratory, http://www.itbl.riken.go.jp/). ITBL is a Japanese national project started in 2001 with the objective of re- izing a virtual joint research environment using information technology. ITBL aims to connect 100 supercomputers located in main Japanese scienti?c research laboratories via high–speed networks. A total of 58 technical contributions from 11 countries were submitted to ISHPC-V. Each paper received at least three peer reviews. After a thorough evaluation process, the program committee selected 14 regular (12-page) papers for presentation at the symposium. In addition, several other papers with fav- able reviews were recommended for a poster session presentation. They are also included in the proceedings as short (8-page) papers. Theprogramcommitteegaveadistinguishedpaperawardandabeststudent paper award to two of the regular papers. The distinguished paper award was given for “Code and Data Transformations for Improving Shared Cache P- formance on SMT Processors” by Dimitrios S. Nikolopoulos. The best student paper award was given for “Improving Memory Latency Aware Fetch Policies for SMT Processors” by Francisco J. Cazorla. |
Contents
High Performance Computing Trends | 1 |
Kiloinstruction Processors | 10 |
Overview of an Adaptive Multithreaded Architecture | 26 |
Numerical Simulator III | 39 |
Award Papers | 54 |
Best Student Paper Award | 70 |
Tolerating Branch Predictor Latency on SMT | 86 |
A Simple LowEnergy Instruction Wakeup Mechanism | 99 |
A Visual Resource Integration Environment | 258 |
Performance of Network Intrusion Detection Cluster System | 278 |
Ryutaro Himeno | 298 |
Virtual Experiment Platform for Materials Design | 320 |
Short Papers | 342 |
Broadcast in a MANET Based on the Beneficial Area | 360 |
An Improved Algorithm of Multicast Topology Inference | 376 |
Distributed Location of Shared Resources and Its Application | 393 |
PowerPerformance TradeOffs in Wide and Clustered VLIW Cores | 113 |
Field Array Compression in Data Caches | 127 |
Software | 146 |
Evaluating Heuristic Scheduling Algorithms | 160 |
Pursuing Laziness for Efficient Implementation | 174 |
SPEC HPG Benchmarks for Large Systems | 189 |
DistributionInsensitive Parallel External Sorting on PC Clusters | 202 |
Is Cooks Theorem Correct for DNABased Computing? | 222 |
ITBL | 245 |
Design and Implementation of Parallel Modified PrefixSpan Method | 412 |
Parallel Matrix Multiplication and LU Factorization | 431 |
Performance Study of a Whole Genome Comparison Tool | 450 |
Large Scale Structures of Turbulent Shear Flow via DNS | 468 |
Performance Evaluation of Low Level Multithreaded BLAS Kernels | 500 |
On the Implementation of OpenMP 2 0 Extensions | 523 |
OpenMP for Adaptive MasterSlave Message Passing Applications | 540 |
Author Index | 565 |
Other editions - View all
Common terms and phrases
achieve addition algorithm allocation analysis application architecture average benchmark block branch cache calculation called cluster communication compared compiler complexity compression copy cost cycles dependent described developed different disk distributed dynamic effect energy environment evaluated example execution experiment fetch Figure first function graph grid hardware implementation improved increase instructions interface International ITBL Japan latency limit load loop Manager matrix measurement mechanism memory method misses multiple node object obtained operations optimization packet parallel parameters performance physical ports possible prediction predictor present problem processors proposed provides queue reduce References requires running scheduling Science sequences server shared shown shows simulation sort space strand structure Table task technique thread tile tool trace tree visualization