High Performance Computing: 5th International Symposium, ISHPC 2003, Tokyo-Odaiba, Japan, October 20-22, 2003, ProceedingsAlex Veidenbaum, Kazuki Joe, Hideharu Amano, Hideo Aiso The 5th International Symposium on High Performance Computing (ISHPC–V) was held in Odaiba, Tokyo, Japan, October 20–22, 2003. The symposium was thoughtfully planned, organized, and supported by the ISHPC Organizing C- mittee and its collaborating organizations. The ISHPC-V program included two keynote speeches, several invited talks, two panel discussions, and technical sessions covering theoretical and applied research topics in high–performance computing and representing both academia and industry. One of the regular sessions highlighted the research results of the ITBL project (IT–based research laboratory, http://www.itbl.riken.go.jp/). ITBL is a Japanese national project started in 2001 with the objective of re- izing a virtual joint research environment using information technology. ITBL aims to connect 100 supercomputers located in main Japanese scienti?c research laboratories via high–speed networks. A total of 58 technical contributions from 11 countries were submitted to ISHPC-V. Each paper received at least three peer reviews. After a thorough evaluation process, the program committee selected 14 regular (12-page) papers for presentation at the symposium. In addition, several other papers with fav- able reviews were recommended for a poster session presentation. They are also included in the proceedings as short (8-page) papers. Theprogramcommitteegaveadistinguishedpaperawardandabeststudent paper award to two of the regular papers. The distinguished paper award was given for “Code and Data Transformations for Improving Shared Cache P- formance on SMT Processors” by Dimitrios S. Nikolopoulos. The best student paper award was given for “Improving Memory Latency Aware Fetch Policies for SMT Processors” by Francisco J. Cazorla. |
Contents
1 | |
10 | |
Overview of an Adaptive Multithreaded Architecture | 26 |
Numerical Simulator III | 39 |
Award Papers | 54 |
Best Student Paper Award | 70 |
Tolerating Branch Predictor Latency on SMT | 86 |
A Simple LowEnergy Instruction Wakeup Mechanism | 99 |
A Visual Resource Integration Environment | 258 |
Performance of Network Intrusion Detection Cluster System | 278 |
Evaluation of HighSpeed VPN Using CFD Benchmark | 298 |
Virtual Experiment Platform for Materials Design | 320 |
Short Papers | 342 |
Broadcast in a MANET Based on the Beneficial Area | 360 |
An Improved Algorithm of Multicast Topology Inference | 376 |
Distributed Location of Shared Resources and Its Application | 393 |
PowerPerformance TradeOffs in Wide and Clustered VLIW Cores | 113 |
Field Array Compression in Data Caches | 127 |
A Generalized Framework for Autotuning Software | 146 |
Evaluating Heuristic Scheduling Algorithms | 160 |
Pursuing Laziness for Efficient Implementation | 174 |
SPEC HPG Benchmarks for Large Systems | 189 |
DistributionInsensitive Parallel External Sorting on PC Clusters | 202 |
Is Cooks Theorem Correct for DNABased Computing? | 222 |
ITBL | 245 |
Design and Implementation of Parallel Modified PrefixSpan Method | 412 |
Parallel Matrix Multiplication and LU Factorization | 431 |
Performance Study of a Whole Genome Comparison Tool | 450 |
Large Scale Structures of Turbulent Shear Flow via DNS | 468 |
Performance Evaluation of Low Level Multithreaded BLAS Kernels | 500 |
On the Implementation of OpenMP 2 0 Extensions | 523 |
OpenMP for Adaptive MasterSlave Message Passing Applications | 540 |
565 | |
Other editions - View all
Common terms and phrases
algorithm allocation analysis application array auto-tuning bandwidth benchmark beneficial area Berlin Heidelberg 2003 block branch predictor buffer bzip2 cache misses calculation Cilk circulant graphs clathrate compiler Computer Architecture configuration cycles data layout disk distributed dynamic Earth Simulator environment Ethernet evaluated execution fetch Figure FLUSH FORTRAN frequent patterns function graph grid gzip hardware hyper-threading IEEE ILP MIX MEM implementation improved integer interface ISHPC ITBL Japan latency load loop matrix multiplication mechanism memory module multithreaded node NP-complete OpenMP operations optimization packet parallel computers parameters PC VPN pointer problem Proc processors proposed scalability scheduling sequences server shared shows simulation Simultaneous Multithreaded speedup strand structure supercomputer superscalar synchronization task technique throughput tile topology UPACS vector Veidenbaum visualization VPN router workloads