{"product_id":"algorithms-and-parallel-computing-hardback-9780470902103","title":"Algorithms and Parallel Computing (Hardback) 9780470902103","description":"\u003cfont face=\"Georgia\"\u003e\r\n\u003cp\u003e\u003cfont size=\"6\"\u003eAlgorithms and Parallel Computing\u003c\/font\u003e\u003cbr\u003e\r\n\r\n\r\n\r\n\r\n\r\n\u003c\/p\u003e\n\u003cp\u003e\u003cfont size=\"4\"\u003eFayez Gebali (Author)\u003c\/font\u003e\u003c\/p\u003e\r\n\r\n\u003cp\u003e\u003cfont size=\"3\"\u003e9780470902103, Wiley\u003c\/font\u003e\u003c\/p\u003e\r\n\r\n\u003cp\u003e\u003cfont size=\"3\"\u003eHardback, published 18 March 2011\u003c\/font\u003e\u003c\/p\u003e\r\n\r\n\u003cp\u003e\u003cfont size=\"3\"\u003e368 pages\u003cbr\u003e24.1 x 16.3 x 2.4 cm, 0.644 kg\u003c\/font\u003e\u003c\/p\u003e\r\n\r\n\r\n\r\n\r\n\r\n\u003cp align=\"justify\"\u003e\u003cstrong\u003e\u003cfont size=\"3\"\u003eThere is a software gap between the hardware potential and the performance that can be attained using today's software parallel program development tools. The tools need manual intervention by the programmer to parallelize the code. Programming a parallel computer requires closely studying the target algorithm or application, more so than in the traditional sequential programming we have all learned. The programmer must be aware of the communication and data dependencies of the algorithm or application. This book provides the techniques to explore the possible ways to program a parallel computer for a given application.\u003c\/font\u003e\u003c\/strong\u003e\u003c\/p\u003e\r\n\r\n\u003cp\u003e\u003cfont size=\"3\"\u003e\u003cp\u003ePreface xiii\u003c\/p\u003e \u003cp\u003eList of Acronyms xix\u003c\/p\u003e \u003cp\u003e\u003cb\u003e1 Introduction 1\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e1.1 Introduction 1\u003c\/p\u003e \u003cp\u003e1.2 Toward Automating Parallel Programming 2\u003c\/p\u003e \u003cp\u003e1.3 Algorithms 4\u003c\/p\u003e \u003cp\u003e1.4 Parallel Computing Design Considerations 12\u003c\/p\u003e \u003cp\u003e1.5 Parallel Algorithms and Parallel Architectures 13\u003c\/p\u003e \u003cp\u003e1.6 Relating Parallel Algorithm and Parallel Architecture 14\u003c\/p\u003e \u003cp\u003e1.7 Implementation of Algorithms: A Two-Sided Problem 14\u003c\/p\u003e \u003cp\u003e1.8 Measuring Benefi ts of Parallel Computing 15\u003c\/p\u003e \u003cp\u003e1.9 Amdahl’s Law for Multiprocessor Systems 19\u003c\/p\u003e \u003cp\u003e1.10 Gustafson–Barsis’s Law 21\u003c\/p\u003e \u003cp\u003e1.11 Applications of Parallel Computing 22\u003c\/p\u003e \u003cp\u003e\u003cb\u003e2 Enhancing Uniprocessor Performance 29\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e2.1 Introduction 29\u003c\/p\u003e \u003cp\u003e2.2 Increasing Processor Clock Frequency 30\u003c\/p\u003e \u003cp\u003e2.3 Parallelizing ALU Structure 30\u003c\/p\u003e \u003cp\u003e2.4 Using Memory Hierarchy 33\u003c\/p\u003e \u003cp\u003e2.5 Pipelining 39\u003c\/p\u003e \u003cp\u003e2.6 Very Long Instruction Word (VLIW) Processors 44\u003c\/p\u003e \u003cp\u003e2.7 Instruction-Level Parallelism (ILP) and Superscalar Processors 45\u003c\/p\u003e \u003cp\u003e2.8 Multithreaded Processor 49\u003c\/p\u003e \u003cp\u003e\u003cb\u003e3 Parallel Computers 53\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e3.1 Introduction 53\u003c\/p\u003e \u003cp\u003e3.2 Parallel Computing 53\u003c\/p\u003e \u003cp\u003e3.3 Shared-Memory Multiprocessors (Uniform Memory Access [UMA]) 54\u003c\/p\u003e \u003cp\u003e3.4 Distributed-Memory Multiprocessor (Nonuniform Memory Access [NUMA]) 56\u003c\/p\u003e \u003cp\u003e3.5 SIMD Processors 57\u003c\/p\u003e \u003cp\u003e3.6 Systolic Processors 57\u003c\/p\u003e \u003cp\u003e3.7 Cluster Computing 60\u003c\/p\u003e \u003cp\u003e3.8 Grid (Cloud) Computing 60\u003c\/p\u003e \u003cp\u003e3.9 Multicore Systems 61\u003c\/p\u003e \u003cp\u003e3.10 SM 62\u003c\/p\u003e \u003cp\u003e3.11 Communication Between Parallel Processors 64\u003c\/p\u003e \u003cp\u003e3.12 Summary of Parallel Architectures 67\u003c\/p\u003e \u003cp\u003e\u003cb\u003e4 Shared-Memory Multiprocessors 69\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e4.1 Introduction 69\u003c\/p\u003e \u003cp\u003e4.2 Cache Coherence and Memory Consistency 70\u003c\/p\u003e \u003cp\u003e4.3 Synchronization and Mutual Exclusion 76\u003c\/p\u003e \u003cp\u003e\u003cb\u003e5 Interconnection Networks 83\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e5.1 Introduction 83\u003c\/p\u003e \u003cp\u003e5.2 Classification of Interconnection Networks by Logical Topologies 84\u003c\/p\u003e \u003cp\u003e5.3 Interconnection Network Switch Architecture 91\u003c\/p\u003e \u003cp\u003e\u003cb\u003e6 Concurrency Platforms 105\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e6.1 Introduction 105\u003c\/p\u003e \u003cp\u003e6.2 Concurrency Platforms 105\u003c\/p\u003e \u003cp\u003e6.3 Cilk++ 106\u003c\/p\u003e \u003cp\u003e6.4 OpenMP 112\u003c\/p\u003e \u003cp\u003e6.5 Compute Unifi ed Device Architecture (CUDA) 122\u003c\/p\u003e \u003cp\u003e\u003cb\u003e7 Ad Hoc Techniques for Parallel Algorithms 131\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e7.1 Introduction 131\u003c\/p\u003e \u003cp\u003e7.2 Defining Algorithm Variables 133\u003c\/p\u003e \u003cp\u003e7.3 Independent Loop Scheduling 133\u003c\/p\u003e \u003cp\u003e7.4 Dependent Loops 134\u003c\/p\u003e \u003cp\u003e7.5 Loop Spreading for Simple Dependent Loops 135\u003c\/p\u003e \u003cp\u003e7.6 Loop Unrolling 135\u003c\/p\u003e \u003cp\u003e7.7 Problem Partitioning 136\u003c\/p\u003e \u003cp\u003e7.8 Divide-and-Conquer (Recursive Partitioning) Strategies 137\u003c\/p\u003e \u003cp\u003e7.9 Pipelining 139\u003c\/p\u003e \u003cp\u003e\u003cb\u003e8 Nonserial–Parallel Algorithms 143\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e8.1 Introduction 143\u003c\/p\u003e \u003cp\u003e8.2 Comparing DAG and DCG Algorithms 143\u003c\/p\u003e \u003cp\u003e8.3 Parallelizing NSPA Algorithms Represented by a DAG 145\u003c\/p\u003e \u003cp\u003e8.4 Formal Technique for Analyzing NSPAs 147\u003c\/p\u003e \u003cp\u003e8.5 Detecting Cycles in the Algorithm 150\u003c\/p\u003e \u003cp\u003e8.6 Extracting Serial and Parallel Algorithm Performance Parameters 151\u003c\/p\u003e \u003cp\u003e8.7 Useful Theorems 153\u003c\/p\u003e \u003cp\u003e8.8 Performance of Serial and Parallel Algorithms on Parallel Computers 156\u003c\/p\u003e \u003cp\u003e\u003cb\u003e9 z-Transform Analysis 159\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e9.1 Introduction 159\u003c\/p\u003e \u003cp\u003e9.2 Definition of z-Transform 159\u003c\/p\u003e \u003cp\u003e9.3 The 1-D FIR Digital Filter Algorithm 160\u003c\/p\u003e \u003cp\u003e9.4 Software and Hardware Implementations of the z-Transform 161\u003c\/p\u003e \u003cp\u003e9.5 Design 1: Using Horner’s Rule for Broadcast Input and Pipelined Output 162\u003c\/p\u003e \u003cp\u003e9.6 Design 2: Pipelined Input and Broadcast Output 163\u003c\/p\u003e \u003cp\u003e9.7 Design 3: Pipelined Input and Output 164\u003c\/p\u003e \u003cp\u003e\u003cb\u003e10 Dependence Graph Analysis 167\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e10.1 Introduction 167\u003c\/p\u003e \u003cp\u003e10.2 The 1-D FIR Digital Filter Algorithm 167\u003c\/p\u003e \u003cp\u003e10.3 The Dependence Graph of an Algorithm 168\u003c\/p\u003e \u003cp\u003e10.4 Deriving the Dependence Graph for an Algorithm 169\u003c\/p\u003e \u003cp\u003e10.5 The Scheduling Function for the 1-D FIR Filter 171\u003c\/p\u003e \u003cp\u003e10.6 Node Projection Operation 177\u003c\/p\u003e \u003cp\u003e10.7 Nonlinear Projection Operation 179\u003c\/p\u003e \u003cp\u003e10.8 Software and Hardware Implementations of the DAG Technique 180\u003c\/p\u003e \u003cp\u003e\u003cb\u003e11 Computational Geometry Analysis 185\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e11.1 Introduction 185\u003c\/p\u003e \u003cp\u003e11.2 Matrix Multiplication Algorithm 185\u003c\/p\u003e \u003cp\u003e11.3 The 3-D Dependence Graph and Computation Domain D 186\u003c\/p\u003e \u003cp\u003e11.4 The Facets and Vertices of D 188\u003c\/p\u003e \u003cp\u003e11.5 The Dependence Matrices of the Algorithm Variables 188\u003c\/p\u003e \u003cp\u003e11.6 Nullspace of Dependence Matrix: The Broadcast Subdomain B 189\u003c\/p\u003e \u003cp\u003e11.7 Design Space Exploration: Choice of Broadcasting versus Pipelining Variables 192\u003c\/p\u003e \u003cp\u003e11.8 Data Scheduling 195\u003c\/p\u003e \u003cp\u003e11.9 Projection Operation Using the Linear Projection Operator 200\u003c\/p\u003e \u003cp\u003e11.10 Effect of Projection Operation on Data 205\u003c\/p\u003e \u003cp\u003e11.11 The Resulting Multithreaded\/Multiprocessor Architecture 206\u003c\/p\u003e \u003cp\u003e11.12 Summary of Work Done in this Chapter 207\u003c\/p\u003e \u003cp\u003e\u003cb\u003e12 Case Study: One-Dimensional IIR Digital Filters 209\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e12.1 Introduction 209\u003c\/p\u003e \u003cp\u003e12.2 The 1-D IIR Digital Filter Algorithm 209\u003c\/p\u003e \u003cp\u003e12.3 The IIR Filter Dependence Graph 209\u003c\/p\u003e \u003cp\u003e12.4 z-Domain Analysis of 1-D IIR Digital Filter Algorithm 216\u003c\/p\u003e \u003cp\u003e\u003cb\u003e13 Case Study: Two- and Three-Dimensional Digital Filters 219\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e13.1 Introduction 219\u003c\/p\u003e \u003cp\u003e13.2 Line and Frame Wraparound Problems 219\u003c\/p\u003e \u003cp\u003e13.3 2-D Recursive Filters 221\u003c\/p\u003e \u003cp\u003e13.4 3-D Digital Filters 223\u003c\/p\u003e \u003cp\u003e\u003cb\u003e14 Case Study: Multirate Decimators and Interpolators 227\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e14.1 Introduction 227\u003c\/p\u003e \u003cp\u003e14.2 Decimator Structures 227\u003c\/p\u003e \u003cp\u003e14.3 Decimator Dependence Graph 228\u003c\/p\u003e \u003cp\u003e14.4 Decimator Scheduling 230\u003c\/p\u003e \u003cp\u003e14.5 Decimator DAG for s1 = [1 0] 231\u003c\/p\u003e \u003cp\u003e14.6 Decimator DAG for s2 = [1 −1] 233\u003c\/p\u003e \u003cp\u003e14.7 Decimator DAG for s3 = [1 1] 235\u003c\/p\u003e \u003cp\u003e14.8 Polyphase Decimator Implementations 235\u003c\/p\u003e \u003cp\u003e14.9 Interpolator Structures 236\u003c\/p\u003e \u003cp\u003e14.10 Interpolator Dependence Graph 237\u003c\/p\u003e \u003cp\u003e14.11 Interpolator Scheduling 238\u003c\/p\u003e \u003cp\u003e14.12 Interpolator DAG for s1 = [1 0] 239\u003c\/p\u003e \u003cp\u003e14.13 Interpolator DAG for s2 = [1 −1] 241\u003c\/p\u003e \u003cp\u003e14.14 Interpolator DAG for s3 = [1 1] 243\u003c\/p\u003e \u003cp\u003e14.15 Polyphase Interpolator Implementations 243\u003c\/p\u003e \u003cp\u003e\u003cb\u003e15 Case Study: Pattern Matching 245\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e15.1 Introduction 245\u003c\/p\u003e \u003cp\u003e15.2 Expressing the Algorithm as a Regular Iterative Algorithm (RIA) 245\u003c\/p\u003e \u003cp\u003e15.3 Obtaining the Algorithm Dependence Graph 246\u003c\/p\u003e \u003cp\u003e15.4 Data Scheduling 247\u003c\/p\u003e \u003cp\u003e15.5 DAG Node Projection 248\u003c\/p\u003e \u003cp\u003e15.6 DESIGN 1: Design Space Exploration When s ­n[1 1]t 249\u003c\/p\u003e \u003cp\u003e15.7 DESIGN 2: Design Space Exploration When s ­n[1 −1]t 252\u003c\/p\u003e \u003cp\u003e15.8 DESIGN 3: Design Space Exploration When s = [1 0]t 253\u003c\/p\u003e \u003cp\u003e\u003cb\u003e16 Case Study: Motion Estimation for Video Compression 255\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e16.1 Introduction 255\u003c\/p\u003e \u003cp\u003e16.2 FBMAs 256\u003c\/p\u003e \u003cp\u003e16.3 Data Buffering Requirements 257\u003c\/p\u003e \u003cp\u003e16.4 Formulation of the FBMA 258\u003c\/p\u003e \u003cp\u003e16.5 Hierarchical Formulation of Motion Estimation 259\u003c\/p\u003e \u003cp\u003e16.6 Hardware Design of the Hierarchy Blocks 261\u003c\/p\u003e \u003cp\u003e\u003cb\u003e17 Case Study: Multiplication over GF(2m) 267\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e17.1 Introduction 267\u003c\/p\u003e \u003cp\u003e17.2 The Multiplication Algorithm in GF(2m) 268\u003c\/p\u003e \u003cp\u003e17.3 Expressing Field Multiplication as an RIA 270\u003c\/p\u003e \u003cp\u003e17.4 Field Multiplication Dependence Graph 270\u003c\/p\u003e \u003cp\u003e17.5 Data Scheduling 271\u003c\/p\u003e \u003cp\u003e17.6 DAG Node Projection 273\u003c\/p\u003e \u003cp\u003e17.7 Design 1: Using d1 = [1 0]t 275\u003c\/p\u003e \u003cp\u003e17.8 Design 2: Using d2 = [1 1]t 275\u003c\/p\u003e \u003cp\u003e17.9 Design 3: Using d3 = [1 −1]t 277\u003c\/p\u003e \u003cp\u003e17.10 Applications of Finite Field Multipliers 277\u003c\/p\u003e \u003cp\u003e\u003cb\u003e18 Case Study: Polynomial Division over GF(2) 279\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e18.1 Introduction 279\u003c\/p\u003e \u003cp\u003e18.2 The Polynomial Division Algorithm 279\u003c\/p\u003e \u003cp\u003e18.3 The LFSR Dependence Graph 281\u003c\/p\u003e \u003cp\u003e18.4 Data Scheduling 282\u003c\/p\u003e \u003cp\u003e18.5 DAG Node Projection 283\u003c\/p\u003e \u003cp\u003e18.6 Design 1: Design Space Exploration When s1 = [1 −1] 284\u003c\/p\u003e \u003cp\u003e18.7 Design 2: Design Space Exploration When s2 = [1 0] 286\u003c\/p\u003e \u003cp\u003e18.8 Design 3: Design Space Exploration When s3 = [1 −0.5] 289\u003c\/p\u003e \u003cp\u003e18.9 Comparing the Three Designs 291\u003c\/p\u003e \u003cp\u003e\u003cb\u003e19 The Fast Fourier Transform 293\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e19.1 Introduction 293\u003c\/p\u003e \u003cp\u003e19.2 Decimation-in-Time FFT 295\u003c\/p\u003e \u003cp\u003e19.3 Pipeline Radix-2 Decimation-in-Time FFT Processor 298\u003c\/p\u003e \u003cp\u003e19.4 Decimation-in-Frequency FFT 299\u003c\/p\u003e \u003cp\u003e19.5 Pipeline Radix-2 Decimation-in-Frequency FFT Processor 303\u003c\/p\u003e \u003cp\u003e\u003cb\u003e20 Solving Systems of Linear Equations 305\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e20.1 Introduction 305\u003c\/p\u003e \u003cp\u003e20.2 Special Matrix Structures 305\u003c\/p\u003e \u003cp\u003e20.3 Forward Substitution (Direct Technique) 309\u003c\/p\u003e \u003cp\u003e20.4 Back Substitution 312\u003c\/p\u003e \u003cp\u003e20.5 Matrix Triangularization Algorithm 312\u003c\/p\u003e \u003cp\u003e20.6 Successive over Relaxation (SOR) (Iterative Technique) 317\u003c\/p\u003e \u003cp\u003e20.7 Problems 321\u003c\/p\u003e \u003cp\u003e\u003cb\u003e21 Solving Partial Differential Equations Using Finite Difference Method 323\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e21.1 Introduction 323\u003c\/p\u003e \u003cp\u003e21.2 FDM for 1-D Systems 324\u003c\/p\u003e \u003cp\u003eReferences 331\u003c\/p\u003e \u003cp\u003eIndex 337\u003c\/p\u003e\u003c\/font\u003e\u003c\/p\u003e\r\n\r\n\u003cp\u003e\u003cfont size=\"3\"\u003eSubject Areas: Computer networking \u0026amp; communications [\u003ca title=\"See our other books on Computer networking \u0026amp; communications\" href=\"https:\/\/freshlyprintedbooks.co.uk\/search?q=%22Computer%20networking%20\u0026amp;%20communications%20%5BUT%5D%22\"\u003eUT\u003c\/a\u003e]\u003c\/font\u003e\u003c\/p\u003e\r\n\r\n\r\n\u003c\/font\u003e","brand":"Wiley","offers":[{"title":"Brand New","offer_id":52278088433944,"sku":"9780470902103","price":84.69,"currency_code":"GBP","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0730\/2037\/5320\/files\/9780470902103.jpg?v=1781458107","url":"https:\/\/freshlyprintedbooks.co.uk\/products\/algorithms-and-parallel-computing-hardback-9780470902103","provider":"Freshly Printed Books","version":"1.0","type":"link"}