Freshly Printed - allow 4 days lead
Introduction to Parallel Programming
This book introduces students to the full gamut of different parallel programming styles and their relationship to hardware architecture.
Subodh Kumar (Author)
9781009069533, Cambridge University Press
Paperback, published 5 January 2023
350 pages
24.1 x 18.5 x 1.5 cm, 0.5 kg
In modern computer science, there exists no truly sequential computing system; and most advanced programming is parallel programming. This is particularly evident in modern application domains like scientific computation, data science, machine intelligence, etc. This lucid introductory textbook will be invaluable to students of computer science and technology, acting as a self-contained primer to parallel programming. It takes the reader from introduction to expertise, addressing a broad gamut of issues. It covers different parallel programming styles, describes parallel architecture, includes parallel programming frameworks and techniques, presents algorithmic and analysis techniques and discusses parallel design and performance issues. With its broad coverage, the book can be useful in a wide range of courses; and can also prove useful as a ready reckoner for professionals in the field.
List of Figures
Introduction
Concurrency and Parallelism
Why Study Parallel Programming
What is in this Book
1. An Introduction to Parallel Computer Architecture
1.1 Parallel Organization
SISD: Single Instruction, Single Data
SIMD: Single Instruction, Multiple Data
MIMD: Multiple Instruction, Multiple Data
MISD: Multiple Instruction, Single Data
1.2 System Architecture
1.3 CPU Architecture
1.4 Memory and Cache
1.5 GPU Architecture
1.6 Interconnect Architecture
Routing
Links
Types and Quality of Networks
Torus Network
Hypercube Network
Cross-Bar Network
Shuffle-Exchange Network
Clos Network
Tree Network
Network Comparison
1.7 Summary
2. Parallel Programming Models
2.1 Distributed-Memory Programming Model
2.2 Shared-Memory Programming Model
2.3 Task Graph Model
2.4 Variants of Task Parallelism
2.5 Summary
3. Parallel Performance Analysis
3.1 Simple Parallel Model
3.2 Bulk-Synchronous Parallel Model
BSP Computation Time
BSP Example
3.3 PRAM Model
PRAM Computation Time
PRAM Example
3.4 Parallel Performance Evaluation
Latency and Throughput
Speed-up
Cost
Efficiency
Scalability
Iso-efficiency
3.5 Parallel Work
Brent's Work-Time Scheduling Principle
3.6 Amdahl's Law
3.7 Gustafson's Law
3.8 Karp–Flatt Metric
3.9 Summary
4. Synchronization and Communication Primitives
4.1 Threads and Processes
4.2 Race Condition and Consistency of State
Sequential Consistency
Causal Consistency
FIFO and Processor Consistency
Weak Consistency
Linearizability
4.3 Synchronization
Synchronization Condition
Protocol Control
Progress
Synchronization Hazards
4.4 Mutual Exclusion
Lock
Peterson's Algorithm
Bakery Algorithm
Compare and Swap
Transactional Memory
Barrier and Consensus
4.5 Communication
Point-to-Point Communication
RPC
Collective Communication
4.6 Summary
5. Parallel Program Design
5.1 Design Steps
Granularity
Communication
Synchronization
Load Balance
5.2 Task Decomposition
Domain Decomposition
Functional Decomposition
Task Graph Metrics
5.3 Task Execution
Preliminary Task Mapping
Task Scheduling Framework
Centralized Push Scheduling Strategy
Distributed Push Scheduling
Pull Scheduling
5.4 Input/Output
5.5 Debugging and Profiling
5.6 Summary
6. Middleware: The Practice of Parallel Programming
6.1 OpenMP
Preliminaries
OpenMP Thread Creation
OpenMP Memory Model
OpenMP Reduction
OpenMP Synchronization
Sharing a Loop's Work
Other Work-Sharing Pragmas
SIMD Pragma
Tasks
6.2 MPI
MPI Send and Receive
Message-Passing Synchronization
MPI Data Types
MPI Collective Communication
MPI Barrier
MPI Reduction
One-Sided Communication
MPI File IO
MPI Groups and Communicators
MPI Dynamic Parallelism
MPI Process Topology
6.3 Chapel
Partitioned Global Address Space
Chapel Tasks
Chapel Variable Scope
6.4 Map-Reduce
Parallel Implementation
Hadoop
6.5 GPU Programming
OpenMP GPU Off-Load
Data and Function on Device
Thread Blocks in OpenMP
CUDA
CUDA Programming Model
CPU–GPU Memory Transfer
Concurrent Kernels
CUDA Synchronization
CUDA Shared Memory
CUDA Parallel Memory Access
False Sharing
6.6 Summary
7. Parallel Algorithms and Techniques
7.1 Divide and Conquer: Prefix-Sum
Parallel Prefix-Sum: Method 1
Parallel Prefix-Sum: Method 2
Parallel Prefix-Sum: Method 3
7.2 Divide and Conquer: Merge Two Sorted Lists
Parallel Merge: Method 1
Parallel Merge: Method 2
Parallel Merge: Method 3
Parallel Merge: Method 4
7.3 Accelerated Cascading: Find Minima
7.4 Recursive Doubling: List Ranking
7.5 Recursive Doubling: Euler Tour
7.6 Recursive Doubling: Connected Components
7.7 Pipelining: Merge-Sort
Basic Merge-Sort
Pipelined Merges
4-Cover Property Analysis
Merge Operation per Tick
7.8 Application of Prefix-Sum: Radix-Sort
7.9 Exploiting Parallelism: Quick-Sort
7.10 Fixing Processor Count: Sample-Sort
7.11 Exploiting Pa
Subject Areas: Parallel processing [UYFP], Computer science [UY], Computer programming / software development [UM]