Parallel

Introduction to Parallel Computing

  • Shared Memory Parallelism
    • Many operations are performed simultaneously
    • Single computer is required
    • Multiple processors perform multiple operations
    • It may have shared or distributed memory
    • Processors communicate with each other through bus
    • Improves the system performance
  • Distributed Memory Parallelism
    • System components are located at different locations
    • Uses multiple computers
    • Multiple computers perform multiple operations
    • It have only distributed memory
    • Computer communicate with each other through message passing.
    • Improves system scalability, fault tolerance and resource sharing capabilities
  • Amdahl’s law: 1/(rs + rp/p)
  • Gustafson’s law

Parallel Computing on Shared Memory

  • Each thread has its own instruction pointer pointing to the next instruction of the thread to be executed. Each thread needs its own stack and also stores information regarding registers but shares the code and other parts. A process can have more than one thread.

IPC & RPC

  • Inter-process Communication
  • information sharing among two or more processes
  • Synchronous and Asynchronous
  • List the Types of Failure in IPC: Loss of request message, Loss of response message, Unsuccessful execution of the request
  • How to implement Idempotency? Adding sequence number with the request message and Introduction of ‘Reply cache’
  • Three main types of Group Communications? One to many, Many to one, Many to many
  • One of the greatest challenges in Many to Many? Ordered Delivery
  • Name an all propose IPC protocol? Remote Procedure Call (RPC)
  • Name a few ways to optimize RPC?
    • Concurrent Access to Multiple Servers, Serving Multiple Requests.
    • Concurrently, Reducing Call Workload per Server
  • Three different techniques for implementing Concurrent Access to Multiple Servers? Threads, Early Reply, Call Buffering

Synchronization, Mutex, Deadlocks (05)

  • Real time Clock Synchronization Methods
    • Cristian’s Method - time server.
    • Berkeley Algorithm - A master - round-trip time
    • Averaging Algorithm - A machine collects all other broadcasts for a certain interval and sets the local clock by the average of their arrival times.
  • Logical Clock Synchronization Techniques
    • Lamport - Happens-before
    • Vector Clock
  • Mutual Exclusion Approaches
    • Centralised - 25
    • Distributed - 29 - too much message
    • GME -
  • Deadlock detection and handling
    • Necessary condition for occurrence
      • communication deadlock
      • resource deadlock
        • Mutual exclusion.
        • Hold and wait.
        • Non-preemption.
        • Circular wait.
    • Deadlock Detection
      • wait-for
    • Deadlock Handling
      • Wait-die, young process kills itself
      • kill young

Election Algorithms, Distributed Transactions, Concurrency Control

  • Election Algorithms
    • Bully Algorithm - Election message - higher numbered answers
    • Ring Algorithm
  • Distributed Transactions
    • System Model
    • Properties of transaction
      • Atomic
      • Consistent
      • Isolated
      • Durable
    • Implementation of transaction
      • Private workspace
      • Write-ahead Log
    • Distributed Commit Protocols
      • 2 Phase commit
      • 3 Phase Commit
        • at most one site can fail during the execution of the transaction
  • Concurrency Control in Distributed System: Mechanisms for implementing Concurrency control
    • Lock based CC
    • Timestamp based CC
    • Optimistic CC

Faults & Distributed Consensus

  • Classification of failures
    • Crash Failure
    • Omission Failure
    • Transient Failure
    • Byzantine Failure
    • Software Failure
    • Temporal Failure
    • Security Failure
  • Fault-Tolerant System
    • Masking tolerance
    • Non-masking tolerance
    • Fail-safe tolerance
    • Graceful degradation

Introduction To ILP Processors

  • Evolution and overview of ILP Processors
  • non-pipe -> pipe -> multi-pipe -> VLIW -> superscale
    • pipe
    • parallel: VLIW and superscale
  • Understanding of inter-instruction dependencies
    • Data, Control and Resource Dependencies
  • What is instruction Scheduling?
  • Sequential Consistency
  • Evolution and overview of ILP Processors
    • Pipelines, VLIW or super scaler
  • Understanding of inter-instruction dependencies
    • Data
      • Read after write
      • WAR and WAW -> renaming
    • Control
    • Resource Dependencies
  • Instruction Scheduling
    • Static & Dynamic scheduling
  • Sequential Consistency
    • When instructions are executed in parallel, processor must be careful to preserve the sequential consistency.
  • How fast can we go?
    • Performance is limited by underlying algorithm compiled code and actual hardware (resource restrictions)

Vector Architectures

  • Generic vector processors
  • Vectorized & Un-vectorized computation
  • Pipelined computation
    • Cache Memories
    • Interleaving
      • the memory is busy during the memory access and no other access can proceed
      • In an interleaved memory system there are a number of banks. Each bank corresponds to a certain range of addresses.
  • Multiple data can be loaded at once.
  • Operations can be applied to all of the data in one operation.

Introduction To MIMD Architectures

  • Distributed Memory MIMD
    • Replicate the processor/memory pairs
    • Connect them via an interconnection network
    • Messages passed between processes
      • Synchronization
      • Data movement
    • Packet switching: Latency = (Packet length * Distance)/Channel Bandwidth
    • Circuit Switching: Latency = (Probe Length * Distance)/ Channel Bandwidth + (Message Length/ Channel Bandwidth)
    • Virtual cut-through: Latency = (Length of header Flit * Distance)/ Channel Bandwidth + (Message Length/ Channel Bandwidth)
    • Wormhole routing
  • Shared Memory MIMD
    • Replicate the processors
    • Replicate the memories
    • Connect them via an interconnection network
    • Synchronization is difficult
    • Lack of scalability

GPU Computing

Written on November 29, 2020