Multiprocessor

A system with more than one processor .
There are many types of multiprocessor systems. These can be classified based on:
* Loosely coupled vs tightly coupled multiprocessor system
* Homogeneous vs heterogeneous multiprocessor system
* Shared memory vs distributed memory
* UMA vs cc-NUMA system
* Hybrid system - shared system memory for global data and local memory for local data
Loosely coupled multiprocessor system
In a distributed memory multiprocessor system, each processor has its local memory, IO channels and with an independent operating system. Processors can exchange data through a high-speed interconnection network by means of communication through the message passing
System characteristics
* These systems are capable of processing multiple instruction, multiple data (MIMD) programming
* This type of architecture allows parallel processing.
* The distributed memory allows high scalability
Tightly coupled multiprocessor system
(or shared memory system)
Multiprocessor system with a shared memory closely connected to the processors.
A symmetric multiprocessor system is a multiprocessor system with centralized shared memory called main memory (MM) operating under a single operating system with two or more homogeneous processors system.
There are two types of system:
* UMA system
* NUMA system
UMA system
(Uniform Memory Access)
* Heterogeneous Multiprocessor System
* Symmetric Multiprocessor System
Heterogeneous multiprocessor system
A Heterogeneous Multiprocessing System itself refers to systems that contain multiple not homogeneous processing units - central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), or any type of application-specific integrated circuits (ASICs). The system architecture allows any accelerator, for instance a graphics processor, to operate at the same processing level as the system's CPU.
Symmetric multiprocessor system
(SMP)
Systems operating under a single OS (Operating System) with two or more homogeneous processors and with a centralized shared Main Memory.
SMP is a system with a pool of homogeneous processors running independently of each other. Each processor, executing different programs and working on different sets of data, has the capability of sharing common resources (memory, I/O device, interrupt system and so on) that are connected using a system bus or a crossbar or a mix of two previously approach, bus for address and crossbar for Data (Data crossbar) .
Each processor have a its own cache that acts as a bridge between processor and Main Memory. The function of the cache memory is to speed up the MM data access (performance increasing) and most important, in multiprocessor systems with shared memory, to reduce the system bus and MM traffic that is one of the major bottleneck of these systems. In these system the cache is an essential element.
The shared memory allows an Uniform Memory Access time (UMA)
cc-NUMA system
(cache coherency - Non Uniform Memory Access)
It is known that the SMP system is limited in scalability.
To overcome this limitation, the architecture called "cc-NUMA " is normally used.
cc-NUMA system is a cluster of SMP systems, called "NODEs", connected via a high-speed connection network that can be a link that can be a single or double-reverse ring, or multi-ring, point-to-point connectionsor a mix of these (e.g. IBM Power Systems), bus interconnection (e.g. NUMAq), crossbar, segmented bus (NUMA Bull HN ISI ex Honeywell), Mesh router, etc..
The main characteristic of cc-NUMA system is to have an unique shared global memory distributed in each node
directly accessed from all the processors of all the nodes
In a cc-NUMA system, the access from a processor to a remote memory of a remote node, is slower compared to the access to its local memory. For this reason this system is called NUMA (Non Uniform Memory Access).
cc-NUMA is also called Distributed Shared Memory (DSM) architecture.
Each node usually is a SMP system, where a processor can be a single processor or a multi-core processor or mix of this two or any other kind of architecture. The fig. aside is just an example.
The difference in access time from local and remote can be also of an order of magnitude, depending on the kind of the connection network used (faster in segmented bus, crossbar and point-to-point interconnection, slower in serial rings connection).
To overcome this limit a large remote cache (see ) is normally used. With this solution the cc-NUMA system become very close to a large SMP system.
Tightly coupled vs Loosely coupled architecture
Both architectures have advantages and trade offs which may be summarized as follows:
* Loosely coupled architectures feature high performances of each individual processor but do not enable for an easy real time balancing of the load among processors.

* Tightly coupled architectures feature by reverse an easy load balancing and distribution among processors but suffer from the bottleneck consisting in the sharing of common resources through one or more buses (which is also a common resource)
Multiprocessor system feraturing global data multiplation
An intermediate approach of the two previous architectures, is one having common resources and local resources such as local memories (LM) in each processor. The common resources are accessible to all the processors through the system bus and the local resources being accessible to the pertaining processor. Cache memories, may be viewed in this perspective as local memories
.
This system (patented F. Zulian ) used on the DPX/2 300 Unix based system (Bull Hn Information Systems Italia (ex Honeywell)) , is a mix of tightly coupled and loose coupled systems and takes all the advancements of these two architectures
The Local memory is divided into two sectors, global data (GD) and local data (LD).
The basic concept of this architecture is to have global data, which is modifiable information required by more than one processor. This information is duplicated and stored in each local memory . Each time the global data is modified in a local memory, a hardware write-broadcasting is sent to the system bus to all other local memories to maintain the global data coherency. Thus, global data may be read by each processor accessing its own local memory without involving the system bus. System bus access is only required when global data is modified in a local memory.
Local data can be exchanged like in loosely coupled system via message-passing
 
< Prev   Next >