Simultaneous multithreading
Encyclopedia : S : SI : SIM : Simultaneous multithreading
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs. SMT permits multiple independent threads of execution to better utilize the resources provided by modern processor architectures.
Details
Normal multithreading operating systems allow multiple processes and threads to utilize the processor one at a time, giving exclusive ownership to a particular thread for a time slice in the order of milliseconds - this is called Temporal multithreading. Quite often, a process will stall for hundreds of cycles while waiting for some external resource (for example, a RAM load), thus lowering processor efficiency.A successive improvement is super-threading, where the processor can execute instructions from a different thread each cycle. Thus cycles left unused by a thread can be used by another that is ready to run.
Still, a given thread is almost surely not utilizing all the multiple execution units of a modern processor at the same time. Simultaneous multithreading allows multiple threads to execute different instructions in the same clock cycle, using the execution units that the first thread left spare. This is done without great changes to the basic processor architecture: the main additions needed are the ability to fetch instructions from multiple threads in a cycle, and a larger register file to hold data from multiple threads. The number of concurrent threads can be decided by the chip designers, but practical restrictions on chip complexity usually limit the number to 2, 4 or sometimes 8 concurrent threads.
Since the technique is really an efficiency solution, and there is inevitable increased conflict on shared resources, measuring or agreeing on the "goodness" of the solution can be difficult. Some researchers have shown that the extra threads can be used to proactively seed a shared resource like a cache, to improve the performance of another single thread, and claim this shows that SMT is not just an efficiency solution. Others use SMT to provide redundant computation, for some level of error detection and recovery.
But, in most current cases, SMT is about efficiency and increased throughput of computations, per amount of hardware used.
Taxonomy
In processor design, there are two ways to increase on-chip parallelism with less resource requirement:one is superscalar technique which tries to increase Instruction Level Parallelism (ILP), the other is multithreading approach exploiting Thread Level Parallelism (TLP).Superscalar means executing multiple instructions at the same time while chip-level multithreading (CMT) executes instructions from multiple threads within one processor chip at the same time. There are many ways to support more than one thread within a chip, namely:
- Multithreaded: Interleaved issue of multiple instructions from different threads
- Simultaneous multithreading (SMT): Issue multiple instructions from multiple threads in one cycle.
- Chip-level multiprocessing (CMP or Multicore): integrate two or more superscalar processors into one chip, each execute one thread independently
- Any combination of multithreaded/SMT/CMP
Historical implementations
While multithreading CPUs have been around since the 1950s, Simultaneous Multithreading was first researched by IBM in 1968. The first major commercial CPU developed with SMT was the DEC 21464 (EV8). This chip was developed by DEC in coordination with Dean Tullsen of the University of California, San Diego. The processor was never released, since the Alpha line of processors was discontinued when Compaq acquired DEC. Dean Tullsen's work was also used to create the Intel Pentium 4 Processor. The technology developed for this processor may eventually find its way into Tukwila, a CPU being developed at Intel by many of the engineers who designed the EV8.Modern commercial implementations
The Intel Pentium 4 was the first modern desktop processor to implement simultaneous multithreading, starting from the 3.06GHz model released in 2002, and since introduced into a number of their processors. Intel calls the functionality Hyper-Threading Technology (HTT), and provides a basic two-threads SMT engine. Intel claims up to a 30% speed improvement compared against an otherwise identical, non-SMT Pentium 4. The performance improvement seen is very application dependent, however, and some programs actually slow down slightly when HTT is turned on. This is due to the replay system of the Pentium 4 tying up valuable execution resources, thereby starving the other thread. However, any performance degradation is unique to the Pentium 4 (due to various architectural nuances), and is not characteristic of SMT in general.The latest MIPS architecture designs include a two-thread SMT system known as "MIPS MT". RMI, a Cupertino-based startup is the first MIPS vendor to provide a processor SOC based on 8 cores, each of which runs 4 threads. The threads can be run in fine-grain mode where a different thread can be executed each cycle. The threads can also be assigned priorities.
The IBM POWER5, announced in May 2004, is a dual-core processor, with each core including a two-thread SMT engine. IBM's implementation is more sophisticated than the previous ones, because it can assign a different priority to the various threads, is more fine grained, and the SMT engine can be turned on and off dynamically, to better execute those workloads where a SMT processor would not increase performance. This is IBM's second implementation of generally available hardware multithreading.
Although many people reported that Sun Microsystems' UltraSPARC T1 (known as "Niagara" until its 14 November 2005 release) and the upcoming processor codenamed "Rock" (to be launched ~2007) are implementations of SPARC focused almost entirely on exploiting SMT and CMP techniques, Niagara is not actually using SMT. Sun refers to these combined approaches as "CMT", and the overall concept as "Throughput Computing". The Niagara chip uses fine-grained multithreading. Unlike SMT, where instructions from multiple threads can be issued simultaneously, the processor uses a round robin policy to issue instructions from a single thread each cycle. The designers of the Montecito (processor) have also chosen not to use SMT.
See also
- Chip-level multiprocessing, a complementary technique
- Thread (computer science), what is executed
- Symmetric multiprocessing uses multiple processors
External links
- [SMT news articles and academic papers]
- [SMT research at the University of Washington]
- [Real World Technologies] - an overview of SMT by Paul DeMone, with a view towards the implementation in the EV8
- [Timeline of multithreading technologies]
References
LE Shar and ES Davidson, "A Multiminiprocessor System Implemented through Pipelining", Computer Feb 1974Sources
Replay: Unknown Features of the NetBurst Core [link]
From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.
