Web - Amazon

We provide Linux to the World


We support WINRAR [What is this] - [Download .exe file(s) for Windows]

CLASSICISTRANIERI HOME PAGE - YOUTUBE CHANNEL
SITEMAP
Audiobooks by Valerio Di Stefano: Single Download - Complete Download [TAR] [WIM] [ZIP] [RAR] - Alphabetical Download  [TAR] [WIM] [ZIP] [RAR] - Download Instructions

Make a donation: IBAN: IT36M0708677020000000008016 - BIC/SWIFT:  ICRAITRRU60 - VALERIO DI STEFANO or
Privacy Policy Cookie Policy Terms and Conditions
Itanium 2 - Wikipedia, the free encyclopedia

Itanium 2

From Wikipedia, the free encyclopedia

Itanium 2
Central processing unit

Itanium 2 processor
Produced: From mid 2002 to present
Manufacturer: Intel
CPU Speeds: 200 MHz to 1.4 GHz
FSB Speeds: 200 MHz to 533 MHz
Instruction Set: IA-64
Cores:
  • McKinley
  • Madison
  • Hondo
  • Deerfield
Itanium 2 logo
Itanium 2 logo

The Itanium 2 is an IA-64 64-bit microprocessor developed jointly by Hewlett-Packard (HP) and Intel, and introduced on July 8, 2002. The first Itanium 2 processor (code-named McKinley) was substantially more powerful than the original Itanium, Intel's first IA-64 product. Several generations of Itanium 2 processors have followed, most recently a dual-core version (code-named Montecito) which, according to Intel, provides roughly 3.5 times the performance of the single-core Itanium 2 processors. [1]


Contents

[edit] Computing capabilities of Itanium

[edit] Floating-Point Performance

Floating-point performance is a function of both the ability to perform floating-point operations in parallel and the cycle time necessary for the processor to execute these operations. On the Itanium this number is increased by functional units which can perform two operations in a single pass. The Itanium is configured with two floating-point functional units known as floating-point multiply add calculations (FMACs), which can multiply two values and add that result to a third value. (Such operations are at the heart of many technical calculations.) Thus, an Itanium running at 800MHz can produce four floating-point results a cycle for a peak 64-bit performance rating of 3.2 billion floating-point operations per second (GFLOPS). The Itanium architecture also includes two single-precision (32-bit) FMACs that are tuned for 3D graphics performance which can each perform an additional four floating-point operations per cycle for a 6.4GFLOPS single-precision rating on an 800MHz processor. It is important to note that these performance numbers will automatically increase with each step-up of clock rates in the Itanium processor family. In addition, the Itanium architecture is designed to allow future versions of the processor to be configured with additional FMACs. The above analysis presents a best-case scenario in which the functional units are always busy. Although computer processors can maintain peak performance for only brief periods, Intel has incorporated a number of features in the Itanium architecture that help to maximize sustained performance. These include:

  • Pipelined functional units

Arithmetic operations generally require more than one machine cycle to complete. A pipelining scheme is used to allow the FMACs to produce results each cycle. The arithmetic operations are broken into a set of independent steps, each requiring one machine cycle to complete. The FMACs perform arithmetic operations in an assembly-line fashion, with each step accepting data from the previous step and sending results to the next step. Thus, after the pipeline is full, a result is produced each cycle.

  • Dual-function arithmetic units

A secondary benefit of the dual function FMAC strategy is that the processor is able to use both functional units even when the distribution of adds and multiplies is biased toward one operation. For example, if a section of code performs only additions, both FMACs can be employed on the task. In contrast, a system with separate addition and multiplication functional units would use the adder but would have to leave the multiply unit idle.

  • Large register sets

Intel designed the Itanium processor to support 128 integer, 128 floating point, 8 branch and 64 predicate registers (for comparison, IA-32 processors support 8 registers and other RISC processors support 32 registers). The use of these registers allows more database data and intermediate calculations to be stored in on-chip registers and reduces the repetitive load/store of intermediate data values. The more data that is directly available to the FMACs, the less likely a functional unit will stall due to lack of data. In addition, the large register sets provide a buffer for the memory system to move data in and out of memory. These capabilities combine to greatly improve the overall response time of an application’s database manipulation request.

  • Internal parallelism

The Itanium can issue up to six instructions per cycles in a fixed set of combinations of four integer arithmetic/ logical operations, two load/store operations, two floating-point operations, and three branch operations. The advantage of double-precision or 64-bit operations over single-precision or 32-bit operations is that the former allow larger sets of calculations to be performed before accumulated round-off errors begin to affect the accuracy of the final results. Because 64-bit systems are able to produce 64-bit results in a single cycle, as opposed to two cycles for 32-bit systems, the speed of operations on 64-bit data types (such as doubles) is greatly increased. Multiple operations not only keeps as much of the processor working as possible but also allows for the pre-fetching of data from memory into registers and cache memory, thus minimizing processor stalls due to data unavailability. The processor also enables a load-double pair instruction to feed the processor with a balance of a memory operation per floating-point operation.

  • Compiler support for parallelism

The IA-64 architecture was designed to allow for closer coordination between the processor and compilers which generate the machine instructions for the processor. Three instructions are bundled along with a template field where the compiler can provide “hints” to the hardware on the interactions between the instructions. These hints are used by the processor to schedule instructions in real time and for pre-fetching of data for future operations.

[edit] Memory Performance

Memory performance is measured in terms of both latency (i.e., how many cycles it takes to get data from memory to the processor) and bandwidth (i.e., how many bytes of data can be moved in a cycle). Many current systems attempt to solve the problems of latency and insufficient bandwidth through memory hierarchies, which include various levels of cache memory between main memory and the processor. Although this solution is effective, it is costly in terms of memory involved.

The Itanium 2 can read or write bytes of data to and from memory during every bus cycle; thus, for a 133MHz bus, the memory bandwidth is 2.1GBps. The 460GX chipset, which supports the Itanium processor, also has the ability to write an additional 2.1GBps from I/O to memory, for a total of 4.2GBps memory bandwidth. The Itanium processor uses a 4MB L3 (level 3) cache for quick access to large data structures such as texture maps for digital content applications. The L3 cache communicates with the 96KB L2 cache and the register file, moving data at 12.6GBps (16 bytes per 800MHz system clock) and with a 24-cycle latency for floating-point numbers.

The L2 cache feeds data directly into the floating-point registers at a rate of 32 bytes of data per clock tick and with a 9 clock latency. Although the L1 cache is by-passed by floating-point data, it is worth noting that it is divided into a 16KB instruction cache — L1I — and a 16KB integer data cache — L1D. Both caches operate on 2 clock latency to provide localized access to integer instructions and data, which is faster than retrieving the data from memory.

Because 64-bit systems have larger address spaces and are thus capable of having more memory — up to 1.8TB on the Itanium (the 460GX enables 64GB of physical memory; other original equipment manufacturer OEM systems can enable larger memory) — Itanium systems with sufficient RAM can load more of the program directly into memory, reducing the amount of time needed for read/write operations on the hard disk, which are ordinarily the most time-consuming operations for a computer (reading and writing data from a disk takes about 10 times longer than the same operation on memory).

[edit] Support for Large Data Sets

The requirements to operate on larger data sets generate in turn requirements for computer systems to provide larger real and virtual memories. A computer system’s addressable memory is usually determined by the size of its integer or address registers. 32-bit architectures can directly address 4GB of either real or virtual memory. Beyond this limit, some form of memory segmentation must be employed. 64-bit architectures can in theory address about 1019 bytes of data, roughly 9 terabytes. This is an enormous amount of physical memory, far more than is currently posessed by any single computer system, so for the time being, 64-bit architectures allow computer systems to expand memory virtually indefinitely without having to resort to some form of segmentation.

[edit] Programming model

One of the major advantages of 64-bit architectures is that because they allow larger amounts of both real and virtual memory, applications developers can design programs without having to divide the code into memory-sized segments. Such segmentation still exists, although it is less common today when nearly all servers have at least a gigabyte of memory available, and it requires developers to create code for managing the memory segments, hampering the program's performance.

[edit] Architectural Features and Attributes

The IA-64 architecture is based on a derivative of VLIW, dubbed Explicitly Parallel Instruction Computing (EPIC). It is theoretically capable of performing roughly 8 times more work per clock cycle than a non-superscalar CISC or RISC architecture due to its Parallel Computing Microarchitecture. However, performance is heavily dependent on software compilers and their ability to generate code which efficiently uses the available execution units of the processor. The Itanium 2 has seen heavy use in compute-bound supercomputers, and large corporate database servers, where parallelism and compile-time optimizations are most effective.

All Itanium 2 processors to date share a common cache hierarchy. They have 16 KiB of Level 1 instruction cache and 16 KiB of Level 1 data cache. The L2 cache is unified (both instruction and data) and is 256 KiB. The Level 3 cache is also unified and varies in size from 1.5 MiB to 24 MiB. In an interesting design choice, the L2 cache contains sufficient logic to handle semaphore operations without disturbing the main ALU. The latest Itanium processor, however, features a split L2 cache, adding a dedicated 1MiB L2 cache for instructions and thereby effectively growing the original 256 KiB L2 cache, which becomes a dedicated data cache.

The Itanium 2 bus is occasionally referred to as the Scalability Port, but much more frequently as the McKinley bus. It is a 200 MHz, 128-bit wide, double pumped bus capable of 6.4 GB/s — more than three times the bandwidth of the original Itanium bus, known as the Merced bus. In 2004, Intel released processors with a 266 MHz bus, increasing bandwidth to 8.5 GB/s. In early 2005, processors with a 10.6 GB/s, 333 MHz bus were released.

Most systems sold by enterprise server vendors that contain 4 or more processor sockets use proprietary Non-Uniform Memory Access (NUMA) architectures that supersede the more limited front side bus of 1 and 2 CPU socket servers.

Itanium's major competitors include Sun Microsystems' UltraSPARC T1, IBM's Power5, AMD's Opteron, and Intel's own Xeon servers. In general, Itanium competes against Sun IBM systems, and Opterons for running enterprise-class workloads on large, multi-processor servers in the back-end of corporate datacenters. It competes against Opteron and Xeon-based servers in smaller configurations and in cluster configurations.

[edit] Supercomputers

The best position ever acheived by an Itanium 2 based system in the TOP500 ranking of the fastest supercomputers was acheived in June 2004 when Thunder at LLNL entered the list at #2 with an Rmax of 19.94 Teraflops. It was second only to the Japanese Earth Simulator. It had 4096 1.4Ghz Itanium 2 processors and was connected by a Quadrics QsNet II interconnect. As of November 2006, this system has slipped to position #19.

In the November 2006 list there were three other Itanium 2 based systems in the top 20:

  • #7 Tera-10, Commissariat a l'Energie Atomique (CEA), France. Machine: Bull SMP Cluster, NovaScale 5160. CPU: 8,704 Itanium 2 (1.6 GHz). Connection: Quadrics QsNet II. Main Memory: 26112 GB. Rmax: 42.9 Teraflops.
  • #8 Columbia, NASA/Ames Research Center/NAS. Machine: SGI Altix 3700. CPU: 10,160 Itanium 2 (1.5 GHz). Connection: SGI NUMAlink / Voltaire Infiniband. Rmax: 51.87 Teraflops.
  • #18 HLRB II, Leibniz Rechenzentrum, Baveria, Germany. Machine: SGI Altix 4700. CPU: 4,096 Itanium 2 (1.6 GHz). Connection: SGI NUMAlink. Rmax: 24.36 Teraflops

The peak number of Itanium-based machines on the list occured on the November 2004 list at 16.8%. In November 2006 the number is 7.0%

[edit] Itanium 2 processor versions

[edit] McKinley

McKinley was the first version of the Itanium 2 processor, manufactured in an 180 nm process. It was released at speeds of 900 MHz and 1 GHz, with cache sizes of 1.5 MiB and 3 MiB. It added hardware support for the branchlong instruction of the IA-64 instruction set. IA-32 performance, while improved, was still much slower than that of contemporanious x86 processors; McKinley's x86 performance was similar to that of a Pentium II at 2/3 the clock speed.

[edit] Madison

Madison was initially introduced on June 30, 2003. It was initially available in three versions: 1.3 GHz with 3 MiB of cache, 1.4 GHz with 4 MiB of cache and 1.5 GHz with 6 MiB of cache. Manufactured in a 130 nm process, it had a die size of 374 mm². Its power envelope remained unchanged from McKinley at 130 watts. On September 8, 2003, a 1.4 GHz version with 1.5 MiB of cache was released. 1.4 GHz and 1.6 GHz versions with 3 MiB of cache were launched on April 13, 2004. November 8, 2004 saw the release of the first processor in the Madison 9M series, at 1.6 GHz with 9 MiB of cache. On July 18, 2005, more variations of the Madison 9M were introduced, including 1.67 GHz models with a 333 MHz FSB and either 6 MiB or 9 MiB of cache. On introduction, the latter part set a record SPECfp result of 2,801 in a Hitachi, Ltd. Computing blade.

In January 2005 OpenVMS was added to the line up of Operating Systems able to run on these processors.

[edit] Hondo

Hondo was announced as the HP mx2 dual-processor module on February 18, 2003 and started shipping in early 2004. It consists of two Madison cores with 32 MiB of L4 cache and fits in the same space as a normal Itanium 2 CPU. It is only available from HP. Currently the cores run at 1.1 GHz with 4 MiB L3 cache each.

OpenVMS for Itanium is able to use the MX2 variant.

[edit] Deerfield

Deerfield was released on September 8, 2003. With 1.5 MiB of cache, running at 1 GHz, this was the first low voltage Itanium processor. Its 62 watt power envelope made it more suited for blade and 1U servers.

[edit] Fanwood

The Fanwood core debuted on November 8, 2004. Versions include a 1.6 GHz edition with 3 MiB of L3 cache with either 200 MHz or 266 MHz FSB and a low voltage 1.3 GHz version with 3 MiB L3 cache at 200 MHz.

[edit] Montecito

Main article: Montecito (processor)

The Dual-Core Intel® Itanium® 2 processor 9000 series (code-named Montecito) was released on July 18, 2006. Montecito is the first Itanium processor to have two cores per die. It was originally planned to feature advanced power and thermal management improvements. However, the originally planned Foxton dynamic clock speed feature was removed due to unspecified engineering issues (it is under consideration by Intel for inclusion in future Itanium 2 processor versions). Despite the elimination of this feature, Intel reports that Montecito doubles the performance of its single-core predecessor, while reducing power consumption by approximately 20 percent. [2] It also adds multi-threading capabilities (two threads per core), a greatly expanded cache subsystem (12 MB per core), and silicon support for virtualization. Manufactured in a 90nm process, Montecito debuted with speeds between 1.4 GHz for a low-power configuration and 1.6 GHz / 12 + 12 MiB L3 at the high end. The FSB runs at 400 MHz and 533 MHz.

[edit] Upcoming revisions

Current event marker This article contains information about a scheduled or expected future product.
It may contain preliminary or speculative information, and may not reflect the final version of the product.

The future of the Itanium family apparently lies in multi-core chips, as the available information about coming generations, such as Montvale and Tukwila shows. (Those are internal code names; the final products will most likely also bear the Itanium brand, possibly as Itanium 3 or perhaps just Itanium 2.).

[edit] Montvale

Main article: Montvale (processor)

Montvale is expected to be a revision of Montecito bringing higher clock speeds, larger caches, and a faster FSB.

[edit] Tukwila

Main article: Tukwila (processor)

Tukwila, the first 65 nm design, is due in 2008. Tukwila will consist of 4 cores, with each core being multithreaded. It is going to feature a new bus called Common System Interface and an on-die memory controller. Ultimately, CSI is intended to provide socket compatibility with Xeon processors; however, as of October 2005, the CSI roadmap for Xeon processors has been delayed until at least 2009.

[edit] Poulson

Main article: Poulson (processor)

Few details are known, other than the existence of the codename.


[edit] External links


List of Intel microprocessors | List of Intel CPU slots and sockets

Intel processors

4004 | 4040 | 8008 | 8080 | 8085 | 8086 | 8088 | iAPX 432 | 80186 | 80188 | 80286 | 80386 | 80486 | i860 | i960 | Pentium | Pentium Pro | Pentium II | Celeron | Pentium III | XScale | Pentium 4 | Pentium M | Pentium D | Pentium Extreme Edition | Xeon | Core | Core 2 | Itanium | Itanium 2   (italics indicate non-x86 processors)

Our "Network":

Project Gutenberg
https://gutenberg.classicistranieri.com

Encyclopaedia Britannica 1911
https://encyclopaediabritannica.classicistranieri.com

Librivox Audiobooks
https://librivox.classicistranieri.com

Linux Distributions
https://old.classicistranieri.com

Magnatune (MP3 Music)
https://magnatune.classicistranieri.com

Static Wikipedia (June 2008)
https://wikipedia.classicistranieri.com

Static Wikipedia (March 2008)
https://wikipedia2007.classicistranieri.com/mar2008/

Static Wikipedia (2007)
https://wikipedia2007.classicistranieri.com

Static Wikipedia (2006)
https://wikipedia2006.classicistranieri.com

Liber Liber
https://liberliber.classicistranieri.com

ZIM Files for Kiwix
https://zim.classicistranieri.com


Other Websites:

Bach - Goldberg Variations
https://www.goldbergvariations.org

Lazarillo de Tormes
https://www.lazarillodetormes.org

Madame Bovary
https://www.madamebovary.org

Il Fu Mattia Pascal
https://www.mattiapascal.it

The Voice in the Desert
https://www.thevoiceinthedesert.org

Confessione d'un amore fascista
https://www.amorefascista.it

Malinverno
https://www.malinverno.org

Debito formativo
https://www.debitoformativo.it

Adina Spire
https://www.adinaspire.com