My original plan was to go processor by processor. My paper tries to cover all the major architectural features for each processor. In some places I was limited by information available but for most of the processors I was able to find all the same information. I want this to be understandable to people who are computer literate but are not architecure experts. As I started with the first processor, the Itanium 2, I realized I was writing a paragraph of explanation for every sentence. In the end I'd end up with five blog entries just for the Itanium 2 so I could explain what everything meant, and then rush through
the rest. That didn't sound like what I wanted so instead I am going to pick one to a few architectural features and talk about all the processors.
I will be using the text of my paper as I wrote it, but insert explanations and hyperlinks to explain what I am talking about. The explanations will be in a grey text, my paper is in normal black text. I'll be including a handful of my sources in each entry for those who are interested in more information.
Itanium 2
A cooperation between Intel and Hewlett-Packard lead to the release of the Itanium processor in 2001. The Itanium was intended to replace the x86 hardware and dominate the server and workstation markets. It was also not expected that AMD would be able to clone it. The second generation Itanium 2 was released in July 2002. Intel has since focused on a new Itanium processor, code named Montecito. HP sells Itanium 2 servers that range from single processor blades to 128 processor high-end servers. The Itanium's IA-64
architecture is completely different than the IA-32 architecture of the x86 family, though it provides backwards compatibility for 32-bit x86 applications. This paper will be focusing on the versions of the Itanium 2 after the first one (McKinley), code named Madison, Deerfield, and Fanwood.
What does this mean? Hardware companies have working or code names for each processor they release. For a specific processor there may be several different versions with different features. For the Itanium 2, Intel had four different versions starting with McKinley. I have found the code name is the most convenient way to reference different versions of the same processor so expect to see them used heavily in this paper.
The Itanium 2 core speed is 1.3 to 1.6 GHz and uses 130 watts of power.
What does this mean? Processor clock frequency is probably one of the most misunderstood pieces of information about processors. So, what does GHz mean for a processor. A processor clock tick is the smallest moment of time for a processor. In the simplest processor, it can do one operation each clock tick. The clock frequency is how many of those ticks occur in one second. The more ticks, the more operations that can be completed in a second. So far it sounds like GHz does just mean speed. Ah, but there is more. What is a operation and how many can a given processor do each tick? These can vary significantly. Programs consist of instructions. Each instruction is like a step in a recipe. The difference is that for different processor types, the number of instuctions needed to run the same program are different. So, already we cannot know which processor will run an application faster because they are not doing the same steps. Each instruction can be broken down into many operations. Some processors break their instructions down into smaller operations than others. Having smaller operations means they can be done faster, which increases the clock speed. But, this does not mean the whole instruction is completed any faster. So, what does knowing the clock speed tell us. For the same processor, and sometimes for a processor family, you can tell which processor is faster. It just cannot be used between
processors families (like comparing a Pentium to a G5) and sometimes even with in a processor family (like between the Pentium-D and the Xeon). Clock speed is also often tied to energy consumption (higher clock speed, more energy used).
The Itanium 2 is one of the few single-core 64-bit processors still commonly available. There is a
dual-core Itanium 2, called Hondo, available only from HP which uses two Madison cores operating at 1.1 Mhz, which is not covered in this paper due to lack of SPEC results for it. Montecito will be a dual-core processor.
What does this mean? A multiple-core processor is basically like having that many seperate processors together on a single chip, which share some resources such as a memory bus. An advantage to multiple-core processors is they can communicate with each other faster than seperate processors. They also take up less physical space on the system mother board. Many dual-core processors have the same footprint size as their older,
single-core predecessors. Most multiple-core processors also use less energy than if they were all seperate processors.
The later versions of the Itanium 2 support two threads using
coarse-grained multithreading.
What does this mean? In a processor without threading, each cycle, one or more instructions from the same program are issued into the processor to be run. The problem is that often the processor is not doing any actual work because the instructions for the program are waiting for something such as data from memory (which is a very slow operation, 1000s of cycles). In order to make better use of the processor, a technique called threading was created. There are three general types of threading. In coarse-grained multithreading, CGMT, the processor will switch what program it is running instructions from when the thread encounters a long-latency event. This means at any given time, the processor is still only running one program.
Opteron
The eighth generation of AMD's
Hammer architecture, the Opteron processor (code names SledgeHammer - 130 μm and Venus - 90 μm), was introduced in April 2003. Designed to compete with Intel's Itanium 2, the Opteron is the most powerful of AMD's 64-bit processors. It was designed for server and enterprise applications. It has arguably become the most popular x86-based 64-bit processor. A variety of computer system producers, including all of the largest enterprise-level UNIX vendors (Fujitsu, IBM, HP, and Sun), sell Opteron systems.
What does μm mean? This refers to the manufacturing process for the processor. The smaller the number, the smaller the size of the circuits. This allows the processors to be smaller and use less power. See wikipedia's page about 90 nanometer for more details.
The Opteron chip comes with either one or two cores with clock speeds from 1.8 to 2.8 GHz. Both the single and dual core Opteron processors can run two threads using
simultaneous multithreading and supports
out-of-order execution.
What does this mean? In simultaneous multi-threading, instructions from more than one program (in the case of the Opteron, from two programs) issue in to the processor to be run in the same cycle. In an in-order processor, instructions for a program are executed in the processor in the same order as they occur in the program. However, many instructions in a program are independent of each other, and do not have to be executed in-order for the program to produce the correct result. This is called
instruction level parallelism, or ILP. Why does it matter if instructions can be run out-of-order? There are many operations which take more than a single-cycle to execute, such a floating-point math or loads and stores from memory. With in-order execution, all instructions have to wait for these operations to finish before they can continue. Out-of-order execution allows a processor execute other instructions rather than stall the program waiting for a high-latency operations to finish.
The average power consumption of an Opteron processor is 89-90 watts.
What does this mean? Even if you aren't an environmentalist
who's worried about our global energy usage, how much you computer uses is
something you should care about. If you're Joe-Average, your computer eating
power means less burgers you get to eat. My dual processor Dell around
350 watts (possibly more at peak). That is more than if I turned on
every light in my house (we use florescents). I make sure that machine is
off or sleeping whenever it is not in use. If your Bob-Admin, multiply that
by however many machines you have. Bob-Admin also has to think about how he's
going to keep his server room cool too because 50 machines using that much
power make a great sauna in a few hours. If Bob-Admin puts his machines in
a co-location facility, where he is paying by the sq ft and the watt, not only
does he pay more for energy usage, but for space too. The racks in a co-lo can
only support so much power draw per sq ft, so that means less machines per
rack.
Pentium D
In May 2005, Intel introduced the Pentium D (code name Smithfield), a
dual-core processor, which contains two essentially unmodified Pentium 4
Prescott processors. Unlike the Prescott, the Pentium D adds support for
64-bit through Intel's
EMT64 technology.
Although some Pentium Prescott processors utilize Intel's
Hyper Threading
technology, the Pentium D examined
in this paper does not . In early 2006, Intel released a 65μm version
of the Pentium D, code named Presler. Like Smithfield, the Presler chip does
not support multithreading.
There is a dual-thread Pentium
D, the 3.2GHz Pentium Extreme, but no CPU2000 benchmarks have been published
for that processor so it was not included in this paper. An Extreme Edition
of Presler is scheduled to be released in mid-2006.
The two cores in the Pentium D Smithfield are on the same die and have a clock
speed of 2.8, 3.0, or 3.2Ghz. The Presler cores are each on their own die,
which decreased production cost since a defect in a die affects only one core.
What does this mean? Two cores on the "same die" mean that
both cores are manufactured on the same integrated circuit. Cores on seperate
dies mean the cores are on seperate integrated circuit, though they are still
on the same chip. For machines with both cores on one die, communication time
is faster but a defect in one core makes both cores not usable since the
integrated circuit must be thrown away. With the seperate die approach, there
is some loss in communication speed but a defect in one core means only that
core must be thrown away.
The cores of the Presler chip operate at 2.8, 3.0, 3.2, and 3.4GHz. With 230
million transistors, the Smithfield is significantly smaller than the
dual-core Itanium processor, which has 1.7 billion transistors yet the maximum
power usage for a Pentium D is about 130W - 155W and the dual-core Itanium is
100W. The cores in the Pentium D are clocked significantly lower than the
single-core Prescott in order to minimize power consumption.
What does this mean? Usually, the number of transistors
in an integrated circuit correlate to the amount of power used however with the
Pentium D compared to the Montecito, this is not the case. Each core in the
Pentium D operates at a lower clock frequency than the single-core equivalent
so that the power usage is still reasonable. Operating at the same frequency,
the Pentium D would likely be over 200W in power usage.
.
Power5
The IBM Power5, close relative of the G5, was released in June 2003. IBM uses the Power5 for
a range of machines from single processor entry-level servers to high-end
multi-processor servers. Like its predecessor, the Power4, the Power5 is a
dual-core processor. Both cores are on the same die. The clock speed of the
Power5 ranges from 2.0 to 2.7GHz. The power usage is about 100W . The Power5 can run two threads in each core using
simultaneous multi-threading. It can also operate in single thread mode.
UltraSPARC IV+
Code named Panther, the fifth generation processor in the SPARC family, the
UltraSPARC IV+, was designed for enterprise computing and released in September 2005. Panther is a dual-core processor that supports two threads using what Sun calls "chip multi-threading", or CMT.
Sun's CMT does not quite the same definition of threading as is commonly used when
talking about processors. Threading normally means running instructions for different
programs in the same core. CMT in the IV+ is running a different program in each
core, not in the same core.
The UltraSPARC IV+ has twice the computing power over the UltraSPARC IV yet
reduces the power consumption from 108W to 90W.
What does this mean? The IV+ shows how much the manufacturing process can
improve processor power usage. The UltraSPARC IV used a 130 μn process
but the IV+ uses a 90 μn process. This allows the processor to be the same
in physical size even though it is much more complex and powerful. This also
helps it use less energy than the IV.
UltraSPARC T1
The UltraSPARC T1 , released in November of 2005, is the newest of the SPARC
processor line by Sun Microsystems. The T1 has generated a lot of interest
due to its departure in design from other 64-bit processors currently on the
market. The T1 has eight cores operating at 1.0 or 1.2GHz.
All cores on the processor operate at the same frequency,
the processor is available in a 1.0 or 1.2 version.
Each core can execute four threads, making the T1 a 32-way processor. Despite the large
number of cores, the T1 only consumes 75W on average, 79W peak.
Xeon
The 64-bit Intel Pentium 4 Xeon was released in June 2004 (code named Nocona). It is designed to be an enterprise-level processor for business computing. It
comes in a single and dual core model (code named Paxville, released in
October 2005) and supports Intel's Hyper Threading technology. The Xeon has
clock speeds from 2.83 to 3.66Ghz, the fastest of any of the processors
examined. The single core Xeon uses 110-120W of power, the dual-core uses
135-150W.
Sources
This is not a complete list... I'll be putting a handful
at the end of each entry.
1. P. Kongetira, K. Aingaran, K. Olukotun - "Niagara: A 32-Way Multithreaded Sparc Processor". IEEE Micro, March/April 2005, Vol. 25, No. 2, pg. 21-29, 2005
2. C. McNairy, D. Soltis - "Itanium 2 Processor Microarchitecture". IEEE Micro,
March/April 2003, Vol. 23, No. 2, pg. 44-55, 2003
3. R. Kalla, B. Sinharoy, J. Tendler - "IBM Power5 Chip: A Dual-Core
Multithreaded Processor". IEEE Micro, March/April 2004, Vol. 24, No. 2, pg.
40-47, 2004
4. C. McNairy, R. Bhatia - "Montecito: A Dual-Core Dual-Threaded Itanium Processor". IEEE Micro, March/April 2005, Vol. 25, No. 2, pg. 10-20, 2005
5. C. Keltcher, K. McGrath, A. Ahmed, P. Conway - "The AMD Opteron Processor for Multiprocessor Servers". IEEE Micro, March/April 2003, Vol. 23, No. 2, pg. 66-76, 2003
Power Consumption Sources
Itanium 2 -
http://www.intel.com/products/processor/itanium2/index.htm
Opteron -
http://www.epinions.com/content_18680811072
Pentium D -
PCStats.com and
wikipedia.com
Power5 -
http://www.xlr8yourmac.com/G5/xserveG5.html
UltraSPARC IV+ -
http://www.extremetech.com/article2/0,1558,1667444,00.asp
UltraSPARC T1 -
http://www.sun.com/processors/UltraSPARC-T1/index.xml
Xeon -
www.news.com
A few errors here..
Montecito uses switch-on-event multithreading (SoEMT), not SMT. McKinley and Madison have no multithreading at all.
POWER5 and PowerPC 970 (aka G5) are quite different. POWER5+ is up to 2.2GHz, probably over 100W, and is not used in any Apple products. The single-core PowerPC 970FX runs up to 2.7GHz and the dual-core PowerPC 970MP is used at up to 2.5Ghz. Since it is closely derived from POWER4, the 970 has no multithreading.
The UltraSPARC IV+ has no multithreading (as the term is defined by computer architects).
Posted by Wes Felter (IBM Research) on March 27, 2006 at 04:13 PM PST #
Posted by Kristin on March 27, 2006 at 06:12 PM PST #