Three kids, a dog, a cat, sunny days, ocean breezes, and way too much time online
SLO Life

 
www.flickr.com
This is a Flickr badge showing public photos from kamundse. Make your own badge here.
 

See all my pictures here.
 

 

Archives
« December 2009
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today
XML
Search

Links

The requested Bookmark Folder does not exist: Blogroll

 
 

Today's Page Hits: 62

All | Geeky | Linux | Personal | rand() | Sun
« Using zones in a... | Main | 64-bit processors -... »
20060327 Monday March 27, 2006
64-bit processors - Clocks, cores, and power usage
 
 
My original plan was to go processor by processor. My paper tries to cover all the major architectural features for each processor. In some places I was limited by information available but for most of the processors I was able to find all the same information. I want this to be understandable to people who are computer literate but are not architecure experts. As I started with the first processor, the Itanium 2, I realized I was writing a paragraph of explanation for every sentence. In the end I'd end up with five blog entries just for the Itanium 2 so I could explain what everything meant, and then rush through the rest. That didn't sound like what I wanted so instead I am going to pick one to a few architectural features and talk about all the processors.  
 
I will be using the text of my paper as I wrote it, but insert explanations and hyperlinks to explain what I am talking about. The explanations will be in a grey text, my paper is in normal black text. I'll be including a handful of my sources in each entry for those who are interested in more information.  
 
Itanium 2  
 
A cooperation between Intel and Hewlett-Packard lead to the release of the Itanium processor in 2001. The Itanium was intended to replace the x86 hardware and dominate the server and workstation markets. It was also not expected that AMD would be able to clone it. The second generation Itanium 2 was released in July 2002. Intel has since focused on a new Itanium processor, code named Montecito. HP sells Itanium 2 servers that range from single processor blades to 128 processor high-end servers. The Itanium's IA-64 architecture is completely different than the IA-32 architecture of the x86 family, though it provides backwards compatibility for 32-bit x86 applications. This paper will be focusing on the versions of the Itanium 2 after the first one (McKinley), code named Madison, Deerfield, and Fanwood. What does this mean? Hardware companies have working or code names for each processor they release. For a specific processor there may be several different versions with different features. For the Itanium 2, Intel had four different versions starting with McKinley. I have found the code name is the most convenient way to reference different versions of the same processor so expect to see them used heavily in this paper.  
 
The Itanium 2 core speed is 1.3 to 1.6 GHz and uses 130 watts of power. What does this mean? Processor clock frequency is probably one of the most misunderstood pieces of information about processors. So, what does GHz mean for a processor. A processor clock tick is the smallest moment of time for a processor. In the simplest processor, it can do one operation each clock tick. The clock frequency is how many of those ticks occur in one second. The more ticks, the more operations that can be completed in a second. So far it sounds like GHz does just mean speed. Ah, but there is more. What is a operation and how many can a given processor do each tick? These can vary significantly. Programs consist of instructions. Each instruction is like a step in a recipe. The difference is that for different processor types, the number of instuctions needed to run the same program are different. So, already we cannot know which processor will run an application faster because they are not doing the same steps. Each instruction can be broken down into many operations. Some processors break their instructions down into smaller operations than others. Having smaller operations means they can be done faster, which increases the clock speed. But, this does not mean the whole instruction is completed any faster. So, what does knowing the clock speed tell us. For the same processor, and sometimes for a processor family, you can tell which processor is faster. It just cannot be used between processors families (like comparing a Pentium to a G5) and sometimes even with in a processor family (like between the Pentium-D and the Xeon). Clock speed is also often tied to energy consumption (higher clock speed, more energy used). The Itanium 2 is one of the few single-core 64-bit processors still commonly available. There is a dual-core Itanium 2, called Hondo, available only from HP which uses two Madison cores operating at 1.1 Mhz, which is not covered in this paper due to lack of SPEC results for it. Montecito will be a dual-core processor. What does this mean? A multiple-core processor is basically like having that many seperate processors together on a single chip, which share some resources such as a memory bus. An advantage to multiple-core processors is they can communicate with each other faster than seperate processors. They also take up less physical space on the system mother board. Many dual-core processors have the same footprint size as their older, single-core predecessors. Most multiple-core processors also use less energy than if they were all seperate processors. The later versions of the Itanium 2 support two threads using coarse-grained multithreading. What does this mean? In a processor without threading, each cycle, one or more instructions from the same program are issued into the processor to be run. The problem is that often the processor is not doing any actual work because the instructions for the program are waiting for something such as data from memory (which is a very slow operation, 1000s of cycles). In order to make better use of the processor, a technique called threading was created. There are three general types of threading. In coarse-grained multithreading, CGMT, the processor will switch what program it is running instructions from when the thread encounters a long-latency event. This means at any given time, the processor is still only running one program.  
 
Opteron  
 
The eighth generation of AMD's Hammer architecture, the Opteron processor (code names SledgeHammer - 130 μm and Venus - 90 μm), was introduced in April 2003. Designed to compete with Intel's Itanium 2, the Opteron is the most powerful of AMD's 64-bit processors. It was designed for server and enterprise applications. It has arguably become the most popular x86-based 64-bit processor. A variety of computer system producers, including all of the largest enterprise-level UNIX vendors (Fujitsu, IBM, HP, and Sun), sell Opteron systems. What does μm mean? This refers to the manufacturing process for the processor. The smaller the number, the smaller the size of the circuits. This allows the processors to be smaller and use less power. See wikipedia's page about 90 nanometer for more details.  
 
The Opteron chip comes with either one or two cores with clock speeds from 1.8 to 2.8 GHz. Both the single and dual core Opteron processors can run two threads using simultaneous multithreading and supports out-of-order execution. What does this mean? In simultaneous multi-threading, instructions from more than one program (in the case of the Opteron, from two programs) issue in to the processor to be run in the same cycle. In an in-order processor, instructions for a program are executed in the processor in the same order as they occur in the program. However, many instructions in a program are independent of each other, and do not have to be executed in-order for the program to produce the correct result. This is called instruction level parallelism, or ILP. Why does it matter if instructions can be run out-of-order? There are many operations which take more than a single-cycle to execute, such a floating-point math or loads and stores from memory. With in-order execution, all instructions have to wait for these operations to finish before they can continue. Out-of-order execution allows a processor execute other instructions rather than stall the program waiting for a high-latency operations to finish. The average power consumption of an Opteron processor is 89-90 watts. What does this mean? Even if you aren't an environmentalist who's worried about our global energy usage, how much you computer uses is something you should care about. If you're Joe-Average, your computer eating power means less burgers you get to eat. My dual processor Dell around 350 watts (possibly more at peak). That is more than if I turned on every light in my house (we use florescents). I make sure that machine is off or sleeping whenever it is not in use. If your Bob-Admin, multiply that by however many machines you have. Bob-Admin also has to think about how he's going to keep his server room cool too because 50 machines using that much power make a great sauna in a few hours. If Bob-Admin puts his machines in a co-location facility, where he is paying by the sq ft and the watt, not only does he pay more for energy usage, but for space too. The racks in a co-lo can only support so much power draw per sq ft, so that means less machines per rack.  
 
Pentium D  
 
In May 2005, Intel introduced the Pentium D (code name Smithfield), a dual-core processor, which contains two essentially unmodified Pentium 4 Prescott processors. Unlike the Prescott, the Pentium D adds support for 64-bit through Intel's EMT64 technology. Although some Pentium Prescott processors utilize Intel's Hyper Threading technology, the Pentium D examined in this paper does not . In early 2006, Intel released a 65μm version of the Pentium D, code named Presler. Like Smithfield, the Presler chip does not support multithreading. There is a dual-thread Pentium D, the 3.2GHz Pentium Extreme, but no CPU2000 benchmarks have been published for that processor so it was not included in this paper. An Extreme Edition of Presler is scheduled to be released in mid-2006.  
 
The two cores in the Pentium D Smithfield are on the same die and have a clock speed of 2.8, 3.0, or 3.2Ghz. The Presler cores are each on their own die, which decreased production cost since a defect in a die affects only one core. What does this mean? Two cores on the "same die" mean that both cores are manufactured on the same integrated circuit. Cores on seperate dies mean the cores are on seperate integrated circuit, though they are still on the same chip. For machines with both cores on one die, communication time is faster but a defect in one core makes both cores not usable since the integrated circuit must be thrown away. With the seperate die approach, there is some loss in communication speed but a defect in one core means only that core must be thrown away. The cores of the Presler chip operate at 2.8, 3.0, 3.2, and 3.4GHz. With 230 million transistors, the Smithfield is significantly smaller than the dual-core Itanium processor, which has 1.7 billion transistors yet the maximum power usage for a Pentium D is about 130W - 155W and the dual-core Itanium is 100W. The cores in the Pentium D are clocked significantly lower than the single-core Prescott in order to minimize power consumption. What does this mean? Usually, the number of transistors in an integrated circuit correlate to the amount of power used however with the Pentium D compared to the Montecito, this is not the case. Each core in the Pentium D operates at a lower clock frequency than the single-core equivalent so that the power usage is still reasonable. Operating at the same frequency, the Pentium D would likely be over 200W in power usage. .  
 
Power5  
 
The IBM Power5, close relative of the G5, was released in June 2003. IBM uses the Power5 for a range of machines from single processor entry-level servers to high-end multi-processor servers. Like its predecessor, the Power4, the Power5 is a dual-core processor. Both cores are on the same die. The clock speed of the Power5 ranges from 2.0 to 2.7GHz. The power usage is about 100W . The Power5 can run two threads in each core using simultaneous multi-threading. It can also operate in single thread mode.  
 
UltraSPARC IV+  
 
Code named Panther, the fifth generation processor in the SPARC family, the UltraSPARC IV+, was designed for enterprise computing and released in September 2005. Panther is a dual-core processor that supports two threads using what Sun calls "chip multi-threading", or CMT. Sun's CMT does not quite the same definition of threading as is commonly used when talking about processors. Threading normally means running instructions for different programs in the same core. CMT in the IV+ is running a different program in each core, not in the same core. The UltraSPARC IV+ has twice the computing power over the UltraSPARC IV yet reduces the power consumption from 108W to 90W. What does this mean? The IV+ shows how much the manufacturing process can improve processor power usage. The UltraSPARC IV used a 130 μn process but the IV+ uses a 90 μn process. This allows the processor to be the same in physical size even though it is much more complex and powerful. This also helps it use less energy than the IV.  
 
UltraSPARC T1  
 
The UltraSPARC T1 , released in November of 2005, is the newest of the SPARC processor line by Sun Microsystems. The T1 has generated a lot of interest due to its departure in design from other 64-bit processors currently on the market. The T1 has eight cores operating at 1.0 or 1.2GHz. All cores on the processor operate at the same frequency, the processor is available in a 1.0 or 1.2 version. Each core can execute four threads, making the T1 a 32-way processor. Despite the large number of cores, the T1 only consumes 75W on average, 79W peak.  
 
Xeon  
 
The 64-bit Intel Pentium 4 Xeon was released in June 2004 (code named Nocona). It is designed to be an enterprise-level processor for business computing. It comes in a single and dual core model (code named Paxville, released in October 2005) and supports Intel's Hyper Threading technology. The Xeon has clock speeds from 2.83 to 3.66Ghz, the fastest of any of the processors examined. The single core Xeon uses 110-120W of power, the dual-core uses 135-150W.  
 
Sources  
 
This is not a complete list... I'll be putting a handful at the end of each entry.  
 
1. P. Kongetira, K. Aingaran, K. Olukotun - "Niagara: A 32-Way Multithreaded Sparc Processor". IEEE Micro, March/April 2005, Vol. 25, No. 2, pg. 21-29, 2005  
 
2. C. McNairy, D. Soltis - "Itanium 2 Processor Microarchitecture". IEEE Micro, March/April 2003, Vol. 23, No. 2, pg. 44-55, 2003  
 
3. R. Kalla, B. Sinharoy, J. Tendler - "IBM Power5 Chip: A Dual-Core Multithreaded Processor". IEEE Micro, March/April 2004, Vol. 24, No. 2, pg. 40-47, 2004  
 
4. C. McNairy, R. Bhatia - "Montecito: A Dual-Core Dual-Threaded Itanium Processor". IEEE Micro, March/April 2005, Vol. 25, No. 2, pg. 10-20, 2005  
 
5. C. Keltcher, K. McGrath, A. Ahmed, P. Conway - "The AMD Opteron Processor for Multiprocessor Servers". IEEE Micro, March/April 2003, Vol. 23, No. 2, pg. 66-76, 2003  
 
Power Consumption Sources  
 
Itanium 2 - http://www.intel.com/products/processor/itanium2/index.htm  
Opteron - http://www.epinions.com/content_18680811072  
Pentium D - PCStats.com and wikipedia.com  
Power5 - http://www.xlr8yourmac.com/G5/xserveG5.html  
UltraSPARC IV+ - http://www.extremetech.com/article2/0,1558,1667444,00.asp  
UltraSPARC T1 - http://www.sun.com/processors/UltraSPARC-T1/index.xml  
Xeon - www.news.com

posted by kamundse Mar 27 2006, 02:52:05 PM PST Permalink Comments [2]

Comments:

A few errors here..

Montecito uses switch-on-event multithreading (SoEMT), not SMT. McKinley and Madison have no multithreading at all.

POWER5 and PowerPC 970 (aka G5) are quite different. POWER5+ is up to 2.2GHz, probably over 100W, and is not used in any Apple products. The single-core PowerPC 970FX runs up to 2.7GHz and the dual-core PowerPC 970MP is used at up to 2.5Ghz. Since it is closely derived from POWER4, the 970 has no multithreading.

The UltraSPARC IV+ has no multithreading (as the term is defined by computer architects).

Posted by Wes Felter (IBM Research) on March 27, 2006 at 04:13 PM PST #

Hi Wes, Thanks for your corrections. I did say the Itanium 2 uses SMT, which I have corrected. I do not think I said what type of threading Montecito uses. Later in the paper I do discuss the fact that it uses coarse-grained multithreading. I have also fixed the information for the Power5 and US IV+. With the IV+, there is a lot of confusing information out there,and I was not sure when writing it that I had it right. While Sun's documentation makes it seem it does not thread, other sources (like infoworld.com) say it does thread. I think the confusion comes from the Sun term "chip multi-threading", which is not threading in the common use of the term.

Posted by Kristin on March 27, 2006 at 06:12 PM PST #

Post a Comment:

Comments are closed for this entry.