I wanted to clarify a few concepts from last time that I got questions on.
The first one is pretty fundamental to the whole discussion... what exactly is
64-bit? All data inside a computer is stored as a series of 1's and 0's. Each bit is like a digit in a normal (base-10) number. The more bits you have, the bigger the data you can represent, just like with numbers you use every day. Until the last 10 years or so, computers have been primarily 32-bit. What does this mean? It means that memory addresses, integers, etc were at most, 32-bits in size. If something can not be represented in just 32-bits, it had to be split into multiple parts. When doing an operation on these bigger numbers or addresses, the processor would have to handle that data in parts, rather than as a whole, which is slow. A 64-bit processor means the addresses, integers, and other data is at most, 64-bits. This means the processor can handle much larger data, which increases speed. Take a look at the link above for more details.
Another one is simultaneous multithreading. To clarify this one we're going to take a big step back. How do programs get run at all? Way back when we people had to walk to both ways to school uphill in the snow (and they liked it), computers were not very sophisticated. They ran one program from beginning to end, then you could load and run another program from beginning to end. As you may imagine, this was fairly limiting as to what you could do with a computer and people got tired of listening to Bob and Joe fight over who's turn it was to run a program. So, the idea of
scheduling was a big hit in the computer world. Scheduling is primarily the job of your operating system. At any given time, even if you have not launched any other application, there are several different programs running. Processes which are ready to run are placed in a queue. The operating system gets a program from the queue and starts it running on the processor. Every process has a time limit so others get a chance to run. A process either runs until its time limit or until something happens that causes it to sleep. Then the operating system picks another process to run and so on. It turns out that most processes spend a lot of time waiting and so when they are having their turn running, they may actually not be doing anything but sitting there. This means while they sit doing nothing, other programs which could be running are waiting. Another interesting thing about many programs is that the work they do can be split up into several independent but related tasks, threads. Take a web server for example, at a given time it may be serving pages to several users but that work is all independent of each other.
Threading allows a programmer to split these tasks into multiple processes, which run independently of each other on the computer. Doing this can make the program run a lot faster. So, how can we make the processor better able to deal with these two things? Well, one way is to add more processors to the machine. This gives each process more places to run so it addresses threading, but what about the problem of processes wasting time? What if when a process was waiting, we take if off the processor and let some other process run? This is called "coarse-grained multithreading" and is a type of
temporal multithreading. The Montecito processor (next generation Itanium) uses this type of threading. Another way to let processes share the processor, is similiar to how the operating system schedules processes. If we have a queue of threads ready to run, each cycle we can pick a thread and issue instructions from that thread in to the processor. The big difference from coarse-grained multithreading is that at a given time, instructions from more than one thread are in the processor core at the same time. This is called "fine-grained" multithreading. The UltraSPARC T1 uses fine-grained multithreading. The problem with temporal threading is that, for processors which can issue more than one instruction per core per cycle (which is all but the T1), in a given cycle it may not be able to issue the maximum number, which leads to waste. What do I mean by this? Say processor X can issue 5 instructions per cycle. For a thread ready to run this cycle, it may not have 5 instructions which can be issued this cycle, maybe only 3 are issued. That means this cycle, the processor wasted 2 issue slots and now inside the processor there are less instructions running than could be running.
Simultaneous multithreading, or SMT, addresses this by allowing instructions from more than one thread to issue each cycle. So for the cycle we just mentioned on processor X, it could try to fill those two unused issue slots with instructions from another thread. All the remaining processors which do hardware multithreading use this type of threading. We'll discuss threading more in a future blog entry but hopefully this gives you a better idea about what SMT is.
The next was what do programs need to do to take advantage of a dual- (or more) core processor. Applications should not need to do anything special. Operating systems need to be written to be able to properly use systems with multiple processors (either multi-core or multi-processor, or both). Unless you are running a *very* old operating system, your OS should already handle this. Applications which are threaded will be able to make more out of a multi-processor system, but they also gain benefits from a single processor system. Even if your application is not threaded, you will most likely you'll see an increase in system performance anyway because your application can get more time to run with more than processing core available on the system.
Another one was out-of-order execution. Let's use the recipe analogy again. Imagine a program is like a recipe. To make a cake, or whatever you want to cook, you follow steps and the end result is a cake. With an in-order execution processor, each step is followed in the order it appears in the recipe. But many people who cook know that not all steps have to be in order for the cake to come out right. Say the the steps are:
- In a bowl, mix the dry ingredients.
- In another bowl, mix the wet ingredients.
- Combine the wet and dry mixtures and mix thoroughly.
- Grease cake pan.
- Pour batter into greased cake pan.
- Preheat oven to 350 degrees.
- Bake for 20 minutes.
Looking at these steps, we can see some that must happen in a particular order. We could not pour the batter into the pan before we grease it. We can not bake the cake without heating the oven. But there are other steps which can be reordered. We could preheat the oven earlier in the recipe. We could grease the cake pan at any time before we pour the batter in. We could mix the wet ingredients before the dry as long as we did both before combining. The same is true of a program. It is possible to take instructions and reorder them and still end up with the correct result. This is called
out-of-order execution and is done by all the processors except the UltraSPARC T1 and the yet-released Montecito. How this is done is a detailed discussion for another day.
Next is instruction level parallelism.
Instruction level parallelism, or ILP is a measure of how many instructions in a program can be run at the same time. Take the recipe above, I can find four steps (instructions) which could be done simultaneously assuming we had enough cooks, 1, 2, 4, and 6. That is pretty much it. It is a fairly simple concept with some big fancy wording.
The last is EMT64. This one is really simple,
EMT64 is just the name for Intel's version of AMD's 64-bit extension to the x86 architecture.
EMT64 is just the name for Intel's 64-bit extension to the x86 architecture.
Presumably what you really meant to write was: "EMT64T is Intels' name for AMDs' 64-bit extensions .."?
Posted by Paul Jakma on March 30, 2006 at 04:45 PM PST #
Posted by Kristin on March 30, 2006 at 07:17 PM PST #
- A Pentium II CPU can do basic arithmetic with 64-bit integers (using MMX instructions), and it can fetch data from memory 64-bits at a time, but it's still called a 32-bit CPU.
- An Opteron CPU does not support 64-bit addresses (it supports a 48-bit virtual address space and a 40-bit physical address space), but it is called a 64-bit CPU.
Can you come up with a definition which encompasses both of these cases?Posted by Luke on March 30, 2006 at 09:08 PM PST #
You present an excellent question here. I agree that the notion of what 64-bit and 32-bit actually means is completely confusing to most people. And, if you read my blog for today, about cache and memory, you'll see it is even more convoluted than you show here! Take the Opteron for example, it's memory controller can handle 128-bit data. All of the processors have different combinations of physical and virtual address sizes, few of those are 64-bit. Later in the paper we get into the sizes of numerical values than the integer and floating-point ALUs can handle and we'll see various bit sizes there too.
What it all seems to boil down to is register size. In the processor, data is stored in registers while it be being used by currently executing instructions. In a 32-bit processor, these registers are 32-bits in size, in a 64-bit processor, they are 64-bits in size. I think you will find that definiton is consistent among all the processors.
Posted by Kristin on March 31, 2006 at 11:45 AM PST #