Free Essay

Cisc vs Risc

In:

Submitted By gauravbansal007
Words 7813
Pages 32
A Tale of Two Processors: Revisiting the RISC-CISC Debate
Ciji Isen1, Lizy John1, and Eugene John2
1 ECE Department, The University of Texas at Austin ECE Department, The University of Texas at San Antonio {isen,ljohn}@ece.utexas.edu, ejohn@utsa.edu 2

Abstract. The contentious debates between RISC and CISC have died down, and a CISC ISA, the x86 continues to be popular. Nowadays, processors with CISC-ISAs translate the CISC instructions into RISC style micro-operations (eg: uops of Intel and ROPS of AMD). The use of the uops (or ROPS) allows the use of RISC-style execution cores, and use of various micro-architectural techniques that can be easily implemented in RISC cores. This can easily allow CISC processors to approach RISC performance. However, CISC ISAs do have the additional burden of translating instructions to micro-operations. In a 1991 study between VAX and MIPS, Bhandarkar and Clark showed that after canceling out the code size advantage of CISC and the CPI advantage of RISC, the MIPS processor had an average 2.7x advantage over the studied CISC processor (VAX). A 1997 study on Alpha 21064 and the Intel Pentium Pro still showed 5% to 200% advantage for RISC for various SPEC CPU95 programs. A decade later and after introduction of interesting techniques such as fusion of micro-operations in the x86, we set off to compare a recent RISC and a recent CISC processor, the IBM POWER5+ and the Intel Woodcrest. We find that the SPEC CPU2006 programs are divided between those showing an advantage on POWER5+ or Woodcrest, narrowing down the 2.7x advantage to nearly 1.0. Our study points to the fact that if aggressive micro-architectural techniques for ILP and high performance can be carefully applied, a CISC ISA can be implemented to yield similar performance as RISC processors. Another interesting observation is that approximately 40% of all work done on the Woodcrest is wasteful execution in the mispredicted path.

1 Introduction
Interesting debates on CISC and RISC instruction set architecture styles were fought over the years, e.g.: the Hennessy-Gelsinger debate at the Microprocessor Forum [8] and Bhandarkar publications [3, 4]. In the Bhandarkar and Clark study of 1991 [3], the comparison was between Digital's VAX and an early RISC processor, the MIPS. As expected, MIPS had larger instruction counts (expected disadvantage for RISC) and VAX had larger CPIs (expected disadvantage for CISC). Bhandarkar et al. presented a metric to indicate the advantage of RISC called the RISC factor. The average RISC factor on SPEC89 benchmarks was shown to be approximately 2.7. Not even one of the SPEC89 program showed an advantage on the CISC.
D. Kaeli and K. Sachs (Eds.): SPEC Benchmark Workshop 2009, LNCS 5419, pp. 57–76, 2009. © Springer-Verlag Berlin Heidelberg 2009

58

C. Isen, L. John, and E. John

The Microprocessor forum debate between John Hennessy and Pat Gelsinger included the following 2 quotes: "Over the last five years, the performance gap has been steadily diminishing. It is an unfounded myth that the gap between RISC and CISC, or between x86 and everyone else, is large. It's not large today. Furthermore, it is getting smaller." - Pat Gelsinger, Intel "At the time that the CISC machines were able to do 32-bit microprocessors, the RISC machines were able to build pipelined 32-bit microprocessors. At the time you could do a basic pipelining in CISC machine, in a RISC machine you could do superscalar designs, like the RS/6000, or superpipelined designs like the R4000. I think that will continue. At the time you can do multiple instruction issue with reasonable efficiency on an x86, I believe you will be able to put second-level caches, or perhaps even two processors on the same piece of silicon, with a RISC machine." - John Hennessy, Stanford Many things have changed since the early RISC comparisons such as the VAXMIPS comparison in 1991 [3]. The debates have died down in the last decade, and most of the new ISAs conceived during the last 2 decades have been mainly RISC. However, a CISC ISA, the x86 continues to be popular. It translates the x86 macroinstructions into micro-operations (uops of Intel and ROPS of AMD). The use of the uops (or ROPS) allows the use of RISC-style execution cores, and use of various micro-architectural techniques that can be easily implemented in RISC cores. A 1997 study of the Alpha and the Pentium Pro [4] showed that the performance gap was narrowing, however the RISC Alpha still showed significant performance advantage. Many see CISC performance approaching RISC performance, but exceeding it is probably unlikely. The hardware for translating the CISC instructions to RISC-style is expected to consume area, power and delay. Uniform-width RISC ISAs do have an advantage for decoding and runtime translations that are required in CISC are definitely not an advantage for CISC. Fifteen years after the heated debates and comparisons, and at a time when all the architectural ideas in Hennessy's quote (on chip second level caches, multiple processors) have been put into practice, we set out to compare a modern CISC and RISC processor. The processors are Intel's Woodcrest (Xeon 5160) and IBM's POWER5+ [11, 16]. A quick comparison of key processor features can be found in Table 1. Though the processors do not have identical micro-architectures, there is a significant similarity. They were released around the same time frame and have similar transistor counts (276 million for P5+ and 291 million for x86). The main difference between the processors is in the memory hierarchy. The Woodcrest has larger L2 cache while the POWER5+ includes a large L3 cache. The SPEC CPU2006 results of Woodcrest (18.9 for INT/17.1 for FP) are significantly higher than that of POWER5+ (10.5 for INT/12.9 for FP). The Woodcrest has a 3 GHz frequency while the POWER5 has a 2.2 GHz frequency. Even if one were to scale up the POWER5+ results and compare the score for CPU2006 integer programs, it is clear that even ignoring the frequency advantage, the CISC processor is exhibiting an advantage over the RISC processor. In this paper, we set out to investigate the performance differences of these 2 processors.

A Tale of Two Processors: Revisiting the RISC-CISC Debate Table 1. Key Features of the IBM POWER5+ and Intel Woodcrest [13]
IBM POWER5+ Bit width Cores/chip*Thread/core Clock Frequency L1 I/D L2 L3 Execution Rate/Core Pipeline Stages Out of Order Memory B/W Process technology Die Size Transistors Power (Max) SPECint/fp2006 [cores] SPECint/fp2006_rate[cores] 64bit 2x2 2.2GHz 2x64/32k 1.92M 36M (off-chip) 5 issue 15 200 inst 12.8GB/s 90nm 245mm2 276 million 100W 10.5 / 12.9 197 / 229 [16] Intel-Woodcrest(Xeon 5160) 32/64bit 2x1 3.GHz 2x32k/32k 4M None 5uops 14 126 uops 10.5GB/s 65nm 144nm2 291 million 80W 18.9 / 17.1 [4] 60.0 / 44.1 [4]

59

Other interesting processor studies in the past include a comparison of the PowerPC601 and Alpha 21064 [12], a detailed study of the Pentium Pro processor [5], a comparison of the SPARC and MIPS [7], etc.

2 The Two Chips
2.1 POWER5+ The IBM POWER5+ is an out of order superscalar processor. The core contains one instruction fetch unit, one decode unit, two load/store pipelines, two fixed-point execution pipelines, two floating-point execution pipelines, and two branch execution pipelines. It has the ability to fetch up to 8 instructions per cycle and dispatch and retire 5 instructions per cycle. POWER5+ is a multi-core chip with two processor cores per chip. The core has a 64KB L1 instruction cache and a 32KB L1 data cache. The chip has a 1.9MB unified L2 cached shared by the two cores. An additional 36MB L3 cache is available off-chip with its controller and directory on the chip. The POWER5+ memory management unit has 3 types of caches to help address translation: a translation look-aside buffer (TLB), a segment look-aside buffer (SLB) and an effective-to-real address table (ERAT). The translation processes starts its search with the ERAT. Only on that failing does it search the SLB and TLB. This processor supports simultaneous multithreading.

60

C. Isen, L. John, and E. John

2.2 Woodcrest The Xeon 5160 is based on Intel’s Woodcrest microarchitecture, the server variant of the Core microarchitecture. It is a dual core, 64 bit, 4-issue superscalar, moderately pipelined (14 stages), out-of-order MPU, and implemented in a 65nm process. The processor can address 36 bits of physical memory and 48 bits of virFig. 1. IBM POWER5+ Processor [16] tual. An 8 way 32KB L1 I cache, a dual ported 32KB L1D cache along with a shared 4MB L2 cache feeds data and instruction to the core. Unlike the POWER5+ it has no L3 cache. The branch prediction occurs inside the Instruction Fetch Unit. The Core microarchitecture employs the traditional Branch Target Buffer (BTB), a Branch Address Calculator (BAC) and the ReFig. 2. Front-End of the Intel Woodcrest processor [17] turn Address Stack (RAS) and two more predictors. The two predictors are: the loop detector (LD) which predicts loop exits and the Indirect Branch Predictor (IBP) which picks targets based on global history, which helps for branches to a calculated address. A queue has been added between the branch target predictors and the instruction fetch to hide single cycle bubbles introduced by taken branches. The x86 instructions are generally broken down into simpler micro-operations (uops), but in certain specialized cases, the processor fuses certain micro-operations to create integrated or chained operations. Two types of fusion operations are used: macro-fusion and micro-fusion.

3 Methodology
In this study we use the 12 integer and 17 floating-point programs of the SPEC CPU2006 [18] benchmark suite and measure performance using the on chip performance counters. Both POWER5+ and Woodcrest microprocessors provide on-chip logic to monitor processor related performance events. The POWER5+ Performance

A Tale of Two Processors: Revisiting the RISC-CISC Debate

61

Monitor Unit contains two dedicated registers that count instructions completed and total cycles as well as four programmable registers, which can count more than 300 hardware events occurring in the processor or memory system. The Woodcrest architecture has a similar set of registers, two dedicated and two programmable registers. These registers can count various performance events such as, cache misses, TLB misses, instruction types, branch misprediction and so forth. The perfex utility from the Perfctr tool is used to perform the counter measurements on Woodcrest. A tool from IBM was used for making the measurements on POWER5+. The Intel Woodcrest processor supports both 32-bit as well as 64-bit binaries. The data we present for Woodcrest corresponds to the best runtime for each benchmark (hence is a mix of 64-bit and 32-bit applications). Except for gcc, gobmk, omnetpp, xalancbmk and soplex, all other programs were in the 64-bit mode. The benchmarks for POWER5+ were compiled using Compilers: XL Fortran Enterprise Edition 10.01 for AIX and XL C/C++ Enterprise Edition 8.0 for AIX. The POWER5+ binaries were compiled using the flags: C/C++ -O5 -qlargepage -qipa=noobject -D_ILS_MACROS -qalias=noansi qalloca + PDF (-qpdf1/-qpdf2) FP - O5 -qlargepage -qsmallstack=dynlenonheap -qalias=nostd + PDF (-qpdf1/qpdf2). The OS used was AIX 5L V5.3 TL05. The benchmarks on Woodcrest were compiled using Intel’s compilers - Intel(R) C Compiler for 32-bit applications/ EM64Tbased applications Version 9.1 and Intel(R) Fortran Compiler for 32-bit applications/ EM64T-based applications, Version 9.1. The binaries were compiled using the flag: -xP -O3 -ipo -no-prec-div / -prof-gen -prof-use. Woodcrest was configured to run using SUSE LINUX 10.1 (X86-64).

4 Execution Characteristics of the Two Processors
4.1 Instruction Count (path length) and CPI According to the traditional RISC vs. CISC tradeoff, we expect POWER5+ to have a larger instruction count and a lower CPI compared to Intel Woodcrest, but we observe that this distinction is blurred. Figure 3 shows the path length (dynamic instruction count) of the two systems for SPEC CPU2006. As expected, the instruction counts in the RISC POWER5+ is more in most cases, however, the POWER5+ has better instruction counts than the Woodcrest in 5 out of 12 integer programs and 7 out of 17 floating-point programs (indicated with * in Figure 3). The path length ratio is defined as the ratio of the instructions retired by POWER5+ to the number of instructions retired by Woodcrest. The path length ratio (instruction count ratio) ranges from 0.7 to 1.23 for integer programs and 0.73 to 1.83 for floating-point programs. The lack of bias is evident since the geometric mean is about 1 for both integer and floating-point applications. Figure 4 presents the CPIs of the two systems for SPEC CPU2006. As expected, the POWER5+ has better CPIs than the Woodcrest in most cases. However, in 5 out of 12 integer programs and 7 out of 17 floating-point programs, the Woodcrest CPI is better (indicated with * in Figure 4). The CPI ratio is the

62

C. Isen, L. John, and E. John

Fig. 3. a) Instruction Count (Path Length)-INT

Fig. 3. b) Instruction Count (Path Length) – FP

ratio of the CPI of Woodcrest to that of POWER5+. The CPI ratio ranges from 0.78 to 4.3 for integer programs and 0.75 to 4.4 for floating-point applications. This data is a sharp contrast to what was observed in the Bhandarkar-Clark study. They obtained an instruction count ratio in the range of 1 to 4 and a CPI ratio ranging from 3 to 10.5. In their study, the RISC instruction count was always higher than CISC and the CISC CPI was always higher than the RISC CPI.

A Tale of Two Processors: Revisiting the RISC-CISC Debate

63

Fig. 4. a) CPI of the 2 processors for INT

Fig. 4. b) CPI of the 2 processors for FP

Figure 5 illustrates an interesting metric, the RISC factor and its change from the Bhandarkar-Clark study to our study. Bhandarkar–Clark defined RISC factor as the ratio of CPI ratio to path length (instruction count) ratio. The x-axis indicates the CPI ratio (CISC to RISC) and the y-axis indicates the instruction count ratio (RISC to CISC). The SPEC 89 data-points from the Bhandarkar-Clark study are clustered to the right side of the figure, whereas most of the SPEC CPU2006 points are located closer to the line representing RISC factor=1 (i.e. no advantage for RISC or CISC). This line represents the situation where the CPI advantage for RISC is cancelled out by the path length advantage for CISC. The shift highlights the sharp contrast between the results observed in the early days of RISC and the current results.

64

C. Isen, L. John, and E. John

4.2 Micro-operations Per Instruction (uops/inst) Woodcrest converts its instructions into simpler instructions called micro-ops (uops). The number of uops per instruction gives an indication of the complexity of the x86 instructions used in each benchmark. Past studies by Bhandarkar and Fig. 5.(a) CPI ratio vs. Path length ratio - INT Ding [5] have recorded the uops per instruction to be in the 1.2 to 1.7 range for SPEC 89 benchmarks. A higher uops/inst ratio would imply that more work is done per instruction for CISC, something that is expected of CISC. Our observation on Woodcrest shows the uops per instruction ratio to be much lower than past studies [5]: an average very close to 1. Table 2 presents the Fig. 5.(b) CPI ratio vs. Path length ratio - FP uops/inst for both SPEC CPU2006 integer and floating-point suites. The integer programs have an average of 1.03 uops/inst and the FP programs have an average of 1.07 uops/instructions. Only 482.sphinx3 has a uops/inst ratio that is similar to what is observed by Bhandarkar et al. [5] (a ratio of 1.34). Among the integer benchmarks, mcf has the highest uops/inst ratio – 1.14. 4.3 Instruction Mix In this section, we present the instruction mix to help the reader better understand the later sections on branch predictor performance, and cache performance. The instruction mix can give us an indication of the difference between the benchmarks. It is far from a clear indicator of bottlenecks but it can still provide some useful information. Table 3 contains the instruction mix for the integer programs while Table 4

A Tale of Two Processors: Revisiting the RISC-CISC Debate Table 2. Micro-ops per instruction for CPU2006 on Intel Woodcrest
BENCHMARK 400.perlbench 401.bzip2 403.gcc 429.mcf 445.gobmk 456.hmmer 458.sjeng 462.libquantum 464.h264ref 471.omnetpp 473.astar 483.xalancbmk uops/inst 1.06 1.03 0.97 1.14 0.93 1.08 1.06 1.05 1.02 0.98 1.07 0.96 BENCHMARK 433.milc 434.zeusmp 435.gromacs 436.cactusADM 437.leslie3d 444.namd 447.dealII 450.soplex 453.povray 454.calculix 459.GemsFDTD 465.tonto 470.lbm 481.wrf 482.sphinx3 410.bwaves.input1 416.gamess FP – geomean uops/inst 1.01 1.02 1.01 1.12 1.09 1.02 1.04 1.00 1.07 1.05 1.16 1.08 1.00 1.16 1.34 1.01 1.02 1.07

65

INT - geomean

1.03

Table 3. Instruction mix for SPEC CPU2006 integer benchmarks
POWER5+ BENCHMARK 400.perlbench 401.bzip2 403.gcc 429.mcf 445.gobmk 456.hmmer 458.sjeng 462.libquantum 464.h264ref 471.omnetpp 473.astar 483.xalancbmk Branches 18% 15% 19% 17% 16% 14% 18% 21% 7% 19% 13% 20% Stores 15% 8% 17% 9% 11% 11% 6% 8% 16% 17% 8% 9% Load 25% 23% 18% 26% 20% 28% 20% 21% 35% 26% 27% 23% Others 41% 54% 46% 48% 53% 47% 56% 50% 42% 38% 52% 47% Branches 23% 15% 22% 19% 21% 8% 21% 27% 8% 21% 17% 26% Woodcrest Stores 11% 9% 13% 9% 14% 16% 8% 5% 12% 18% 5% 9% Loads 24% 26% 26% 31% 28% 41% 21% 14% 35% 34% 27% 32% other 41% 49% 39% 42% 37% 35% 50% 53% 45% 27% 52% 33%

contains the same information for floating-point benchmarks. In comparing the composition of instructions in the binaries of POWER5+ and Woodcrest, the instruction mix seems to be largely similar for both architectures. We do observe that some Woodcrest binaries have a larger fraction of load instructions compared to their POWER5+ counterparts. For example, the execution of hmmer on POWER5+ has 28% load instruction while the Woodcrest version has 41% loads. Among integer programs, gcc, gobmk and xalancbmk are other programs where the percentage of loads in Woodcrest is higher than that of POWER5+.

66

C. Isen, L. John, and E. John Table 4. Instruction mix for SPEC CPU2006 floating-point benchmarks
POWER5+ BENCHMARK 410.bwaves 416.gamess 433.milc 434.zeusmp 435.gromacs 436.cactusADM 437.leslie3d 444.namd 447.dealII 450.soplex 453.povray 454.calculix 459.GemsFDTD 465.tonto 470.lbm 481.wrf 482.sphinx3 Branches 1% 8% 3% 2% 4% 0% 1% 5% 15% 15% 12% 4% 2% 6% 1% 4% 8% Stores 7% 8% 18% 11% 14% 14% 12% 6% 9% 6% 14% 6% 10% 13% 9% 11% 3% Load 46% 31% 34% 26% 28% 38% 28% 28% 32% 26% 31% 25% 31% 29% 18% 31% 31% Others 46% 53% 46% 61% 54% 48% 59% 61% 45% 53% 44% 65% 57% 52% 72% 54% 59% Branches 1% 8% 2% 4% 3% 0% 3% 5% 17% 16% 14% 5% 1% 6% 1% 6% 10% Woodcrest Stores 8% 9% 11% 8% 14% 13% 11% 6% 7% 8% 9% 3% 10% 11% 9% 8% 3% Loads 47% 35% 37% 29% 29% 46% 45% 23% 35% 39% 30% 32% 45% 35% 26% 31% 30% Others 44% 48% 50% 59% 53% 40% 41% 66% 41% 37% 47% 60% 43% 49% 64% 56% 56%

We also find a difference in the fraction of branch instructions, though not as significant as the differences observed for load instructions. For example, xalancbmk has 20% branches in a POWER5+ execution and 26% branches in the case of Woodcrest. A similar difference exists for gobmk and libquantum. In the case of hmmer, unlike the previous cases, the number of branches is lower for Woodcrest (14% for POWER5+ and only 8% for Woodcrest). Similar examples for difference in the fraction of load and branch instructions can be found in the floating-point programs. A few examples are cactusADM, leslie3d, soplex, gemsFDTD and lbm. FP programs have traditionally had a lower fraction of branch instructions, but three of the programs exhibit more than 12% branches. This observation holds for both POWER5+ and Woodcrest. Interestingly these three programs (dealII, soplex and povray) are C++ programs. 4.4 Branch Prediction Branch prediction is a key feature in modern processors allowing out of order execution. Branch misprediction rate and misprediction penalty significantly influence the stalls in the pipeline, and the amount of instructions that will be executed speculatively and wastefully in the misprediction path. In Figure 6 we present the branch misprediction statistics for both architectures. We find that Woodcrest outperforms POWER5+ in this aspect. The misprediction rate for Woodcrest among integer benchmarks ranges from a low 1% for xalancbmk to a high 14% for astar. Only

A Tale of Two Processors: Revisiting the RISC-CISC Debate

67

gobmk and astar have a misprediction rate higher than 10% for Woodcrest. On the other hand, the misprediction rate for POWER5+ ranges from 1.74% for xalancbmk and 15% for astar. On average the misprediction for integer benchmarks is 7% for POWER5+ and 5.5% for Woodcrest. In the case of floating-point benchmarks this is 5% for POWER5+ and 2% for Woodcrest. We see that, in the case of the floatingpoint programs, POWER5+ branch prediction performs poorly relative to Woodcrest. This is particularly noticeable in programs like games, dealII, tonto and sphinx.
16% 14% 12% 10% 8% 6% 4% 2% 0% cc f ng mc ta r 2 k f h p m z ip m re lbe nc e tp 3 .g n tu 3. as ob 64 9. bm a la 48 3 .x nc je k

1 .b

8 .s

ua

4. h2

5 .g

mn 47 1 .o

40

42

40

45

0 .p

44

ib q

40

P5+ branch mispred %

46

Fig. 6. a) Branch misprediction – INT
12%

10%

8%

6%

4%

2%

0%

41 0. bw 41 ave 6. s ga m es s 43 3 43 .mil c 4. ze 43 usm 5. p 43 gro m 6. ac ca s ct us 43 AD M 7. le sli e3 44 d 4. na m d 44 7. d 45 ealI I 0. so pl ex 45 3. po 45 vra y 4. 45 ca 9. G lcul em ix sF D TD 46 5. to nt 47 o 0. lb m 48 1. 48 w rf 2. sp hi nx 3
P5 + branch mispred % WC branch mispred %

Fig. 6. b) Branch misprediction – FP

2 .l

WC branch mispred %

46

47

er

68

C. Isen, L. John, and E. John

4.5 Cache Misses The cache hierarchy is one of the important micro-architectural features that differ between the systems. POWER5+ has a smaller L2 cache (1.9M instead of 4M in Woodcrest), but it has a large shared L3 cache. This makes the performance of the cache hierarchies of the two processors of particular interest. Figure 7 shows the L1 data cache misses per thousand instructions for both integer and floating-point benchmarks. Among integer programs mcf stands out, while there are no floating-point programs with a similar behavior. POWER5+ has a higher L1 D cache miss rate for gcc, milc and lbm even though both processors have the same L1 D cache size. In general, the L1 data cache miss rates are under 40 misses per 1k instructions. In spite of the small L2 cache, the L2 miss ratio on POWER5+ is lower than that on Woodcrest. While no data is available to further analyze this, we suspect that differences in the

160 140 120 100 80 60 40 20 0
42 9. m cf 44 5. go bm k 45 8. sj en 46 g 2. li b qu an tu m 46 4. h2 64 re 47 f 1. om ne tp p 47 3. as 48 ta r 3. xa la nc bm k 40 0. pe rlb en ch 40 1. bz ip 2 40 3. gc c

P5+ L1 D miss/1 k inst

WC L1 D miss/1k inst

Fig. 7. a) L1 D cache misses per 1k Instructions – INT
160 140 120 100 80 60 40 20 0
41 0. b 41 wav 6. e ga s m es s 43 3. 43 m il 4. ze c us 43 m 5 p 43 .g r om 6. ca ac ct us s 43 AD M 7. le sli e3 44 d 4. na 44 md 7. d 45 ea lI 0. so I 45 ple x 3. p 45 ovr 45 4.c ay al 9. c G em ulix sF D 46 TD 5. to n 47 to 0. lb m 4 48 8 1. w 2. s p rf hi nx 3

P5+ L1 D miss/1k inst

WC L1 D miss/1k inst

Fig. 7. b) L1 D cache misses per 1k Instructions - FP

A Tale of Two Processors: Revisiting the RISC-CISC Debate

69

40 35 30 25 20 15 10 5 0 cf 44 5. go bm k 45 8. sj en 46 g 2. lib qu an tu m 46 4. h2 64 re 47 f 1. om ne tp p 47 3. as 48 ta r 3. xa la nc bm k nc h 40 1. bz ip 2 40 3. gc c 40 0. pe rlb e 42 9. m

P5 L2 miss/1k inst

WC L2 miss/1k inst

Fig. 8. a) L2 cache misses per 1k Instructions – INT

60 50 40 30 20 10 0
41 0. b 41 wav e 6. ga s m es s 43 3. 43 m il 4. ze c us 43 m 5 p 43 .g r om 6. ca ac ct us s 43 AD M 7. le sl ie 3d 44 4. na 44 md 7. d 45 ea lI 0. so I 45 ple x 3. p 45 ovr 45 4.c ay al 9. c G em ulix sF D 46 TD 5. to n 47 to 0. lb m 4 48 8 1. w 2. s p rf hi nx 3

P5+ L2 miss/1k inst

WC L2 miss/1k inst

Fig. 8. b) L2 cache misses per 1k Instructions – FP

amount of loads in the instruction mix (as discussed earlier), differences in the instruction cache misses (POWER5+ has a bigger I-cache) etc. can lead to this. 4.6 Speculative Execution Over the years out-of-order processors have achieved significant performance gains from various speculation techniques. The techniques have primarily focused on control flow prediction and memory disambiguation. In Figure 9 we present speculation percentage, a measure of the amount of wasteful execution, for different benchmarks. We define the speculation % as the ratio of instructions that are executed speculatively but not retired to the number of instructions retired (i.e. (dispatched_inst_cnt / retired_inst_cnt) -1). We find the amount of speculation in integer benchmarks to be

70

C. Isen, L. John, and E. John

Speculation %
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

40 3. gc c

cf

P5+(inst disp/compl)

Fig. 9. (a) Percentage of instructions executed speculatively - INT

Speculation %
0.3 0.25 0.2 0.15 0.1 0.05 0
41 0. b 41 wav 6. es ga m es 43 s 43 3.m ilc 4. z 43 e u s m 5. p gr 43 6. oma ca cs ct u 43 sAD 7. le M sli 4 4 e 3d 4. na 4 4 md 7. d 45 ea lI 0. so I 45 ple x 3. po 4 vr 45 54.c ay 9. al G c em ul ix sF DT D 46 5. to n 47 to 0. lb m 48 1. 48 w 2. sp rf hi nx 3

P5+(inst disp/compl)

Fig. 9. (b) Percentage of instructions executed speculatively - FP

higher than floating-point benchmarks, not surprising considering the higher percentage of branches and branch mispredictions in integer programs. In general, the Woodcrest micro-architecture speculates much more aggressively compared to POWER5+. On an average, an excess of 40% of instructions in Woodcrest and 29% of instructions in POWER5+ are speculative for integer benchmarks. The amount of speculations for FP programs on average is 20% for Woodcrest and 9% for POWER5+. Despite concerns on power consumption, the fraction of instructions spent in mispredicted path has increased from the average of 20% (25% for INT and 15% for FP) seen in the 1997 Pentium Pro study. Among the floating-point programs, POWER5+ speculates more than Woodcrest in four of the benchmarks: dealII, soplex, povray and sphinx. It is interesting to note that 3 of these benchmarks are C++

46 en 2. g lib qu an tu m 46 4. h2 64 re 47 f 1. om ne tp p 47 3. as 48 ta 3. r xa la nc bm k
WC(UOPS disp/retired)

nc h

ip 2

40 1. bz

42 9. m

rlb e

bm

k

40 0.

44 5. go

pe

45 8. sj

WC(UOPS disp/retired)

A Tale of Two Processors: Revisiting the RISC-CISC Debate

71

programs. With limitation on power and energy consumption, wastage from execution in speculative path is of great concern.

5 Techniques That Aid Woodcrest
Part of Woodcrest’s performance advantage comes from the reduction of microoperations through fusion. Another important technique is early load address resolution. In this section, we analyze these specific techniques. 5.1 Macro-fusion and Micro-op Fusion Although the Woodcrest breaks instructions into micro-operations, in certain cases, it also uses fusion of micro-operations to combine specific uops to integrated operations, thus taking the advantage of simple or complex operations as it finds fit. Macrofusion [11] is a new feature for Intel’s Core micro-architecture, which is designed to decrease the number of micro-ops in the instruction stream. Select pairs of compare and branch instruction are fused together during the pre-decode phase and then sent through any one of the four decoders. The decoder then produces a micro-op from the fused pair of instructions. The hardware can perform a maximum of one macro-fusion per cycle. Table 5 and Table 6 show the percentage of fused operations for integer and floating-point benchmarks. In the tables, fused operations are classified as macro-fusion and micro-fusion. Micro-fusion is further classified into two: Loads that are fused with arithmetic operations or an indirect branch (LD_IND_BR) and store address computations fused with data store (STD_STA). As stated before, the version of the benchmark selected (32bit vs. 64bit) depends on the overall performance. This was done to give maximum performance benefit to CISC. It turns out that most of the programs performed best in the 64-bit mode but in this mode macro-fusion does not work well. Since our primary focus is in comparing POWER5+ with Woodcrest we used the binaries that yielded best performance for this study too. The best case runs (runs with highest performance) for integer benchmarks have an average of 19% operations that can be fused by micro or macro-fusion. This implies that the average uops/inst will go up from 1.03 to 1.23 uops/inst if there was no fusion. The majority of the fusion comes from micro-fusion, an average of 14%, and the rest from macro-fusion. Macro-fusion in integer benchmarks ranges from 0.13% in hammer to 21% for xalancbmk. For micro-fusion, we find it to range from 6% (astar) to 29% (hmmer). Among the two sub-components of micro-fusion, store address computation fusion is predominant. ‘Store address and store’ fusion ranges from 4%, for astar, to 18%, for omnetpp. On the other hand Loads fusion (LD_IND_BR - Loads that fused with arithmetic operations or an indirect branch) is the lowest for mcf and the highest for hmmer. The best case runs (runs with highest performance) for FP benchmarks have an average of 15% uops that can be fused by micro or macro-fusion. Almost all of the fusion is from micro-fusion. The percentage of uops that can be fused via micro-fusion in FP programs ranges from 4% (sphinx) to 21% (leslie3D).

72

C. Isen, L. John, and E. John Table 5. Micro & macro-fusion in SPEC CPU2006 integer benchmarks
%macro%microfusion uop 13% 12% 16% 8% 19% 29% 9% 8% 18% 22% 6% 13% 14% %fusion uop 13% 12% 31% 8% 31% 29% 9% 8% 18% 31% 6% 34% 19% %LD_IND_BR uops 3% 4% 4% 0% 5% 14% 2% 3% 6% 5% 1% 13% 5% %STD_STA uops 11% 9% 13% 8% 15% 15% 7% 5% 12% 18% 4% 10% 11%

BENCHMARK 400.perlbench 401.bzip2 403.gcc 429.mcf 445.gobmk 456.hmmer 458.sjeng 462.libquantum 464.h264ref 471.omnetpp 473.astar 483.xalancbmk Average

uops/inst 1.06 1.03 0.97 1.14 0.93 1.08 1.06 1.05 1.02 0.98 1.07 0.96 1.03

fusion uop 0% 0% 15% 0% 12% 0% 0% 0% 0% 10% 0% 21% 5%

Table 6. Micro & macro-fusion in SPEC CPU2006 – FP benchmarks
%macroBENCHMARK 410.bwaves 416.gamess 433.milc 434.zeusmp 435.gromacs 436.cactusADM 437.leslie3d 444.namd 447.dealII 450.soplex 453.povray 454.calculix 459.GemsFDTD 465.tonto 470.lbm 481.wrf 482.sphinx3 Average uops/inst 1.01 1.02 1.01 1.02 1.01 1.12 1.09 1.02 1.04 1.00 1.07 1.05 1.16 1.08 1.00 1.16 1.34 1.07 fusion uop 0% 0% 0% 0% 0% 0% 0% 0% 0% 4% 0% 0% 0% 0% 0% 0% 0% 0% %microfusion uop 19% 20% 13% 13% 18% 20% 21% 9% 19% 15% 13% 9% 13% 20% 19% 13% 4% 15% %fusion uop 19% 20% 13% 13% 18% 20% 21% 9% 19% 20% 13% 9% 13% 20% 19% 13% 4% 15% %LD_IND_BR uops 11% 11% 3% 5% 3% 8% 12% 3% 12% 8% 5% 6% 5% 10% 10% 7% 2% 7% %STD_STA uops 8% 9% 11% 8% 14% 12% 10% 6% 7% 7% 8% 3% 9% 10% 9% 6% 2% 8%

A Tale of Two Processors: Revisiting the RISC-CISC Debate

73

Hypothetically, not having fusion would increase the uops/inst for floating-point programs from 1.07 uops/inst to 1.23 uops/inst and for integer programs from 1.03 uops/inst to 1.23 uops/inst. It is clear that this micro-architectural technique has played a significant part in blunting the advantage of RISC by reducing the number of uops that are executed per instruction. 5.2 Early Load Address Resolution The cost of memory access has been accentuated by the higher performance of the logic unit of the processor (the memory wall). The Woodcrest architecture is said to perform an optimization aimed at reducing the load latencies of operations with regards to the stack pointer [2]. The work by Bekerman et al. [2] proposes tracking the ESP register and simple operations on it of the form reg±immediate, to enable quick resolutions of the load address at decode time. The ESP register in IA32 holds the stack pointer and is almost never used for any other purpose. Instructions such as CALL/RET, PUSH/POP, and ENTER/LEAVE can implicitly modify the stack pointer. There can also be general-purpose instructions that modify the ESP in the fashion ESP←ESP±immediate. These instructions are heavily used for procedure calls and are translated into uops as given below in Table 7. The value of the immediate operand is provided explicitly in the uop.
Table 7. Early load address prediction - Example
PUSH EAX ESP←ESP - immediate. Mem[ESP] ← EAX POP EAX EAX ← mem[ESP] ESP←ESP - immediate. LOAD EAX from stack EAX ← mem[ESP+imm]

These ESP modifications can be tracked easily after decode. Once the initial ESP value is known later values can be computed after each instruction decode. In essence this method caches a copy of the ESP value in the decode unit. Whenever a simple modification to the ESP value is detected the cached value is used to compute the ESP value without waiting for the uops to reach execution stage. The cached copy is also updated with the newly computed value. In some cases the uops cause operations that are not easy to track and compute; for example loads from memory into the ESP or computations that involve other registers. In these cases the cached value of ESP is flagged and it is not used for computations until the uop passes the execution stage and the new ESP value is obtained. In the mean while, if any other instruction that follows attempts to modify the ESP value, the decoder tracks the change operation and the delta value it causes. Once the new ESP value is obtained from the uop that passed the execution stage, the delta value observed is applied on it to bring the ESP register up-to-date. Having the ESP value at hand allows quick resolution of the load addresses there by avoiding any stall related to that. This technique is expected to bear fruit in workloads where there is a significant use of the stack, most likely for function calls. Further details on this optimization can be found in Bekerman et al. [2].

74

C. Isen, L. John, and E. John

In Table 8 we present data related to ESP optimization. The percentage of ESP.SYNC refers to the number of times the ESP value had to be synchronized with the delta value as a percent of the total number of instructions. A high number is not desirable as it would imply the frequent need to synchronize the ESP data i.e. ESP data can not be computed at the decoder because it has to wait for the value from the execution stage. % ESP.ADDITIONS is a similar percent for the number of ESP addition operations performed in the decode unit – an indication of the scope of this optimization. A high value for this metric is desirable because, larger the percentage of instructions that use the addition operation, more are the number of cycles saved. The stack optimization seems to be more predominant in the integer benchmarks and not the floating-point benchmarks. The % ESP addition optimization in integer benchmarks range from 0.1% for hmmer to 11.3% for xalancbmk. The % of ESP synchronization is low even for benchmarks with high % of ESP addition. For example xalancbmk exhibits 11.3% ESP addition and has only 3.76% ESP synchronization. The C++ programs are expected to have more function calls and hence more scope for this optimization. Among integer programs omnetpp and xalancbmk are among the ones with a large % ESP addition. The others are gcc and gobmk; the modular and highly control flow intensive nature of gcc allows for these optimizations. Although Astar is a C++ application, it makes very little use of C++ features [19] and we find that it has a low % for ESP addition. Among the floating-point applications, dealII and povray, both C++ applications, have a higher % of ESP addition.
Table 8. Percentage of instructions on which early load address resolutions were applied
% ESP BENCHMARK 400.perlbench 401.bzip2 403.gcc 429.mcf 445.gobmk 456.hmmer 458.sjeng 462.libquantum 464.h264ref 471.omnetpp 473.astar 483.xalancbmk SYNCH 0.90% 0.30% 1.80% 0.17% 1.81% 0.00% 0.41% 0.12% 0.12% 3.06% 0.01% 3.76% % ESP BENCHMARK 433.milc 434.zeusmp 435.gromacs 436.cactusADM 437.leslie3d 444.namd 447.dealII 450.soplex 453.povray 454.calculix 459.GemsFDTD 465.tonto 470.lbm 481.wrf 482.sphinx3 410.bwaves 416.gamess INT - geomean 1.04% 4.07% FP - geomean % ESP SYNCH 0.00% 0.00% 0.03% 0.00% 0.00% 0.00% 0.20% 0.11% 0.67% 0.03% 0.08% 0.26% 0.00% 0.19% 0.17% 0.03% 0.15% 0.12% % ESP

ADDITIONS 6.88% 1.41% 7.99% 0.24% 8.45% 0.11% 3.19% 0.13% 1.44% 7.60% 0.14% 11.30%

ADDITIONS 0.04% 0.00% 0.14% 0.00% 0.00% 0.01% 3.05% 0.54% 2.77% 0.09% 0.33% 0.77% 0.00% 0.35% 0.90% 0.04% 0.76% 0.60%

A Tale of Two Processors: Revisiting the RISC-CISC Debate

75

On average the benefit from ESP based optimization is 4% for integer programs and 0.6% for FP programs. Each ESP based addition that is avoided amounts to avoiding execution of one uop. Although the average benefit is low, some of the applications benefit significantly in reducing unnecessary computations and there by helping performance of those applications in relation to their POWER5+ counter parts.

6 Conclusion
Using the SPEC CPU2006 benchmarks, we analyze the performance of a recent CISC processor, the Intel Woodcrest (Xeon 5160) with a recent RISC processor, the IBM POWER5+. In a CISC RISC comparison in 1991, the RISC processor showed an advantage of 2.7x and in a 1997 study of the Alpha 21064 and the Pentium Pro, the RISC Alpha showed 5% to 200% advantage on the SPEC CPU92 benchmarks. Our study shows that the performance difference between RISC and CISC has further narrowed down. In contrast to the earlier studies where the RISC processors showed dominance on all SPEC CPU programs, neither the RISC nor CISC dominates in this study. In our experiments, the Woodcrest shows advantage on several of the SPEC CPU2006 programs and the POWER5+ shows advantage on several other programs. Various factors have helped the Woodcrest to obtain its RISC-like performance. Splitting the x86 instruction into micro-operations of uniform complexity has helped, however, interestingly the Woodcrest also combines (fuses) some micro-operations to a single macro-operation. In some programs, up to a third of all micro-operations are seen to benefit from fusion, resulting in chained operations that are executed in a single step by the relevant functional unit. Fusion also reduces the demand on reservation station and reorder buffer entries. Additionally, it reduces the net uops per instruction. The average uop per instruction for Woodcrest in 2007 is 1.03 for integer programs and 1.07 for floating-point programs, while in Bhandarkar and Ding’s 1997 study [5] using SPEC CPU95 programs, the average was around 1.35 uops/inst. Although the POWER5+ has smaller L2 cache than the Woodcrest, it is seen to achieve equal or better L2 cache performance than the Woodcrest. The Woodcrest has better branch prediction performance than the POWER5+. Approximately 40%/20% (int/fp) of instructions in Woodcrest and 29%/9% (int/fp) of instructions in the POWER5+ are seen to be in the speculative path. Our study points out that with aggressive micro-architectural techniques for ILP, CISC and RISC ISAs can be implemented to yield very similar performance.

Acknowledgement
We would like to acknowledge Alex Mericas, Venkat R. Indukuru and Lorena Pesantez at IBM Austin for their guidance. The authors are supported in part by NSF grant 0702694, and an IBM Faculty award. Any opinions, findings and conclusions expressed in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation (NSF) or other research sponsors.

76

C. Isen, L. John, and E. John

References
1. Agerwala, T., Cocke, J.: High-performance reduced instruction set processors. Technical report, IBM Computer Science (1987) 2. Bekerman, M., Yoaz, A., Gabbay, F., Jourdan, S., Kalaev, M., Ronen, R.: Early load address resolution via register tracking. In: Proceedings of the 27th Annual international Symposium on Computer Architecture, pp. 306–315 3. Bhandarkar, D., Clark, D.W.: Performance from architecture: comparing a RISC and a CISC with similar hardware organization. In: Proceedings of ASPLOS 1991, pp. 310–319 (1991) 4. Bhandarkar, D.: A Tale of two Chips. ACM SIGARCH Computer Architecture News 25(1), 1–12 (1997) 5. Bhandarkar, D., Ding, J.: Performance Characterization of the Pentium® Pro Processor. In: Proceedings of the 3rd IEEE Symposium on High Performance Computer Architecture, February 01-05, 1997, pp. 288–297 (1997) 6. Chow, F., Correll, S., Himelstein, M., Killian, E., Weber, L.: How many addressing modes are enough. In: Proceedings of ASPLOS-2, pp. 117–121 (1987) 7. Cmelik, et al.: An analysis of MIPS and SPARC instruction set utilization on the SPEC benchmarks. In: ASPLOS 1991, pp. 290–302 (1991) 8. Hennessy, Gelsinger Debate: Can the 386 Architecture Keep Up? John Hennessy and Pat Gelsinger Debate the Future of RISC vs. CISC: Microprocessor Report 9. Hennessy, J.: VLSI Processor Architecture. IEEE Transactions on Computers C-33(11), 1221–1246 (1984) 10. Hennessy, J.: VLSI RISC Processors. VLSI Systems Design, VI:10, pp. 22–32 (October 1985) 11. Inside Intel Core Microarchitecture: Setting New Standards for Energy-Efficient Performance, http://www.intel.com/technology/architecture-silicon/core/ 12. Smith, J.E., Weiss, S.: PowerPC 601 and Alpha 21064. A Tale of Two RISCs, IEEE Computer 13. Microprocessor Report – Chart Watch - Server Processors. Data as of (October 2007) http://www.mdronline.com/mpr/cw/cw_wks.html 14. Patterson, D.A., Ditzel, D.R.: The case for the reduced instruction set computer. Computer architecture News 8(6), 25–33 (1980) 15. Patterson, D.: Reduced Instruction Set Computers. Communications of the ACM 28(1), 8– 21 (1985) 16. Kanter, D.: Fall Processor Forum 2006: IBM’s POWER6, http://www.realworldtech.com/ 17. Kanter, D.: Intel’s Next Generation Microarchitecture Unveiled. Real World Technologies (March 2006), http://www.realworldtech.com 18. SPEC Benchmarks, http://www.spec.org 19. Wong, M.: C++ benchmarks in SPEC CPU 2006. SIGARCH Computer Architecture News 35(1), 77–83 (2007)

Similar Documents

Free Essay

Cisc vs. Risc.

...particularly the CISC and the RISC, which have been developed as computer architects aimed for a fast, cost-effective design. Included in this paper are the arguments made for each architecture, and of some performance comparisons on RISC and CISC processors. These data are collected from various papers published concerning the RISC versus CISC discussion. INTRODUCTION: The advent of microprocessor and strides in integrated circuit technology improved the performance of computer system at roughly 35% per year. Mass production of lower cost microprocessors has increased the share of microprocessor based computer in the market. This new architecture of microprocessor based computers has become a true success after two major changes in computer marketplace. One is elimination of programming at the assembly language level which eliminated the need for object-code compatibility. So any architecture could reuse the source code written in higher level languages. The second is the creation of standardized vendor-independent operating systems like UNIX and its clones like Linux which lowered the cost and risk of bringing out a new architecture. The open standard of systems eased the new computer architecture introduction. The above improvements helped evolve computer architecture from general microprocessor based architecture to a new set of architectures called RISC architectures. RISC stands for Reduced Instruction Set Computer (employ simpler instruction set). CISC stands for Complex...

Words: 2342 - Pages: 10

Free Essay

Sikila7La7

...CISC vs RISC By Armin Gerritsen - Which one is better? - RISC vs CISC is a topic quite popular on the Net. Everytime Intel (CISC) or Apple (RISC) introduces a new CPU, the topic pops up again. But what are CISC and RISC exactly, and is one of them really better? This article tries to explain in simple terms what RISC and CISC are and what the future might bring for the both of them. This article is by no means intended as an article pro-RISC or pro-CISC. You draw your own conclusions … CISC Pronounced sisk, and stands for Complex Instruction Set Computer. Most PC's use CPU based on this architecture. For instance Intel and AMD CPU's are based on CISC architectures. Typically CISC chips have a large amount of different and complex instructions. The philosophy behind it is that hardware is always faster than software, therefore one should make a powerful instructionset, which provides programmers with assembly instructions to do a lot with short programs. In common CISC chips are relatively slow (compared to RISC chips) per instruction, but use little (less than RISC) instructions. RISC Pronounced risk, and stands for Reduced Instruction Set Computer. RISC chips evolved around the mid-1980 as a reaction at CISC chips. The philosophy behind it is that almost no one uses complex assembly language instructions as used by CISC, and people mostly use compilers which never use complex instructions. Apple for instance uses RISC chips. Therefore fewer, simpler and faster instructions...

Words: 1125 - Pages: 5

Premium Essay

Very Long Instruction Word Architecture

...Course code :CSE 211 Course title: Computer Organisation and Architecture Submitted to: Ramanpreet Kaur Lamba Madam Submitted by: K. Nabachandra Singha Very-Long Instruction Word (VLIW) Computer Architecture ABSTRACT VLIW architectures are distinct from traditional RISC and CISC architectures implemented in current mass-market microprocessors. It is important to distinguish instruction-set architecture—the processor programming model—from implementation—the physical chip and its characteristics. VLIW microprocessors and superscalar implementations of traditional instruction sets share some characteristics—multiple execution units and the ability to execute multiple operations simultaneously. The techniques used to achieve high performance, however, are very different because the parallelism is explicit in VLIW instructions but must be discovered by hardware at run time by superscalar processors. VLIW implementations are simpler for very high performance. Just as RISC architectures permit simpler, cheaper high-performance implementations than do CISCs, VLIW architectures are simpler and cheaper than RISCs because of further hardware simplifications. VLIW architectures, however, require more compiler support. INTRODUCTION AND MOTIVATION Currently, in the mid 1990s, IC fabrication technology is advanced enough to allow unprecedented implementations of computer architectures on a single chip. Also, the current...

Words: 3947 - Pages: 16

Free Essay

Perbedaan Risc Dan Cisc

...PERBANDINGAN ARSITEKTUR RISC DAN CISC M. Afif Izzuddin 11251102067 Teknik Informatika – Fakultas Sains dan Teknologi UIN Sultan Syarif Qasim Riau Email : afif.izzuddin94@yahoo.com ABSTRAK Terdapat dua konsep yang populer yang berhubungan dengan desain CPU dan set instruksi yaitu Complex Instruction Set Computing (CISC) dan Reduce Instruction Set Computing (RISC). RISC merupakan bagian dari arsitektur mikroprosessor, berbentuk kecil dan berfungsi untuk mengeset instruksi dalam komunikasi diantara arsitektur lainnya. CISC atau kumpulan instruksi komputasi kompleks. Adalah suatu arsitektur komputer dimana setiap instruksi akan menjalankan beberapa operasi tingkat rendah, seperti pengambilan dari memori (load), operasi aritmatika, dan penyimpanan ke dalam memori (store) yang saling bekerja sama. Tujuan utama dari arsitektur CISC adalah melaksanakan suatu instruksi cukup dengan beberapa baris bahasa mesin yang relatif pendek. RISC dimaksudkan untuk menyederhanakan rumusan perintah sehingga lebih efisen dalam penyusunan kompiler yang ada. Walaupun sistem sekarang terdiri atas kedua sistem tersebut. Sistem RISC lebih populer saat ini karena tingkat kinerjanya, dibandingkan dengan sistem CISC. Namun karena biaya yang dibutuhkan tinggi, sistem RISC hanya digunakan ketika membutuhkan kecepatan khusus, keandalan, dan sebagainya. ABSTRACT There are two popular concepts related to the design of the CPU and instruction set that is Complex Instruction Set Computing ( CISC ) and Reduce Instruction...

Words: 2145 - Pages: 9

Free Essay

Modelling of Modern Microprocessors

...Modelling Of Modern Microprocessors Siddhant (Author) Department of Computer Science Lovely Professional University Phagwara, India siddhant_s@outlook.com Abstract--Microprocessors are also known as a CPU or central processing unit is a complete computation engine that is fabricated on a single chip. The first microprocessor was the Intel 4004, introduced in 1971. This paper covers the evolution in microprocessors and the changes in the architecture of the microprocessor, the details of the latest microprocessors and the machines using them. The paper also discusses how the number of transistors affects the performance of processor.   A microprocessor can move data from one memory location to another. A microprocessor can make decisions and jump to a new set of instructions based on those decisions. The native language of a microprocessor is Assembly Language. The above mentioned are the three basic activities of a microprocessor. An extremely simple microprocessor capable of performing the above mentioned operations loos like: Index terms—Modern, architecture, Intel, PC, Apple. I. INTRODUCTION The microprocessor is the heart of any normal computer, whether it is a desktop machine , a server or a laptop . The first microprocessor to make a real splash in the market was the Intel 8088, introduced in 1979 and incorporated into the IBM PC (which first appeared around 1982).The microprocessor is made up of transistors. CHIPA chip...

Words: 1808 - Pages: 8

Free Essay

Arm Processer for Computer Architecture

...Subject: The use of the ARM processor as an instruction tool for Computer Architecture Class Journal Article Title: Arms for the Poor: Selecting a Processor for Teaching Computer Architecture Author: Alan Clements Site: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5673541 When an individual chooses to become a teacher, professor, or some sort of instructor, he or she will become subject to one of the most primitive questions ever asked in the history of civilization: “Why?” However, generally speaking when a student asks the question “Why?” it is not for a genuine thirst for knowledge or explanation. It is not like a child who wants to know why the sky is blue, or why dogs can’t talk. A students real interpretation of the question why is more like: “Why is this important?”, or “Why do we have to learn this?”, or the big one (according to Algebra teachers), “Will I ever use this again in the real world?” A computer architecture professor is different from other professor (besides obviously being smarter ;) ), when having to answer this question. Unlike Algebra, which is pretty well established and unlikely to change operations in the next 10 years, Computer Architecture is a rapidly evolving industry and has the very good possibility to look completely different in the year 2022. So a computer architecture professor is faced with a difficult answer to the question. One answer could be “Yes you have to learn it, because it appears on the final and I will...

Words: 1463 - Pages: 6

Free Essay

Systems

...characteristics of contemporary processors, input, output and storage devices | Structure and function of the processor | The Arithmetic and Logic Unit (ALU), Control Unit and registers: Program Counter (PC), Accumulator (ACC), Memory Address Register (MAR), Memory Data Register (MDR), Current Instruction Register (CIR).Buses: data, address and control: How this relates to assembly language programs.The fetch-decode-execute cycle, including its effect on registers.The factors affecting the performance of the CPU, clock speed, number of cores, cache.Von Neumann, Harvard and contemporary processor architecture. | The use of pipelining in a processor to improve efficiency. | Types of processor | The differences between, and uses of, CISC and RISC processors.Multicore and parallel systems. | GPUs and their uses (including those not related to graphics). | Input, output and storage | How different input output and storage devices can be applied as a solution of different problems.The uses of magnetic, flash and optical storage devices.RAM and ROM.Virtual storage. | | 2 Software and software development | Operating systems | The need for, function and purpose of operating systems.Memory management (paging, segmentation and virtual memory).Interrupts, the role of interrupts and Interrupt Service Routines (ISR), role within the fetch decode execute cycle.Scheduling: round robin, first come first served, multi-level feedback queues, shortest job first and shortest remaining...

Words: 1302 - Pages: 6

Free Essay

Computer Oss Comparison Essay

...Linux has grown in popularity and capability over the years, but is it competitive with its competition. In this paper an overview of the Linux 2.6 Operating System (OS) and how it functions/performs on the technical level will be discussed. Comparisons to other retail OSs such as, Windows, Mac OS X, and prior versions of Linux will be used to show the strengths and weaknesses of this OS. “Linux was created by a student (Linus Torvalds) in Helsinki in 1991 with the assistance of developers from around the world. Linux is free, it shares its work with everyone — including competitors — and its business model is motivated primarily by adrenaline, altruism, and peer respect rather than by money. Yet, Linux's functionality, adaptability and robustness has made it the main alternative for proprietary operating systems, especially where budgets are a main concern.” (OEDB, 2007). As it is stated above Torvalds creation was a key proponent in creating the Open Source Movement, which has paved the way for the many distributions of the Linux Kernel. In the beginning Linus Torvalds was an IT student with the desire to test the limits of his current computer. During this time Torvalds was working with the MINIX OS which was create to be a cheap alternative to UNIX. Torvalds wanted to modify the kernel of MINIX and found that this was not possible so he began to create Linux. In the beginning Linux did not offer a lot of features and seemed to be lacking in ability (Diedrich, 2011)...

Words: 1869 - Pages: 8

Premium Essay

Tech Guide

...Tech Guide 1 Hardware TG1.1 Components of a Computer System TG1.2 Evolution of Computer Hardware TG1.3 Types of Computers TG1.4 Microprocessor and Primary Storage TG1.5 Input/Output Devices TG1.1 Components of a Computer System Computer hardware is composed of the following components: central processing unit (CPU), primary storage, secondary storage, input devices, output devices, and communication devices. Communication devices are covered in detail in Tech Guide 4. The input devices accept data and instructions and convert them to a form that the computer can understand. The output devices present data in a form people can understand. The CPU manipulates the data and controls the tasks done by the other components. Primary storage (internal storage that is part of the CPU) temporarily stores data and program instructions during processing. Secondary storage (external storage such as flash drives) stores data and programs that have been saved for future use. Communication devices manage the flow of data from public networks (e.g., Internet, intranets) to the CPU, and from the CPU to networks. A schematic view of a computer system is shown in Figure TG1.1. REPRESENTING DATA, PICTURES, TIME, AND SIZE IN A COMPUTER ASCII. Computers are based on integrated circuits (chips), each of which includes millions of sub-miniature transistors that are interconnected on a small (less than l-inch-square) chip area. Each transistor can be in either an “on” or an “off” position...

Words: 8488 - Pages: 34

Free Essay

Software

...0 3 Elective -I Digital Control Systems Distributed Operating Systems Cloud Computing 3 0 3 Elective -II Digital Systems Design Fault Tolerant Systems Advanced Computer Networks 3 0 3 Lab Micro Processors and Programming Languages Lab 0 3 2 Seminar - - 2 Total Credits (6 Theory + 1 Lab.) 22 JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD MASTER OF TECHNOLOGY (REAL TIME SYSTEMS) I SEMESTER ADVANCED COMPUTER ARCHITECTURE UNIT I Concept of instruction format and instruction set of a computer, types of operands and operations; addressing modes; processor organization, register organization and stack organization; instruction cycle; basic details of Pentium processor and power PC processor, RISC and CISC instruction set. UNIT II Memory devices; Semiconductor and ferrite core memory, main memory, cache memory, associative memory organization; concept of virtual memory; memory organization and mapping; partitioning, demand paging, segmentation; magnetic disk organization, introduction to magnetic tape and CDROM. UNIT III IO Devices, Programmed IO, interrupt driver IO, DMA IO modules, IO addressing; IO channel, IO Processor, DOT matrix printer, ink jet printer, laser printer. Advanced concepts; Horizontal and vertical instruction format, microprogramming, microinstruction sequencing and control; instruction pipeline; parallel processing; problems in parallel processing; data hazard, control hazard. UNIT IV ILP software approach-complier...

Words: 3183 - Pages: 13

Premium Essay

Computer Organization and Architecture Designing for Performance 8th Edition

...COMPUTER ORGANIZATION AND ARCHITECTURE DESIGNING FOR PERFORMANCE EIGHTH EDITION William Stallings Prentice Hall Upper Saddle River, NJ 07458 Library of Congress Cataloging-in-Publication Data On File Vice President and Editorial Director: Marcia J. Horton Editor-in-Chief: Michael Hirsch Executive Editor: Tracy Dunkelberger Associate Editor: Melinda Haggerty Marketing Manager: Erin Davis Senior Managing Editor: Scott Disanno Production Editor: Rose Kernan Operations Specialist: Lisa McDowell Art Director: Kenny Beck Cover Design: Kristine Carney Director, Image Resource Center: Melinda Patelli Manager, Rights and Permissions: Zina Arabia Manager, Visual Research: Beth Brenzel Manager, Cover Visual Research & Permissions: Karen Sanatar Composition: Rakesh Poddar, Aptara®, Inc. Cover Image: Picturegarden /Image Bank /Getty Images, Inc. Copyright © 2010, 2006 by Pearson Education, Inc., Upper Saddle River, New Jersey, 07458. Pearson Prentice Hall. All rights reserved. Printed in the United States of America. This publication is protected by Copyright and permission should be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permission(s), write to: Rights and Permissions Department. Pearson Prentice Hall™ is a trademark of Pearson Education, Inc. Pearson® is a registered trademark of...

Words: 239771 - Pages: 960

Premium Essay

Abcdef

...Vol.1 FE Exam Preparation Book Preparation Book for Fundamental Information Technology Engineer Examination Part1: Preparation for Morning Exam Part2: Trial Exam Set INFORMATION-TECHNOLOGY PROMOTION AGENCY, JAPAN FE Exam Preparation Book Vol. 1 Table of Contents Part 1 Chapter 1 PREPARATION FOR MORNING EXAM Computer Science Fundamentals 1.1 Basic Theory of Information 1.1.1 Radix Conversion 1.1.2 Numerical Representations 1.1.3 Non-Numerical Representations 1.1.4 Operations and Accuracy Quiz 1.2 Information and Logic 1.2.1 Logical Operations 1.2.2 BNF 1.2.3 Reverse Polish Notation Quiz 1.3 Data Structures 1.3.1 Arrays 1.3.2 Lists 1.3.3 Stacks 1.3.4 Queues (Waiting lists) 1.3.5 Trees 1.3.6 Hash Quiz 1.4 Algorithms 1.4.1 Search Algorithms 1.4.2 Sorting Algorithms 1.4.3 String Search Algorithms 1.4.4 Graph Algorithms Quiz Questions and Answers 2 3 3 7 10 11 14 15 15 18 21 24 25 25 27 29 30 32 34 37 38 38 41 45 48 50 51 i Chapter 2 Computer Systems 2.1 Hardware 2.1.1 Information Elements (Memory) 2.1.2 Processor Architecture 2.1.3 Memory Architecture 2.1.4 Magnetic Tape Units 2.1.5 Hard Disks 2.1.6 Terms Related to Performance/ RAID 2.1.7 Auxiliary Storage / Input and Output Units 2.1.8 Input and Output Interfaces Quiz 2.2 Operating Systems 2.2.1 Configuration and Objectives of OS 2.2.2 Job Management 2.2.3 Task Management 2.2.4 Data Management and File Organization 2.2.5 Memory Management Quiz 2.3 System Configuration Technology 2.3.1 Client...

Words: 26218 - Pages: 105

Premium Essay

It and Its Scope

...UNIVERSITY OF MUMBAI Bachelor of Engineering Information Technology (Third Year – Sem. V & VI) Revised course (REV- 2012) from Academic Year 2014 -15 Under FACULTY OF TECHNOLOGY (As per Semester Based Credit and Grading System) University of Mumbai, Information Technology (semester V and VI) (Rev-2012) Page 1 Preamble To meet the challenge of ensuring excellence in engineering education, the issue of quality needs to be addressed, debated and taken forward in a systematic manner. Accreditation is the principal means of quality assurance in higher education. The major emphasis of accreditation process is to measure the outcomes of the program that is being accredited. In line with this Faculty of Technology of University of Mumbai has taken a lead in incorporating philosophy of outcome based education in the process of curriculum development. Faculty of Technology, University of Mumbai, in one of its meeting unanimously resolved that, each Board of Studies shall prepare some Program Educational Objectives (PEO‟s) and give freedom to affiliated Institutes to add few (PEO‟s) and course objectives and course outcomes to be clearly defined for each course, so that all faculty members in affiliated institutes understand the depth and approach of course to be taught, which will enhance learner‟s learning process. It was also resolved that, maximum senior faculty from colleges and experts from industry to be involved while revising the curriculum. I am happy to state...

Words: 10444 - Pages: 42

Premium Essay

Heuy2Kj4

...the essentials of Linda Null and Julia Lobur JONES AND BARTLETT COMPUTER SCIENCE the essentials of Linda Null Pennsylvania State University Julia Lobur Pennsylvania State University World Headquarters Jones and Bartlett Publishers 40 Tall Pine Drive Sudbury, MA 01776 978-443-5000 info@jbpub.com www.jbpub.com Jones and Bartlett Publishers Canada 2406 Nikanna Road Mississauga, ON L5C 2W6 CANADA Jones and Bartlett Publishers International Barb House, Barb Mews London W6 7PA UK Copyright © 2003 by Jones and Bartlett Publishers, Inc. Cover image © David Buffington / Getty Images Illustrations based upon and drawn from art provided by Julia Lobur Library of Congress Cataloging-in-Publication Data Null, Linda. The essentials of computer organization and architecture / Linda Null, Julia Lobur. p. cm. ISBN 0-7637-0444-X 1. Computer organization. 2. Computer architecture. I. Lobur, Julia. II. Title. QA76.9.C643 N85 2003 004.2’2—dc21 2002040576 All rights reserved. No part of the material protected by this copyright notice may be reproduced or utilized in any form, electronic or mechanical, including photocopying, recording, or any information storage or retrieval system, without written permission from the copyright owner. Chief Executive Officer: Clayton Jones Chief Operating Officer: Don W. Jones, Jr. Executive V.P. and Publisher: Robert W. Holland, Jr. V.P., Design and Production: Anne Spencer V.P., Manufacturing and...

Words: 118595 - Pages: 475

Premium Essay

Hai, How Are U

...UNIVERSITY OF KERALA B. TECH. DEGREE COURSE 2008 ADMISSION REGULATIONS and I  VIII SEMESTERS SCHEME AND SYLLABUS of COMPUTER SCIENCE AND ENGINEERING B.Tech Comp. Sc. & Engg., University of Kerala 2 UNIVERSITY OF KERALA B.Tech Degree Course – 2008 Scheme REGULATIONS 1. Conditions for Admission Candidates for admission to the B.Tech degree course shall be required to have passed the Higher Secondary Examination, Kerala or 12th Standard V.H.S.E., C.B.S.E., I.S.C. or any examination accepted by the university as equivalent thereto obtaining not less than 50% in Mathematics and 50% in Mathematics, Physics and Chemistry/ Bio- technology/ Computer Science/ Biology put together, or a diploma in Engineering awarded by the Board of Technical Education, Kerala or an examination recognized as equivalent thereto after undergoing an institutional course of at least three years securing a minimum of 50 % marks in the final diploma examination subject to the usual concessions allowed for backward classes and other communities as specified from time to time. 2. Duration of the course i) The course for the B.Tech Degree shall extend over a period of four academic years comprising of eight semesters. The first and second semester shall be combined and each semester from third semester onwards shall cover the groups of subjects as given in the curriculum and scheme of examination ii) Each semester shall ordinarily comprise of not less than 400 working periods each of 60 minutes duration...

Words: 34195 - Pages: 137