Free Essay

Study Based on Simplescalar

In:

Submitted By nescio
Words 1699
Pages 7
PROJECT: STUDY BASED ON SIMPLESCALAR

CS202, SPRING 2011

Project
Study Based on SimpleScalar
Problem 1: Cache Simulation and Associativity
Descriptions
Configurations:
• least-recently-used (LRU) replacement policy • 128 to 2048 sets • 1-way to 4-way associativity • 16-byte cache lines

Command Line: ./sim-cheetah -R lru -a 7 -b 11 -n 2 -l 4 go.ss 2 8 go.in The Simulation under Ubuntu:

PAGE 1 / 8

COMPUTER ORGANIZATION AND ARCHITECTURE

PROJECT: STUDY BASED ON SIMPLESCALAR

CS202, SPRING 2011

Results
• simulation statistics

sim_num_insn sim_num_refs sim_elapsed_time sim_inst_rate

31394965 # total number of instructions executed 8154766 # total number of loads and stores executed 2 # total simulation time in seconds 15697482.5000 # simulation speed (in insts/sec)

Addresses processed: 8155568 Line size: 16 bytes
• Miss Ratios

Associativity & No. of sets

1

2

3

4

128 256 512 1024 2048

0.18586 0.12974 0.09492 0.05787 0.03718

0.094562 0.062298 0.043441 0.025081 0.014161

0.065089 0.043251 0.030266 0.017734 0.007808

0.051197 0.034149 0.024275 0.012934 0.004801

Analysis
Based on the data above, I draw two diagrams for the convenience of comparision. 1 0.2
0.186

2

3

4

0.15
0.13

0.1
0.095 0.095 0.062 0.043 0.034 0.043 0.03 0.024 0.058 0.025 0.018 0.013 0.037 0.014 0.008 0.005

0.05 0

0.065 0.051

128

256

512

1024

2048

PAGE 2 / 8

COMPUTER ORGANIZATION AND ARCHITECTURE

PROJECT: STUDY BASED ON SIMPLESCALAR

CS202, SPRING 2011

0.2 0.15 0.1 0.05 0

128

256

512

1024

2048

Discussion and Conclusion
From the diagrams, we can clearly see that the miss radio goes down as the number of sets increases, and also as the associativity increases. So both increasing the associativity and the number of sets benefit. To be more specific, we can see, when the associativity is 1, the miss ratio goes down very rapidly as we increase the number of sets; but he associativity is 4, the decreasing of miss ratio gets slow as the increasing of the number of sets. So we cannot always get benefit by increasing the number of sets. Besides, when the associativity increases from 1 to 2, the miss ratio drops down dramatically; when the associativity is 3 or 4, we seems to get much less benefit than before. So I think in the real cache design, 4-way associativity is enough.

Problem 3: Cache Replacement Policies
Description
The MIN set-associative cache replacement policy that was first suggested by Laszlo Belady. This unimplementable--but simulatable--policy uses oracle information to determine the block in a set that will be used the farthest in the future. Configurations: Except the using of MIN replacement policy, other configurations are exactly the same as Problem 1. Command Line(In sim-cheetah -R opt stands for the MIN replacement policy): ./sim-cheetah -R opt -a 7 -b 11 -n 2 -l 4 go.ss 2 8 go.in

Results
• simulation statistics

sim_num_insn sim_num_refs sim_elapsed_time
PAGE 3 / 8

31394965 # total number of instructions executed 8154766 # total number of loads and stores executed 1 # total simulation time in seconds

COMPUTER ORGANIZATION AND ARCHITECTURE

PROJECT: STUDY BASED ON SIMPLESCALAR

CS202, SPRING 2011

sim_inst_rate

31394965.0000 # simulation speed (in insts/sec)

Addresses processed: 8155568 Line size: 16 bytes
• Miss Ratios

Associativity & No. of sets

1

2

3

4

128 256 512 1024 2048

0.14945 0.10623 0.07926 0.05042 0.03269

0.071916 0.048559 0.033936 0.019199 0.010186

0.048655 0.032539 0.021734 0.011578 0.005191

0.037641 0.024930 0.015905 0.007579 0.003463

Analysis
A comparison between MIN and LRU for associativity 1: LRU 0.2 MIN

0.186

0.150.149
0.13

0.1 0.05 0

0.106

0.095 0.079 0.058 0.05 0.037 0.033

128

256

512

1024

2048

Define the reduction in miss-rate:
Reduction in miss-rate= (original miss-rate - new miss-rate) / original miss-rate

Compute the reduction in miss-rate when changing from the LRU to the MIN replacement policy for all cache sizes and associativities.

PAGE 4 / 8

COMPUTER ORGANIZATION AND ARCHITECTURE

PROJECT: STUDY BASED ON SIMPLESCALAR

CS202, SPRING 2011

Reduction in miss-rate

1

2

3

4

128 256 512 1024 2048

0.19589

0.23948309

0.25248506

0.26478114 0.26996398 0.34479918 0.41402505 0.27869194

0.181239 0.220536775 0.24767057 0.16497 0.12875 0.12067 0.218802514 0.28190048 0.234520155 0.28070052 0.34712981 0.335169057

Discussion and Conclusion
According to the diagram and chart above, the MIN replacement policy does reduce the miss rate. So a good replacement policy helps improve cache performance. In most cases, the higher associativity, the bigger reduction, but there are exceptions(e.g. the reduction number for 2048 sets and 4-way associativity). For associativity 1, the more sets, the smaller reduction. But there seems to be no rule in other cases. MIN cannot be implemented. So in practice, we can use FIFO(first in first out) or LFU(least frequently used) replacement policy.

Problem 8: Branch Predictors
Descriptions
Five kinds of branch predictions:
1. 2. 3.

Never taken Assume that jump will not happen and always fetch the next instruction. Always taken Assume that jump will happen and always fetch the target instruction. Bimodal A bimodal predictor is a state machine with four states:
• Strongly not taken • Weakly not taken • Weakly taken • Strongly taken

When a branch is evaluated, the corresponding state machine is updated. Branches evaluated as not taken decrement the state towards strongly not taken, and branches evaluated as taken increment the state towards strongly taken. The advantage of the two-bit counter over a onebit scheme is that a conditional jump has to deviate twice from what it has done most in the
PAGE 5 / 8

COMPUTER ORGANIZATION AND ARCHITECTURE

PROJECT: STUDY BASED ON SIMPLESCALAR

CS202, SPRING 2011

past before the prediction changes. For example, a loop-closing conditional jump is mispredicted once rather than twice.
4.

Two-level Conditional jumps that are taken every second time or have some other regularly recurring pattern are not predicted well by the bimodal predictor. A two-level adaptive predictor remembers the history of the last n occurrences of the branch and use one bimodal predictor for each of the possible 2^n history patterns.

5.

Combined The combined predictor in SimpleScalar is a combination of bimodal and two-level adaptive predictor.

sim-outorder also provides a perfect predictor which has 0% miss rate. Command Lines: ./sim-profile -iclass go.ss 2 8 go.in ./sim-outorder -bpred nottaken go.ss 2 8 go.in ./sim-outorder -bpred taken go.ss 2 8 go.in ./sim-outorder -bpred bimod go.ss 2 8 go.in ./sim-outorder -bpred 2lev go.ss 2 8 go.in ./sim-outorder -bpred comb go.ss 2 8 go.in ./sim-outorder -bpred perfect go.ss 2 8 go.in

Results
• Distribution of instruction types

Instruction Type

Number

Percentage(%)

load store uncond. branch cond. branch int computation fp computation trap

6175411 1979355 996935 4001737 18241470 0 56

19.67 6.30 3.18 12.75 58.10 0.00 0.00

• Simulation statistics of 6 types of branch predictors

PAGE 6 / 8

COMPUTER ORGANIZATION AND ARCHITECTURE

PROJECT: STUDY BASED ON SIMPLESCALAR

CS202, SPRING 2011

Predictors

IPC

Branch directionprediction rate

nottaken taken bimod 2lev comb perfect

0.6525 0.6599 0.9680 0.9005 0.9737 1.1500

0.3214 0.3214 0.8434 0.7514 0.8457 1.0000

Analysis
Branch direction-prediction rate comparison among 6 predictors: nottaken taken bimod 2lev comb perfect 0 0.25 0.5 0.75
0.751 0.846 1 0.321 0.321 0.843

1

How the prediction rate effects the processor IPC: (x-axis: prediction rate; y-axis: IPC)
1.20

1.05

0.90

0.75

0.60 0.300
PAGE 7 / 8

0.475

0.650

0.825

1.000

COMPUTER ORGANIZATION AND ARCHITECTURE

PROJECT: STUDY BASED ON SIMPLESCALAR

CS202, SPRING 2011

Discussion and Conclusion
• Conditional branches percentage

According to the statics of the distribution of instruction types, 4001737 instructions are conditional branches in total, which takes a percentage of 12.75%. That means when we encounter one conditional branch instruction, we have to encounter 6.84 instructions of other types on average. So the frequency of conditional branch instructions in a program (at least in go.ss) appears to be really high, and thus branch prediction is necessary.
• Prediction accuracy

As being illustrated in the diagram above, for go.ss, the nottaken and taken predictors have the lowest hit rate, while the bimod and 2lev predictors do a much better job, and the comb predictor is the best of all. In this program, the 2-level adaptive predictor works worse than bimodal predictor though it seems to have more complicated structure. So compared with the perfect predictor, predictors can have at least 75% prediction rate unless you use the straightforward nottaken or taken predictors.
• Prediction rate and IPC

We can see from the diagram above, the higher the prediction rate, the more the IPC goes. But the results seem not to be always true. Here is an abstract from Prediction System for L1 Data Cache by Ankur Kath, Neha Paranjape, Sachin Kulkarni, Varun Pandit at University of Minnesota:

”From the above graph we observe that the IPC rate for system with prediction is consistently lower than the system without prediction for all benchmarks. This is because though we aim to exploit the window between two consecutive memory instructions. In the case where this window is not enough for a prediction load to take place it will introduce a latency which cannot be hidden and therefore will count towards the IPC. This will especially be true for programs which have a number of random memory accesses.”

So I think for go.ss the IPC increases as the prediction rate rises, but this conclusion cannot generalize to all programs.

PAGE 8 / 8

COMPUTER ORGANIZATION AND ARCHITECTURE

Similar Documents

Free Essay

General Introductio

...NATIONAL UNIVERSITY OF RWANDA FACULTY OF APPLIED SCIENCES DEPARTMENT OF COMPUTER SCIENCE ACADEMIC YEAR 2011 Performance analysis of Encryption/Decryption algorithms using SimpleScalar By: MANIRIHO Malachie and NIZEYIMANA Jean-Paul Supervisor: Dr.-Ing. NIYONKURU Adronis Huye, 2011 CHAPTER ONE: GENERAL INTRODUCTION 1.1. BACKGROUND TO THE STUDY There are various security measures that can be imposed in order to secure the information stored. As more and more technologies evolve, an irresponsible person may try to find a way to excavate any loopholes within the system in order to penetrate into the heart of its weaknesses. This is due to the fact that human-made designs can also be broken by another human. Thus, over time security measures must constantly be reviewed and strengthened in order to combat hackers or culprits hot on the heels of system developers who are also using high technologies. One of the means to secure the data is to apply a secret code of encryption. By having it encrypted, the sender can pass the data to the receiver and only the receiver or authorized personnel can have access to the data provided they have been given a key by the sender to decrypt it in order for them to view the information. Thus, without having the right key, nobody is able to read the encrypted data received or stored. Even if hackers or unauthorized person managed to intercept or steal the data, it would be...

Words: 7475 - Pages: 30

Free Essay

Pentium 4

...The Effect of Compiler Optimizations on Pentium 4 Power Consumption John S. Seng Dean M. Tullsen Dept. of Computer Science and Engineering University of California, San Diego La Jolla, CA 92093-0114 jseng,tullsen @cs.ucsd.edu Abstract This paper examines the effect of compiler optimizations on the energy usage and power consumption of the Intel Pentium 4 processor. We measure the effects of different levels of general optimization and specific optimization. We classify general optimizations as those compiler flags which enable a set of compiler optimizations. Specific optimizations are those which can be enabled and disabled individually. The three specific optimizations we study are loop unrolling, loop vectorization, and function inlining. The binaries used in this study are generated using the Intel C++ compiler, which allows fine-grained control over each of these specific optimizations. ¡   1. Introduction The power consumption of general purpose microprocessors has reached a point where the problem has to be addressed at various levels of system design. Many circuit, architecture, and software algorithm techniques exist to reduce power, but one often overlooked area is the effect of the program code on power consumption. Some research has been done studying the effect of compiler optimizations on power consumption [8, 9]; this work has been generally limited to using architecture-level power models for power estimation. In this work we examine the effect of the compiler...

Words: 4099 - Pages: 17

Premium Essay

Computer Organization and Architecture Designing for Performance 8th Edition

...COMPUTER ORGANIZATION AND ARCHITECTURE DESIGNING FOR PERFORMANCE EIGHTH EDITION William Stallings Prentice Hall Upper Saddle River, NJ 07458 Library of Congress Cataloging-in-Publication Data On File Vice President and Editorial Director: Marcia J. Horton Editor-in-Chief: Michael Hirsch Executive Editor: Tracy Dunkelberger Associate Editor: Melinda Haggerty Marketing Manager: Erin Davis Senior Managing Editor: Scott Disanno Production Editor: Rose Kernan Operations Specialist: Lisa McDowell Art Director: Kenny Beck Cover Design: Kristine Carney Director, Image Resource Center: Melinda Patelli Manager, Rights and Permissions: Zina Arabia Manager, Visual Research: Beth Brenzel Manager, Cover Visual Research & Permissions: Karen Sanatar Composition: Rakesh Poddar, Aptara®, Inc. Cover Image: Picturegarden /Image Bank /Getty Images, Inc. Copyright © 2010, 2006 by Pearson Education, Inc., Upper Saddle River, New Jersey, 07458. Pearson Prentice Hall. All rights reserved. Printed in the United States of America. This publication is protected by Copyright and permission should be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permission(s), write to: Rights and Permissions Department. Pearson Prentice Hall™ is a trademark of Pearson Education, Inc. Pearson® is a registered trademark of...

Words: 239771 - Pages: 960