Free Essay

Ibm Power6 Microprocessor (64 Bit)

In:

Submitted By sihakk
Words 3085
Pages 13
IBM POWER6 Microprocessor (64 bit)
Term Paper: ECE312

Rahul Sihag
Section: K2103, Roll no: B26
B Tech CSE
Lovely Professional University
Phagwara, Punjab, India rahulsihagg@gmail.com Abstract— This term paper is about IBM POWER6 Microprocessors. It covers Introduction, Core chapters including definition, description, history, design etc. It also includes their Applications, Future perspective and Conclusion etc.
Index Terms— Introduction, Core chapters, Applications & Future perspective, Conclusion.
I. INTRODUCTION
A. Microprocessors
A silicon chip that contains a CPU. In the world of personal computers, the terms microprocessor and CPU are used interchangeably. At the heart of all personal computers and most workstations sits a microprocessor. Microprocessors also control the logic of almost all digital devices, from clock radios to fuel-injection systems for automobiles. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and provides results as output. Intel introduced its first 4-bit microprocessor 4004 in 1971 and its 8-bit microprocessor 8008 in 1972.

B. IBM POWER6 Microprocessors
The POWER6 is a microprocessor developed by IBM that implemented the Power ISA v.2.03. When it became available in systems in 2007, it succeeded the POWER5+ as IBM's flagship Power microprocessor. The POWER6 processor is the latest generation in the
POWER line of PowerPC processors. Fabricated using
IBM’s 65 nm partially-depleted SOI process, the 341 mm
POWER6 chip contains over 790 million transistors and 1953 signal I/Os connected using 4.5 km of wire on 10 copper metal layers (Fig. 1). Each chip includes two dual threaded SMT processor cores implemented in a 13 FO4 design capable of running at speeds up to 5 GHz. In addition, a private 4 MB L2 cache per core, a shared 32 MB L3 cache controller, two inte- grated memory controllers, an on-board I/O controller and nest support for large-scale SMP are included on the chip. In order to provide mainframe-like reliability, enhanced error detection and system monitoring capabilities are managed through a new recovery unit that provides full checkpointing facilities. This is supplemented by complete ECC protection of large caches and architected state, parity protection on more than 99% of register files and 70% of dataflow circuits, along with extensive control checkers. In addition, improved virtualization support and decimal floating-point execution capability provide a rich set of features, while remaining binary compatible with previous POWER designs.

II. CORE CHAPTERS
A. History
POWER6 was described at the International Solid-State Circuits Conference (ISSCC) in February 2006, and additional details were added at the Microprocessor Forum in October 2006 and at the next ISSCC in February 2007. It was formally announced on May 21, 2007. It was released on June 8, 2007 at speeds of 3.5, 4.2 and 4.7 GHz, but the company has noted prototypes have reached 6 GHz. POWER6 reached first silicon in the middle of 2005, and was bumped to 5.0 GHz in May 2008 with the introduction of the P595.

B. Description
POWER is a RISC instruction set architecture designed by IBM. (POWERis Performance Optimization With Enhanced RISC*)
 It’s based on IBM POWER5 microprocessor technology (SMT, Dual
Core) plus some extensions in order to increase performances.
 Its core is fabricated in 65-nm silicon-on-insulator (SOI) technology and operates at frequencies of more than 4 GHz.
 The microprocessor is a 13-FO4** design containing more than 790 million transistors, 1,953 signal I/Os, and more than 4.5 km of wire on ten copper metal layers.

Fig: Power6 chip with cores and L2 latch highlighted

The IBM POWER6*microprocessor core is fabricated using the IBM 65-nm silicon-on-insulator (SOI) process and provides a significant boost in frequency and performance to pSeries*systems. Core operating frequencies of more than 5 GHz have been demonstrated.
The processor chip contains two cores, 8 MB of on-chip level 2 (L2) cache, a directory for a 32-MB L3 cache, two memory controllers, a GX I/O controller, and nest support circuitry for a 128-way symmetric multiprocessor
(SMP). The chip shown in Figure 1has an area of
341 mm2 and contains 790 million transistors, 1,953 signal I/Os, 5,399 power and ground I/Os, and more than
4.5 km of wire.
The on-chip circuits are connected via ten levels of copper wire and are powered through multiple voltage domains. The core logic, array, and I/O circuits are designed to operate at nominal voltages of 1.15, 1.3, and
1.2 V, respectively. However, the actual logic and array voltages delivered to each chip vary between 0.85 V and
1.3 V and between 1.0 V and 1.4 V, respectively, depending on the speed of the part. Chips with shorter channels typically run faster but use considerably more power because of higher leakage. In previous-generation processors, these parts would have been discarded because of excessive power dissipation but now are usable by operating at lowered voltages. In addition, chips with longer channels typically run slower, so some of these parts also would not have been used in earlier generation processors because of their low operating frequency, but now they also are made usable by increasing their operating voltages.

C. Architecture Fig: Architecture of power6

 The Power6 Chip operates at twice the frequency of Power5
In place of speculative out-of-order execution that requires costly circuit renaming, the design concentrates on providing data prefetch.
Limited out-of-order execution is implemented for FP instructions.
Improvement of the Dispatch and
Completion: 7 intr from both cores simultaneously Better SMT speed up due to increased cache size,associativity
Designed to consume less power

D. Circuit Design Methodologies
1) Latch Design:
The majority of state-saving devices used in POWER6, out- side register _les and SRAMs, are scannable master–slave _ip- flops (FFs). In normal operation, each of these is controlled by two opposite phase, slightly skewed clocks, C1 and C2 that drive the master latch (L1) and slave latch (L2), respectively . In order to reduce chip power, most _ip-_ops can be run in pulsed mode where C1 is held high while C2 is pulsed (Fig. 3). Since only one clock signal is active in this mode, switching power is reduced. Table I describes various latch modes and their clocks. Delay C1 mode allows cycle stealing during the C2 rise and C1 fall overlap, which provides the capability to shift cycle bound- aries and tune frequency in the hardware. Pulsed mode allows even more cycle stealing at the cost of extra padding needed to meet tighter hold time requirements. Designs were padded for minimum pulsewidth mode (2.9 FO4), while mid (4.2 FO4) and max (5.2 FO4) pulsewidth modes were supported to pro- vide maximum _exibility when the chip was tuned in the lab (see Section IV). Finally, a Delay C2 mode, which delayed the C2 rise, was available for debugging frequency limiting paths.

2) Library Cells
One of the driving forces behind the ef_cient design method- ology of POWER6 was the RodRunner pcell-based gate library Fig. 4. Custom design _ow. that provided _ne device size granularity while retaining the ad- vantages of cell-based layout design. In addition, the resources required to create the full cell library were greatly reduced because layouts for each cell were generated and updated automatically.
For synthesized random logic macros, the use of RodRunner cells allowed a very large library of standard cells to be cre- ated giving synthesis maximum _exibility. Over 500 unique cells were available for each of three types supported in the 65 nm technology, without the enormous overhead that would normally be associated with maintaining a library of that size.
As many as four different beta ratios were available for each size cell with two- and three-input cells usually having multiple tapering ratios available as well.
A key bene_t to RodRunner was the ability to make any DRC or methodology (METH) updates in a single location within the RodRunner cell. This change was instantly picked up across all instantiations of the cell, including in the standard cell library.
While this occasionally required minor updates to existing lay- outs to ensure compatibility, these could be performed with minimal effort. This also allowed technology updates, which could affect transistor strengths and beta ratios, to be easily compen- sated for by the designer or Einstuner (IBM’s device tuning tool).

3) Custom Methodology
The tools used for the custom methodology maximized the possible number of iterations on a circuit, allowing the designer to rapidly approach an optimal solution. The methodology could be split into three design phases as illustrated in Fig. 4: high level design, schematic entry and placement, and layout.
During the high level design phase, physical abstracts were used to _oorplan a macro as well as develop a pin/wiring contract with integrators. Early timing abstracts were generated based on circuit designer estimates of logic implementation.
Schematic entry and placement could be performed simulta- neously with an innovative new tool called PIP (Placement with Instance Parameters), a GUI for a library of Skill functions used to place cells. PIP allowed circuit designers to more accurately and easily _oorplan and time their macros. This combined with STEP [6] (STeiner Estimated Parasitic), allowed fairly accurate wire models to be included in early timing abstracts. A circuit topology checker could then be used to verify the circuits to ensure they met project design rules prior to layout implementation.
During the layout phase, only routing was needed as all cells had been previously placed. The use of RodRunner allowed automatic optimization tools, such as Einstuner, to easily update both schematics and layouts to improve timing, area and power by optimizing device sizing and beta ratios. Additionally, the LAVA engine [6], which performed leakage calculations based on analyzing channel-connected components, could change the of cells, to either reduce leakage power on noncritical paths or increase the speed of a failing paths. Tools to add decoupling capacitors, gate arrays (see Section II-E) and redundant vias or to tweak n-well/rox layers could then be run on the completed layout to improve yield and performance. A number of physical checks, including DRC and LVS, methodology, DFT, extraction, power and transistor-level timing, were performed to validate the design, followed by electromigration, noise and IR drop analysis to ensure circuit reliability.

4) RLM Methodology
The synthesized random logic macro (RLM) methodology was designed to have as much commonality with customs as possible to allow maximum sharing of tools and checkers.
RLMs were designed with the same bit image as customs and were generally allowed unrestricted use of M1–M3 while M4
(and higher for special cases) was shared with the unit via contracts. Pins were required to follow a more restrictive set of placement and spacing rules to provide the highest possible pin density while still ensuring accessibility to pins for both the unit and the RLM by automated routing tools.
The RLM process was broken into three major phases: syn- thesis and placement, routing, and physical validation. The _rst step was performed in an IBM tool suite that combined logical synthesis, mapping, placement and timing capabilities in an in- tegrated framework called PDSRTL [6]. Given a VHDL design, macro dimensions with pin locations, and a set of timing con- tracts, PDSRTL optimized the design for timing, power, area and electrical constraints. A carefully tuned set of default pa- rameters yielded high quality results for the majority of the de- signs, while at the same time these could be customized to adapt to characteristics of individual RLMs.
The second step took the fully placed RLM, pre-routed wide clock nets based on LCB and latch placement, and added _ll cells (see Section II-E) before the design was run through a grid-based routing tool. A fully redundant via set could be used on 95% of the designs for increased yield, with the remaining
RLMs using a mixed via set. The _nal routed design was trans- lated into a standard layout by removing _oorplanning informa- tion and replacing abstracts of all the standard cells with actual layouts. At this point, the methodology aligned with the custom macro methodology and the same physical checks are performed to validate the design. Typically, RLMs were clean by construction and only required minimal tweaking to pass all requirements.
Like for customs, Einstuner and LAVA substitution were available for post-layout tuning.

5) Filler Cell and ECO Methodology
The aggressive schedule of the POWER6 design required physical design (PD) to already be in late stages while veri_- cation work was still ongoing, resulting in an unusually large number of engineering change orders (ECOs). The RLM _ow was capable of automatically taking a modi_ed (and option- ally placed) netlist, merging it into the existing design and run- ning incremental routing to update the layout. In past designs, once the front-end-of-the-line (FEOL) layers were locked and no further changes to cell sizing or placement were possible, back-end-of-the-line (BEOL) or wire-only ECO capability was severely limited by the number of spare cells of each type and their location in the macro. In POWER6, a special set of gate- array cells, each containing a single PFET and NFET device, were used to _ll the unused area in both RLM and custom de- signs. When used as _ll, these cells remained disconnected to have no impact on power, but in a BEOL ECO, they could be combined and replaced by functional cells that had the exact same FEOL layers, but connected the transistors to form any type of static gate. This capability, combined with spare latches that were scattered throughout all RLMs and some customs, al- lowed even the most complex changes to be completed using
BEOL layers only.
For extremely complex changes where VHDL did not easily correspond to the netlist, an experimental process was introduced for RLMs that made the PDSRTL tool aware of the gate-array methodology. By describing a delta-VHDL, a designer could essentially graft a new cone of logic into the design. PDSRTL would swap _ll cells for gate-array cells as needed and, where possible, reuse existing cells to map the new logic. While results were generally not as ef_cient as manual
ECOs, this new approach proved to be extremely valuable for complex situations where an ECO would have otherwise been unfeasible. 6) Timing Methodology
In order to achieve timing closure on the POWER6 chip, par- allel development at all levels of the design, rapid iteration and early timing estimation were essential in addition to highly ac- curate timing models. Hierarchies on POWER6 included chip, core, nest, unit, and macro levels. Using assertions or timing contracts to describe boundary conditions such as arrival times, slews and capacitance loads, a top-down methodology was used, allowing each level of the design to be analyzed and iterated in- dependently of the others.
Only the basic best case and nominal case timing corners needed to be run separately to evaluate timing at any given level of hierarchy, due to parallel modeling of all clock phases, voltage levels and both pulsed and nonpulsed modes in a single run. A third run that modeled actual hostile capacitances and their impact on timing was used late in the design for detailed noise coupling analysis.
RLMs were treated in the same way as customs in unit, core and chip timing environments through the use of transistor-level timing abstracts. This is a departure from previous POWER de- signs, where RLMs were modeled using standard cells and as- sociated delay equations, and yielded much more consistent and accurate timing information.
E. SUMMARY/CONCLUSION
The POWER6 chip has been fabricated in IBM 65-nm
SOI process. This process technology incorporates multiple device thresholds and ten layers of copper wiring with a low-kdielectric. The logic circuits were predominately implemented in static CMOS circuits in order to reduce power. The POWER6 chip employs three distinct latch designs: a scannable, dynamic front-end latch that incorporates logic function into the latch; a scannable, master–slave latch that can be operated in pulsed mode to save power; and a scannable, hybrid pulsed latch that can be operated in an L2-only mode in order to minimize latch insertion delay, or in a safety mode for burn-in. The low-skew high-frequency
POWER6 processor global clock distribution network was described.
The POWER6 processor used a new custom macro design methodology to estimate parasitic resistances and capacitances earlier in the design flow. This methodology reduced the layout rework, extraction, and timing iterations needed to close all custom paths to a 13-FO4 cycle time.
The POWER6 processor parts have been demonstrated to operate in excess of 5 GHz and within the power constraints established for the chip. Chip power dissipation is reduced through modulation of operating voltages, fine-grained clock gating, latch and logic gate sizing, VT optimization, pulsed latches, and half-frequency operation of portions of the chip.
The POWER6 chip has been extensively tested at wafer, first-level package, and system levels. The evaluation was accomplished via LBIST, ABIST, and
(real code) functional exercisers across wide voltage, frequency, and temperature ranges as well as process technology variations. The range of POWER6 systems includes "Express" models (the 520, 550 and 560) and Enterprise models (the 570 and 595).

ACKNOWLEDGMENT
It is not until you undertake the project like this one that you realize how massive the effort it really is, or how much you must rely upon the Selfless efforts and good will of others. There are many who helped me with this project, and I want to thank them all from the core of my heart. I owe special words of thanks to my Teacher for her vision, thoughtful counseling and encouragement at every step of the project. I am also thankful to the teachers of the Department for giving me the best of knowledge and guidance throughout the project. All this has become reality because of their blessings and above all by the grace of god.
REFERENCES
[1] http://en.wikipedia.org/wiki/POWER6#Products
[2] http://www.gizmag.com/go/7307/
[3] http://www.isham-research.co.uk/mainframe_2008.html
[4] http://www-03.ibm.com/systems/power/hardware/
[5] http://www.itjungle.com/tfh/tfh082205-story01.html
[6] http://www.realworldtech.com/page.cfm?ArticleID=RWT121905001634
[7] +Additional pdf’s

Similar Documents

Premium Essay

History of Computing

...1950–1979 1980–1989 1990–1999 2000–2009 2010–2019 more timelines ... Category Category v t e Computer operating systems (OSes) provide a set of functions needed and used by most application programs on a computer, and the linkages needed to control and synchronize computer hardware. On the first computers, with no operating system, every program needed the full hardware specification to run correctly and perform standard tasks, and its own drivers for peripheral devices like printers and punched paper card readers. The growing complexity of hardware and application programs eventually made operating systems a necessity. Contents [hide] 1 Background 2 Mainframes 2.1 Systems on IBM hardware 2.2 Other mainframe operating systems 3 Minicomputers and the rise of Unix 4 Microcomputers: 8-bit home computers and game consoles 4.1 Home computers 4.2 Rise of OS in video games and consoles 5 Personal computer era 6 Rise of virtualization 7 See also 8 Notes 9 References 10 Further reading Background[edit] Question book-new.svg This section does not cite any references or sources. Please help...

Words: 4042 - Pages: 17

Free Essay

Cisc vs Risc

...easily implemented in RISC cores. This can easily allow CISC processors to approach RISC performance. However, CISC ISAs do have the additional burden of translating instructions to micro-operations. In a 1991 study between VAX and MIPS, Bhandarkar and Clark showed that after canceling out the code size advantage of CISC and the CPI advantage of RISC, the MIPS processor had an average 2.7x advantage over the studied CISC processor (VAX). A 1997 study on Alpha 21064 and the Intel Pentium Pro still showed 5% to 200% advantage for RISC for various SPEC CPU95 programs. A decade later and after introduction of interesting techniques such as fusion of micro-operations in the x86, we set off to compare a recent RISC and a recent CISC processor, the IBM POWER5+ and the Intel Woodcrest. We find that the SPEC CPU2006 programs are divided between those showing an advantage on POWER5+ or Woodcrest, narrowing down the 2.7x advantage to nearly 1.0. Our study points to the fact that if aggressive micro-architectural techniques for ILP and high performance can be carefully applied, a CISC ISA can be implemented to yield similar performance as RISC processors. Another interesting observation is that approximately 40% of all work done on the...

Words: 7813 - Pages: 32

Premium Essay

Computer Organization and Architecture Designing for Performance 8th Edition

...COMPUTER ORGANIZATION AND ARCHITECTURE DESIGNING FOR PERFORMANCE EIGHTH EDITION William Stallings Prentice Hall Upper Saddle River, NJ 07458 Library of Congress Cataloging-in-Publication Data On File Vice President and Editorial Director: Marcia J. Horton Editor-in-Chief: Michael Hirsch Executive Editor: Tracy Dunkelberger Associate Editor: Melinda Haggerty Marketing Manager: Erin Davis Senior Managing Editor: Scott Disanno Production Editor: Rose Kernan Operations Specialist: Lisa McDowell Art Director: Kenny Beck Cover Design: Kristine Carney Director, Image Resource Center: Melinda Patelli Manager, Rights and Permissions: Zina Arabia Manager, Visual Research: Beth Brenzel Manager, Cover Visual Research & Permissions: Karen Sanatar Composition: Rakesh Poddar, Aptara®, Inc. Cover Image: Picturegarden /Image Bank /Getty Images, Inc. Copyright © 2010, 2006 by Pearson Education, Inc., Upper Saddle River, New Jersey, 07458. Pearson Prentice Hall. All rights reserved. Printed in the United States of America. This publication is protected by Copyright and permission should be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permission(s), write to: Rights and Permissions Department. Pearson Prentice Hall™ is a trademark of Pearson Education, Inc. Pearson® is a registered trademark of...

Words: 239771 - Pages: 960

Premium Essay

Managing Information Technology (7th Edition)

...CONTENTS: CASE STUDIES CASE STUDY 1 Midsouth Chamber of Commerce (A): The Role of the Operating Manager in Information Systems CASE STUDY I-1 IMT Custom Machine Company, Inc.: Selection of an Information Technology Platform CASE STUDY I-2 VoIP2.biz, Inc.: Deciding on the Next Steps for a VoIP Supplier CASE STUDY I-3 The VoIP Adoption at Butler University CASE STUDY I-4 Supporting Mobile Health Clinics: The Children’s Health Fund of New York City CASE STUDY I-5 Data Governance at InsuraCorp CASE STUDY I-6 H.H. Gregg’s Appliances, Inc.: Deciding on a New Information Technology Platform CASE STUDY I-7 Midsouth Chamber of Commerce (B): Cleaning Up an Information Systems Debacle CASE STUDY II-1 Vendor-Managed Inventory at NIBCO CASE STUDY II-2 Real-Time Business Intelligence at Continental Airlines CASE STUDY II-3 Norfolk Southern Railway: The Business Intelligence Journey CASE STUDY II-4 Mining Data to Increase State Tax Revenues in California CASE STUDY II-5 The Cliptomania™ Web Store: An E-Tailing Start-up Survival Story CASE STUDY II-6 Rock Island Chocolate Company, Inc.: Building a Social Networking Strategy CASE STUDY III-1 Managing a Systems Development Project at Consumer and Industrial Products, Inc. CASE STUDY III-2 A Make-or-Buy Decision at Baxter Manufacturing Company CASE STUDY III-3 ERP Purchase Decision at Benton Manufacturing Company, Inc. CASE STUDY III-4 ...

Words: 239887 - Pages: 960