Free Essay

Week3 Pos/355

In:

Submitted By importgirl82
Words 731
Pages 3
| |2014 |
| | POS/355 |
| |Professor Sumayao |
| | |
| |June 9, 2014 |

|[Week 4 Individual Assignment-Failures] |
| |

Types of Failure in Distributed System

December 5, 2012

Types of Failure in Distributed System

To design a reliable distributed system that can run on unreliable communication networks, it is utmost important to recognize the various types of failures that a system has to deal with during a failure state. Broadly speaking failures of a distributed system fall into two obvious categories: hardware and software failure. A distributed system may suffer any of such types of failures. Yet each of the failure has its own particular nature, reasons and corresponding remedial actions to restore smooth operation (Ray, 2009).

Follow are few types of failure that may occur for a distributed system.

Transaction failure: Transaction failure is a centralized system failure. The failures generally occur due to two types of errors.

These errors are: application software errors and system errors. In case of any logical error in the application software that is used for accessing a database the transaction will not continue for a longer period. In such case the system will go in a deadlock situation that results the failure of one or more transactions. The types of errors that will occur in such case are called system errors. Timing failures: Timing failures are also centralized failures and caused across the server of a distributed system. The failures occur when the server response time corresponding to the client requests is more than the expected range. In such situations as control flow out of the responses hence the corresponding clients may give up the requests as they can’t wait too long to have a response from the server. The end result is that the server operations are failed and due to this timing failure (Sai, n. d.).

Byzantine failures: Byzantine failures or called arbitrary failures are purely centralized failures and caused across the server of the distributed systems. In such failures situations the behavior of server is arbitrary in nature and it responds in an arbitrary passion at arbitrary times across the distributed systems resulting an inappropriate out from the server. Hence the chances of some the malicious events and duplicate messages from the server side will increase and the clients will face arbitrary and unwanted duplicate updates and messages from the server.

Site Failure: Site failure is a localized failure i.e. it can occur in response to failure of any system in distributed system environment. There can be numerous reasons for a system failure including both hardware and software. Also system failure can be partial or total failure of system. In partial failure few sites in distributed system are down while others are operational. The total failure exhibits that all sites of a distributed system are simultaneous down and fail to respond (Ray, 2009).

Depending upon failure state various approaches are adopted to recover the distributed system to its original state. For example in transaction failure situation a transaction manager must have the ability to communicate to all resource managers in distributed system that is in use by the applications. For each resource manager, transaction manager exploits the

XAResource recover technique to retrieve the list of transactions currently in a prepared or heuristically completed state when failure occurs. All the transactional resource factories used by the applications deployed on a system are generally configured by system administrator. Since XAResource objects are not persistence over a prolong period across system failures, it is essential that Transaction Manager must have the ability to acquire the XAResource objects that represent the resource managers which might have participated in the transactions before the failure of a system (Little, Dinn, Halliday, & Connor, 2011).

Also, to overcome the Byzantine failures various approaches have been utilized that includes ckeckpointing protocol with assumption that it is permissible to roll back the execution to a previous checkpoint (Agbaria, & Friedman, n. d.).

References

Agbaria, A., Friedman, R. (n. d.). Overcoming Byzantine Failures Using Checkpointing. Baidu.

Retrieved from

http://wenku.baidu.com/view/d4c11138580216fc700afd46.html

Little, M., Dinn, A., Halliday, J., & Connor, K. (2011). JBoss Enterprise Web Platform 5.

Retrieved from https://access.redhat.com/knowledge/docs/en-

US/JBoss_Enterprise_Web_Platform/5/html/Transactions_JTA_Development_Guide/ind

ex.html

Ray, Chhanda. (2009). Distributed Database Systems. New Delhi. Dorling Kindersley (Ind.)

Pvt. Ltd.

Sai. (n. d.). Types of Failures in Distributed Systems. Retrieved from

http://1000projects.org/types-of-failures-in-distributed-systems.html

Similar Documents

Premium Essay

Scp and Sap Apo

...Supply Chain Management and Advanced Planning Hartmut Stadtler ´ Christoph Kilger (Eds.) Supply Chain Management and Advanced Planning Concepts, Models, Software and Case Studies Third Edition With 173 Figures and 56 Tables 12 Professor Dr. Hartmut Stadtler FG Produktion und Supply Chain Management FB Rechts- und Wirtschaftswissenschaften TU Darmstadt Hochschulstraûe 1 64289 Darmstadt Germany stadtler@bwl.tu-darmstadt.de Dr. Christoph Kilger j&m Management Consulting AG Kaiserringforum Willy-Brandt-Platz 5 68161 Mannheim Germany christoph.kilger@jnm.de Cataloging-in-Publication Data Library of Congress Control Number: 2004110194 ISBN 3-540-22065-8 Springer Berlin Heidelberg New York ISBN 3-540-43450-X 2nd edition Springer Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springeronline.com ° Springer Berlin ´ Heidelberg 2000, 2002, 2005 Printed...

Words: 180845 - Pages: 724