Free Essay

Distributed System Failures

In: Computers and Technology

Submitted By minori89
Words 726
Pages 3
Distributed System Failures
There are four types of failures that may be encountered when using and operating within a distributed system. Hardware failures occur when a single component within the system fails. Network failures refer to the failure of links within the distributed system network. Application failure occur to the failure of applications that run within the system, and can occur when the application stops working or operates incorrectly. Failure of synchronization occurs when different points in the system do not synchronize correctly. Both hardware and application failures may also occur within a centralized system as well as distributed systems. In the event of an application failure, it is important to first be able to differentiate between operator error and software error in order to determine the point of failure. When a hardware error occurs, this can be due to a few simple causes.
Hardware failures occur when a single component within the system fails. The most common types of hardware failures are of a link, a site, or the loss of a message. At one point hardware failures were a common occurrence, but with recent innovations in hardware design and manufacturing these failures tend to be few and far between. Instead, more failures that now occur tend to be network or drive related.
Network failures refer to the failure of links within the distributed system network. Processors within a distributed system need to be able to communicate with each other via the network. When a link in this network connection fails, this causes functions to cease running.
Application failure occur to the failure of applications that run within the system, and can occur when the application stops working or operates incorrectly. These failures may be caused by a variety of issues, including software bugs. Because there are numerous fail points in software issues, the problem can be hard to replicated and solve.
Failure of synchronization occurs when different points in the system do not synchronize correctly. When individual processors in the distributed system fail to synchronize, processes that require two or more processors to complete successfully instead become delayed or fail.
Both hardware and application failures may also occur within a centralized system as well as distributed systems. A hardware failure in a centralized system can be catastrophic if it were to occur at the hub, since this would affect operations to all outlying machines. An application failure in the hub as well could prove to be more troublesome that if it were to occur in an access point. Although application errors can usually be attributed to faulty code or a bug, it can also occur due to operator error.
In the event of an application failure, it is important to first be able to differentiate between operator error and software error in order to determine the point of failure. If the failure is due to operator error, the fix is something as simple as training of the operator who initially made the error. In the case of actual software error, in order to determine the course of action to fix the issue first you must be able to replicate the problem. By replicating the issue, you are then able to narrow down where the bug may be occurring.
When a hardware error occurs, this can be due to a few simple causes. The general rule is that the more hardware there is present in the system, the more likely for a failure to occur. By replicating the process that took place when the failure occurred, it is then possible to be able to determine what faulty piece of hardware is malfunctioning.
Although failures occur within both centralized and distributed systems, by following procedure in replicating the processes the cause of the failure can usually be determined, though each type of failure presents a unique challenge. Hardware failures can usually be repair just by replacing a faulty piece of equipment, while application failures are a little more complicated. Network and failure of synchronization errors may span through the network, which can cause delay in processes as well as being able to replicate faulty processes.

References
Distributed System Failure Types. (2014, April). Studymode.com. Retrieved from http://www.studymode.com/essays/Distributed-System-Failure-Types-1602939.html
Krzyzanowski, P. (2009, April). Distributed Systems. Retrieved from https://www.cs.rutgers.edu/~pxk/rutgers/notes/content/ft.html

Similar Documents

Premium Essay

Distributed System Failure

...A distributed system is a collection of processors that run a single system, but may act independently. The processors on a distributed system can be on a single computer or multiple computers and can be spread across a local or wide area network. With this type of systems, potential problems can arise. The following will address some of these problems. Network Failure One problem that may arise in a distributed system is a failure within the network. The processors on a distributed system must communicate with each other over a network and failure to do so could cause problems with the function needing to be carried out. In order to fix this problem, you would need to find out which end the problem is originating from. This can be done by checking the data sent by all the processors and seeing if the data is being sent correctly. This will help to determine whether or not the problem is in the sending of the data or the receiving of the data within the network. After isolating the source of the problem, it can be addressed appropriately. Timing Failure A timing failure can occur when processors on the network are not synchronized. When processors are not synchronized, then processes that require two or more processors might become delayed or fail all together....

Words: 573 - Pages: 3

Premium Essay

Distributed System Failures

...Distributed System Failures Mark McCarley POS/355 Terrance Carlson June 23, 2014 A distributed system can be described as a collection of computer systems linked together via a network and fully equipped with distributed system software. The distributed system software allows the individuals computer systems to coordinate computing activities and share resources such as system hardware and software as well as data. To the end-user a distributed system should appear as a single system that allows seamless interaction and improves overall availability and performance. A distributed system appears in direct contrast to a system where end-users are fully aware that there are several systems and/or locations. In some cases, in a non-distributed system end-user may even be aware of storage replication and load balancing. According to the “Georgia State University” (2014) website there are four main goals of a distributed system: Connecting resources and users, distribution transparency, openness and scalability. Similar to the goals of a distributed system, there are also four main types of possible failures that can occur in a distributed system: Crash failures, hardware failures, omission failures and byzantine failures. Crash failures, also referred to as operating system failures, are most typically associated with a server fault in distributed systems....

Words: 273 - Pages: 2

Free Essay

Distributed System Failures

...Victoria White Distributed System Failure December 16, 2013 There are two types of system structures that can be created. The first is a centralized system, which consists of one or more major hubs. All communication is processed through these hubs. This system setup provides security, to an extent, since all of the computing is done through a single computer. However, it also creates a single point of failure, if the main computer goes down the system is down. A distributed system is a collection of processors connected by a communication network. The processors may include microprocessors, workstations, minicomputers, and large computer systems. These processors are known by a few different names, sites, hosts, nodes, computers, and machines. There are a couple major reasons for creating a distributed system, these reasons include resource sharing, communication, reliability, and computation speedup. However, there are a few failures that may occur with a distributed system these failures include link failure, host failure, storage media failure, and scalability. The first failure, link failure, occurs when the connection between two parts of the system fails. When this takes type of failure takes place the two parts of the system connecting can no longer communicate with each other. To detect link failure, a procedure known as handshaking is done. With this procedure first the host that is still functioning will continue to send I-am-up messages to the other host....

Words: 1102 - Pages: 5

Free Essay

Failures of a Distributed System

...Failures of a Distributed System POS/355 July 25, 2013 Failures of a Distributed System In the words of Adam Savage from Mythbusters, “failure is always an option”. This holds true when talking about a distributed system, which is a computer network like a Wide Area Network (WAN) or a Local Area Network (LAN). Distributed systems is defined as a software system in which components located on networked computers communicate and coordinate their actions by passing messages (Coulouris, Dollimore, Kindberg, & Blair, 2012). This allows the computers or even devices like smart phones and tablets, to share resources like printers, hard drives, and even internet access. A centralized system is a computer that is by itself, one that is not connected to a laptop. Think of a centralized computer as one of the spy computers in movies, like Mission Impossible. These systems can and will fail, while sharing some failures; a distributed system has more components that could fail, leading to them having more problems. There a many things that could fail on a distributed system, this paper will cover four of them, starting with hardware failure. Video cards, network access card, hard disk drives, solid-state drives, memory, and power supply units (PSU), these are all pieces of hardware that are in most of the computers sold today, and they can all die at a moment’s notice....

Words: 1133 - Pages: 5

Premium Essay

Failures in Distributed and Centralized Systems

...Failures in Distributed and Centralized Systems Student Name POS/355 Instructor Name Date Failures in Distributed and Centralized Systems In today’s technology we have a vastly wide range of options when it comes to networking and linking computer systems. Organizations use a few different methods to linking their systems together. Large organizations, such as banks, power grids, and airport flight controller systems use what is called a distributed system. A distributed system must be reliable, available, safe, and secure. Since a distributed system is a widely available system that is essentially a collection of independent computers. With any large system, there are more components, more software, and more security risks that can jeopardize the system’s integrity. Many smaller organizations use what is called a centralized system, which can be anything from a personal computer to several terminals connected to a server. These systems can run into a few errors within their processes called failures. Distributed System According to our text, “A distributed system is a collection of processors that do not share memory or a clock. Instead, each processor has its own local memory. The processors communicate with one another through various communication networks, such as high-speed buses or telephone lines. In this chapter, we discuss the general structure of distributed systems and the networks that interconnect them.” (Silbershatz, A., Galvin, P....

Words: 1091 - Pages: 5

Premium Essay

Four Types of Distributed Computer System Failures

...Four Types of Distributed Computer System Failures University of Phoenix August 19, 2013 David Conway Four Types of Distributed Computer System Failures This paper will discuss four common types of distributed computer system failures which are Crash failures also known as operating system failures, Hardware Failures, Omission Failures and Byzantine Failures. Included in the discussion are failures which can also occur in a centralized computer system, and how to isolate and repair two types of failures. Crash failures are normally associated with a server fault in a typical distributed system. Inherently crash failures are interrupt operations of the server and can halt operation for a considerable time (Projects Helper, 2012).Operating system failures are the best examples for this scenario. Operating System or software failures come in many more varieties than hardware failures. Software bugs in distributed systems can be difficult to replicate and, consequently, repair and or debug. Corresponding fault tolerant systems are developed and employed with respect to these affects. An operating system or software failure can also occur in a centralized system such as a data base this is why it is highly recommended to back up a data base using stable mass storage media (Projects Helper, 2012)....

Words: 1180 - Pages: 5

Free Essay

Week3 Pos/355

...| |2014 | | | POS/355 | | |Professor Sumayao | | | | | |June 9, 2014 | |[Week 4 Individual Assignment-Failures] | | | Types of Failure in Distributed System December 5, 2012 Types of Failure in Distributed System To design a reliable distributed system that can run on unreliable communication networks, it is utmost important to recognize the various types of failures that a system has to deal with during a failure state. Broadly speaking failures of a distributed system fall into two obvious categories: hardware and software failure. A distributed system may suffer any of such types of failures. Yet each of the failure has its own particular nature, reasons and corresponding remedial actions to restore smooth operation (Ray, 2009). Follow are few types of failure that may occur for a distributed system....

Words: 731 - Pages: 3

Free Essay

Pos 355 Failures

...Failures POS/355 August 26, 2013 UOPX Failures Distributed systems emerged recently in the world of computers. A distributed system is an application of independent computers that appear to work as a coherent system to its users. The advantages of distributed systems consist of developing the ability to continually to open interactions with other components to accommodate a number of computers and users. Thus, stating that a stand-alone system is not as powerful as a distributed system that has the combined capabilities of distributed components. This type of system does have its complications and is difficult to maintain complex interactions continual between running components. Problems do arise because distributed systems are not without its failures. Four types of failures will characterize and the solutions to two of these failures will address on how to fix such problems. Before constructing a distributed system reliable one must consider fault tolerance, availability, reliability, scalability, performance, and security. Fault tolerance means that the system continues to operate in the event of internal or external system failure to prevent data loss or other issues. Availability needed to restore operations to resume procedure with components has failed to perform. For the system to run over a long period without any errors is need and known as reliability. To remain scalable means to operate correctly on a large scale....

Words: 953 - Pages: 4

Free Essay

Distributed Database

...ABSTRACT Today's business environment has an increasing need for distributed database and client/server applications as the desire for reliable, scalable and accessible information is steadily rising. Distributed database systems provide an improvement on communication and data processing due to its data distribution throughout different network sites. Not only is data access faster, but a single-point of failure is less likely to occur, and it provides local control of data for users. However, there is some complexity when attempting to manage and control distributed database systems. The DDBMS synchronizes all the data periodically, and in cases where multiple users must access the same data, ensures that updates and deletes performed on the data at one location will be automatically reflected in the data stored elsewhere. A distributed database can also be defined as a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system is then defined as the software system that permits the management of the distributed databases and makes this distribution transparent to the users. Distributed database system is to referred as a combination of the distributed databases and the distributed DBMS Current trends in multi-tier client/server networks make DDBS an appropriated solution to provide access to and control over localized databases....

Words: 3658 - Pages: 15

Premium Essay

Distributed Systems and Centralized Systems

...Distributed System and Centralized Failures By Kentrell Lanier POS/355 March 28, 2014 Paul Borkowski Distributed System and Centralized System Failures Distributed system is many computers linked together that take on different tasks and act like one big computer. Distributed system is found in business across the world. When computers are linked together they share the same database and server. Distributed system is constructed for resource sharing, computation speedup, reliability, and communication Distributed system have different names for the computers in the system. Names such as sites, nodes, computers, machines, and host. Each names goes to a computer that’s part of the system. Resource sharing is when computers link up and they have different data any user can use the data form any computer in the system. Computation speedup is when the system recognize that one computer is over worked so the system have computers that’s have less duties to perform the tasks. Computation speedup help the system from crashing and tasks are preformed quicker. Distributed systems are more reliable because if one computer crash or fail the others can share its responsibilities and system will continue running smoothly. By computers being link together the users can communicate between each other. Two Types of failure When dealing with computers there are two types of failures. You can have a hard drive failure or a software failure....

Words: 874 - Pages: 4

Free Essay

Poss 355

...FAILURES POSS / 355 Moore Clarence 29 june 2015 BOB O CONNER To begin what is a distributed system? There are several words that can describe parts that make up a distributed system. A program , a process, a message, packet, protocol, network components all take part in helping define what a distributed system makes of. A distributed system is an application that executes a collection of protocols to coordinate cooperate together to perform a single or small set of related tasks. Failure is the defining difference between distributed and local programming. So you have to design distributed system with the expectation of failures. Handling failures is an important theme in distributed systems design. Failures fall into two obvious categories. Hardware and software. Hardware failures was once an issue but since has improved a lot. Dealing with a lot of improvements to such items as wiring and circuits played positive roles to improving hardware the mechanical and network failures are part of todays problems. Software failures is part of a distributed system. When a software failure occurs it often affect downtime to the distributed system. The computer freezing or fail stop and so often even a network failure. Types of failures includes crash failures that is when a server halts, but its working correctly until it halts....

Words: 346 - Pages: 2

Premium Essay

Failures Paper

...Failures Paper Charles Persinger University of Phoenix POS/355 Jeff Rugg April 28, 2014 Simply put, distributed computing is allowing computers to work together in groups to solve a single problem too large for any one of them to perform on its own. Distributed computing is not a simple matter of just sticking the computers together. For a distributed computation to work effectively, those systems must cooperate, and must do so without lots of manual intervention by people. This is usually done by splitting problems into smaller pieces, each of which can be tackled more simply than the whole problem. The results of doing each piece are then reassembled into the full solution. As handy as a distributed system can be there are a there are four main issues you could face: Operating system failures, Hardware Failures, Omission Failures and Byzantine Failures. Crash failures are caused across the server of a typical distributed system and if these failures are occurred operations of the server are halt for some time. Operating system failures are the best examples for this case and the corresponding fault tolerant systems are developed with respect to these affects. Hardware failures used to be more common, but with all of the recent innovations in hardware design and manufacturing they tend to be fewer and far between with most of these physical failures tending to be network or drive related....

Words: 747 - Pages: 3

Free Essay

Hello

...Failures The following paper will examine four types of failures that may occur in a distributed system. Also discussed is how these failures relate to a centralized system. Lastly, two of the four failures common to both a distributed and a centralized system will be isolated and fixed. A distributed operating system gives the appearance of a single system; however in all actuality it is a collection of computers that are connected to a network. This collection of computers, or distributed operating system, share resources and therefore encounters problematic failures as a result (Stallings, 2012). Failures experienced by distributed operating systems include communication faults, machine failures or fail-stop, storage-device crashes and decays of storage media, and network failures (Ghosh & Mathur,  2011). Communication faults In order to detect communication faults a time-out scheme can be used. When a communication, or message, is sent out it specifies a time interval during specifying the length of time it will wait for an acknowledgement message from the sender. If the sender received the acknowledgement message within the specified timeframe, then all is well and good. However, if the message is outside of that timeframe then we know that we are experiencing a communication fault and a time-out is occurring. In this case, the sender can send a message to the receiver asking ‘are you up?’....

Words: 1353 - Pages: 6

Premium Essay

Son of Computer and Technology

...Crash failures: Crash failures are caused across the server of a typical distributed system and if these failures are occurred operations of the server are halt for some time. Operating system failures are the best examples for this case and the corresponding fault tolerant systems are developed with respect to these affects. Timing failures: Timing failures are caused across the server of a distributed system. The usual behavior of these timing failures would be like that the server response time towards the client requests would be more than the expected range. Control flow out of the responses may be caused due to these timing failures and the corresponding clients may give up as they can’t wait for the required response from the server and thus the server operations are failed due to this. Omission failures: Omission failures are caused across the server due to lack or reply or response from the server across the distributed systems. There are different issues raised due to these omission failures and the key among them are server not listening or a typical buffer overflow errors across the servers of the distributed systems. Byzantine failures: Byzantine failures are also know as arbitrary failures and these failures are caused across the server of the distributed systems. These failures cause the server to behave arbitrary in nature and the server responds in an arbitrary passion at arbitrary times across the distributed systems....

Words: 284 - Pages: 2

Free Essay

Pos/355 Project Failures

...Failures Adam Cain POS/355 2/6/2014 Randy Shirley Failure is not an option! This is what I have been told growing up and while I served in the Marine Corps, but as I found out in this assignment, failure is an option. This holds true when talking about a distributed system, which is a computer network like a Wide Area Network (WAN) or a Local Area Network (LAN). Distributed systems is defined as a software system in which components located on networked computers communicate and coordinate their actions by passing messages (Coulouris, Dollimore, Kindberg, & Blair, 2012). This allows the computers or even devices like smart phones and tablets, to share resources like printers, hard drives, and even internet access. A centralized system is a computer that is by itself, one that is not connected to a laptop. Think of a centralized computer as one of the spy computers in movies, like Mission Impossible. These systems can and will fail, while sharing some failures; a distributed system has more components that could fail, leading to them having more problems. There a many things that could fail on a distributed system, this paper will cover four of them, starting with hardware failure. Video cards, network access card, hard disk drives, solid-state drives, memory, and power supply units (PSU), these are all pieces of hardware that are in most of the computers sold today, and they can all die at a moment’s notice....

Words: 1129 - Pages: 5