Free Essay

Clustering - Hierarchical

In:

Submitted By Nossier
Words 1378
Pages 6
Clustering - Hierarchical1/15/2015

Clustering ­ Hierarchical

A Tutorial on Clustering Algorithms
Introduction | K­means | Fuzzy C­means | Hierarchical | Mixture of Gaussians | Links

Hierarchical Clustering Algorithms
How They Work
Given a set of N items to be clustered, and an N*N distance (or similarity) matrix, the basic process of hierarchical clustering (defined by S.C. Johnson in 1967) is this:
1. Start by assigning each item to a cluster, so that if you have N items, you now have N clusters, each containing just one item. Let the distances (similarities) between the clusters the same as the distances (similarities) between the items they contain.
2. Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you have one cluster less.
3. Compute distances (similarities) between the new cluster and each of the old clusters.
4. Repeat steps 2 and 3 until all items are clustered into a single cluster of size N. (*)
Step 3 can be done in different ways, which is what distinguishes single­linkage from complete­linkage and average­linkage clustering.
In single­linkage clustering (also called the connectedness or minimum method), we consider the distance between one cluster and another cluster to be equal to the shortest distance from any member of one cluster to any member of the other cluster. If the data consist of similarities, we consider the similarity between one cluster and another cluster to be equal to the greatest similarity from any member of one cluster to any member of the other cluster.
In complete­linkage clustering (also called the diameter or maximum method), we consider the distance between one cluster and another cluster to be equal to the greatest distance from any member of one cluster to any member of the other cluster.
In average­linkage clustering, we consider the distance between one cluster and another cluster to be equal to the average distance from any member of one cluster to any member of the other cluster.
A variation on average­link clustering is the UCLUS method of R. D'Andrade (1978) which uses the median distance, which is much more outlier­proof than the average distance.
This kind of hierarchical clustering is called agglomerative because it merges clusters iteratively. There is also a divisive hierarchical clustering which does the reverse by starting with all objects in one cluster and subdividing them into smaller pieces. Divisive methods are not generally available, and rarely have been applied.
(*) Of course there is no point in having all the N items grouped in a single cluster but, once you have got the complete hierarchical tree, if you want k clusters you just have to cut the k­1 longest links.

Single­Linkage Clustering: The Algorithm
Let’s now take a deeper look at how Johnson’s algorithm works in the case of single­linkage clustering.
The algorithm is an agglomerative scheme that erases rows and columns in the proximity matrix as old clusters are merged into new ones.
The N*N proximity matrix is D = [d(i,j)]. The clusterings are assigned sequence numbers 0,1,......, (n­1) http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/hierarchical.html 1/6

1/15/2015

Clustering ­ Hierarchical

and L(k) is the level of the kth clustering. A cluster with sequence number m is denoted (m) and the proximity between clusters (r) and (s) is denoted d [(r),(s)].
The algorithm is composed of the following steps:
1. Begin with the disjoint clustering having level L(0) = 0 and sequence number m = 0.
2. Find the least dissimilar pair of clusters in the current clustering, say pair (r), (s), according to d[(r),(s)] = min d[(i),(j)] where the minimum is over all pairs of clusters in the current clustering. 3. Increment the sequence number : m = m +1. Merge clusters (r) and (s) into a single cluster to form the next clustering m. Set the level of this clustering to
L(m) = d[(r),(s)]
4. Update the proximity matrix, D, by deleting the rows and columns corresponding to clusters (r) and (s) and adding a row and column corresponding to the newly formed cluster. The proximity between the new cluster, denoted (r,s) and old cluster (k) is defined in this way: d[(k), (r,s)] = min d[(k),(r)], d[(k),(s)]
5. If all objects are in one cluster, stop. Else, go to step 2.

An Example
Let’s now see a simple example: a hierarchical clustering of distances in kilometers between some Italian cities. The method used is single­linkage.
Input distance matrix (L = 0 for all the clusters): BA

BA FI MI NA RM TO
0

662 877 255 412 996

FI 662 0 295 468 268 400
MI 877 295 0 754 564 138
NA 255 468 754

0

219 869

RM 412 268 564 219 0 669
TO 996 400 138 869 669 0

http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/hierarchical.html

2/6

1/15/2015

Clustering ­ Hierarchical

The nearest pair of cities is MI and TO, at distance 138. These are merged into a single cluster called
"MI/TO". The level of the new cluster is L(MI/TO) = 138 and the new sequence number is m = 1.
Then we compute the distance from this new compound object to all other objects. In single link clustering the rule is that the distance from the compound object to another object is equal to the shortest distance from any member of the cluster to the outside object. So the distance from "MI/TO" to RM is chosen to be 564, which is the distance from MI to RM, and so on.
After merging MI with TO we obtain the following matrix:

BA FI MI/TO NA RM
BA

0 662

877

255 412

FI 662 0
MI/TO 877 295

295
0

468 268
754 564
0 219

NA

255 468

754

RM

412 268

564

219 0

min d(i,j) = d(NA,RM) = 219 => merge NA and RM into a new cluster called NA/RM
L(NA/RM) = 219 m = 2 http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/hierarchical.html 3/6

1/15/2015

Clustering ­ Hierarchical

BA FI MI/TO NA/RM

BA
0 662
FI
662 0
MI/TO 877 295

877
295
0

255
268
564

NA/RM 255 268

564

0

min d(i,j) = d(BA,NA/RM) = 255 => merge BA and NA/RM into a new cluster called BA/NA/RM
L(BA/NA/RM) = 255 m = 3 BA/NA/RM FI MI/TO
BA/NA/RM
0
268 564
FI
268
0
295
MI/TO

564

295

0

min d(i,j) = d(BA/NA/RM,FI) = 268 => merge BA/NA/RM and FI into a new cluster called
BA/FI/NA/RM
L(BA/FI/NA/RM) = 268 http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/hierarchical.html 4/6

1/15/2015

Clustering ­ Hierarchical

m = 4 BA/FI/NA/RM MI/TO
BA/FI/NA/RM
0
295
MI/TO

295

0

Finally, we merge the last two clusters at level 295.
The process is summarized by the following hierarchical tree:

Problems
The main weaknesses of agglomerative clustering methods are: they do not scale well: time complexity of at least O(n2), where n is the number of total objects; they can never undo what was done previously.

Bibliography
S. C. Johnson (1967): "Hierarchical Clustering Schemes" Psychometrika, 2:241­254
R. D'andrade (1978): "U­Statistic Hierarchical Clustering" Psychometrika, 4:58­67
Andrew Moore: “K­means and Hierarchical Clustering ­ Tutorial Slides” http://www­2.cs.cmu.edu/~awm/tutorials/kmeans.html Osmar R. Zaïane: “Principles of Knowledge Discovery in Databases ­ Chapter 8: Data Clustering” http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/hierarchical.html 5/6

1/15/2015

Clustering ­ Hierarchical

http://www.cs.ualberta.ca/~zaiane/courses/cmput690/slides/Chapter8/index.html
Stephen P. Borgatti: “How to explain hierarchical clustering” http://www.analytictech.com/networks/hiclus.htm Maria Irene Miranda: “Clustering methods and algorithms” http://www.cse.iitb.ac.in/dbms/Data/Courses/CS632/1999/clustering/dbms.htm Hierarchical clustering interactive demo
Previous page | Next page

http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/hierarchical.html

6/6

Similar Documents

Premium Essay

Organizational Behavior and Management

...Case As a consequence of policy change, a professor was reverted as lecturer and his immediate subordinate was elevated to professor and head of department cadre. A tremendous change in his approach and attitude towards colleagues as well as students was observed. How would you account for such change? Give your arguments. Solution: Sometimes there is internal movement within an organization when due to policy change, employees are promoted, transferred or demoted. In the case in question, as a result of policy change a professor has been reverted as lecturer and his immediate subordinate has been elevated to professor and head of department cadre. One can confidently say that this is a clear case of demotion. A demotion can be defined as a compulsory reduction in an employee’s rank or job title within the organizational hierarchy of a company, public service department or other body (Wikepedia, 2012). It is also the reassignment of a lower job to an employee with delegation of responsibilities and authority required to perform that lower job and normally a lower level pay. It is the direct opposite of promotion. A demotion may be done for a variety of reasons such as disciplinary purposes, reorganization of work, inability to perform at a higher rank, poor attendance, and poor attitude; therefore, it is in this light that most people view it as a punishment. When somebody is demoted he/she may be demoted within a department as in the case of the professor according to...

Words: 780 - Pages: 4

Premium Essay

Cmgt 530 Week 1 Individual Management Roles

...strategy. (www.revolutioncleaners.com). A business with more than 50 employees, Wal-Mart, was founded in Arkansas in 1962 and became a public company in 1972. The company started with the plan help the low income earners live an enhanced life. Walmart operates in twenty eight different countries and has more than 2.1 million employees globally and is opening new branches almost daily, selling a variety of products earning over 540 billion profits annually. (walmartstores.com) Hierarchical organization. With less than 50 employees, the hierarchical organization of Revolution Cleaners is extremely straightforward. The company has an owner, managers, and employees. The owner/CEO is in highest ranking official in the business and he gives his orders and commands to the store managers who then pass on the information the employees. Communication in Revolution Cleaners is very efficient as the hierarchical structure uncomplicated as opposed to a larger organization, like Walmart. The hierarchical structure in small companies like Revolution Cleaners is effective, as the amount of employees affects the success of the organizational structure. The level of the management is simple making communication and decision making efficient and problem free. Based on the Revolution Cleaners website (and personal...

Words: 678 - Pages: 3

Free Essay

Student

...topicmodels: An R Package for Fitting Topic Models Bettina Grun ¨ Johannes Kepler Universit¨t Linz a Kurt Hornik WU Wirtschaftsuniversit¨t Wien a Abstract This article is a (slightly) modified and shortened version of Gr¨n and Hornik (2011), u published in the Journal of Statistical Software. Topic models allow the probabilistic modeling of term frequency occurrences in documents. The fitted model can be used to estimate the similarity between documents as well as between a set of specified keywords using an additional layer of latent variables which are referred to as topics. The R package topicmodels provides basic infrastructure for fitting topic models based on data structures from the text mining package tm. The package includes interfaces to two algorithms for fitting topic models: the variational expectation-maximization algorithm provided by David M. Blei and co-authors and an algorithm using Gibbs sampling by Xuan-Hieu Phan and co-authors. Keywords: Gibbs sampling, R, text analysis, topic model, variational EM. 1. Introduction In machine learning and natural language processing topic models are generative models which provide a probabilistic framework for the term frequency occurrences in documents in a given corpus. Using only the term frequencies assumes that the information in which order the words occur in a document is negligible. This assumption is also referred to as the exchangeability assumption for the words in a document and this assumption leads...

Words: 6498 - Pages: 26

Free Essay

Data Mining Term Paper

...in the Data mining course. It includes my two proposed approaches in the field of clustering, my learn lessons in class and my comment on this class. The report’s outline is as following: Part I: Proposed approaches 1. Introduction and backgrounds 2. Related works and motivation 3. Proposed approaches 4. Evaluation method 5. Conclusion Part II: Lessons learned 1. Data preprocessing 2. Frequent pattern and association rule 3. Classification and prediction 4. Clustering Part III: My own comments on this class. I. Proposed approach • An incremental subspace-based K-means clustering method for high dimensional data • Subspace based document clustering and its application in data preprocessing in Web mining 1. Introduction and background High dimensional data clustering has many applications in real world, especially in bioinformatics. Many well-known clustering algorithms often use a whole-space distance score to measure the similarity or distance between two objects, such as Euclidean distance, Cosine function... However, in fact, when the dimensionality of space or the number of objects is large, such whole-space-based pairwise similarity scores are no longer meaningful, due to the distance of each pair of object nearly the same [5]. Pattern-space clustering, a kind of subspace clustering, can overcome above problem by discovering such patterns that existing in subspaces...

Words: 5913 - Pages: 24

Premium Essay

Organizational Psychology

...What is Organizational Psychology? Beyounka Bonner PSYCH 570 March 2, 2014 Dr. Vicki Koenig What is Organizational Psychology? According to Britt and Jex (2008), organizational psychology is grounded in scientifically studying one’s behavior and interactions that one has within the workplace, as well as other organizations. Organizational psychology has an extreme effect on a company’s success rate; it has an impact on a company’s work performance, gratification, security, healthiness, and overall well-being of a company’s employees. Research on employee’s behaviors and assertiveness, methods on how improvements can be made to the company’s hiring process, training programs, and managerial coordination are all conducted by organizational psychologists. Organizational psychologists assist organizations in transitioning through different times of adjustments, growth, and developments. There are many factors that may influence the way in which one behaves in an organization, such as, the structure of an organization, societal norms, managing styles, and the expectation of different roles. Although the field of organizational psychology is indebted to the study of organizations that are formal, it is not to say that the organization will always be one that is a business or that is one for profit; which is a common misconception about this field (Britt & Jex, 2008). This essay will explore the field of organizational psychology. Evolution of Organizational Psychology ...

Words: 1352 - Pages: 6

Free Essay

Crime Investigation

...M. Tech (CS) Bharathidasan University, Trichy, India. Abstract: In data mining, Crime management is an interesting application where it plays an important role in handling of crime data. Crime investigation has very significant role of police system in any country. There had been an enormous increase in the crime in recent years. With rapid popularity of the internet, crime information maintained in web is becoming increasingly rampant. In this paper the data mining techniques are used to analyze the web data. This paper presents detailed study on classification and clustering. Classification is the process of classifying the crime type Clustering is the process of combining data object into groups. The construct of scenario is to extract the attributes and relations in the web page and reconstruct the scenario for crime mining. Key words: Crime data analysis, classification, clustering. I. INTRODUCTION Crime is one of the dangerous factors for any country. Crime analysis is the activity in which analysis is done on crime activities. Today criminals have maximum use of all modern technologies and hi-tech methods in committing crimes. The law enforcers have to effectively meet out challenges of crime control and maintenance of public order. One challenge to law enforcement and intelligence agencies is the difficulty of analyzing large volumes of data involved in criminal and terrorist activities. Hence, creation of data base for crimes and criminals is...

Words: 1699 - Pages: 7

Premium Essay

Nt1310 Unit 3.4 System Analysis

...3.4 System Design In this section, we provide the proposed work using graphic system such as: class diagram, system architecture, activity diagram etc. 3.3 Algorithm Design and Mathematical Model 3.3.1 Existing Clustering Algorithm The weighted variants of k-means clustering, with an objective function, but instead of this generic form propose a Block-based Weighted Clustering scheme for the objective function. Here we are considering the proposed algorithm is based on WLAN (wireless local area network) by designing temporary Mobile Ad Hoc Networks using 802.11 MAC protocol and AODV routing protocol. Algorithm 1: Block Based Weighted Clustering (BWC) Input: {MANET Access point 1, MANET Access point 1, MANET Access point 1} Concept and Notations:...

Words: 699 - Pages: 3

Free Essay

Self Driving Car

...Essay A driverless car, also known as a self-driving car is an autonomous car which can perform the actions of the human being, as if a man were driving a traditional car. We can say that the car is independent of the human as the car only needs to be programmed with the destination. The mechanical part of the vehicle is held by the car its own. Moreover, to function, the car has some specific technology, for example laser, radar, GPS and computer vision. An example of an approved case of self-driving cars can be Google’s. In 2011 the state of Nevada was the first jurisdiction in the United States to pass a law concerning the operation of driverless cars. This law was turned into effect by March 2012 and the Nevada Department of Motor Vehicles gave the first license for a self-driven car in May 2012. This license was given to Google’s car which was in this case a Toyota Prius. Google got involved with this issue as it is trying to develop technology for driverless vehicles. In addition, the project is currently being led by Google engineer Sebastian Thrun, director of the Stanford Artificial Intelligence Laboratory and the co-inventor of Google Street View. To develop this system, Google also had to hire 15 engineers. This topic about self-driving cars involves also other issues such as hardware and software, social and ethical issues and the social impact. As regards hardware and software involved, it integrates Google Maps with various hardware sensors and artificial...

Words: 656 - Pages: 3

Premium Essay

Drugs and Narcotics

...qwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmrtyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmrtyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmrtyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmqwertyuiopasdfghjklzxcvbnmrtyuiopasdfghjklzxcvbnmqwer...

Words: 623 - Pages: 3

Premium Essay

Promoting Cognitive Developments

...LASA 1 Promoting Cognitive Developments Students Name Course name and number Instructors’ name Date submitted LASA 1 Promoting Cognitive Developments A good understanding of how children grow, learn, and change is significant as it allows people to accept and appreciate the cognitive, physical, emotional, educational, and social growth that kids undergo from birth through early adulthood. Piaget is well known for her cognitive developmental theory that sees the kid cognitive development and knowledge, as taking place in different stages. According to his theory, he claims that the child passes through four unique stages of development; Sensorimotor stage (0- 2 years), pre-operational stage (2- 7years), concrete operational period (7-11 years), and formal operations (11-15) years (Herzog et al, 1997). According to him, reasoning in kids deepens as they continue to grow. Their engagement in the social and physical world improves development and all other changes that occur via assimilation and accommodation (Kenpro, 2010). Skinner‘s theory deals with behaviorism in children. His theory explains that a child’s behavior can become increased by the presence of reinforces and declined via punishment. According to this theory imitation or observational learning can greatly improve the chance that the child will learn or develop new behaviors. According to skinner a child’s development is way outside of their influence, but becomes shaped by the environmental stimuli...

Words: 949 - Pages: 4

Premium Essay

To Study Information Life Cycle Management.

...EXPERIMENT NO: 1 AIM: To study Information Life Cycle Management. THEORY: INFORMATION LIFECYCLE: The information lifecycle is the “change in the value of information” over time. When data is first created, it often has the highest value and is used frequently. As data ages, it is accessed less frequently and is of less value to the organization. Understanding the information lifecycle helps to deploy appropriate storage infrastructure, according to the changing value of information. For example, in a sales order application, the value of the information changes from the time the order is placed until the time that the warranty becomes void (see Figure 1-7). The value of the information is highest when a company receives a new sales order and processes it to deliver the product. After order fulfillment, the customer or order data need not be available for real-time access. The company can transfer this data to less expensive secondary storage with lower accessibility and availability requirements unless or until a warranty claim or another event triggers its need. After the warranty becomes void, the company can archive or dispose of data to create space for other high-value information. Information Lifecycle Management Today’s business requires data to be protected and available 24 × 7. Data centers can accomplish this with the optimal and appropriate use of storage infrastructure. An effective information management policy is required to support...

Words: 676 - Pages: 3

Free Essay

Case Study 3: Managing Contention for Shared Resources on Multicore Processors

...Case Study 3: Managing Contention for Shared Resources on Multicore Processors By Ja’Kedrick L. Pearson Professor Hossein Besharatian CIS 512 June 2, 2013 Memory contention Memory contention is a state an OS memory manager can reside in when to many memory requests are issued to it from an active application possibly leading to a DOS condition specific to that application. A test was run on a group of applications several times, on three different schedules, each with two different parings sharing a memory domain. The three pairing permutations afforded each application an opportunity to run with each of the other three applications with the same memory domain. The three applications being discussed in this paper are the Soplex, Sphinx, and the NAMD. The Soplex is a linear programming (LP) solver based on the revised simplex algorithm. It features preprocessing techniques, exploits sparsity, and offers primal and dual solving routines. It can be used as a standalone solver reading MPS or LP format files as well as embedded into other programs via a C++ class library. Sphinx is an open source full text search server, designed from the ground up with performance, relevance (aka search quality), and integration simplicity in mind. It's written in C++ and works on Linux (RedHat, Ubuntu, etc), Windows, MacOS, Solaris, FreeBSD, and a few other systems (Sphinx Technologies, 2013). NAMD is a parallel molecular dynamics code designed for...

Words: 1093 - Pages: 5

Free Essay

Darpa

...Favian Morales GS1140 Tuesday 7:20pm Instructor K. Fitch October 7, 2014 The DARPA project The DARPA headquarters was founded the year of 1958 in Arlington, Virginia by our very own president Dwight D Eisenhower. They have a total of 240 employees, and has an annual budget of 2.8 billion. DARPA stands for Defense Advanced Research Projects Agency. That is not the information I want to bring to your attention. The information I want you to know is about the DARPA challenge (The driverless car). From what I researched, the purpose of this vehicle is to travel into disasters that human engineering couldn’t bear to enter. The driverless vehicle must perform thee various tasks: 1. Drive a utility vehicle at the site.2. Travel dismounted across rubble.3. Remove debris blocking an entryway.4. Open a door and enter a building.5. Climb an industrial ladder and traverse an industrial walkway.6. Use a tool to break through a concrete panel.7. Locate and close a valve near a leaking pipe.8. Connect a fire hose to a standpipe and turn on a valve. I personally think it is impossible to build, for the simple fact that I would not even know where to start. While writing this essay, I was thinking of ways to engineer such a vehicle. I couldn’t come up with no analysis to such a challenge, then it hit me. What was said in the article I what Mr. Fitch been teaching us all along. The lesson of the original challenge [DARPA Grand Challenge - driverless cars] is that persistence pays....

Words: 328 - Pages: 2

Premium Essay

Vidoe Mining

...1 Video Data Mining JungHwan Oh University of Texas at Arlington, USA JeongKyu Lee University of Texas at Arlington, USA Sae Hwang University of Texas at Arlington, USA 8 INTRODUCTION Data mining, which is defined as the process of extracting previously unknown knowledge and detecting interesting patterns from a massive set of data, has been an active research area. As a result, several commercial products and research prototypes are available nowadays. However, most of these studies have focused on corporate data — typically in an alpha-numeric database, and relatively less work has been pursued for the mining of multimedia data (Zaïane, Han, & Zhu, 2000). Digital multimedia differs from previous forms of combined media in that the bits representing texts, images, audios, and videos can be treated as data by computer programs (Simoff, Djeraba, & Zaïane, 2002). One facet of these diverse data in terms of underlying models and formats is that they are synchronized and integrated hence, can be treated as integrated data records. The collection of such integral data records constitutes a multimedia data set. The challenge of extracting meaningful patterns from such data sets has lead to research and development in the area of multimedia data mining. This is a challenging field due to the non-structured nature of multimedia data. Such ubiquitous data is required in many applications such as financial, medical, advertising and Command, Control, Communications and Intelligence...

Words: 3477 - Pages: 14

Free Essay

Rock Algorithm

...Introduction Clustering in data mining, is useful in discovery of distribution patterns in underlying data. Our interest is in clustering based on non-numerical data-categorical or Boolean attributes. An example of hierarchical clustering algorithm used in sample data is ROCK (RObust Clustering using linKs). The clustering technique is useful for grouping data points such that a single group or cluster have similar characteristics while different groups are dissimilar. ROCK belongs to the class of agglomerative hierarchical clustering algorithms. OCK algorithm has mainly 3 steps namely, ‘Draw random sample’, ‘Cluster with links’, ‘Label data in disk’ the steps are described in the following diagram: ROCK’s hierarchical algorithm accepts as input the set S of N sample points to be clustered, and the number of desired clusters K. The first step in the procedure is to compute the number of links between pairs of points. Initially each point is separate cluster. For each cluster i, we build a local heap q[i] and maintain the heap during the execution of the algorithm. Q[i] contains every cluster j such that link[i,j] is non-zero. The clusters j in q[i] are ordered in the decreasing order of the goodness measure with respect to I, g(i,j). In addition to the local heaps q[i] for each cluster I, the algorithm also maintains an additional global heap q that contains all the clusters. Furthermore, the clusters in q are ordered in the decreasing order of their best goodness measures...

Words: 838 - Pages: 4