Data Mning Algos

Top 10 data mining algorithms in plain
English
1.1K

Today, I’m going to explain in plain English the top 10 most influential data mining algorithms as voted on by 3 separate panels in this survey paper.
Once you know what they are, how they work, what they do and where you can find them, my hope is you’ll have this blog post as a springboard to learn even more about data mining.
What are we waiting for? Let’s get started!
Contents [hide]


1. C4.5



2. k-means



3. Support vector machines



4. Apriori



5. EM



6. PageRank



7. AdaBoost



8. kNN



9. Naive Bayes



10. CART



Interesting Resources



Now it’s your turn…

Update 16-May-2015: Thanks to Yuval Merhav and Oliver Keyes for their suggestions which
I’ve incorporated into the post.
Update 28-May-2015: Thanks to Dan Steinberg (yes, the CART expert!) for the suggested updates to the CART section which have now been added.

1. C4.5
What does it do? C4.5 constructs a classifier in the form of a decision tree. In order to do this,
C4.5 is given a set of data representing things that are already classified.
Wait, what’s a classifier? A classifier is a tool in data mining that takes a bunch of data representing things we want to classify and attempts to predict which class the new data belongs to. What’s an example of this? Sure, suppose a dataset contains a bunch of patients. We know various things about each patient like age, pulse, blood pressure, VO2max, family history, etc.
These are called attributes.
Now:
Given these attributes, we want to predict whether the patient will get cancer. The patient can fall into 1 of 2 classes: will get cancer or won’t get cancer. C4.5 is told the class for each patient.
And here’s the deal:
Using a set of patient attributes and the patient’s corresponding class, C4.5 constructs a

Similar Documents

Popular Essays