Free Essay

Knn vs Nn

In:

Submitted By sbasavar
Words 377
Pages 2
The advantages of KNN algorithm are:

* KNN algorithm is very simple to implement. By using appropriate nearest neighbor search algorithm makes k-NN computationally tractable even for larger data sets, dealing with as high as 60000 data sets in our case.

* KNN algorithm is analytically tractable.

* KNN algorithm is highly adaptive to local information. We have used Euclidean distance metrics in our KNN algorithm to estimate the closest data points. Thus the algorithm is able to take full advantage of the local information in the training datasets to form a highly nonlinear, highly adaptive decision boundaries for each data point.

* KNN algorithm is easily implemented in parallel. For each data set to be evaluated, it checks against the training data-set for the k nearest neighbors around the data set to be evaluated. Since each of these data set is independent of the others, the search and validation can be implemented in parallel.

* Choosing higher value of K will yield smoother decision regions and provide probabilistic information.

The disadvantages of KNN algorithm are:

* KNN algorithm is computationally very intensive as it needs to search through training datasets (of size 60000) to find the nearest neighbors and determine the closest ones for each of the testing data sets which needs to be classified.

* Since the datasets is huge (as we are dealing with 60000 training sets), storage will be a concern as we need to store all the data sets in training sample and use when we need to validate the test datasets.

* As dimensionality of k increases, it becomes difficult to calculate the ‘Euclidean’ distance and finding the appropriate space to find the relevant datasets for classification and computational cost also increases.

* KNN algorithm is considered as a lazy learner as it does not learn anything from the training data and simply uses the training data for classification.

* KNN algorithm will compute the distance and sort all the training data at each prediction, which becomes slow in our case as we are dealing with large training dataset of 60000 samples.

* Since the algorithm does not learn anything from the training data, the KNN classifier may not be robust to noisy data. By changing the value of K, the resulting predicted class label can change as well.

Similar Documents

Premium Essay

Data Mning Algos

...Top 10 data mining algorithms in plain English 1.1K Today, I’m going to explain in plain English the top 10 most influential data mining algorithms as voted on by 3 separate panels in this survey paper. Once you know what they are, how they work, what they do and where you can find them, my hope is you’ll have this blog post as a springboard to learn even more about data mining. What are we waiting for? Let’s get started! Contents [hide]  1. C4.5  2. k-means  3. Support vector machines  4. Apriori  5. EM  6. PageRank  7. AdaBoost  8. kNN  9. Naive Bayes  10. CART  Interesting Resources  Now it’s your turn… Update 16-May-2015: Thanks to Yuval Merhav and Oliver Keyes for their suggestions which I’ve incorporated into the post. Update 28-May-2015: Thanks to Dan Steinberg (yes, the CART expert!) for the suggested updates to the CART section which have now been added. 1. C4.5 What does it do? C4.5 constructs a classifier in the form of a decision tree. In order to do this, C4.5 is given a set of data representing things that are already classified. Wait, what’s a classifier? A classifier is a tool in data mining that takes a bunch of data representing things we want to classify and attempts to predict which class the new data belongs to. What’s an example of this? Sure, suppose a dataset contains a bunch of patients. We know various things about each patient like age, pulse...

Words: 6478 - Pages: 26

Free Essay

Data

...Tutorial on Classification Igor Baskin and Alexandre Varnek Introduction The tutorial demonstrates possibilities offered by the Weka software to build classification models for SAR (Structure-Activity Relationships) analysis. Two types of classification tasks will be considered – two-class and multi-class classification. In all cases protein-ligand binding data will analyzed, ligands exhibiting strong binding affinity towards a certain protein being considered as “active” with respect to it. If it is not known about the binding affinity of a ligand towards the protein, such ligand is conventionally considered as “nonactive” one. In this case, the goal of classification models is to be able to predict whether a new ligand will exhibit strong binding activity toward certain protein biotargets. In the latter case one can expect that such ligands might possess the corresponding type of biological activity and therefore could be used as ‘’hits” for drug design. All ligands in this tutorial are described by means of an extended set of MACCS fingerprints, each of them comprising 1024 bits, the “on” value of each of them indicating the presence of a certain structural feature in ligand, otherwise its value being “off”. Part 1. Two-Class Classification Models. 1. Data and descriptors. The dataset for this tutorial contains 49 ligands of Angeotensin-Converting Enzyme (ACE) and 1797 decoy compounds chosen from the DUD database. The set of "extended" MACCS fingerprints is used as...

Words: 5674 - Pages: 23

Free Essay

Face Recognition

...REVIEWS, REFINEMENTS AND NEW IDEAS IN FACE RECOGNITION Edited by Peter M. Corcoran Reviews, Refinements and New Ideas in Face Recognition Edited by Peter M. Corcoran Published by InTech Janeza Trdine 9, 51000 Rijeka, Croatia Copyright © 2011 InTech All chapters are Open Access articles distributed under the Creative Commons Non Commercial Share Alike Attribution 3.0 license, which permits to copy, distribute, transmit, and adapt the work in any medium, so long as the original work is properly cited. After this work has been published by InTech, authors have the right to republish it, in whole or part, in any publication of which they are the author, and to make other personal use of the work. Any republication, referencing or personal use of the work must explicitly identify the original source. Statements and opinions expressed in the chapters are these of the individual contributors and not necessarily those of the editors or publisher. No responsibility is accepted for the accuracy of information contained in the published articles. The publisher assumes no responsibility for any damage or injury to persons or property arising out of the use of any materials, instructions, methods or ideas contained in the book. Publishing Process Manager Mirna Cvijic Technical Editor Teodora Smiljanic Cover Designer Jan Hyrat Image Copyright hfng, 2010. Used under license from Shutterstock.com First published July, 2011 Printed in Croatia A free online edition of this book is available...

Words: 33246 - Pages: 133

Free Essay

There Once Was a Young Wild Pony

...ia ib ic id ie if ig ih ii ij ik il im in io ip iq ir is it iu iv iw ix iy iz ja jb jc jd je jf jg jh ji jj jk jl jm jn jo jp jq jr js jt ju jv jw jx jy jz ka kb kc kd ke kf kg kh ki kj kk kl km kn ko kp kq kr ks kt ku kv kw kx ky kz la lb lc ld le lf lg lh li lj lk ll lm ln lo lp lq lr ls lt lu lv lw lx ly lz ma mb mc md me mf mg mh mi mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq nr ns nt nu nv nw nx ny nz oa ob oc od oe of og oh oi oj ok ol om on oo op oq or os ot ou ov ow ox oy oz pa pb pc pd pe pf pg ph pi pj pk pl pm pn po pp pq pr ps pt pu pv pw px py pz qa qb qc qd qe qf qg qh qi qj qk ql qm qn qo qp qq qr qs qt qu qv qw qx qy qz ra rb rc rd re rf rg rh ri rj rk rl rm rn ro rp rq rr rs rt ru rv rw rx ry rz sa sb sc sd se sf sg sh si sj sk sl sm sn so sp sq sr ss st su sv sw sx sy sz ta tb tc td te tf tg th ti tj tk tl tm tn to tp tq tr ts tt tu tv tw tx ty tz ua ub uc ud ue uf ug uh ui uj uk ul um un uo up uq ur us ut uu uv uw ux uy uz va vb vc vd ve vf vg vh vi vj vk vl vm vn vo vp vq vr vs vt vu vv vw vx vy vz wa wb wc wd we wf wg wh wi wj wk wl wm wn wo wp wq wr ws wt wu wv ww wx wy wz xa xb xc xd xe xf xg xh xi xj xk xl xm xn xo xp xq xr...

Words: 29642 - Pages: 119

Free Essay

Nit-Silchar B.Tech Syllabus

...NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR Bachelor of Technology Programmes amï´>r¶ JH$s g§ñWmZ, m¡Úmo{ à VO o pñ Vw dZ m dY r V ‘ ñ Syllabi and Regulations for Undergraduate PROGRAMME OF STUDY (wef 2012 entry batch) Ma {gb Course Structure for B.Tech (4years, 8 Semester Course) Civil Engineering ( to be applicable from 2012 entry batch onwards) Course No CH-1101 /PH-1101 EE-1101 MA-1101 CE-1101 HS-1101 CH-1111 /PH-1111 ME-1111 Course Name Semester-1 Chemistry/Physics Basic Electrical Engineering Mathematics-I Engineering Graphics Communication Skills Chemistry/Physics Laboratory Workshop Physical Training-I NCC/NSO/NSS L 3 3 3 1 3 0 0 0 0 13 T 1 0 1 0 0 0 0 0 0 2 1 1 1 1 0 0 0 0 4 1 1 0 0 0 0 0 0 2 0 0 0 0 P 0 0 0 3 0 2 3 2 2 8 0 0 0 0 0 2 2 2 2 0 0 0 0 0 2 2 2 6 0 0 8 2 C 8 6 8 5 6 2 3 0 0 38 8 8 8 8 6 2 0 0 40 8 8 6 6 6 2 2 2 40 6 6 8 2 Course No EC-1101 CS-1101 MA-1102 ME-1101 PH-1101/ CH-1101 CS-1111 EE-1111 PH-1111/ CH-1111 Course Name Semester-2 Basic Electronics Introduction to Computing Mathematics-II Engineering Mechanics Physics/Chemistry Computing Laboratory Electrical Science Laboratory Physics/Chemistry Laboratory Physical Training –II NCC/NSO/NSS Semester-4 Structural Analysis-I Hydraulics Environmental Engg-I Structural Design-I Managerial Economics Engg. Geology Laboratory Hydraulics Laboratory Physical Training-IV NCC/NSO/NSS Semester-6 Structural Design-II Structural Analysis-III Foundation Engineering Transportation Engineering-II Hydrology &Flood...

Words: 126345 - Pages: 506

Free Essay

Linear Algebra

...SCHAUM’S outlines SCHAUM’S outlines Linear Algebra Fourth Edition Seymour Lipschutz, Ph.D. Temple University Marc Lars Lipson, Ph.D. University of Virginia Schaum’s Outline Series New York Chicago San Francisco Lisbon London Madrid Mexico City Milan New Delhi San Juan Seoul Singapore Sydney Toronto Copyright © 2009, 2001, 1991, 1968 by The McGraw-Hill Companies, Inc. All rights reserved. Except as permitted under the United States Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written permission of the publisher. ISBN: 978-0-07-154353-8 MHID: 0-07-154353-8 The material in this eBook also appears in the print version of this title: ISBN: 978-0-07-154352-1, MHID: 0-07-154352-X. All trademarks are trademarks of their respective owners. Rather than put a trademark symbol after every occurrence of a trademarked name, we use names in an editorial fashion only, and to the benefit of the trademark owner, with no intention of infringement of the trademark. Where such designations appear in this book, they have been printed with initial caps. McGraw-Hill eBooks are available at special quantity discounts to use as premiums and sales promotions, or for use in corporate training programs. To contact a representative please e-mail us at bulksales@mcgraw-hill.com. TERMS OF USE This is a copyrighted work and The McGraw-Hill Companies,...

Words: 229129 - Pages: 917