k-NN simplified

By Mohamed Ehab Sabry

feel free to reach me out on LinkedIn

www.linkedin.com

k-NN in short

non-parametric to know what does that mean, read this :

Parametric vs Nonparametric Machine Learning Algorithms

uses proximity
typically used as a classification algorithm
it works based on the assumption that similar points can be found near one another .
in classification, it depends on “plurality voting” not majority vote, as majority vote requires a +50% of the votes, which is not necessarily the case when you have multiple classes, you can help yourself out with less than that.
we must choose the way to calculate the distance between points at first, there are many, the most widely used is Euclidian Distance.
it is part of lazy learning models ( just stores a training dataset, doesn’t actually train anything)
Also called instance-based or memory-based learning method (since it relies heavily on memory to store all it training data)
used in:
- simple recommendation systems
- pattern recognition
- data mining
- financial market predictions
- intrusion detection كشف اختراقات المواقع
- more
avoid it when:
- dataset is becoming “huge”
  - because this will make it more and more inefficient

Compute k-NN: distance metrics

Determine your distance metrics:

عشان نحسب المسافه، محتاجين نحدد هنحسبها ازاي، تحديدها بيساعدنا نحدد ال Decision boundaries، والي معناه لو النقطة دي في أني مكان بظبط تبقا تبع أني كلاس ؟ ودي بقدر أعملها visualization عن طريق Voronoi diagrams