By Mohamed Ehab Sabry
feel free to reach me out on LinkedIn
www.linkedin.com
k-NN in short
- non-parametric
to know what does that mean, read this :
Parametric vs Nonparametric Machine Learning Algorithms
- uses proximity
- typically used as a classification algorithm
- it works based on the assumption that similar points can be found near one another .
- in classification, it depends on “plurality voting” not majority vote, as majority vote requires a +50% of the votes, which is not necessarily the case when you have multiple classes, you can help yourself out with less than that.
- we must choose the way to calculate the distance between points at first, there are many, the most widely used is Euclidian Distance.
- it is part of lazy learning models ( just stores a training dataset, doesn’t actually train anything)
- Also called instance-based or memory-based learning method (since it relies heavily on memory to store all it training data)
- used in:
- simple recommendation systems
- pattern recognition
- data mining
- financial market predictions
- intrusion detection كشف اختراقات المواقع
- more
- avoid it when:
- dataset is becoming “huge”
- because this will make it more and more inefficient
Compute k-NN: distance metrics
Determine your distance metrics:
عشان نحسب المسافه، محتاجين نحدد هنحسبها ازاي، تحديدها بيساعدنا نحدد ال Decision boundaries، والي معناه لو النقطة دي في أني مكان بظبط تبقا تبع أني كلاس ؟ ودي بقدر أعملها visualization عن طريق Voronoi diagrams
