A Linear Time Partitioning Algorithm For Frequency Weighted Impurity Functions
Thuan Nguyen, Thinh Nguyen
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 10:05
Partitioning algorithms play a key role in machine learning, signal processing, and communications. They are used in many well-known NP-hard problems such as k-means clustering and vector quantization. The goodness of a partition scheme is measured by a given impurity function over the resulted partitions. The optimal partition is one(s) with the minimum impurity. Practical algorithms for finding an optimal partitioning are approximate, heuristic, and often assume certain properties of the given impurity function such as concavity/convexity. In this paper, we propose a heuristic, efficient (linear time) algorithm for finding the minimum impurity for a broader class of impurity functions which includes popular impurities such as Gini index and entropy. We also make a connection to a well-known result which states that the optimal partitions correspond to the regions separated by hyperplane cuts in the probability space of the posterior distribution.