i have extracted words set of urls , calculated cosine similarity between each url's contents.and have normalized values between 0-1(using min-max).now need cluster urls based on cosine similarity values find out similar urls.which clustering algorithm suitable?.please suggest me dynamic clustering method because useful since increase number of url's on demand , more natural.please correct me if feel i'm making progress in wrong way.thanks in anticipation.
k-means clustering can used online learning, need select number of clusters priori. also, think shouldn't normalize data, because cosine provides values in range [0:1]. min-max normalization lead information loss.
Comments
Post a Comment