Suffix tree clustering

From Wikipedia, the free encyclopedia

Suffix Tree Clustering, often abbreviated as STC is an approach for clustering that uses suffix trees.[1] A suffix tree cluster keeps track of all n-grams of any given length to be inserted into a set word string, while simultaneously allowing differing strings to be inserted incrementally in a linear order. This has the advantage of ensuring that a large number of clusters can be handled sequentially. However, a potential disadvantage may be that it also increases the number of possible documents that need to be looked through when handling large sets of data. Suffix tree clusters can either be decompositional or agglomerative in nature, depending on the type of data being handled.[2]

References[edit]

  1. ^ Branson, Steve; Greenberg, Ari. "Clustering Web Search Results Using Suffix Tree Methods, CS276A Final Project" (PDF). www.stanford.edu. Stanford University. Retrieved 2 January 2015.
  2. ^ Davis, Ernest. "Lecture 4: Clustering". www.cs.nyu.edu. New York University. Retrieved 2 January 2015.