k-Means Clustering

K-Means clustering may be performed on either the entire gene set or on a selected subset by selecting Clustering > k-Means.

The k-Means Clustering method groups data points by partitioning them into a fixed number of arbitrary groups and then repeatedly refining the groups. This process is done by first randomly selecting one starting point for each cluster, and then grouping each of the data points to the closest starting point. The algorithm then defines a new center point for each group by finding the centroid, and each data point is then re-grouped to the closest center point. This process is repeated again and again, until the process no longer yields improvement. The resulting clusters are displayed in the Line Graph Thumbnails view.

k-Means Clustering may be performed as a single trial, or multiple trials may be run on one set of data points. Performing multiple trials will create multiple sets of clusters. Each set of clusters is created independently of each other. Once the cluster sets are created, each set will be scored, and the set with the best score will be displayed.

By default, k-Means Clustering will create 8 clusters by going through 20 trials. The default distance metric is the Standard Pearson’s correlation coefficient. However, all of these parameters can be easily changed in the Clustering Parameters window.

Although best displayed in the Line Graph Thumbnail view, k-Means clustering can also be displayed in the Heat Map view by adjusting your Clustering Parameters.