I’ve been using the Jaccard similarity index in a lot of work recently. Given two sets, the Jaccard index is the ratio between the length of their intersection and the length of their union. Basically divide the number of items shared between two collections by the total number of unique items the collections have.
The Jaccard index is a quick and easy way to compare the similarity of two collections that are not in ordinary Euclidian space (like a collection of words or shared neighbors in a friends-of-friend algorithm). You can convert it to the Jaccard distance by taking 1 - Jaccard Index.
Text input inspired by eesur.