Abstract
Generation of Object Cluster Hierarchies is a new variant of Hierarchical Clustering that increasingly gains more interest in the field of Machine Learning. Being a novelty, the lack of tools for systematic analysis and comparison of Object Cluster Hierarchies inhibits its further development. In this paper, we propose a novel method for generating hierarchical structures of data based on Tree-Structured Stick Breaking Process that can be used for benchmarking. The article presents thorough empirical and theoretical analysis of the method revealing its characteristics. More importantly, the intuition how to operate with model parameters and a set of benchmarking datasets are provided. Conducted experiments show usefulness of the model as high flexibility in generating a wide range of differently-structured data is achieved. The developed generator together with proposed benchmarks are publicly available (this http URL).
Abstract (translated by Google)
URL
http://arxiv.org/abs/1606.05681