Abstract
Large-scale probabilistic representations, including statistical knowledge bases and graphical models, are increasingly in demand. They are built by mining massive sources of structured and unstructured data, the latter often derived from natural language processing techniques. The very nature of the enterprise makes the extracted representations probabilistic. In particular, inducing relations and facts from noisy and incomplete sources via statistical machine learning models means that the labels are either already probabilistic, or that probabilities approximate confidence. While the progress is impressive, extracted representations essentially enforce the closed-world assumption, which means that all facts in the database are accorded the corresponding probability, but all other facts have probability zero. The CWA is deeply problematic in most machine learning contexts. A principled solution is needed for representing incomplete and indeterminate knowledge in such models, imprecise probability models such as credal networks being an example. In this work, we are interested in the foundational problem of learning such open-world probabilistic models. However, since exact inference in probabilistic graphical models is intractable, the paradigm of tractable learning has emerged to learn data structures (such as arithmetic circuits) that support efficient probabilistic querying. We show here how the computational machinery underlying tractable learning has to be generalized for imprecise probabilities. Our empirical evaluations demonstrate that our regime is also effective.
Abstract (translated by Google)
URL
https://arxiv.org/abs/1901.05847