Abstract
We compute the transition probability between two learning tasks, and show that it decomposes into two factors. The first depends on the geometry of the loss landscape of a model trained on each task, independent of any particular model used. This is related to an information theoretic distance function, but is insufficient to predict success in transfer learning, as nearby tasks can be unreachable via fine-tuning. The second factor depends on the ease of traversing the path between two tasks. With this dynamic component, we derive strict lower bounds on the complexity necessary to learn a task starting from the solution to another, which is one of the most common forms of transfer learning.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1810.02440