Reference Priors for Pre-training Neural Nets

We start with the problem of estimating the bias of a coin. We know that we will receive $$n$$ data points

We maximize the information brought by the data.

We want to tackle a supervised learning task with limited amounts of data. In addition, we have unlabeled data from the same distribution. Can we learn a useful prior from unlabeled data, such that

In this blog post, we seek a prior – or a probability distribution over the model’s weights – from unlabeled data. The prior is selected to work with small amounts of labeled data

We want to leverage this prior on small amounts of labeled data. How do we optimally leverage this limited information. We consider the problem of estimating

To do so, we would like to restrict the model space