A good thread about some philosophical questions on PAC-Bayes, Occam’s Razor, and Bayesian priors.

An introduction to PAC-Bayesian learning theory.

PAC-Bayes bounds are a generalization of the Occam’s razor bound for algorithms which output a distribution over classifiers rather than just a single classifier. This includes the possibility of a distribution over a single classifier, so it is a generalization. Most classifiers do not output a distribution over base classifiers. Instead, they output either a classifier, or an average over base classifiers. Nonetheless, PAC-Bayes bounds are interesting for several reasons:

PAC-Bayes bounds are much tighter (in practice) than most common VC-related approaches on continuous classifier spaces. This can be shown by application to stochastic neural networks (see section 13) as well as other classifiers. It also can be seen by observation: when specializing the PAC-Bayes bounds on discrete hypothesis spaces, only O(lnm) sample complexity is lost.
Due to the achievable tightness, the result motivates new learning algorithms which strongly limit the amount of overfitting that a learning algorithm will incur.
The result found here will turn out to be useful for averaging hypotheses.