Nothing is more practical than a good theory
Recent Updates Toggle Comment Threads | Keyboard Shortcuts
WTF, how could you let Hank die? W. W. you are completely a disgusting, greedy, and cowardly loser.
An interesting paper about determinism, realism, and locality.
pickle.dump(model, open(sys.argv, ‘wb’))
SystemError: error return without exception set
The error can be reproduced by the following code:
python2.7 -c “import cPickle; cPickle.dumps(‘\x00’ * 2**31)”
Changing cPickle to pickle may solve this problem if you do not care about the efficiency issue of pickle. To completely solve the issue, you need to switch to Python 3. Using numpy.save/HDF5 maybe good choices too.
An efficient estimator is the one that achieves the equality of the Cramer-Rao bound (CRB)
The Cramer-Rao bound says that the variance of an unbiased estimator cannot be smaller than the inverse of the Fisher information. However, a bias estimator can have smaller variance than the Cramer-Rao bound, which is also known as the bias-variance trade-off.
A major attraction of the connectionist approach to language, apart from its natural relation to neural computation, is that the very same processing mechanisms apply across the full range of linguistic structure.
Douglas L. T. Rohde, David C. Plaut, Connectionist Models of Language Processing
A good thread about some philosophical questions on PAC-Bayes, Occam’s Razor, and Bayesian priors.
An introduction to PAC-Bayesian learning theory.
PAC-Bayes bounds are a generalization of the Occam’s razor bound for algorithms which output a distribution over classifiers rather than just a single classifier. This includes the possibility of a distribution over a single classifier, so it is a generalization. Most classifiers do not output a distribution over base classifiers. Instead, they output either a classifier, or an average over base classifiers. Nonetheless, PAC-Bayes bounds are interesting for several reasons:
PAC-Bayes bounds are much tighter (in practice) than most common VC-related approaches on continuous classifier spaces. This can be shown by application to stochastic neural networks (see section 13) as well as other classifiers. It also can be seen by observation: when specializing the PAC-Bayes bounds on discrete hypothesis spaces, only O(lnm) sample complexity is lost.
Due to the achievable tightness, the result motivates new learning algorithms which strongly limit the amount of overfitting that a learning algorithm will incur.
The result found here will turn out to be useful for averaging hypotheses.
A lot of philosophical discussions about latent variable models in this text.
rational-weighted recurrent NNs having boolean activation functions (simple thresholds) are equivalent to finite state automata (Minsky, “Computation: finite and infinite machines”, 1967);
rational-weighted recurrent NNs having linear sigmoid activation functions are equivalent to Turing Machines (Siegelmann and Sontag, “On the computational power of neural nets”, 1995);
real-weighted recurrent NNs having linear sigmoid activation functions are more powerful than Turing Machines (Siegelmann and Sontag, “Analog computation via neural networks”, 1993);
real-weighted recurrent NNs with Gaussian noise on the outputs cannot recognize arbitrary regular languages (Maass and Sontag, “Analog Neural Nets with Gaussian or Other Common NoiseDistributions Cannot Recognize Arbitrary Regular Languages” , 1995);
To conclude: there is probably no real Turing machine (TM) in reality. There is also unlikely any “accurate” analog neural nets. Proving that the analog neural nets have better computation capability than TM is completely OK, but they all are just not implementable.