An interesting paper about determinism, realism, and locality.
http://www.yaronhadad.com/wpcontent/uploads/2012/02/term_paper_yaron_hadad1.pdf
Tagged: research Toggle Comment Threads  Keyboard Shortcuts

wangxinxi
Advertisements 
wangxinxi
An efficient estimator is the one that achieves the equality of the CramerRao bound (CRB)
The CramerRao bound says that the variance of an unbiased estimator cannot be smaller than the inverse of the Fisher information. However, a bias estimator can have smaller variance than the CramerRao bound, which is also known as the biasvariance tradeoff. 
wangxinxi
A major attraction of the connectionist approach to language, apart from its natural relation to neural computation, is that the very same processing mechanisms apply across the full range of linguistic structure.
Douglas L. T. Rohde, David C. Plaut, Connectionist Models of Language Processing

wangxinxi
A good thread about some philosophical questions on PACBayes, Occam’s Razor, and Bayesian priors.
http://philtcs.wordpress.com/2011/11/03/class8occamsrazortheuniversalpriorandpaclearning/An introduction to PACBayesian learning theory.
http://people.kyb.tuebingen.mpg.de/seldin/ICML_PACBayes_Tutorial.pdfPACBayes bounds are a generalization of the Occam’s razor bound for algorithms which output a distribution over classifiers rather than just a single classifier. This includes the possibility of a distribution over a single classifier, so it is a generalization. Most classifiers do not output a distribution over base classifiers. Instead, they output either a classifier, or an average over base classifiers. Nonetheless, PACBayes bounds are interesting for several reasons:
PACBayes bounds are much tighter (in practice) than most common VCrelated approaches on continuous classifier spaces. This can be shown by application to stochastic neural networks (see section 13) as well as other classifiers. It also can be seen by observation: when specializing the PACBayes bounds on discrete hypothesis spaces, only O(lnm) sample complexity is lost.
Due to the achievable tightness, the result motivates new learning algorithms which strongly limit the amount of overfitting that a learning algorithm will incur.
The result found here will turn out to be useful for averaging hypotheses.http://hunch.net/~jl/projects/prediction_bounds/thesis/mathml/thesisch6.xml
http://vserver1.cscs.lsa.umich.edu/~crshalizi/notebooks/learningtheory.html

wangxinxi
A lot of philosophical discussions about latent variable models in this text.
http://dare.uva.nl/document/195746 
wangxinxi
rationalweighted recurrent NNs having boolean activation functions (simple thresholds) are equivalent to finite state automata (Minsky, “Computation: finite and infinite machines”, 1967);
rationalweighted recurrent NNs having linear sigmoid activation functions are equivalent to Turing Machines (Siegelmann and Sontag, “On the computational power of neural nets”, 1995);
realweighted recurrent NNs having linear sigmoid activation functions are more powerful than Turing Machines (Siegelmann and Sontag, “Analog computation via neural networks”, 1993);
but …
realweighted recurrent NNs with Gaussian noise on the outputs cannot recognize arbitrary regular languages (Maass and Sontag, “Analog Neural Nets with Gaussian or Other Common NoiseDistributions Cannot Recognize Arbitrary Regular Languages” , 1995);
Some interesting slides:
http://gjclokhorst.nl/hypercomputation.helsinki.pdf
http://homepages.ipact.nl/~lokhorst/hypercomputationUCL.pdfTo conclude: there is probably no real Turing machine (TM) in reality. There is also unlikely any “accurate” analog neural nets. Proving that the analog neural nets have better computation capability than TM is completely OK, but they all are just not implementable.

wangxinxi
Gradient descent
Conjugate gradient for optimization: A firstorder optimization method.
Newton’s method: It’s like gradient descent, but it multiplies the gradient by the inverse of the Hessian matrix. It converges much faster than gradient descent. However, since it has to maintain the Hessian matrix, it cost much more memory if there are many parameters.
quasiNewton method: Instead of calculating the exact Hessian matrix, it calculates an approximation of the Hessian matrix from the firstorder information.
BFGS: A quasiNewton method.
LBFGS: It solves the quadratic memory issue of BFGS. http://en.wikipedia.org/wiki/Limitedmemory_BFGSA good introduction of them can be find here: http://acdl.mit.edu/mdo/mdo_06/Multivariable.pdf

wangxinxi
Before compiling numpy, please ensure that you have removed ATLAS, OpenBLAS etc.
To compile numpy with mkl, the configuration for site.cfg of numpy is as following:
[mkl] library_dirs = /opt/intel/mkl/lib/intel64 include_dirs = /opt/intel/mkl/include mkl_libs = mkl_rt lapack_libs = mkl_rt
To compile numpy with acml, the configuration for site.cfg is as following:
[blas] blas_libs = cblas, acml_mp library_dirs = /opt/acml5.3.1/gfortran64_fma4_mp/lib include_dirs = /opt/acml5.3.1/gfortran64_fma4_mp/include [lapack] language = f77 lapack_libs = mkl_rt library_dirs = /opt/intel/mkl/lib/intel64 include_dirs = /opt/intel/mkl/include

wangxinxi
To test the efficiency of your numpy:
#!/usr/bin/env python import timeit setup = "import numpy;\ import numpy.linalg as linalg;\ x = numpy.random.random((1000,1000));\ z = numpy.dot(x, x.T)" count = 5 t = timeit.Timer("linalg.cholesky(z)", setup=setup) print "cholesky:", t.timeit(count)/count, "sec" t = timeit.Timer("linalg.inv(z)", setup=setup) print "inv:", t.timeit(count)/count, "sec"
This is my result
cholesky: 0.0482553958893 sec inv: 0.102989816666 sec
numpy.dot is a bit special. It depends on ATLAS by defaults, which means if you have not installed ATLAS in your system, you could get very slow numpy.dot. To use other implementations instead of ATLAS in numpy, you can follow this post: https://xinxiwang.wordpress.com/2014/02/03/efficientnumpydotreliesonatlasbecausethe/
#!/usr/bin/env python import numpy import sys import timeit try: import numpy.core._dotblas print 'FAST BLAS' except ImportError: print 'slow blas' print "version:", numpy.__version__ print "maxint:", sys.maxint print x = numpy.random.random((1000,1000)) setup = "import numpy; x = numpy.random.random((1000,1000))" count = 5 t = timeit.Timer("numpy.dot(x, x.T)", setup=setup) print "dot:", t.timeit(count)/count, "sec"
The following is my result
FAST BLAS version: 1.8.0 maxint: 9223372036854775807 dot: 0.0540486335754 sec
My CPU model:
processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Xeon(R) CPU E5506 @ 2.13GHz stepping : 5 microcode : 0x11 cpu MHz : 1596.000 cache size : 4096 KB physical id : 1 siblings : 4 core id : 0 cpu cores : 4 apicid : 16 initial apicid : 16 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dtherm tpr_shadow vnmi flexpriority ept vpid bogomips : 4255.80 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:

wangxinxi
To get efficient numpy.dot, numpy needs to be compiled against ATLAS. This is probably a bug of numpy because numpy.dot should be able two work with any cblas implementation.
To fix this bug, we first comment out the following two lines of numpy/core/setup.py
if ('NO_ATLAS_INFO', 1) in blas_info.get('define_macros', []): return None # dotblas needs ATLAS, Fortran compiled blas will not be sufficient.
Then, we compile numpy without ATLAS.
Finally, we manually compile _dotblas using the following command:
gcc pthread shared Wl,O1 Wl,Bsymbolicfunctions Wl,Bsymbolicfunctions Wl,z,relro build/temp.linuxx86_642.7/numpy/core/blasdot/_dotblas.o LCBLAS_PATH lcblas o build/lib.linuxx86_642.7/numpy/core/_dotblas.so lpython2.7
You need to replace CBLAS_PATH with your own path which holds the libcblas.so file.
Reply