Recent Updates Page 2 Toggle Comment Threads | Keyboard Shortcuts

  • wangxinxi 01:37 on February 13, 2014 Permalink | Reply
    Tags:   

    Gradient descent

    Conjugate gradient for optimization: A first-order optimization method.

    Newton’s method: It’s like gradient descent, but it multiplies the gradient by the inverse of the Hessian matrix. It converges much faster than gradient descent. However, since it has to maintain the Hessian matrix, it cost much more memory if there are many parameters.

    quasi-Newton method: Instead of calculating the exact Hessian matrix, it calculates an approximation of the Hessian matrix from the first-order information.

    BFGS: A quasi-Newton method.
    LBFGS: It solves the quadratic memory issue of BFGS. http://en.wikipedia.org/wiki/Limited-memory_BFGS

    A good introduction of them can be find here: http://acdl.mit.edu/mdo/mdo_06/Multi-variable.pdf

    Advertisements
     
  • wangxinxi 13:38 on February 11, 2014 Permalink | Reply
    Tags: ,   

    Before compiling numpy, please ensure that you have removed ATLAS, OpenBLAS etc.

    To compile numpy with mkl, the configuration for site.cfg of numpy is as following:

    [mkl]
    library_dirs = /opt/intel/mkl/lib/intel64
    include_dirs = /opt/intel/mkl/include
    mkl_libs = mkl_rt
    lapack_libs = mkl_rt
    

    To compile numpy with acml, the configuration for site.cfg is as following:

    [blas]
    blas_libs = cblas, acml_mp
    library_dirs = /opt/acml5.3.1/gfortran64_fma4_mp/lib
    include_dirs = /opt/acml5.3.1/gfortran64_fma4_mp/include
    
    [lapack]
    language = f77
    lapack_libs = mkl_rt
    library_dirs = /opt/intel/mkl/lib/intel64
    include_dirs = /opt/intel/mkl/include
    
     
  • wangxinxi 12:55 on February 11, 2014 Permalink | Reply
    Tags: ,   

    To test the efficiency of your numpy:

    #!/usr/bin/env python                                                           
    import timeit
     
     
    setup = "import numpy;\
            import numpy.linalg as linalg;\
            x = numpy.random.random((1000,1000));\
            z = numpy.dot(x, x.T)"
    count = 5
     
    t = timeit.Timer("linalg.cholesky(z)", setup=setup)
    print "cholesky:", t.timeit(count)/count, "sec"
     
    t = timeit.Timer("linalg.inv(z)", setup=setup)
    print "inv:", t.timeit(count)/count, "sec"
    

    This is my result

    cholesky: 0.0482553958893 sec
    inv: 0.102989816666 sec
    

    numpy.dot is a bit special. It depends on ATLAS by defaults, which means if you have not installed ATLAS in your system, you could get very slow numpy.dot. To use other implementations instead of ATLAS in numpy, you can follow this post: https://xinxiwang.wordpress.com/2014/02/03/efficient-numpy-dot-relies-on-atlas-because-the/

    #!/usr/bin/env python                                                           
    import numpy
    import sys
    import timeit
     
    try:
        import numpy.core._dotblas
        print 'FAST BLAS'
    except ImportError:
        print 'slow blas'
     
    print "version:", numpy.__version__
    print "maxint:", sys.maxint
    print
     
    x = numpy.random.random((1000,1000))
     
    setup = "import numpy; x = numpy.random.random((1000,1000))"
    count = 5
     
    t = timeit.Timer("numpy.dot(x, x.T)", setup=setup)
    print "dot:", t.timeit(count)/count, "sec"
    

    The following is my result

    FAST BLAS
    version: 1.8.0
    maxint: 9223372036854775807
    
    dot: 0.0540486335754 sec
    

    My CPU model:

    processor       : 0
    vendor_id       : GenuineIntel
    cpu family      : 6
    model           : 26
    model name      : Intel(R) Xeon(R) CPU           E5506  @ 2.13GHz
    stepping        : 5
    microcode       : 0x11
    cpu MHz         : 1596.000
    cache size      : 4096 KB
    physical id     : 1
    siblings        : 4
    core id         : 0
    cpu cores       : 4
    apicid          : 16
    initial apicid  : 16
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 11
    wp              : yes
    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dtherm tpr_shadow vnmi flexpriority ept vpid
    bogomips        : 4255.80
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 40 bits physical, 48 bits virtual
    power management:
    
     
  • wangxinxi 01:23 on February 3, 2014 Permalink | Reply
    Tags: ,   

    To get efficient numpy.dot, numpy needs to be compiled against ATLAS. This is probably a bug of numpy because numpy.dot should be able two work with any cblas implementation.

    To fix this bug, we first comment out the following two lines of numpy/core/setup.py

                 if ('NO_ATLAS_INFO', 1) in blas_info.get('define_macros', []):
                    return None # dotblas needs ATLAS, Fortran compiled blas will not be sufficient.
    

    Then, we compile numpy without ATLAS.

    Finally, we manually compile _dotblas using the following command:

    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro build/temp.linux-x86_64-2.7/numpy/core/blasdot/_dotblas.o -LCBLAS_PATH -lcblas -o build/lib.linux-x86_64-2.7/numpy/core/_dotblas.so -lpython2.7
    

    You need to replace CBLAS_PATH with your own path which holds the libcblas.so file.

     
  • wangxinxi 00:42 on February 2, 2014 Permalink | Reply
    Tags:   

    Sample from the multivariate normal distribution

    Let Z_1,\dots,Z_k \sim \mathcal{N}(0, 1) , and Z = (Z_1,\dots,Z_k) , then \mu + Chol(\Sigma)Z \sim \mathcal{N}(\mu, \Sigma), where Chol(\Sigma) is the Choleskey decomposition of \Sigma.

     
  • wangxinxi 22:59 on January 30, 2014 Permalink | Reply
    Tags: ,   

    Here are some steps I used with Ubuntu 13.04 and the new multi-package distribution Intel® SDK for OpenCL* Applications XE 2013:

    Download the 64-bit distribution from Intel: intel_sdk_for_ocl_applications_2013_xe_sdk_3.0.67279_x64.tgz
    Download the Intel public key. I had trouble finding it, but googling the filename pointed me to the correct page at Intel. Intel-E901-172E-EF96-900F-B8E1-4184-D7BE-0E73-F789-186F.pub

    $ sudo apt-get install -y rpm alien libnuma1

    $ sudo rpm –import Intel-E901-172E-EF96-900F-B8E1-4184-D7BE-0E73-F789-186F.pub

    $ tar -xvf intel_sdk_for_ocl_applications_2013_xe_sdk_3.0.67279_x64.tgz
    $ cd intel_sdk_for_ocl_applications_2013_xe_sdk_3.0.67279_x64/

    $ fakeroot alien –to-deb opencl-1.2-base-3.0.67279-1.x86_64.rpm
    $ fakeroot alien –to-deb opencl-1.2-intel-cpu-3.0.67279-1.x86_64.rpm

    $ sudo dpkg -i opencl-1.2-base_3.0.67279-2_amd64.deb
    $ sudo dpkg -i opencl-1.2-intel-cpu_3.0.67279-2_amd64.deb

    The above installs the library files and intallable client driver registration in /opt/intel/opencl-1.2-3.0.67279.
    Two more steps were needed to run an OpenCL program.

    Add library to search path:
    $ sudo touch /etc/ld.so.conf.d/intelOpenCL.conf
    Edit this file, add the line:
    /opt/intel/opencl-1.2-3.0.67279/lib64

    Link to the intel icd file in the expected location:
    $ sudo ln /opt/intel/opencl-1.2-3.0.67279/etc/intel64.icd /etc/OpenCL/vendors/intel64.icd
    $ sudo ldconfig

    At this point I could run an existing application. If doing developmnent, install the developer headers and tools:

    $ fakeroot alien –to-deb opencl-1.2-devel-3.0.67279-1.x86_64.rpm
    $ fakeroot alien –to-deb opencl-1.2-intel-devel-3.0.67279-1.x86_64.rpm

    $ sudo dpkg -i opencl-1.2-devel_3.0.67279-2_amd64.deb
    $ sudo dpkg -i opencl-1.2-intel-devel_3.0.67279-2_amd64.deb

    It is worth noting the include path for headers is: /opt/intel/opencl-1.2-3.0.67279/include
    The linking path for libraries is: /opt/intel/opencl-1.2-3.0.67279/lib64
    The developer tool binaries are installed in: /opt/intel/opencl-1.2-3.0.67279/bin

    Top
    Back to original post
    Login to leave a comment.

    Terms of Use *Trademarks Privacy Cookies Publications
    Look for us on:

     
  • wangxinxi 11:37 on January 30, 2014 Permalink | Reply
    Tags: , ,   

    In general, you can avoid getting ill-conditioned covariance matrices by using one of the following precautions:

    Pre-process your data to remove correlated features.
    Set ‘SharedCov’ to true to use an equal covariance matrix for every component.
    Set ‘CovType’ to ‘diagonal’.
    Use ‘Regularize’ to add a very small positive number to the diagonal of every covariance matrix.
    Try another set of initial values.

     
  • wangxinxi 13:14 on January 29, 2014 Permalink | Reply
    Tags: ,   

    How to check the memory and usage of GPU?

    [gold-c01]$ nvidia-smi 
    Wed Jan 29 13:13:55 2014       
    +------------------------------------------------------+                       
    | NVIDIA-SMI 5.319.72   Driver Version: 319.72         |                       
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  Tesla M2090         Off  | 0000:11:00.0     Off |                    0 |
    | N/A   N/A   P12    31W /  N/A |       10MB /  5375MB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
    |   1  Tesla M2090         Off  | 0000:14:00.0     Off |                    0 |
    | N/A   N/A    P0    80W /  N/A |      155MB /  5375MB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
                                                                                   
    +-----------------------------------------------------------------------------+
    | Compute processes:                                               GPU Memory |
    |  GPU       PID  Process name                                     Usage      |
    |=============================================================================|
    |    1     10223  python                                               142MB  |
    +-----------------------------------------------------------------------------+
    
     
  • wangxinxi 18:59 on January 18, 2014 Permalink | Reply
    Tags:   

    Thread-safe map for C++ 

    #ifndef CONCURRENT_MAP
    #define CONCURRENT_MAP
    
    #include <boost/thread.hpp>
    #include <map>
    
    template<typename Key, typename Value>
    class ConcurrentMap
    {
        typedef boost::shared_lock shared_lock;
        typedef boost::shared_mutex shared_mutex;
        
    public:
        ConcurrentMap();
        bool has(Key k) const
        {
            shared_lock<shared_mutex> lock(schemaAccess);
            return m.find(k) == m.end();
        }
        
        void erase(Key k)
        {
            upgrade_lock<shared_mutex> schemaLock(schemaAccess);
            upgrade_to_unique_lock<shared_mutex> schemaUniqueLock(schemaLock);
    
            valueAccess.erase(k);
            m.erase(k);
        }
        
    
        void set(Key k, Value v)
        {
            shared_lock<shared_mutex> lock(schemaAccess);
    
            // set k, v
            if(m.find(k) == m.end()) {
    
                upgrade_lock<shared_mutex> valueLock(*valueAccess[k]);
                upgrade_to_unique_lock<shared_mutex> valueUniqueLock(valueLock);
                
                m.at(k) = v;
                
            }
            // insert k, v
            else {
                upgrade_lock<shared_mutex> schemaLock(schemaAccess);
                lock.unlock();
                upgrade_to_unique_lock<shared_mutex> schemaUniqueLock(schemaLock);
                
                valueAccess.insert(k, new shared_mutext());
                m.insert(std::pair(k, v));
            }
        }
        
    
        Value get(Key k) const
        {
            shared_lock<shared_mutex> lock(schemaAccess);
            return m.at(k);
        }
    
    
        void insert(Key k, Value v)
        {
            upgrade_lock<shared_mutex> schemaLock(schemaAccess);
            upgrade_to_unique_lock<shared_mutex> schemaUniqueLock(schemaLock);
            
            valueAccess.insert(k, new shared_mutext());
            m.insert(std::pair(k, v));
        }
    
    
    private:
        std::map m;
    
        std::map<Key, std::shared_ptr<shared_mutex> > valueAccess;
        shared_mutex schemaAccess;
    }
    
    
    #endif
    
    
     
    • Maurice Smulders 16:14 on October 7, 2015 Permalink | Reply

      Xinxi,

      Thanks for the example, but as posted, it doesn’t compile… I did make it compile, except i need one more function, and the compiler is driving me nuts. The error message doesn’t make sense….

      Here is the modified class (+ my new function)

      #ifndef CONCURRENTMAP_H_
      #define CONCURRENTMAP_H_

      #include
      #include
      #include

      namespace util
      {
      template
      class ConcurrentMap
      {
      typedef boost::shared_ptr MtxPtr;
      typedef Value(*mutate_func)(Value);

      public:
      ConcurrentMap();
      bool has(Key k) const
      {
      boost::shared_lock lock(schemaAccess);
      return m.find(k) == m.end();
      }

      void erase(Key k)
      {
      boost::upgrade_lock schemaLock(schemaAccess);
      boost::upgrade_to_unique_lock schemaUniqueLock(schemaLock);

      valueAccess.erase(k);
      m.erase(k);
      }

      void set(Key k, Value v)
      {
      boost::shared_lock lock(schemaAccess);

      // set k, v
      if(m.find(k) == m.end())
      {

      boost::upgrade_lock valueLock(*valueAccess[k]);
      boost::upgrade_to_unique_lock valueUniqueLock(valueLock);

      m.at(k) = v;

      }
      // insert k, v
      else
      {
      boost::upgrade_lock schemaLock(schemaAccess);
      lock.unlock();
      boost::upgrade_to_unique_lock schemaUniqueLock(schemaLock);

      valueAccess.insert(k, MtxPtr(new boost::shared_mutex()));
      m.insert(std::pair(k,v));
      }
      }

      /*
      * mutate()
      *
      * IN: key to find record
      * default value if record doesn’t exist
      * mutate function to change old to new if exist
      */

      void mutate(Key k, Value v, mutate_func mf)
      {
      boost::shared_lock lock(schemaAccess);

      // set k, v
      if(m.find(k) == m.end())
      {
      boost::upgrade_lock valueLock(*valueAccess[k]);
      boost::upgrade_to_unique_lock valueUniqueLock(valueLock);
      // if found, pass the current value into the mutate function
      m.at(k) = mf(m.at(k));
      }
      // insert k, v
      else
      {
      boost::upgrade_lock schemaLock(schemaAccess);
      lock.unlock();
      boost::upgrade_to_unique_lock schemaUniqueLock(schemaLock);

      valueAccess.insert(k, MtxPtr(new boost::shared_mutex()));
      m.insert(std::pair(k,v));
      }
      }

      Value get(Key k) const
      {
      boost::shared_lock lock(schemaAccess);
      return m.at(k);
      }

      void insert(Key k, Value v)
      {
      boost::upgrade_lock schemaLock(schemaAccess);
      boost::upgrade_to_unique_lock schemaUniqueLock(schemaLock);

      valueAccess.insert(k, MtxPtr(new boost::shared_mutex()));
      m.insert(std::pair(k, v));
      }

      private:
      std::map m;

      std::map valueAccess;
      boost::shared_mutex schemaAccess;
      };

      } /* namespace util */

      #endif /* CONCURRENTMAP_H_ */

      • Maurice Smulders 16:16 on October 7, 2015 Permalink | Reply

        The function which doesn’t compile is mutate – using GCC 4.8

        ../src/ConcurrentMap.h: In instantiation of ‘void util::ConcurrentMap::mutate(Key, Value, util::ConcurrentMap::mutate_func) [with Key = std::basic_string; Value = long unsigned int; util::ConcurrentMap::mutate_func = long unsigned int (*)(long unsigned int)]’:
        xxx,.cpp:343:37: required from here
        ../src/ConcurrentMap.h:99:13: error: no matching function for call to ‘std::map<std::basic_string, boost::shared_ptr, std::less<std::basic_string >, std::allocator<std::pair<const std::basic_string, boost::shared_ptr > > >::insert(std::basic_string&, util::ConcurrentMap<std::basic_string, long unsigned int>::MtxPtr)’
        valueAccess.insert(k, MtxPtr(new boost::shared_mutex()));

        And the insert is totally the same as the set() call…

    • mauricesmulders 00:58 on October 9, 2015 Permalink | Reply

      There are a few issues in this code as posted. I moved it to use Boost shared_ptr (more portable) and fixed the compilation and deadlock issues.

      #include <boost/thread.hpp>
      #include <boost/shared_ptr.hpp>
      #include <boost/make_shared.hpp>
      #include <map>
      #include "Log.h"
      
      namespace util
      {
      template<typename Key, typename Value>
      class ConcurrentMap
      {
          typedef Value(*mutate_func)(Value);
      
      public:
          ConcurrentMap() {};
          bool has(Key k) const
          {
              boost::shared_lock<boost::shared_mutex> lock(schemaAccess);
              return m.find(k) == m.end();
          }
      
          void erase(Key k)
          {
              boost::upgrade_lock<boost::shared_mutex> schemaLock(schemaAccess);
              boost::upgrade_to_unique_lock<boost::shared_mutex> schemaUniqueLock(schemaLock);
      
              valueAccess.erase(k);
              m.erase(k);
          }
      
      
          void set(Key k, Value v)
          {
              boost::shared_lock<boost::shared_mutex> lock(schemaAccess);
      
              // set k, v
              if(m.find(k) != m.end())
              {
      
                  boost::upgrade_lock<boost::shared_mutex> valueLock(*valueAccess[k]);
                  boost::upgrade_to_unique_lock<boost::shared_mutex> valueUniqueLock(valueLock);
      
                  m.at(k) = v;
              }
              // insert k, v
              else
              {
                  lock.unlock();
                  boost::upgrade_lock<boost::shared_mutex> schemaLock(schemaAccess);
                  boost::upgrade_to_unique_lock<boost::shared_mutex> schemaUniqueLock(schemaLock);
      
                  boost::shared_ptr<boost::shared_mutex> mtx = boost::make_shared<boost::shared_mutex>();
                  valueAccess.insert(std::pair<Key, boost::shared_ptr<boost::shared_mutex> >(k, mtx));
                  m.insert(std::pair<Key,Value>(k,v));
              }
          }
      
          /*
           * mutate()
           *
           * IN: key to find record
           *     default value if record doesn't exist
           *     mutate function to change old to new if exist
           *
           * OUT: N/A
           *
           * This method looks for the map record to determine whether it exists
           * if it does, it calls the mutate_func with the value as a parameter
           * and takes it's return as the new value.
           * If it doesn't, then the value is initialized to v.
           */
          void mutate(Key k, Value v, mutate_func mf)
          {
              boost::shared_lock<boost::shared_mutex> lock(schemaAccess);
      
              // set k, v
              // TODO: Use an iterator. Changes 3 lookups into 1
              if(m.find(k) != m.end())
              {
                  boost::upgrade_lock<boost::shared_mutex> valueLock(*valueAccess[k]);
                  boost::upgrade_to_unique_lock<boost::shared_mutex> valueUniqueLock(valueLock);
                  // if found, pass the current value into the mutate function
                  m.at(k) = mf(m.at(k));
              }
              // insert k, v
              else
              {
                  lock.unlock();
                  boost::upgrade_lock<boost::shared_mutex> schemaLock(schemaAccess);
                  boost::upgrade_to_unique_lock<boost::shared_mutex> schemaUniqueLock(schemaLock);
      
                  boost::shared_ptr<boost::shared_mutex> mtx = boost::make_shared<boost::shared_mutex>();
                  valueAccess.insert(std::pair<Key, boost::shared_ptr<boost::shared_mutex> >(k, mtx));
                  m.insert(std::pair<Key,Value>(k,v));
              }
          }
      
          Value get(Key k) const
          {
              boost::shared_lock<boost::shared_mutex> lock(schemaAccess);
              return m.at(k);
          }
      
      
          void insert(Key k, Value v)
          {
              boost::upgrade_lock<boost::shared_mutex> schemaLock(schemaAccess);
              boost::upgrade_to_unique_lock<boost::shared_mutex> schemaUniqueLock(schemaLock);
      
              boost::shared_ptr<boost::shared_mutex> mtx = boost::make_shared<boost::shared_mutex>();
              valueAccess.insert(std::pair<Key, boost::shared_ptr<boost::shared_mutex> >(k, mtx));
              m.insert(std::pair<Key,Value>(k, v));
          }
      
          /*
           * Iterators are thread unsafe. (and access the internal map..)
           * I'm not even locking the map at this time...
           * Also, only the const iterator is provided for that reason.
           */
           typename std::map<Key,Value>::const_iterator unsafe_begin()
           {
          	 return m.begin();
           }
      
           typename std::map<Key,Value>::const_iterator unsafe_end()
           {
          	 return m.end();
           }
      
      private:
          std::map<Key, Value> m;
      
          std::map<Key, boost::shared_ptr<boost::shared_mutex> > valueAccess;
          boost::shared_mutex schemaAccess;
      };
      
    • wangxinxi 03:01 on October 9, 2015 Permalink | Reply

      Thank you!

  • wangxinxi 15:00 on December 8, 2013 Permalink | Reply
    Tags: ,   

    ACML developed by AMD is a great library for scientific computing. It boosts the performance of numpy in my server by hundreds of times.
    http://luiseth.wordpress.com/2012/04/08/accelerate-your-matrix-computations-with-acml-on-kubuntu-11-10/

     
c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel