Tagged: Linux Toggle Comment Threads | Keyboard Shortcuts

  • wangxinxi 13:38 on February 11, 2014 Permalink | Reply
    Tags: Linux,   

    Before compiling numpy, please ensure that you have removed ATLAS, OpenBLAS etc.

    To compile numpy with mkl, the configuration for site.cfg of numpy is as following:

    [mkl]
    library_dirs = /opt/intel/mkl/lib/intel64
    include_dirs = /opt/intel/mkl/include
    mkl_libs = mkl_rt
    lapack_libs = mkl_rt
    

    To compile numpy with acml, the configuration for site.cfg is as following:

    [blas]
    blas_libs = cblas, acml_mp
    library_dirs = /opt/acml5.3.1/gfortran64_fma4_mp/lib
    include_dirs = /opt/acml5.3.1/gfortran64_fma4_mp/include
    
    [lapack]
    language = f77
    lapack_libs = mkl_rt
    library_dirs = /opt/intel/mkl/lib/intel64
    include_dirs = /opt/intel/mkl/include
    
    Advertisements
     
  • wangxinxi 12:55 on February 11, 2014 Permalink | Reply
    Tags: Linux,   

    To test the efficiency of your numpy:

    #!/usr/bin/env python                                                           
    import timeit
     
     
    setup = "import numpy;\
            import numpy.linalg as linalg;\
            x = numpy.random.random((1000,1000));\
            z = numpy.dot(x, x.T)"
    count = 5
     
    t = timeit.Timer("linalg.cholesky(z)", setup=setup)
    print "cholesky:", t.timeit(count)/count, "sec"
     
    t = timeit.Timer("linalg.inv(z)", setup=setup)
    print "inv:", t.timeit(count)/count, "sec"
    

    This is my result

    cholesky: 0.0482553958893 sec
    inv: 0.102989816666 sec
    

    numpy.dot is a bit special. It depends on ATLAS by defaults, which means if you have not installed ATLAS in your system, you could get very slow numpy.dot. To use other implementations instead of ATLAS in numpy, you can follow this post: https://xinxiwang.wordpress.com/2014/02/03/efficient-numpy-dot-relies-on-atlas-because-the/

    #!/usr/bin/env python                                                           
    import numpy
    import sys
    import timeit
     
    try:
        import numpy.core._dotblas
        print 'FAST BLAS'
    except ImportError:
        print 'slow blas'
     
    print "version:", numpy.__version__
    print "maxint:", sys.maxint
    print
     
    x = numpy.random.random((1000,1000))
     
    setup = "import numpy; x = numpy.random.random((1000,1000))"
    count = 5
     
    t = timeit.Timer("numpy.dot(x, x.T)", setup=setup)
    print "dot:", t.timeit(count)/count, "sec"
    

    The following is my result

    FAST BLAS
    version: 1.8.0
    maxint: 9223372036854775807
    
    dot: 0.0540486335754 sec
    

    My CPU model:

    processor       : 0
    vendor_id       : GenuineIntel
    cpu family      : 6
    model           : 26
    model name      : Intel(R) Xeon(R) CPU           E5506  @ 2.13GHz
    stepping        : 5
    microcode       : 0x11
    cpu MHz         : 1596.000
    cache size      : 4096 KB
    physical id     : 1
    siblings        : 4
    core id         : 0
    cpu cores       : 4
    apicid          : 16
    initial apicid  : 16
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 11
    wp              : yes
    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dtherm tpr_shadow vnmi flexpriority ept vpid
    bogomips        : 4255.80
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 40 bits physical, 48 bits virtual
    power management:
    
     
  • wangxinxi 01:23 on February 3, 2014 Permalink | Reply
    Tags: Linux,   

    To get efficient numpy.dot, numpy needs to be compiled against ATLAS. This is probably a bug of numpy because numpy.dot should be able two work with any cblas implementation.

    To fix this bug, we first comment out the following two lines of numpy/core/setup.py

                 if ('NO_ATLAS_INFO', 1) in blas_info.get('define_macros', []):
                    return None # dotblas needs ATLAS, Fortran compiled blas will not be sufficient.
    

    Then, we compile numpy without ATLAS.

    Finally, we manually compile _dotblas using the following command:

    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro build/temp.linux-x86_64-2.7/numpy/core/blasdot/_dotblas.o -LCBLAS_PATH -lcblas -o build/lib.linux-x86_64-2.7/numpy/core/_dotblas.so -lpython2.7
    

    You need to replace CBLAS_PATH with your own path which holds the libcblas.so file.

     
  • wangxinxi 22:59 on January 30, 2014 Permalink | Reply
    Tags: Linux,   

    Here are some steps I used with Ubuntu 13.04 and the new multi-package distribution Intel® SDK for OpenCL* Applications XE 2013:

    Download the 64-bit distribution from Intel: intel_sdk_for_ocl_applications_2013_xe_sdk_3.0.67279_x64.tgz
    Download the Intel public key. I had trouble finding it, but googling the filename pointed me to the correct page at Intel. Intel-E901-172E-EF96-900F-B8E1-4184-D7BE-0E73-F789-186F.pub

    $ sudo apt-get install -y rpm alien libnuma1

    $ sudo rpm –import Intel-E901-172E-EF96-900F-B8E1-4184-D7BE-0E73-F789-186F.pub

    $ tar -xvf intel_sdk_for_ocl_applications_2013_xe_sdk_3.0.67279_x64.tgz
    $ cd intel_sdk_for_ocl_applications_2013_xe_sdk_3.0.67279_x64/

    $ fakeroot alien –to-deb opencl-1.2-base-3.0.67279-1.x86_64.rpm
    $ fakeroot alien –to-deb opencl-1.2-intel-cpu-3.0.67279-1.x86_64.rpm

    $ sudo dpkg -i opencl-1.2-base_3.0.67279-2_amd64.deb
    $ sudo dpkg -i opencl-1.2-intel-cpu_3.0.67279-2_amd64.deb

    The above installs the library files and intallable client driver registration in /opt/intel/opencl-1.2-3.0.67279.
    Two more steps were needed to run an OpenCL program.

    Add library to search path:
    $ sudo touch /etc/ld.so.conf.d/intelOpenCL.conf
    Edit this file, add the line:
    /opt/intel/opencl-1.2-3.0.67279/lib64

    Link to the intel icd file in the expected location:
    $ sudo ln /opt/intel/opencl-1.2-3.0.67279/etc/intel64.icd /etc/OpenCL/vendors/intel64.icd
    $ sudo ldconfig

    At this point I could run an existing application. If doing developmnent, install the developer headers and tools:

    $ fakeroot alien –to-deb opencl-1.2-devel-3.0.67279-1.x86_64.rpm
    $ fakeroot alien –to-deb opencl-1.2-intel-devel-3.0.67279-1.x86_64.rpm

    $ sudo dpkg -i opencl-1.2-devel_3.0.67279-2_amd64.deb
    $ sudo dpkg -i opencl-1.2-intel-devel_3.0.67279-2_amd64.deb

    It is worth noting the include path for headers is: /opt/intel/opencl-1.2-3.0.67279/include
    The linking path for libraries is: /opt/intel/opencl-1.2-3.0.67279/lib64
    The developer tool binaries are installed in: /opt/intel/opencl-1.2-3.0.67279/bin

    Top
    Back to original post
    Login to leave a comment.

    Terms of Use *Trademarks Privacy Cookies Publications
    Look for us on:

     
  • wangxinxi 13:14 on January 29, 2014 Permalink | Reply
    Tags: Linux,   

    How to check the memory and usage of GPU?

    [gold-c01]$ nvidia-smi 
    Wed Jan 29 13:13:55 2014       
    +------------------------------------------------------+                       
    | NVIDIA-SMI 5.319.72   Driver Version: 319.72         |                       
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  Tesla M2090         Off  | 0000:11:00.0     Off |                    0 |
    | N/A   N/A   P12    31W /  N/A |       10MB /  5375MB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
    |   1  Tesla M2090         Off  | 0000:14:00.0     Off |                    0 |
    | N/A   N/A    P0    80W /  N/A |      155MB /  5375MB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
                                                                                   
    +-----------------------------------------------------------------------------+
    | Compute processes:                                               GPU Memory |
    |  GPU       PID  Process name                                     Usage      |
    |=============================================================================|
    |    1     10223  python                                               142MB  |
    +-----------------------------------------------------------------------------+
    
     
  • wangxinxi 15:00 on December 8, 2013 Permalink | Reply
    Tags: Linux,   

    ACML developed by AMD is a great library for scientific computing. It boosts the performance of numpy in my server by hundreds of times.
    http://luiseth.wordpress.com/2012/04/08/accelerate-your-matrix-computations-with-acml-on-kubuntu-11-10/

     
  • wangxinxi 01:53 on December 2, 2013 Permalink | Reply
    Tags: Linux   

    Innodb on SSD is much faster than on HDD.

    A is a Innodb table defined as following

    +-------+---------+------+-----+---------+-------+
    | Field | Type    | Null | Key | Default | Extra |
    +-------+---------+------+-----+---------+-------+
    | a     | int(11) | YES  |     | NULL    |       |
    +-------+---------+------+-----+---------+-------+
    

    We test the performance of SSD and HDD using the following script to commit 1000 transactions.

    #!/usr/bin/python
    import MySQLdb as mdb
    
    try:
        con = mdb.connect('localhost', 'root', '', 'test')
        con.autocommit(True)
        cur = con.cursor()
        for i in range(1000):
            cur.execute("INSERT INTO A VALUES (1)")
        con.close()
    except mdb.Error, e:
        print "Error %d: %s" (e.args[0], e.args[1])
        sys.exit(1)
    

    With SSD, the above 1000 transactions can finish with the following time

    real	0m0.571s
    user	0m0.173s
    sys	0m0.075s
    

    With HDD, it runs much slower

    real	1m20.121s
    user	0m0.220s
    sys	0m0.044s
    
     
  • wangxinxi 23:15 on September 19, 2013 Permalink | Reply
    Tags: Linux,   

    How could my server with 64 cores run slower than a desktop with an i7 cpu?

    http://osdf.github.io/blog/numpyscipy-with-openblas-for-ubuntu-1204.html

    My results are:
    cholesky: 0.226734781265 sec
    svd: 6.15930080414 sec

     
  • wangxinxi 12:42 on August 27, 2013 Permalink | Reply
    Tags: Linux,   

    Setup shared folder provided by Virtualbox in Ubuntu
    http://askubuntu.com/questions/30396/error-mounting-virtualbox-shared-folders-in-an-ubuntu-guest

     
  • wangxinxi 12:10 on August 27, 2013 Permalink | Reply
    Tags: Linux,   

    Add the line number to the head of every line in VIM:

    %s/^/\=line(‘.’).’ ‘

     
c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel