[CWB] CWB4 and Ziggurat

Sat Oct 10 11:51:17 CEST 2015

(With apologies to those who just want to search linguistic corpora and
aren't interested in the lowlevel stuff)

> Sure, but it's unlikely that we're ever going to solve the "stochastic
> neoclassical growth model, the workhorse of modern macroeconomics" with CQP.

I'm not sure summing up numbers is a typical task either. Unfortunately
there's no R entry on the
benchmarks game, but for most nontrivial workloads, Python (even Py3) and
Perl are roughly equal:
http://benchmarksgame.alioth.debian.org/u64/perl.php

As I said, the most compelling argument for Python wouldn't be that it's
faster than the others
(because Python and R users mostly rely on the speed of underlying C
modules that do all the work),
but that it's (along with Java) taught relatively widely.

But let's look at what people would do for a simple count look of the form

>         for x in xrange(N+1):
>                 sum = sum + x
>
if they were using numpy:
--- 8< ---
from numpy import arange

a=arange(N+1)
sum = a.sum()
--- 8< ---
This takes 0.08 seconds when run from the command line for 20M numbers, and
0.21 seconds for 100M numbers.

--- 8< ---
from future import __print_function__
import cython

def test_fun(N):
    return cython.inline(
        '''
        cdef int s, x, n
        s=0; n=N+1
        for x from 0<=x<n:
           s+=x
        return s
        ''')

print(test_fun(20000000))
---8< ---
takes 0.39 seconds, including running the Cython compiler and a call to gcc
to compile the inline code.
100M iterations (after cleaning out Cython.inline's cache) take 0.45
seconds, 0.235 seconds if the
compiled code is in cache.

Numba is even simpler:
--- 8< ---
from numba import autojit

@autojit
def test_fun(N):
    s=0
    for x in xrange(N+1):
        s+=x
    return s
--- 8< ---
which takes 0.23 seconds right away for either 20M or 100M additions. (I.e.
compilation is much faster
because it can use LLVM instead of gcc).

Leaving out the numba bit and just using Python 2.7., I get 0.62 seconds
for 20M and 2.9 seconds
for 100M, so my notebook may be faster than the computer you tried this on.
Perl code using a foreach loop uses 1.0 seconds for 20M and 4.7 seconds for
100M, so it looks like the overall speed may be different.
If I take the code out of the function, Python makes the variables global,
which leads to them being slower, and we end up with results analogous to
what you had in your list - 1.8 seconds for 20M and 8.98 sec. for 100M.

This gives, for 100M integers:
bad Python code:     8.98 sec.,  11 Mops/sec
Perl code:           4.7  sec.,  21 Mops/sec
normal Python code:  2.9 sec.,   34 Mops/sec
Cython inline:       0.45 sec., 222 Mops/sec
Numba autojit:       0.23 sec., 434 Mops/sec
Numpy sum:           0.21 sec., 476 Mops/sec

There's a bit of setup involved with Numba -- unless you install Anaconda
(a Python distribution from Continuum.io that works for Windows, Mac and
Linux and uses its own binary packages, which makes it newbie-friendly on
Windows but slightly awkward on Linux), you'd have to install LLVM and
Numba, but the results are definitely nice.

BTW, a bit of random googling turned up this neat C library for fast
(de)compression of
integer blocks:
https://github.com/lemire/simdcomp
Daniel Lemire has also a C++ library for intersecting compressed integer
lists
https://github.com/lemire/SIMDCompressionAndIntersection

Cheers,
Yannick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20151010/3b64a108/attachment-0001.html>