“
One downfall of numpy’s optimization of vector operations is that it only occurs on one operation at a time. That is to say, when we are doing the operation A * B + C with numpy vectors, first the entire A * B operation completes, and the data is stored in a temporary vector; then this new vector is added with C. The in-place version of the diffusion code in Example 6-14 shows this quite explicitly. However, there are many modules that can help with this. numexpr is a module that can take an entire vector expression and compile it into very efficient code that is optimized to minimize cache misses and temporary space used. In addition, the expressions can utilize multiple CPU cores (see Chapter 9 for more information) and specialized instructions for Intel chips to maximize the speedup. It is very easy to change code to use numexpr: all that’s required is to rewrite the expressions as strings with references to local variables. The expressions are compiled behind the scenes (and cached so that calls to the same expression don’t incur the same cost of compilation) and run using optimized code. Example 6-19 shows the simplicity of changing the evolve function to use numexpr. In this case, we chose to use the out parameter of the evaluate function so that numexpr doesn’t allocate a new vector to which to return the result of the calculation.
”
”
Micha Gorelick (High Performance Python: Practical Performant Programming for Humans)