Numerical Performance

Summary

One of the goals of tafra is to provide a fast-as-possible data structure for numerical computing. To achieve this, all function returns are written as generator expressions wherever possible.

Additionally, because the data contains values of ndarrays, the map functions may also take functions that operate on ndarrays. This means that they are able to take numba @jit’ed functions as well.

pandas is essentially a standard package for anyone performing data science with Python, and it provides a wide variety of useful features. However, it’s not particularly aimed at maximizing performance. Let’s use an example of a dataframe of function arguments, and a function that maps scalar arguments into a vector result. Any function of time serves this purpose, so let’s use a hyperbolic function.

First, let’s randomnly generate some function arguments and construct both a Tafra and a pandas.DataFrame:

>>> from tafra import Tafra
>>> import pandas as pd
>>> import numpy as np

>>> from typing import Tuple, Union, Any

>>> tf = Tafra({
...     'wellid': np.arange(0, 100),
...     'qi': np.random.lognormal(np.log(2000.), np.log(3000. / 1000.) / (2 * 1.28), 100),
...     'Di': np.random.uniform(.5, .9, 100),
...     'bi': np.random.normal(1.0, .2, 100)
... })

>>> df = pd.DataFrame(tf.data)

>>> tf.head(5)

index

wellid

qi

Di

bi

dtype

int32

float64

float64

float64

0

0

2665.82

0.54095

1.07538

1

1

1245.85

0.81711

0.48448

2

2

1306.56

0.61570

0.54587

3

3

2950.33

0.81956

0.66440

4

4

1963.93

0.56918

0.74165

Next, we define our hyperbolic function and the time array to evaluate:

>>> import math

>>> def tan_to_nominal(D: float) -> float:
...     return -math.log1p(-D)

>>> def sec_to_nominal(D: float, b: float) -> float:
...     if b <= 1e-4:
...         return tan_to_nominal(Di)
...
...     return ((1.0 - D) ** -b - 1.0) / b

>>> def hyp(qi: float, Di: float, bi: float, t: np.ndarray) -> np.ndarray:
...     Dn = sec_to_nominal(Di, bi)
...
...     if bi <= 1e-4:
...         return qi * np.exp(-Dn * t)
...
...    return qi / (1.0 + Dn * bi * t) ** (1.0 / bi)

>>> t = 10 ** np.linspace(0, 4, 101)

And let’s build a generic mapper function to map over the named columns:

>>> def mapper(tf: Union[Tafra, pd.DataFrame]) -> Tuple[int, np.ndarray]:
...     return tf['wellid'], hyp(tf['qi'], tf['Di'], tf['bi'], t)

We can call this with the following style. The pandas syntax is a bit verbose, but pandas.DataFrame.from_items() is deprecated in newer versions, so this is the recommended way. Let’s time each approach:

>>> %timeit tdcs = Tafra(tf.row_map(mapper))
3.38 ms ± 129 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


>>> %timeit pdcs = pd.DataFrame(dict(df.apply(mapper, axis=1).to_list())))
6.86 ms ± 408 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

We see Tafra is about twice as fast. Mapping a function this way is convenient, but there is some indirection occuring that we can do away with to obtain direct access to the data of the Tafra, and there is a faster method for pandas as well as opposed to pandas.DataFrame.apply(). Instead of constructing a new Tafra or pd.DataFrame for each row, we can instead return a NamedTuple, which is faster to construct. Doing so:

>>> def tuple_mapper(tf: Tuple[Any, ...]) -> Tuple[int, np.ndarray]:
...     return tf.wellid, hyp(tf.qi, tf.Di, tf.bi, t)

>>> %timeit Tafra(tf.tuple_map(tuple_mapper))
1.68 ms ± 84.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

>>> %timeit pd.DataFrame(dict((tuple_mapper(row)) for row in df.itertuples()))
3.14 ms ± 121 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

And once again, Tafra is about twice as fast.

One of the upcoming features of pandas is the ability to apply numba @jit’ed functions to pandas.DataFrame.apply(). The performance improvement should be significant, especially for long-running functions, but there will still be overhead in the abstraction before the function is called. We can demonstrate this by @jit’ing our hyperbolic function and mapping it over the dataframes, and get an idea of how much improvement is possible:

>>> from numba import jit
>>> jit_kw = {'fastmath': True}

>>> @jit(**jit_kw)
...  def tan_to_nominal(D: float) -> float:
...     return -math.log1p(-D)

>>> @jit(**jit_kw)
... def sec_to_nominal(D: float, b: float) -> float:
...     if b <= 1e-4:
...         return tan_to_nominal(D)
...
...     return ((1.0 - D) ** -b - 1.0) / b

>>> @jit(**jit_kw)
... def hyp(qi: float, Di: float, bi: float, t: np.ndarray) -> np.ndarray:
...     Dn = sec_to_nominal(Di, bi)
...
...     if bi <= 1e-4:
...         return qi * np.exp(-Dn * t)
...
...     return qi / (1.0 + Dn * bi * t) ** (1.0 / bi)

>>> %timeit Tafra(tf.tuple_map(tuple_mapper))
884 µs ± 41.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

>>> %timeit pd.DataFrame(dict((tuple_mapper(row)) for row in df.itertuples()))
3.09 ms ± 115 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Interestingly, we see that pandas does not get much benefit from this, as the limit occurs not in the performance of the functions but in the performance of pandas itself. We can validate this by skipping the dataframe construction step:

>>> %timeit [tf.tuple_map(tuple_mapper)]
81.9 µs ± 2.91 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>> %timeit [(tuple_mapper(row)) for row in df.itertuples()]
614 µs ± 14.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Last, we might as the question “If pandas is incurring some performance penalty, what is the performance penalty of Tafra?” We’ll write a function that operates on the numpy.ndarrays themselves rather than using the helper Tafra.tuple_map(). We can also use numpy’s built in apply function, numpy.apply_along_axis(), but it is considerably slower than a @jit’ed function.

>>> @jit(**jit_kw)
... def ndarray_map(qi: np.ndarray, Di: np.ndarray, bi: np.ndarray, t: np.ndarray) -> np.ndarray:
...     out = np.zeros((qi.shape[0], t.shape[0]))
...     for i in range(qi.shape[0]):
...         out[i, :] = hyp(qi[i], Di[i], bi[i], t)
...
...     return out
81.2 µs ± 9.7 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

And the timing is neglible, meaning Tafra’s Tafra.tuple_map() is essentially as performant as we are able to achieve in Python.