API Reference

Summary

Tafra

Tafra(data, dtypes, validate, check_rows)

A minimalist dataframe.

Aggregations

Union()

Union two Tafra together.

GroupBy(group_by_cols, aggregation, iter_fn)

Aggregation by a set of unique values.

Transform(group_by_cols, aggregation, iter_fn)

Apply a function to each unique set of values and join to the original table.

IterateBy(group_by_cols)

A generator that yields a Tafra for each set of unique values.

InnerJoin(on, select)

An inner join.

LeftJoin(on, select)

A left join.

CrossJoin(on, select)

A cross join.

Methods

from_records(records, columns[, dtypes])

Construct a Tafra from an Iterator of records, e.g.

from_dataframe(df[, dtypes])

Construct a Tafra from a pandas.DataFrame.

from_series(s[, dtype])

Construct a Tafra from a pandas.Series.

read_sql(query, cur)

Execute a SQL SELECT statement using a pyodbc.Cursor and return a Tuple of column names and an Iterator of records.

read_sql_chunks(query, cur[, chunksize])

Execute a SQL SELECT statement using a pyodbc.Cursor and return a Tuple of column names and an Iterator of records.

read_csv(csv_file[, guess_rows, missing, dtypes])

Read a CSV file with a header row, infer the types of each column, and return a Tafra containing the file's contents.

as_tafra(maybe_tafra)

Returns the unmodified tafra` if already a Tafra, else construct a Tafra from known types or subtypes of DataFrame or dict.

to_records([columns, cast_null])

Return a Iterator of Tuple, each being a record (i.e.

to_list([columns, inner])

Return a list of homogeneously typed columns (as numpy.ndarray).

to_tuple([columns, name, inner])

Return a NamedTuple or Tuple.

to_array([columns])

Return an object array.

to_pandas([columns])

Construct a pandas.DataFrame.

to_csv(filename[, columns])

Write the Tafra to a CSV.

rows

The number of rows of the first item in data.

columns

The names of the columns.

data

The Tafra data.

dtypes

The Tafra dtypes.

size

The Tafra size.

ndim

The Tafra number of dimensions.

shape

The Tafra shape.

head([n])

Display the head of the Tafra.

keys()

Return the keys of data, i.e. like dict.keys().

values()

Return the values of data, i.e. like dict.values().

items()

Return the items of data, i.e. like dict.items().

get(key[, default])

Return from the get() function of data, i.e. like dict.get().

iterrows()

Yield rows as Tafra.

itertuples([name])

Yield rows as NamedTuple, or if name is None, yield rows as tuple.

itercols()

Yield columns as Tuple[str, np.ndarray], where the str is the column name.

row_map(fn, *args, **kwargs)

Map a function over rows.

tuple_map(fn, *args, **kwargs)

Map a function over rows.

col_map(fn, *args, **kwargs)

Map a function over columns.

key_map(fn, *args, **kwargs)

Map a function over columns like :meth:col_map, but return Tuple of the key with the function result.

pipe(fn, *args, **kwargs)

Apply a function to the Tafra and return the resulting Tafra.

select(columns)

Use column names to slice the Tafra columns analogous to SQL SELECT.

copy([order])

Create a copy of a Tafra.

update(other)

Update the data and dtypes of this Tafra with another Tafra.

update_inplace(other)

Inplace version.

update_dtypes(dtypes)

Apply new dtypes.

update_dtypes_inplace(dtypes)

Inplace version.

parse_object_dtypes()

Parse the object dtypes using the ObjectFormatter instance.

parse_object_dtypes_inplace()

Inplace version.

rename(renames)

Rename columns in the Tafra from a dict.

rename_inplace(renames)

In-place version.

coalesce(column, fills)

Fill None values from fills.

coalesce_inplace(column, fills)

In-place version.

_coalesce_dtypes()

Update dtypes with missing keys that exist in data.

delete(columns)

Remove a column from data and dtypes.

delete_inplace(columns)

In-place version.

pprint([indent, width, depth, compact])

Pretty print.

pformat([indent, width, depth, compact])

Format for pretty printing.

to_html([n])

Construct an HTML table representation of the Tafra data.

_slice(_slice)

Use a slice to slice the Tafra.

_iindex(index)

Use a :class`int` to slice the Tafra.

_aindex(index)

Use numpy advanced indexing to slice the Tafra.

_ndindex(index)

Use numpy.ndarray indexing to slice the Tafra.

Helper Methods

union(other)

Helper function to implement tafra.group.Union.apply().

union_inplace(other)

Inplace version.

group_by(columns[, aggregation, iter_fn])

Helper function to implement tafra.group.GroupBy.apply().

transform(columns[, aggregation, iter_fn])

Helper function to implement tafra.group.Transform.apply().

iterate_by(columns)

Helper function to implement tafra.group.IterateBy.apply().

inner_join(right, on[, select])

Helper function to implement tafra.group.InnerJoin.apply().

left_join(right, on[, select])

Helper function to implement tafra.group.LeftJoin.apply().

cross_join(right[, select])

Helper function to implement tafra.group.CrossJoin.apply().

Object Formatter

ObjectFormatter

A dictionary that contains mappings for formatting objects.

Detailed Reference

Tafra

Methods

class tafra.base.Tafra(data: ~dataclasses.InitVar = <property object>, dtypes: ~dataclasses.InitVar = <property object>, validate: ~dataclasses.InitVar = True, check_rows: bool = True)[source]

A minimalist dataframe.

Constructs a Tafra from dict of data and (optionally) dtypes. Types on parameters are the types of the constructed Tafra, but attempts are made to parse anything that “looks” like the correct data structure, including Iterable, Iterator, Sequence, and Mapping and various combinations.

Parameters are given as an InitVar, defined as:

InitVar = Union[Tuple[str, Any], _Mapping, Sequence[_Element], Iterable[_Element], Iterator[_Element], enumerate]

_Mapping = Union[Mapping[str, Any], Mapping[int, Any], Mapping[float, Any], Mapping[bool, Any]

_Element = Union[Tuple[Union[str, int, float, np.ndarray], Any], List[Any], Mapping]

Parameters
  • data (InitVar) – The data of the Tafra.

  • dtypes (InitVar) – The dtypes of the columns.

  • validate (bool = True) – Run validation checks of the data. False will improve performance, but data and dtypes will not be validated for conformance to expected data structures.

  • check_rows (bool = True) – Run row count checks. False will allow columns of differing lengths, which may break several methods.

Returns

tafra – The constructed Tafra.

Return type

Tafra

classmethod from_dataframe(df: DataFrame, dtypes: Optional[Dict[str, Any]] = None, **kwargs: Any) Tafra[source]

Construct a Tafra from a pandas.DataFrame. If dtypes are not given, take from pandas.DataFrame.dtypes.

Parameters
  • df (pandas.DataFrame) – The dataframe used to build the Tafra.

  • dtypes (Optional[Dict[str, Any]] = None) – The dtypes of the columns.

Returns

tafra – The constructed Tafra.

Return type

Tafra

classmethod from_series(s: Series, dtype: Optional[str] = None, **kwargs: Any) Tafra[source]

Construct a Tafra from a pandas.Series. If dtype is not given, take from pandas.Series.dtype.

Parameters
  • df (pandas.Series) – The series used to build the Tafra.

  • dtype (Optional[str] = None) – The dtypes of the column.

Returns

tafra – The constructed Tafra.

Return type

Tafra

classmethod from_records(records: Iterable[Iterable[Any]], columns: Iterable[str], dtypes: Optional[Iterable[Any]] = None, **kwargs: Any) Tafra[source]

Construct a Tafra from an Iterator of records, e.g. from a SQL query. The records should be a nested Iterable, but can also be fed a cursor method such as cur.fetchmany() or cur.fetchall().

Parameters
  • records (ITerable[Iteralble[str]]) – The records to turn into a Tafra.

  • columns (Iterable[str]) – The column names to use.

  • dtypes (Optional[Iterable[Any]] = None) – The dtypes of the columns.

Returns

tafra – The constructed Tafra.

Return type

Tafra

classmethod read_sql(query: str, cur: Cursor) Tafra[source]

Execute a SQL SELECT statement using a pyodbc.Cursor and return a Tuple of column names and an Iterator of records.

Parameters
  • query (str) – The SQL query.

  • cur (pyodbc.Cursor) – The pyodbc cursor.

Returns

tafra – The constructed Tafra.

Return type

Tafra

classmethod read_sql_chunks(query: str, cur: Cursor, chunksize: int = 100) Iterator[Tafra][source]

Execute a SQL SELECT statement using a pyodbc.Cursor and return a Tuple of column names and an Iterator of records.

Parameters
  • query (str) – The SQL query.

  • cur (pyodbc.Cursor) – The pyodbc cursor.

Returns

tafra – The constructed Tafra.

Return type

Tafra

classmethod read_csv(csv_file: Union[str, Path, TextIOWrapper, IO[str]], guess_rows: int = 5, missing: Optional[str] = '', dtypes: Optional[Dict[str, Any]] = None, **csvkw: Dict[str, Any]) Tafra[source]

Read a CSV file with a header row, infer the types of each column, and return a Tafra containing the file’s contents.

Parameters
  • csv_file (Union[str, TextIOWrapper]) – The path to the CSV file, or an open file-like object.

  • guess_rows (int) – The number of rows to use when guessing column types.

  • dtypes (Optional[Dict[str, str]]) – dtypes by column name; by default, all dtypes will be inferred from the file contents.

  • **csvkw (Dict[str, Any]) – Additional keyword arguments passed to csv.reader.

Returns

tafra – The constructed Tafra.

Return type

Tafra

classmethod as_tafra(maybe_tafra: Union[Tafra, DataFrame, Series, Dict[str, Any], Any]) Optional[Tafra][source]

Returns the unmodified tafra` if already a Tafra, else construct a Tafra from known types or subtypes of DataFrame or dict. Structural subtypes of DataFrame or Series are also valid, as are classes that have cls.__name__ == 'DataFrame' or cls.__name__ == 'Series'.

Parameters

maybe_tafra (Union['tafra', DataFrame]) – The object to ensure is a Tafra.

Returns

tafra – The Tafra, or None is maybe_tafra is an unknown type.

Return type

Optional[Tafra]

to_records(columns: Optional[Iterable[str]] = None, cast_null: bool = True) Iterator[Tuple[Any, ...]][source]

Return a Iterator of Tuple, each being a record (i.e. row) and allowing heterogeneous typing. Useful for e.g. sending records back to a database.

Parameters
  • columns (Optional[Iterable[str]] = None) – The columns to extract. If None, extract all columns.

  • cast_null (bool) – Cast np.nan to None. Necessary for :mod:pyodbc

Returns

records

Return type

Iterator[Tuple[Any, …]]

to_list(columns: Optional[Iterable[str]] = None, inner: bool = False) Union[List[ndarray], List[List[Any]]][source]

Return a list of homogeneously typed columns (as numpy.ndarray). If a generator is needed, use to_records(). If inner == True each column will be cast from numpy.ndarray to a List.

Parameters
  • columns (Optional[Iterable[str]] = None) – The columns to extract. If None, extract all columns.

  • inner (bool = False) – Cast all np.ndarray to :class`List`.

Returns

list

Return type

Union[List[np.ndarray], List[List[Any]]]

to_tuple(columns: Optional[Iterable[str]] = None, name: Optional[str] = 'Tafra', inner: bool = False) Union[Tuple[ndarray], Tuple[Tuple[Any, ...]]][source]

Return a NamedTuple or Tuple. If a generator is needed, use to_records(). If inner == True each column will be cast from np.ndarray to a Tuple. If name is None, returns a Tuple instead.

Parameters
  • columns (Optional[Iterable[str]] = None) – The columns to extract. If None, extract all columns.

  • name (Optional[str] = 'Tafra') – The name for the NamedTuple. If None, construct a Tuple instead.

  • inner (bool = False) – Cast all np.ndarray to :class`List`.

Returns

list

Return type

Union[Tuple[np.ndarray], Tuple[Tuple[Any, …]]]

to_array(columns: Optional[Iterable[str]] = None) ndarray[source]

Return an object array.

Parameters

columns (Optional[Iterable[str]] = None) – The columns to extract. If None, extract all columns.

Returns

array

Return type

np.ndarray

to_pandas(columns: Optional[Iterable[str]] = None) DataFrame[source]

Construct a pandas.DataFrame.

Parameters

columns (Iterable[str]) – The columns to write. IF None, write all columns.

Returns

dataframe

Return type

pandas.DataFrame

to_csv(filename: Union[str, Path, TextIOWrapper, IO[str]], columns: Optional[Iterable[str]] = None) None[source]

Write the Tafra to a CSV.

Parameters
  • filename (Union[str, Path]) – The path of the filename to write.

  • columns (Iterable[str]) – The columns to write. IF None, write all columns.

rows

The number of rows of the first item in data. The len() of all items have been previously validated.

Returns

rows – The number of rows of the Tafra.

Return type

int

columns

The names of the columns. Equivalent to Tafra.keys().

Returns

columns – The column names.

Return type

Tuple[str, …]

data: InitVar

The Tafra data.

Returns

data – The data.

Return type

Dict[str, np.ndarray]

dtypes: InitVar

The Tafra dtypes.

Returns

dtypes – The dtypes.

Return type

Dict[str, str]

size

The Tafra size.

Returns

size – The size.

Return type

int

ndim

The Tafra number of dimensions.

Returns

ndim – The number of dimensions.

Return type

int

shape

The Tafra shape.

Returns

shape – The shape.

Return type

int

head(n: int = 5) Tafra[source]

Display the head of the Tafra.

Parameters

n (int = 5) – The number of rows to display.

Returns

None

Return type

None

keys() KeysView[str][source]

Return the keys of data, i.e. like dict.keys().

Returns

data keys – The keys of the data property.

Return type

KeysView[str]

values() ValuesView[ndarray][source]

Return the values of data, i.e. like dict.values().

Returns

data values – The values of the data property.

Return type

ValuesView[np.ndarray]

items() ItemsView[str, ndarray][source]

Return the items of data, i.e. like dict.items().

Returns

items – The data items.

Return type

ItemsView[str, np.ndarray]

get(key: str, default: Optional[Any] = None) Any[source]

Return from the get() function of data, i.e. like dict.get().

Parameters
  • key (str) – The key value in the data property.

  • default (Any) – The default to return if the key does not exist.

Returns

value – The value for the key, or the default if the key does not exist.

Return type

Any

iterrows() Iterator[Tafra][source]

Yield rows as Tafra. Use itertuples() for better performance.

Returns

tafras – An iterator of Tafra.

Return type

Iterator[Tafra]

itertuples(name: Optional[str] = 'Tafra') Iterator[Tuple[Any, ...]][source]

Yield rows as NamedTuple, or if name is None, yield rows as tuple.

Parameters

name (Optional[str] = 'Tafra') – The name for the NamedTuple. If None, construct a Tuple instead.

Returns

tuples – An iterator of NamedTuple.

Return type

Iterator[NamedTuple[Any, …]]

itercols() Iterator[Tuple[str, ndarray]][source]

Yield columns as Tuple[str, np.ndarray], where the str is the column name.

Returns

tuples – An iterator of Tafra.

Return type

Iterator[Tuple[str, np.ndarray]]

row_map(fn: Callable[[...], Any], *args: Any, **kwargs: Any) Iterator[Any][source]

Map a function over rows. To apply to specific columns, use select() first. The function must operate on Tafra.

Parameters
  • fn (Callable[..., Any]) – The function to map.

  • *args (Any) – Additional positional arguments to fn.

  • **kwargs (Any) – Additional keyword arguments to fn.

Returns

iter_tf – An iterator to map the function.

Return type

Iterator[Any]

tuple_map(fn: Callable[[...], Any], *args: Any, **kwargs: Any) Iterator[Any][source]

Map a function over rows. This is faster than row_map(). To apply to specific columns, use select() first. The function must operate on NamedTuple from itertuples().

Parameters
  • fn (Callable[..., Any]) – The function to map.

  • name (Optional[str] = 'Tafra') – The name for the NamedTuple. If None, construct a Tuple instead. Must be given as a keyword argument.

  • *args (Any) – Additional positional arguments to fn.

  • **kwargs (Any) – Additional keyword arguments to fn.

Returns

iter_tf – An iterator to map the function.

Return type

Iterator[Any]

col_map(fn: Callable[[...], Any], *args: Any, **kwargs: Any) Iterator[Any][source]

Map a function over columns. To apply to specific columns, use select() first. The function must operate on Tuple[str, np.ndarray].

Parameters
  • fn (Callable[..., Any]) – The function to map.

  • *args (Any) – Additional positional arguments to fn.

  • **kwargs (Any) – Additional keyword arguments to fn.

Returns

iter_tf – An iterator to map the function.

Return type

Iterator[Any]

key_map(fn: Callable[[...], Any], *args: Any, **kwargs: Any) Iterator[Tuple[str, Any]][source]

Map a function over columns like :meth:col_map, but return Tuple of the key with the function result. To apply to specific columns, use select() first. The function must operate on Tuple[str, np.ndarray].

Parameters
  • fn (Callable[..., Any]) – The function to map.

  • *args (Any) – Additional positional arguments to fn.

  • **kwargs (Any) – Additional keyword arguments to fn.

Returns

iter_tf – An iterator to map the function.

Return type

Iterator[Any]

pipe(fn: Callable[[Tafra, P], Tafra], *args: Any, **kwargs: Any) Tafra[source]

Apply a function to the Tafra and return the resulting Tafra. Primarily used to build a tranformer pipeline.

Parameters
  • fn (Callable[[], 'Tafra']) – The function to apply.

  • *args (Any) – Additional positional arguments to fn.

  • **kwargs (Any) – Additional keyword arguments to fn.

Returns

tafra – A new Tafra result of the function.

Return type

Tafra

__rshift__(other: Callable[[Tafra], Tafra]) Tafra[source]
select(columns: Iterable[str]) Tafra[source]

Use column names to slice the Tafra columns analogous to SQL SELECT. This does not copy the data. Call copy() to obtain a copy of the sliced data.

Parameters

columns (Iterable[str]) – The column names to slice from the Tafra.

Returns

tafra – the Tafra with the sliced columns.

Return type

Tafra

copy(order: str = 'C') Tafra[source]

Create a copy of a Tafra.

Parameters

order (str = 'C' {‘C’, ‘F’, ‘A’, ‘K’}) – Controls the memory layout of the copy. ‘C’ means C-order, ‘F’ means F-order, ‘A’ means ‘F’ if a is Fortran contiguous, ‘C’ otherwise. ‘K’ means match the layout of a as closely as possible.

Returns

tafra – A copied Tafra.

Return type

Tafra

update(other: Tafra) Tafra[source]

Update the data and dtypes of this Tafra with another Tafra. Length of rows must match, while data of different dtype will overwrite.

Parameters

other (Tafra) – The other Tafra from which to update.

Returns

None

Return type

None

update_inplace(other: Tafra) None[source]

Inplace version.

Update the data and dtypes of this Tafra with another Tafra. Length of rows must match, while data of different dtype will overwrite.

Parameters

other (Tafra) – The other Tafra from which to update.

Returns

None

Return type

None

update_dtypes(dtypes: Dict[str, Any]) Tafra[source]

Apply new dtypes.

Parameters

dtypes (Dict[str, Any]) – The dtypes to update. If None, create from entries in data.

Returns

tafra – The updated Tafra.

Return type

Optional[Tafra]

update_dtypes_inplace(dtypes: Dict[str, Any]) None[source]

Inplace version.

Apply new dtypes.

Parameters

dtypes (Dict[str, Any]) – The dtypes to update. If None, create from entries in data.

Returns

tafra – The updated Tafra.

Return type

Optional[Tafra]

parse_object_dtypes() Tafra[source]

Parse the object dtypes using the ObjectFormatter instance.

parse_object_dtypes_inplace() None[source]

Inplace version.

Parse the object dtypes using the ObjectFormatter instance.

rename(renames: Dict[str, str]) Tafra[source]

Rename columns in the Tafra from a dict.

Parameters

renames (Dict[str, str]) – The map from current names to new names.

Returns

tafra – The Tafra with update names.

Return type

Optional[Tafra]

rename_inplace(renames: Dict[str, str]) None[source]

In-place version.

Rename columns in the Tafra from a dict.

Parameters

renames (Dict[str, str]) – The map from current names to new names.

Returns

tafra – The Tafra with update names.

Return type

Optional[Tafra]

coalesce(column: str, fills: Iterable[Iterable[Union[None, str, int, float, bool, ndarray]]]) ndarray[source]

Fill None values from fills. Analogous to SQL COALESCE or pandas.fillna().

Parameters
  • column (str) – The column to coalesce.

  • fills (Iterable[Union[str, int, float, bool, np.ndarray]:) –

Returns

data – The coalesced data.

Return type

np.ndarray

coalesce_inplace(column: str, fills: Iterable[Iterable[Union[None, str, int, float, bool, ndarray]]]) None[source]

In-place version.

Fill None values from fills. Analogous to SQL COALESCE or pandas.fillna().

Parameters
  • column (str) – The column to coalesce.

  • fills (Iterable[Union[str, int, float, bool, np.ndarray]:) –

Returns

data – The coalesced data.

Return type

np.ndarray

_coalesce_dtypes() None[source]

Update dtypes with missing keys that exist in data.

Must be called if :attr:`data` or :attr:`data` is directly modified!

Returns

None

Return type

None

delete(columns: Iterable[str]) Tafra[source]

Remove a column from data and dtypes.

Parameters

column (str) – The column to remove.

Returns

tafra – The Tafra with the deleted column.

Return type

Optional[Tafra]

delete_inplace(columns: Iterable[str]) None[source]

In-place version.

Remove a column from data and dtypes.

Parameters

column (str) – The column to remove.

Returns

tafra – The Tafra with the deleted column.

Return type

Optional[Tafra]

pprint(indent: int = 1, width: int = 80, depth: Optional[int] = None, compact: bool = False) None[source]

Pretty print. Parameters are passed to pprint.PrettyPrinter.

Parameters
  • indent (int) – Number of spaces to indent for each level of nesting.

  • width (int) – Attempted maximum number of columns in the output.

  • depth (Optional[int]) – The maximum depth to print out nested structures.

  • compact (bool) – If true, several items will be combined in one line.

Returns

None

Return type

None

pformat(indent: int = 1, width: int = 80, depth: Optional[int] = None, compact: bool = False) str[source]

Format for pretty printing. Parameters are passed to pprint.PrettyPrinter.

Parameters
  • indent (int) – Number of spaces to indent for each level of nesting.

  • width (int) – Attempted maximum number of columns in the output.

  • depth (Optional[int]) – The maximum depth to print out nested structures.

  • compact (bool) – If true, several items will be combined in one line.

Returns

formatted string – A formatted string for pretty printing.

Return type

str

to_html(n: int = 20) str[source]

Construct an HTML table representation of the Tafra data.

Parameters

n (int = 20) – Number of items to print.

Returns

HTML – The HTML table representation.

Return type

str

_slice(_slice: slice) Tafra[source]

Use a slice to slice the Tafra.

Parameters

_slice (slice) – The slice object.

Returns

tafra – The sliced Tafra.

Return type

Tafra

_iindex(index: int) Tafra[source]

Use a :class`int` to slice the Tafra.

Parameters

index (int) –

Returns

tafra – The sliced Tafra.

Return type

Tafra

_aindex(index: Sequence[Union[int, bool]]) Tafra[source]

Use numpy advanced indexing to slice the Tafra.

Parameters

index (Sequence[Union[int, bool]]) –

Returns

tafra – The sliced Tafra.

Return type

Tafra

_ndindex(index: ndarray) Tafra[source]

Use numpy.ndarray indexing to slice the Tafra.

Parameters

index (np.ndarray) –

Returns

tafra – The sliced Tafra.

Return type

Tafra

Helper Methods

class tafra.base.Tafra[source]
union(other: Tafra) Tafra[source]

Helper function to implement tafra.group.Union.apply().

Union two Tafra together. Analogy to SQL UNION or pandas.append. All column names and dtypes must match.

Parameters

other (Tafra) – The other tafra to union.

Returns

tafra – A new tafra with the unioned data.

Return type

Tafra

union_inplace(other: Tafra) None[source]

Inplace version.

Helper function to implement tafra.group.Union.apply_inplace().

Union two Tafra together. Analogy to SQL UNION or pandas.append. All column names and dtypes must match.

Parameters

other (Tafra) – The other tafra to union.

Returns

None

Return type

None

group_by(columns: Iterable[str], aggregation: Mapping[str, Union[Callable[[ndarray], Any], Tuple[Callable[[ndarray], Any], str]]] = {}, iter_fn: Mapping[str, Callable[[ndarray], Any]] = {}) Tafra[source]

Helper function to implement tafra.group.GroupBy.apply().

Aggregation by a set of unique values.

Analogy to SQL GROUP BY, not pandas.DataFrame.groupby().

Parameters
  • columns (Iterable[str]) – The column names to group by.

  • aggregation (Mapping[str, Union[Callable[[np.ndarray], Any], Tuple[Callable[[np.ndarray], Any], str]]]) – Optional. A mapping for columns and aggregation functions. Should be given as {‘column’: fn} or {‘new_column’: (fn, ‘column’)}.

  • iter_fn (Mapping[str, Callable[[np.ndarray], Any]]) – Optional. A mapping for new columns names to the function to apply to the enumeration. Should be given as {‘new_column’: fn}.

Returns

tafra – The aggregated Tafra.

Return type

Tafra

transform(columns: Iterable[str], aggregation: Mapping[str, Union[Callable[[ndarray], Any], Tuple[Callable[[ndarray], Any], str]]] = {}, iter_fn: Dict[str, Callable[[ndarray], Any]] = {}) Tafra[source]

Helper function to implement tafra.group.Transform.apply().

Apply a function to each unique set of values and join to the original table. Analogy to pandas.DataFrame.groupby().transform(), i.e. a SQL GROUP BY and LEFT JOIN back to the original table.

Parameters
  • group_by (Iterable[str]) – The column names to group by.

  • aggregation (Mapping[str, Union[Callable[[np.ndarray], Any], Tuple[Callable[[np.ndarray], Any], str]]]) – Optional. A mapping for columns and aggregation functions. Should be given as {‘column’: fn} or {‘new_column’: (fn, ‘column’)}.

  • iter_fn (Mapping[str, Callable[[np.ndarray], Any]]) – Optional. A mapping for new columns names to the function to apply to the enumeration. Should be given as {‘new_column’: fn}.

Returns

tafra – The transformed Tafra.

Return type

Tafra

iterate_by(columns: Iterable[str]) Iterator[Tuple[Tuple[Any, ...], ndarray, Tafra]][source]

Helper function to implement tafra.group.IterateBy.apply().

A generator that yields a Tafra for each set of unique values. Analogy to pandas.DataFrame.groupby(), i.e. an Iterator of Tafra.

Yields tuples of ((unique grouping values, …), row indices array, subset tafra)

Parameters

group_by (Iterable[str]) – The column names to group by.

Returns

tafras – An iterator over the grouped Tafra.

Return type

Iterator[GroupDescription]

inner_join(right: Tafra, on: Iterable[Tuple[str, str, str]], select: Iterable[str] = []) Tafra[source]

Helper function to implement tafra.group.InnerJoin.apply().

An inner join.

Analogy to SQL INNER JOIN, or pandas.merge(…, how=’inner’),

Parameters
  • right (Tafra) – The right-side Tafra to join.

  • on (Iterable[Tuple[str, str, str]]) –

    The columns and operator to join on. Should be given as (‘left column’, ‘right column’, ‘op’) Valid ops are:

    ’==’ : equal to ‘!=’ : not equal to ‘<’ : less than ‘<=’ : less than or equal to ‘>’ : greater than ‘>=’ : greater than or equal to

  • select (Iterable[str] = []) – The columns to return. If not given, all unique columns names are returned. If the column exists in both :class`Tafra`, prefers the left over the right.

Returns

tafra – The joined Tafra.

Return type

Tafra

left_join(right: Tafra, on: Iterable[Tuple[str, str, str]], select: Iterable[str] = []) Tafra[source]

Helper function to implement tafra.group.LeftJoin.apply().

A left join.

Analogy to SQL LEFT JOIN, or pandas.merge(…, how=’left’),

Parameters
  • right (Tafra) – The right-side Tafra to join.

  • on (Iterable[Tuple[str, str, str]]) –

    The columns and operator to join on. Should be given as (‘left column’, ‘right column’, ‘op’) Valid ops are:

    ’==’ : equal to ‘!=’ : not equal to ‘<’ : less than ‘<=’ : less than or equal to ‘>’ : greater than ‘>=’ : greater than or equal to

  • select (Iterable[str] = []) – The columns to return. If not given, all unique columns names are returned. If the column exists in both :class`Tafra`, prefers the left over the right.

Returns

tafra – The joined Tafra.

Return type

Tafra

cross_join(right: Tafra, select: Iterable[str] = []) Tafra[source]

Helper function to implement tafra.group.CrossJoin.apply().

A cross join.

Analogy to SQL CROSS JOIN, or pandas.merge(…, how=’outer’) using temporary columns of static value to intersect all rows.

Parameters
  • right (Tafra) – The right-side Tafra to join.

  • select (Iterable[str] = []) – The columns to return. If not given, all unique columns names are returned. If the column exists in both :class`Tafra`, prefers the left over the right.

Returns

tafra – The joined Tafra.

Return type

Tafra

Aggregations

class tafra.group.Union[source]

Union two Tafra together. Analogy to SQL UNION or pandas.append. All column names and dtypes must match.

apply(left: Tafra, right: Tafra) Tafra[source]

Apply the Union_ to the Tafra.

Parameters
  • left (Tafra) – The left Tafra to union.

  • right (Tafra) – The right Tafra to union.

Returns

tafra – The unioned :class`Tafra`.

Return type

Tafra

apply_inplace(left: Tafra, right: Tafra) None[source]

In-place version.

Apply the Union_ to the Tafra.

Parameters
  • left (Tafra) – The left Tafra to union.

  • right (Tafra) – The right Tafra to union.

Returns

tafra – The unioned :class`Tafra`.

Return type

Tafra

class tafra.group.GroupBy(group_by_cols: Iterable[str], aggregation: InitVar, iter_fn: Mapping[str, Callable[[ndarray], Any]])[source]

Aggregation by a set of unique values.

Analogy to SQL GROUP BY, not pandas.DataFrame.groupby().

Parameters
  • columns (Iterable[str]) – The column names to group by.

  • aggregation (Mapping[str, Union[Callable[[np.ndarray], Any], Optional. Tuple[Callable[[np.ndarray], Any], str]]]) – A mapping for columns and aggregation functions. Should be given as {‘column’: fn} or {‘new_column’: (fn, ‘column’)}.

  • iter_fn (Mapping[str, Callable[[np.ndarray], Any]]) – Optional. A mapping for new columns names to the function to apply to the enumeration. Should be given as {‘new_column’: fn}.

apply(tafra: Tafra) Tafra[source]

Apply the GroupBy to the Tafra.

Parameters

tafra (Tafra) – The tafra to apply the operation to.

Returns

tafra – The aggregated Tafra.

Return type

Tafra

class tafra.group.Transform(group_by_cols: Iterable[str], aggregation: InitVar, iter_fn: Mapping[str, Callable[[ndarray], Any]])[source]

Apply a function to each unique set of values and join to the original table.

Analogy to pandas.DataFrame.groupby().transform(), i.e. a SQL GROUP BY and LEFT JOIN back to the original table.

Parameters
  • group_by (Iterable[str]) – The column names to group by.

  • aggregation (Mapping[str, Union[Callable[[np.ndarray], Any], Tuple[Callable[[np.ndarray], Any], str]]]) – Optional. A mapping for columns and aggregation functions. Should be given as {‘column’: fn} or {‘new_column’: (fn, ‘column’)}.

  • iter_fn (Mapping[str, Callable[[np.ndarray], Any]]) – Optional. A mapping for new columns names to the function to apply to the enumeration. Should be given as {‘new_column’: fn}.

apply(tafra: Tafra) Tafra[source]

Apply the Transform to the Tafra.

Parameters

tafra (Tafra) – The tafra to apply the operation to.

Returns

tafra – The transformed Tafra.

Return type

Tafra

class tafra.group.IterateBy(group_by_cols: Iterable[str])[source]

A generator that yields a Tafra for each set of unique values.

Analogy to pandas.DataFrame.groupby(), i.e. an Sequence of Tafra objects. Yields tuples of ((unique grouping values, …), row indices array, subset tafra)

Parameters

group_by (Iterable[str]) – The column names to group by.

apply(tafra: Tafra) Iterator[Tuple[Tuple[Any, ...], ndarray, Tafra]][source]

Apply the IterateBy to the Tafra.

Parameters

tafra (Tafra) – The tafra to apply the operation to.

Returns

tafras – An iterator over the grouped Tafra.

Return type

Iterator[GroupDescription]

class tafra.group.InnerJoin(on: Iterable[Tuple[str, str, str]], select: Iterable[str])[source]

An inner join.

Analogy to SQL INNER JOIN, or pandas.merge(…, how=’inner’),

Parameters
  • right (Tafra) – The right-side Tafra to join.

  • on (Iterable[Tuple[str, str, str]]) –

    The columns and operator to join on. Should be given as (‘left column’, ‘right column’, ‘op’) Valid ops are:

    ’==’ : equal to ‘!=’ : not equal to ‘<’ : less than ‘<=’ : less than or equal to ‘>’ : greater than ‘>=’ : greater than or equal to

  • select (Iterable[str] = []) – The columns to return. If not given, all unique columns names are returned. If the column exists in both :class`Tafra`, prefers the left over the right.

apply(left_t: Tafra, right_t: Tafra) Tafra[source]

Apply the InnerJoin to the Tafra.

Parameters
  • left_t (Tafra) – The left tafra to join.

  • right_t (Tafra) – The right tafra to join.

Returns

tafra – The joined Tafra.

Return type

Tafra

class tafra.group.LeftJoin(on: Iterable[Tuple[str, str, str]], select: Iterable[str])[source]

A left join.

Analogy to SQL LEFT JOIN, or pandas.merge(…, how=’left’),

Parameters
  • right (Tafra) – The right-side Tafra to join.

  • on (Iterable[Tuple[str, str, str]]) –

    The columns and operator to join on. Should be given as (‘left column’, ‘right column’, ‘op’) Valid ops are:

    ’==’ : equal to ‘!=’ : not equal to ‘<’ : less than ‘<=’ : less than or equal to ‘>’ : greater than ‘>=’ : greater than or equal to

  • select (Iterable[str] = []) – The columns to return. If not given, all unique columns names are returned. If the column exists in both :class`Tafra`, prefers the left over the right.

apply(left_t: Tafra, right_t: Tafra) Tafra[source]

Apply the LeftJoin to the Tafra.

Parameters
  • left_t (Tafra) – The left tafra to join.

  • right_t (Tafra) – The right tafra to join.

Returns

tafra – The joined Tafra.

Return type

Tafra

class tafra.group.CrossJoin(on: Iterable[Tuple[str, str, str]], select: Iterable[str])[source]

A cross join.

Analogy to SQL CROSS JOIN, or pandas.merge(…, how=’outer’) using temporary columns of static value to intersect all rows.

Parameters
  • right (Tafra) – The right-side Tafra to join.

  • select (Iterable[str] = []) – The columns to return. If not given, all unique columns names are returned. If the column exists in both :class`Tafra`, prefers the left over the right.

apply(left_t: Tafra, right_t: Tafra) Tafra[source]

Apply the CrossJoin to the Tafra.

Parameters
  • left_t (Tafra) – The left tafra to join.

  • right_t (Tafra) – The right tafra to join.

Returns

tafra – The joined Tafra.

Return type

Tafra

Object Formatter

class tafra.formatter.ObjectFormatter[source]

A dictionary that contains mappings for formatting objects. Some numpy objects should be cast to other types, e.g. the decimal.Decimal type cannot operate with np.float. These mappings are defined in this class.

Each mapping must define a function that takes a np.ndarray and returns a np.ndarray.

The key for each mapping is the name of the type of the actual value, looked up from the first element of the np.ndarray, i.e. type(array[0]).__name__.

__getitem__(dtype: str) Callable[[ndarray], ndarray][source]

Get the dtype formatter.

__setitem__(dtype: str, value: Callable[[ndarray], ndarray]) None[source]

Set the dtype formatter.

__delitem__(dtype: str) None[source]

Delete the dtype formatter.