API Reference¶
Summary¶
Aggregations¶
|
Union two |
|
Aggregation by a set of unique values. |
|
Apply a function to each unique set of values and join to the original table. |
|
A generator that yields a |
|
An inner join. |
|
A left join. |
|
A cross join. |
Methods¶
|
Construct a |
|
Construct a |
|
Construct a |
|
Execute a SQL SELECT statement using a |
|
Execute a SQL SELECT statement using a |
|
Read a CSV file with a header row, infer the types of each column, and return a Tafra containing the file’s contents. |
|
Returns the unmodified tafra` if already a |
|
Return a |
|
Return a list of homogeneously typed columns (as |
|
Return a |
|
Return an object array. |
|
Construct a |
|
Write the |
The number of rows of the first item in |
|
The names of the columns. |
|
The |
|
The |
|
The |
|
The |
|
The |
|
|
Display the head of the |
|
Return the keys of |
|
Return the values of |
|
Return the items of |
|
Return from the |
|
Yield rows as |
|
Yield rows as |
|
Yield columns as |
|
Map a function over rows. |
|
Map a function over rows. |
|
Map a function over columns. |
|
Map a function over columns like :meth:col_map, but return |
|
Use column names to slice the |
|
Create a copy of a |
|
Update the data and dtypes of this |
|
Inplace version. |
|
Apply new dtypes. |
|
Inplace version. |
Parse the object dtypes using the |
|
Inplace version. |
|
|
Rename columns in the |
|
In-place version. |
|
Fill |
|
In-place version. |
|
|
|
In-place version. |
|
Pretty print. |
|
Format for pretty printing. |
|
Construct an HTML table representation of the |
|
Use a |
|
Use a :class`int` to slice the |
|
Use numpy advanced indexing to slice the |
|
Use |
Helper Methods¶
|
Helper function to implement |
|
Inplace version. |
|
Helper function to implement |
|
Helper function to implement |
|
Helper function to implement |
|
Helper function to implement |
|
Helper function to implement |
|
Helper function to implement |
Object Formatter¶
A dictionary that contains mappings for formatting objects. |
Detailed Reference¶
Tafra¶
Methods¶
-
class
tafra.base.
Tafra
(data: dataclasses.InitVar = <property object>, dtypes: dataclasses.InitVar = <property object>, validate: dataclasses.InitVar = True, check_rows: bool = True)[source]¶ A minimalist dataframe.
Constructs a
Tafra
fromdict
of data and (optionally) dtypes. Types on parameters are the types of the constructedTafra
, but attempts are made to parse anything that “looks” like the correct data structure, includingIterable
,Iterator
,Sequence
, andMapping
and various combinations.Parameters are given as an
InitVar
, defined as:InitVar = Union[Tuple[str, Any], _Mapping, Sequence[_Element], Iterable[_Element],
Iterator[_Element], enumerate]
_Mapping = Union[Mapping[str, Any], Mapping[int, Any], Mapping[float, Any],
Mapping[bool, Any]
_Element = Union[Tuple[Union[str, int, float, np.ndarray], Any], List[Any], Mapping]
- Parameters
data (InitVar) – The data of the Tafra.
dtypes (InitVar) – The dtypes of the columns.
validate (bool = True) – Run validation checks of the data. False will improve performance, but data and dtypes will not be validated for conformance to expected data structures.
check_rows (bool = True) – Run row count checks. False will allow columns of differing lengths, which may break several methods.
- Returns
tafra – The constructed
Tafra
.- Return type
-
classmethod
from_dataframe
(df: tafra.protocol.DataFrame, dtypes: Optional[Dict[str, Any]] = None, **kwargs: Any) → tafra.base.Tafra[source]¶ Construct a
Tafra
from apandas.DataFrame
. Ifdtypes
are not given, take frompandas.DataFrame.dtypes
.
-
classmethod
from_series
(s: tafra.protocol.Series, dtype: Optional[str] = None, **kwargs: Any) → tafra.base.Tafra[source]¶ Construct a
Tafra
from apandas.Series
. Ifdtype
is not given, take frompandas.Series.dtype
.
-
classmethod
from_records
(records: Iterable[Iterable[Any]], columns: Iterable[str], dtypes: Optional[Iterable[Any]] = None, **kwargs: Any) → tafra.base.Tafra[source]¶ Construct a
Tafra
from an Iterator of records, e.g. from a SQL query. The records should be a nested Iterable, but can also be fed a cursor method such ascur.fetchmany()
orcur.fetchall()
.
-
classmethod
read_sql
(query: str, cur: tafra.protocol.Cursor) → tafra.base.Tafra[source]¶ Execute a SQL SELECT statement using a
pyodbc.Cursor
and return a Tuple of column names and an Iterator of records.
-
classmethod
read_sql_chunks
(query: str, cur: tafra.protocol.Cursor, chunksize: int = 100) → Iterator[tafra.base.Tafra][source]¶ Execute a SQL SELECT statement using a
pyodbc.Cursor
and return a Tuple of column names and an Iterator of records.
-
classmethod
read_csv
(csv_file: Union[str, pathlib.Path, _io.TextIOWrapper, IO[str]], guess_rows: int = 5, missing: Optional[str] = '', dtypes: Optional[Dict[str, Any]] = None, **csvkw: Dict[str, Any]) → tafra.base.Tafra[source]¶ Read a CSV file with a header row, infer the types of each column, and return a Tafra containing the file’s contents.
- Parameters
csv_file (Union[str, TextIOWrapper]) – The path to the CSV file, or an open file-like object.
guess_rows (int) – The number of rows to use when guessing column types.
dtypes (Optional[Dict[str, str]]) – dtypes by column name; by default, all dtypes will be inferred from the file contents.
**csvkw (Dict[str, Any]) – Additional keyword arguments passed to csv.reader.
- Returns
tafra – The constructed
Tafra
.- Return type
-
classmethod
as_tafra
(maybe_tafra: Union[Tafra, tafra.protocol.DataFrame, tafra.protocol.Series, Dict[str, Any], Any]) → Optional[tafra.base.Tafra][source]¶ Returns the unmodified tafra` if already a
Tafra
, else construct aTafra
from known types or subtypes ofDataFrame
or dict. Structural subtypes ofDataFrame
orSeries
are also valid, as are classes that havecls.__name__ == 'DataFrame'
orcls.__name__ == 'Series'
.
-
to_records
(columns: Optional[Iterable[str]] = None, cast_null: bool = True) → Iterator[Tuple[Any, …]][source]¶ Return a
Iterator
ofTuple
, each being a record (i.e. row) and allowing heterogeneous typing. Useful for e.g. sending records back to a database.- Parameters
columns (Optional[Iterable[str]] = None) – The columns to extract. If
None
, extract all columns.cast_null (bool) – Cast
np.nan
to None. Necessary for :mod:pyodbc
- Returns
records
- Return type
Iterator[Tuple[Any, ..]]
-
to_list
(columns: Optional[Iterable[str]] = None, inner: bool = False) → Union[List[numpy.ndarray], List[List[Any]]][source]¶ Return a list of homogeneously typed columns (as
numpy.ndarray
). If a generator is needed, useto_records()
. Ifinner == True
each column will be cast fromnumpy.ndarray
to aList
.- Parameters
columns (Optional[Iterable[str]] = None) – The columns to extract. If
None
, extract all columns.inner (bool = False) – Cast all
np.ndarray
to :class`List`.
- Returns
list
- Return type
Union[List[np.ndarray], List[List[Any]]]
-
to_tuple
(columns: Optional[Iterable[str]] = None, name: Optional[str] = 'Tafra', inner: bool = False) → Union[Tuple[numpy.ndarray], Tuple[Tuple[Any, …]]][source]¶ Return a
NamedTuple
orTuple
. If a generator is needed, useto_records()
. Ifinner == True
each column will be cast fromnp.ndarray
to aTuple
. If name is None, returns aTuple
instead.- Parameters
columns (Optional[Iterable[str]] = None) – The columns to extract. If
None
, extract all columns.name (Optional[str] = 'Tafra') – The name for the
NamedTuple
. IfNone
, construct aTuple
instead.inner (bool = False) – Cast all
np.ndarray
to :class`List`.
- Returns
list
- Return type
Union[Tuple[np.ndarray], Tuple[Tuple[Any, ..]]]
-
to_array
(columns: Optional[Iterable[str]] = None) → numpy.ndarray[source]¶ Return an object array.
- Parameters
columns (Optional[Iterable[str]] = None) – The columns to extract. If
None
, extract all columns.- Returns
array
- Return type
np.ndarray
-
to_pandas
(columns: Optional[Iterable[str]] = None) → tafra.protocol.DataFrame[source]¶ Construct a
pandas.DataFrame
.- Parameters
columns (Iterable[str]) – The columns to write. IF
None
, write all columns.- Returns
dataframe
- Return type
pandas.DataFrame
-
to_csv
(filename: Union[str, pathlib.Path, _io.TextIOWrapper, IO[str]], columns: Optional[Iterable[str]] = None) → None[source]¶ Write the
Tafra
to a CSV.- Parameters
filename (Union[str, Path]) – The path of the filename to write.
columns (Iterable[str]) – The columns to write. IF
None
, write all columns.
-
rows
¶ The number of rows of the first item in
data
. Thelen()
of all items have been previously validated.- Returns
rows – The number of rows of the
Tafra
.- Return type
int
-
columns
¶ The names of the columns. Equivalent to Tafra.keys().
- Returns
columns – The column names.
- Return type
Tuple[str, ..]
-
data
: dataclasses.InitVar¶ The
Tafra
data.- Returns
data – The data.
- Return type
Dict[str, np.ndarray]
-
dtypes
: dataclasses.InitVar¶ The
Tafra
dtypes.- Returns
dtypes – The dtypes.
- Return type
Dict[str, str]
-
head
(n: int = 5) → tafra.base.Tafra[source]¶ Display the head of the
Tafra
.- Parameters
n (int = 5) – The number of rows to display.
- Returns
None
- Return type
None
-
keys
() → KeysView[str][source]¶ Return the keys of
data
, i.e. likedict.keys()
.- Returns
data keys – The keys of the data property.
- Return type
KeysView[str]
-
values
() → ValuesView[numpy.ndarray][source]¶ Return the values of
data
, i.e. likedict.values()
.- Returns
data values – The values of the data property.
- Return type
ValuesView[np.ndarray]
-
items
() → ItemsView[str, numpy.ndarray][source]¶ Return the items of
data
, i.e. likedict.items()
.- Returns
items – The data items.
- Return type
ItemsView[str, np.ndarray]
-
get
(key: str, default: Any = None) → Any[source]¶ Return from the
get()
function ofdata
, i.e. likedict.get()
.- Parameters
key (str) – The key value in the data property.
default (Any) – The default to return if the key does not exist.
- Returns
value – The value for the key, or the default if the key does not exist.
- Return type
Any
-
iterrows
() → Iterator[tafra.base.Tafra][source]¶ Yield rows as
Tafra
. Useitertuples()
for better performance.
-
itertuples
(name: Optional[str] = 'Tafra') → Iterator[Tuple[Any, …]][source]¶ Yield rows as
NamedTuple
, or ifname
isNone
, yield rows astuple
.- Parameters
name (Optional[str] = 'Tafra') – The name for the
NamedTuple
. IfNone
, construct aTuple
instead.- Returns
tuples – An iterator of
NamedTuple
.- Return type
Iterator[NamedTuple[Any, ..]]
-
itercols
() → Iterator[Tuple[str, numpy.ndarray]][source]¶ Yield columns as
Tuple[str, np.ndarray]
, where thestr
is the column name.- Returns
tuples – An iterator of
Tafra
.- Return type
Iterator[Tuple[str, np.ndarray]]
-
row_map
(fn: Callable[[…], Any], *args: Any, **kwargs: Any) → Iterator[Any][source]¶ Map a function over rows. To apply to specific columns, use
select()
first. The function must operate onTafra
.- Parameters
fn (Callable[.., Any]) – The function to map.
*args (Any) – Additional positional arguments to
fn
.**kwargs (Any) – Additional keyword arguments to
fn
.
- Returns
iter_tf – An iterator to map the function.
- Return type
Iterator[Any]
-
tuple_map
(fn: Callable[[…], Any], *args: Any, **kwargs: Any) → Iterator[Any][source]¶ Map a function over rows. This is faster than
row_map()
. To apply to specific columns, useselect()
first. The function must operate onNamedTuple
fromitertuples()
.- Parameters
fn (Callable[.., Any]) – The function to map.
name (Optional[str] = 'Tafra') – The name for the
NamedTuple
. IfNone
, construct aTuple
instead. Must be given as a keyword argument.*args (Any) – Additional positional arguments to
fn
.**kwargs (Any) – Additional keyword arguments to
fn
.
- Returns
iter_tf – An iterator to map the function.
- Return type
Iterator[Any]
-
col_map
(fn: Callable[[…], Any], keys: bool = True, *args: Any, **kwargs: Any) → Iterator[Any][source]¶ Map a function over columns. To apply to specific columns, use
select()
first. The function must operate onTuple[str, np.ndarray]
.- Parameters
fn (Callable[.., Any]) – The function to map.
keys (bool = True) – Return a tuple
*args (Any) – Additional positional arguments to
fn
.**kwargs (Any) – Additional keyword arguments to
fn
.
- Returns
iter_tf – An iterator to map the function.
- Return type
Iterator[Any]
-
key_map
(fn: Callable[[…], Any], keys: bool = True, *args: Any, **kwargs: Any) → Iterator[Tuple[str, Any]][source]¶ Map a function over columns like :meth:col_map, but return
Tuple
of the key with the function result. To apply to specific columns, useselect()
first. The function must operate onTuple[str, np.ndarray]
.- Parameters
fn (Callable[.., Any]) – The function to map.
keys (bool = True) – Return a tuple
*args (Any) – Additional positional arguments to
fn
.**kwargs (Any) – Additional keyword arguments to
fn
.
- Returns
iter_tf – An iterator to map the function.
- Return type
Iterator[Any]
-
select
(columns: Iterable[str]) → tafra.base.Tafra[source]¶ Use column names to slice the
Tafra
columns analogous to SQL SELECT. This does not copy the data. Callcopy()
to obtain a copy of the sliced data.
-
copy
(order: str = 'C') → tafra.base.Tafra[source]¶ Create a copy of a
Tafra
.
-
update
(other: tafra.base.Tafra) → tafra.base.Tafra[source]¶ Update the data and dtypes of this
Tafra
with anotherTafra
. Length of rows must match, while data of differentdtype
will overwrite.
-
update_inplace
(other: tafra.base.Tafra) → None[source]¶ Inplace version.
Update the data and dtypes of this
Tafra
with anotherTafra
. Length of rows must match, while data of differentdtype
will overwrite.
-
update_dtypes
(dtypes: Dict[str, Any]) → tafra.base.Tafra[source]¶ Apply new dtypes.
-
parse_object_dtypes
() → tafra.base.Tafra[source]¶ Parse the object dtypes using the
ObjectFormatter
instance.
-
parse_object_dtypes_inplace
() → None[source]¶ Inplace version.
Parse the object dtypes using the
ObjectFormatter
instance.
-
rename
(renames: Dict[str, str]) → tafra.base.Tafra[source]¶ Rename columns in the
Tafra
from adict
.
-
rename_inplace
(renames: Dict[str, str]) → None[source]¶ In-place version.
Rename columns in the
Tafra
from adict
.
-
coalesce
(column: str, fills: Iterable[Iterable[Union[None, str, int, float, bool, numpy.ndarray]]]) → numpy.ndarray[source]¶ Fill
None
values fromfills
. Analogous toSQL COALESCE
orpandas.fillna()
.- Parameters
column (str) – The column to coalesce.
fills (Iterable[Union[str, int, float, bool, np.ndarray]:) –
- Returns
data – The coalesced data.
- Return type
np.ndarray
-
coalesce_inplace
(column: str, fills: Iterable[Iterable[Union[None, str, int, float, bool, numpy.ndarray]]]) → None[source]¶ In-place version.
Fill
None
values fromfills
. Analogous toSQL COALESCE
orpandas.fillna()
.- Parameters
column (str) – The column to coalesce.
fills (Iterable[Union[str, int, float, bool, np.ndarray]:) –
- Returns
data – The coalesced data.
- Return type
np.ndarray
-
_coalesce_dtypes
() → None[source]¶ Update
dtypes
with missing keys that exist indata
.Must be called if :attr:`data` or :attr:`data` is directly modified!
- Returns
None
- Return type
None
-
delete
(columns: Iterable[str]) → tafra.base.Tafra[source]¶
-
pprint
(indent: int = 1, width: int = 80, depth: Optional[int] = None, compact: bool = False) → None[source]¶ Pretty print. Parameters are passed to
pprint.PrettyPrinter
.- Parameters
indent (int) – Number of spaces to indent for each level of nesting.
width (int) – Attempted maximum number of columns in the output.
depth (Optional[int]) – The maximum depth to print out nested structures.
compact (bool) – If true, several items will be combined in one line.
- Returns
None
- Return type
None
-
pformat
(indent: int = 1, width: int = 80, depth: Optional[int] = None, compact: bool = False) → str[source]¶ Format for pretty printing. Parameters are passed to
pprint.PrettyPrinter
.- Parameters
indent (int) – Number of spaces to indent for each level of nesting.
width (int) – Attempted maximum number of columns in the output.
depth (Optional[int]) – The maximum depth to print out nested structures.
compact (bool) – If true, several items will be combined in one line.
- Returns
formatted string – A formatted string for pretty printing.
- Return type
str
-
to_html
(n: int = 20) → str[source]¶ Construct an HTML table representation of the
Tafra
data.- Parameters
n (int = 20) – Number of items to print.
- Returns
HTML – The HTML table representation.
- Return type
str
-
_slice
(_slice: slice) → tafra.base.Tafra[source]¶ Use a
slice
to slice theTafra
.
-
_iindex
(index: int) → tafra.base.Tafra[source]¶ Use a :class`int` to slice the
Tafra
.
-
_aindex
(index: Sequence[Union[int, bool]]) → tafra.base.Tafra[source]¶ Use numpy advanced indexing to slice the
Tafra
.
Helper Methods¶
-
class
tafra.base.
Tafra
[source] -
union
(other: tafra.base.Tafra) → tafra.base.Tafra[source]¶ Helper function to implement
tafra.group.Union.apply()
.Union two
Tafra
together. Analogy to SQL UNION or pandas.append. All column names and dtypes must match.
-
union_inplace
(other: tafra.base.Tafra) → None[source]¶ Inplace version.
Helper function to implement
tafra.group.Union.apply_inplace()
.Union two
Tafra
together. Analogy to SQL UNION or pandas.append. All column names and dtypes must match.- Parameters
other (Tafra) – The other tafra to union.
- Returns
None
- Return type
None
-
group_by
(columns: Iterable[str], aggregation: Mapping[str, Union[Callable[[numpy.ndarray], Any], Tuple[Callable[[numpy.ndarray], Any], str]]] = {}, iter_fn: Mapping[str, Callable[[numpy.ndarray], Any]] = {}) → tafra.base.Tafra[source]¶ Helper function to implement
tafra.group.GroupBy.apply()
.Aggregation by a set of unique values.
Analogy to SQL
GROUP BY
, notpandas.DataFrame.groupby()
.- Parameters
columns (Iterable[str]) – The column names to group by.
aggregation (Mapping[str, Union[Callable[[np.ndarray], Any], Tuple[Callable[[np.ndarray], Any], str]]]) – Optional. A mapping for columns and aggregation functions. Should be given as {‘column’: fn} or {‘new_column’: (fn, ‘column’)}.
iter_fn (Mapping[str, Callable[[np.ndarray], Any]]) – Optional. A mapping for new columns names to the function to apply to the enumeration. Should be given as {‘new_column’: fn}.
- Returns
tafra – The aggregated
Tafra
.- Return type
-
transform
(columns: Iterable[str], aggregation: Mapping[str, Union[Callable[[numpy.ndarray], Any], Tuple[Callable[[numpy.ndarray], Any], str]]] = {}, iter_fn: Dict[str, Callable[[numpy.ndarray], Any]] = {}) → tafra.base.Tafra[source]¶ Helper function to implement
tafra.group.Transform.apply()
.Apply a function to each unique set of values and join to the original table. Analogy to
pandas.DataFrame.groupby().transform()
, i.e. a SQLGROUP BY
andLEFT JOIN
back to the original table.- Parameters
group_by (Iterable[str]) – The column names to group by.
aggregation (Mapping[str, Union[Callable[[np.ndarray], Any], Tuple[Callable[[np.ndarray], Any], str]]]) – Optional. A mapping for columns and aggregation functions. Should be given as {‘column’: fn} or {‘new_column’: (fn, ‘column’)}.
iter_fn (Mapping[str, Callable[[np.ndarray], Any]]) – Optional. A mapping for new columns names to the function to apply to the enumeration. Should be given as {‘new_column’: fn}.
- Returns
tafra – The transformed
Tafra
.- Return type
-
iterate_by
(columns: Iterable[str]) → Iterator[Tuple[Tuple[Any, …], numpy.ndarray, Tafra]][source]¶ Helper function to implement
tafra.group.IterateBy.apply()
.A generator that yields a
Tafra
for each set of unique values. Analogy to pandas.DataFrame.groupby(), i.e. anIterator
ofTafra
.Yields tuples of ((unique grouping values, …), row indices array, subset tafra)
- Parameters
group_by (Iterable[str]) – The column names to group by.
- Returns
tafras – An iterator over the grouped
Tafra
.- Return type
Iterator[GroupDescription]
-
inner_join
(right: tafra.base.Tafra, on: Iterable[Tuple[str, str, str]], select: Iterable[str] = []) → tafra.base.Tafra[source]¶ Helper function to implement
tafra.group.InnerJoin.apply()
.An inner join.
Analogy to SQL INNER JOIN, or pandas.merge(…, how=’inner’),
- Parameters
on (Iterable[Tuple[str, str, str]]) –
The columns and operator to join on. Should be given as (‘left column’, ‘right column’, ‘op’) Valid ops are:
’==’ : equal to ‘!=’ : not equal to ‘<’ : less than ‘<=’ : less than or equal to ‘>’ : greater than ‘>=’ : greater than or equal to
select (Iterable[str] = []) – The columns to return. If not given, all unique columns names are returned. If the column exists in both :class`Tafra`, prefers the left over the right.
- Returns
tafra – The joined
Tafra
.- Return type
-
left_join
(right: tafra.base.Tafra, on: Iterable[Tuple[str, str, str]], select: Iterable[str] = []) → tafra.base.Tafra[source]¶ Helper function to implement
tafra.group.LeftJoin.apply()
.A left join.
Analogy to SQL LEFT JOIN, or pandas.merge(…, how=’left’),
- Parameters
on (Iterable[Tuple[str, str, str]]) –
The columns and operator to join on. Should be given as (‘left column’, ‘right column’, ‘op’) Valid ops are:
’==’ : equal to ‘!=’ : not equal to ‘<’ : less than ‘<=’ : less than or equal to ‘>’ : greater than ‘>=’ : greater than or equal to
select (Iterable[str] = []) – The columns to return. If not given, all unique columns names are returned. If the column exists in both :class`Tafra`, prefers the left over the right.
- Returns
tafra – The joined
Tafra
.- Return type
-
cross_join
(right: tafra.base.Tafra, select: Iterable[str] = []) → tafra.base.Tafra[source]¶ Helper function to implement
tafra.group.CrossJoin.apply()
.A cross join.
Analogy to SQL CROSS JOIN, or pandas.merge(…, how=’outer’) using temporary columns of static value to intersect all rows.
-
Aggregations¶
-
class
tafra.group.
Union
[source]¶ Union two
Tafra
together. Analogy to SQL UNION or pandas.append. All column names and dtypes must match.-
apply
(left: tafra.base.Tafra, right: tafra.base.Tafra) → tafra.base.Tafra[source]¶ Apply the
Union_
to theTafra
.
-
apply_inplace
(left: tafra.base.Tafra, right: tafra.base.Tafra) → None[source]¶ In-place version.
Apply the
Union_
to theTafra
.
-
-
class
tafra.group.
GroupBy
(group_by_cols: Iterable[str], aggregation: dataclasses.InitVar, iter_fn: Mapping[str, Callable[[numpy.ndarray], Any]])[source]¶ Aggregation by a set of unique values.
Analogy to SQL
GROUP BY
, notpandas.DataFrame.groupby()
.- Parameters
columns (Iterable[str]) – The column names to group by.
aggregation (Mapping[str, Union[Callable[[np.ndarray], Any], Optional. Tuple[Callable[[np.ndarray], Any], str]]]) – A mapping for columns and aggregation functions. Should be given as {‘column’: fn} or {‘new_column’: (fn, ‘column’)}.
iter_fn (Mapping[str, Callable[[np.ndarray], Any]]) – Optional. A mapping for new columns names to the function to apply to the enumeration. Should be given as {‘new_column’: fn}.
-
apply
(tafra: tafra.base.Tafra) → tafra.base.Tafra[source]¶ Apply the
GroupBy
to theTafra
.
-
class
tafra.group.
Transform
(group_by_cols: Iterable[str], aggregation: dataclasses.InitVar, iter_fn: Mapping[str, Callable[[numpy.ndarray], Any]])[source]¶ Apply a function to each unique set of values and join to the original table.
Analogy to
pandas.DataFrame.groupby().transform()
, i.e. a SQLGROUP BY
andLEFT JOIN
back to the original table.- Parameters
group_by (Iterable[str]) – The column names to group by.
aggregation (Mapping[str, Union[Callable[[np.ndarray], Any], Tuple[Callable[[np.ndarray], Any], str]]]) – Optional. A mapping for columns and aggregation functions. Should be given as {‘column’: fn} or {‘new_column’: (fn, ‘column’)}.
iter_fn (Mapping[str, Callable[[np.ndarray], Any]]) – Optional. A mapping for new columns names to the function to apply to the enumeration. Should be given as {‘new_column’: fn}.
-
apply
(tafra: tafra.base.Tafra) → tafra.base.Tafra[source]¶ Apply the
Transform
to theTafra
.
-
class
tafra.group.
IterateBy
(group_by_cols: Iterable[str])[source]¶ A generator that yields a
Tafra
for each set of unique values.Analogy to pandas.DataFrame.groupby(), i.e. an Sequence of Tafra objects. Yields tuples of ((unique grouping values, …), row indices array, subset tafra)
- Parameters
group_by (Iterable[str]) – The column names to group by.
-
apply
(tafra: tafra.base.Tafra) → Iterator[Tuple[Tuple[Any, …], numpy.ndarray, tafra.base.Tafra]][source]¶ Apply the
IterateBy
to theTafra
.- Parameters
tafra (Tafra) – The tafra to apply the operation to.
- Returns
tafras – An iterator over the grouped
Tafra
.- Return type
Iterator[GroupDescription]
-
class
tafra.group.
InnerJoin
(on: Iterable[Tuple[str, str, str]], select: Iterable[str])[source]¶ An inner join.
Analogy to SQL INNER JOIN, or pandas.merge(…, how=’inner’),
- Parameters
right (Tafra) – The right-side
Tafra
to join.on (Iterable[Tuple[str, str, str]]) –
The columns and operator to join on. Should be given as (‘left column’, ‘right column’, ‘op’) Valid ops are:
’==’ : equal to ‘!=’ : not equal to ‘<’ : less than ‘<=’ : less than or equal to ‘>’ : greater than ‘>=’ : greater than or equal to
select (Iterable[str] = []) – The columns to return. If not given, all unique columns names are returned. If the column exists in both :class`Tafra`, prefers the left over the right.
-
apply
(left_t: tafra.base.Tafra, right_t: tafra.base.Tafra) → tafra.base.Tafra[source]¶ Apply the
InnerJoin
to theTafra
.
-
class
tafra.group.
LeftJoin
(on: Iterable[Tuple[str, str, str]], select: Iterable[str])[source]¶ A left join.
Analogy to SQL LEFT JOIN, or pandas.merge(…, how=’left’),
- Parameters
right (Tafra) – The right-side
Tafra
to join.on (Iterable[Tuple[str, str, str]]) –
The columns and operator to join on. Should be given as (‘left column’, ‘right column’, ‘op’) Valid ops are:
’==’ : equal to ‘!=’ : not equal to ‘<’ : less than ‘<=’ : less than or equal to ‘>’ : greater than ‘>=’ : greater than or equal to
select (Iterable[str] = []) – The columns to return. If not given, all unique columns names are returned. If the column exists in both :class`Tafra`, prefers the left over the right.
-
apply
(left_t: tafra.base.Tafra, right_t: tafra.base.Tafra) → tafra.base.Tafra[source]¶ Apply the
LeftJoin
to theTafra
.
-
class
tafra.group.
CrossJoin
(on: Iterable[Tuple[str, str, str]], select: Iterable[str])[source]¶ A cross join.
Analogy to SQL CROSS JOIN, or pandas.merge(…, how=’outer’) using temporary columns of static value to intersect all rows.
- Parameters
right (Tafra) – The right-side
Tafra
to join.select (Iterable[str] = []) – The columns to return. If not given, all unique columns names are returned. If the column exists in both :class`Tafra`, prefers the left over the right.
-
apply
(left_t: tafra.base.Tafra, right_t: tafra.base.Tafra) → tafra.base.Tafra[source]¶ Apply the
CrossJoin
to theTafra
.
Object Formatter¶
-
class
tafra.formatter.
ObjectFormatter
[source]¶ A dictionary that contains mappings for formatting objects. Some numpy objects should be cast to other types, e.g. the
decimal.Decimal
type cannot operate withnp.float
. These mappings are defined in this class.Each mapping must define a function that takes a
np.ndarray
and returns anp.ndarray
.The key for each mapping is the name of the type of the actual value, looked up from the first element of the
np.ndarray
, i.e.type(array[0]).__name__
.-
__getitem__
(dtype: str) → Callable[[numpy.ndarray], numpy.ndarray][source]¶ Get the dtype formatter.
-