meteo_qc#

class meteo_qc.ColumnMapping[source]#

Class for adding columns of a dataframe to to groups. This can be done by calling:

from meteo_qc import ColumnMapping

column_mapping = ColumnMapping()
column_mapping['temperature_2m'].add_group('temperature')

which will add the column temperature_2m to the group temperature and run all checks registered for this group.

classmethod autodetect_from_df(df)[source]#

Autodetect the groups from the column names.

import meteo_qc
import pandas as pd

df = pd.DataFrame(
    data=[[10], [20]],
    index=pd.date_range(
        start='2022-01-01 10:00',
        end='2022-01-01 10:10',
        freq='10min',
    ),
    columns=['air_temperature_2m'],
)

column_mapping = meteo_qc.ColumnMapping().autodetect_from_df(df)
print(column_mapping)

This will result in the air_temperature_2m column being registered with the temperature group.

ColumnMapping({'air_temperature_2m': GroupList(['generic', 'temperature'])})
Parameters:

df (pandas.core.frame.DataFrame) – The pandas.DataFrame to infer the groups from the column names

Returns:

An instance of meteo_qc.ColumnMapping() with columns registered that could be inferred from the column name.

Return type:

meteo_qc.ColumnMapping()

class meteo_qc.FinalResult[source]#

Final Result dictionary of the quality control.

Parameters:
  • columns – column that were quality controlled, mapping to a dictionary of results being another dictionary mapping the check function to to its Result.

  • passed – did the the entire quality control pass (all checks)

  • data_start_date – timestamp in milliseconds of the start date of the provided input data

  • data_end_date – timestamp in milliseconds of the end date of the provided input data.

columns: dict[str, meteo_qc._main.ColumnResult]#
data_end_date: int#
data_start_date: int#
passed: bool#
class meteo_qc.Result(function: str, passed: bool, msg: str | None = None, data: list[list[float]] | None = None)[source]#

A NamedTuple storing the Results of one quality check.

Parameters:
  • function (str) – the name of the function that applied the check

  • passed (bool) – did the check pass?

  • msg (str | None) – message returned from the check e.g. a specific error/problem

  • data (list[list[float]] | None) – the data that did not pass the check

data: typing.Optional[list[list[float]]]#

Alias for field number 3

function: str#

Alias for field number 0

msg: typing.Optional[str]#

Alias for field number 2

passed: bool#

Alias for field number 1

meteo_qc.apply_qc(df, column_mapping)[source]#

Apply the quality control to a a pandas.DataFrame.

Parameters:
Return type:

meteo_qc._main.FinalResult

Returns:

A result as json serializable dictionary to be rendered in a an HTML template.

{
    "columns": {
        {
            "temp": {
                "passed": False,
                "results": {
                    "missing_timestamps": Result(
                        function="missing_timestamps",
                        passed=False,
                        msg="missing 1 timestamps (assumed frequency: 10min)",
                        data=None,
                    ),
                    "null_values": Result(
                        function="null_values",
                        passed=False,
                        msg="found 7 values that are null",
                        data=[
                            [1641034800000, None, True],
                            [1641038400000, None, True],
                            [1641042000000, None, True],
                            [1641045600000, None, True],
                            [1641049200000, None, True],
                            [1641052800000, None, True],
                            [1641056400000, None, True],
                        ],
                    ),
                    "persistence_check": Result(
                        function="persistence_check", passed=True, msg=None, data=None
                    ),
                    "range_check": Result(
                        function="range_check", passed=True, msg=None, data=None
                    ),
                    "spike_dip_check": Result(
                        function="spike_dip_check", passed=True, msg=None, data=None
                    ),
                },
            },
        },
        ...
    },
    "data_end_date": 1641056400000,
    "data_start_date": 1641031200000,
    "passed": False,
}

meteo_qc.get_plugin_args()[source]#

Get the (default) arguments that were registered with the check functions. For example when using this:

import meteo_qc
import pandas as pd

@meteo_qc.register('custom_group', arg=123)
def custom_check(s: pd.Series, arg: int) -> meteo_qc.Result:
    ...

print(meteo_qc.get_plugin_args())

For custom_group the argument arg was registered with 123.

{
    ...
    'generic': {'missing_timestamps': {}, 'null_values': {}},
    'custom_group': {'custom_check': {'arg': 123}},
    ...
}

This can also be used to change the default values e.g. for the group temperature:

import meteo_qc

plugin_args = meteo_qc.get_plugin_args()
plugin_args['temperature']['range_check']['lower_bound'] = -60
Return type:

dict[str, dict[str, dict[str, typing.Any]]]

Returns:

A dictionary mapping the groups to a dictionary of registered check functions which will have a dictionary of the arguments registered as default for this function.

meteo_qc.infer_freq(s)[source]#

Infer the frequency of a pd.DateTimeIndex() by copying the dates and shifting them by one timestamp then subtracting the dates from each other taking the minimum.

Parameters:

s – a pd.Series() with a pd.DateTimeIndex().

Returns:

if the series is too short (< 3) None since the frequency cannot be inferred. Else a freqstr e.g. 10min.

Return type:

str | None

meteo_qc.persistence_check(s, window, excludes=[])[source]#

A check function checking if values in the pd.Series() s are persistent for a certain amount of time. “stuck values”.

This function can be used to write your own custom persistence checks.

Parameters:
  • s – the pd.Series() to be checked

  • window – a timedelta after which the values must have changed

  • excludes – values to exclude from the check e.g. useful for radiation or precipitation parameters that are 0 during the night or 0 without precipitation

Returns:

a meteo_qc.Result() object containing the outcome of the applied check.

Return type:

Result

meteo_qc.range_check(s, lower_bound, upper_bound)[source]#

A check function checking if values in the pd.Series() s are within a range.

This function can be used to write your own custom range checks.

Parameters:
  • s – the pd.Series() to be checked

  • lower_bound – the lower bound of the allowed values (inclusive)

  • upper_bound – the lower bound of the allowed values (inclusive)

Returns:

a meteo_qc.Result() object containing the outcome of the applied check.

Return type:

Result

meteo_qc.register(group, **kwargs)[source]#

A decorator for registering a plugin function.

import meteo_qc


@meteo_qc.register('temperature', args1=5, pi=3.141)
def custom_check(s: pd.DataFrame, arg1: int, pi: float) -> meteo_qc.Result:
    ...

The function custom_check will now be called for every column that is registered with the group temperature and will be part of the meteo_qc.FinalResult() returned by meteo_qc.apply_qc().

The meteo_qc.register() decorators can also be stacked to register a function for multiple groups.

import meteo_qc

@meteo_qc.register('relhum', args1=10, pi=3.141)
@meteo_qc.register('temperature', args1=5, pi=3.141)
def custom_check(s: pd.DataFrame, arg1: int, pi: float) -> meteo_qc.Result:
    ...

The function custom_check will now be called for every column that is registered with the group temperature or the group relhum, but with the corresponding arguments. The assigned arguments can be checked using meteo_qc.get_plugin_args().

Parameters:
  • group (str) – The group the function should registered with. This can be existing groups or a new group.

  • kwargs (typing.Any) – The keyword arguments that are associated with function that is decorated. For an example see above.

Return type:

typing.Callable[[typing.Callable[..., meteo_qc._data.Result]], typing.Callable[..., meteo_qc._data.Result]]

meteo_qc.spike_dip_check(s, delta)[source]#

A check function checking if values in the pd.Series() s have sudden spikes or dips.

This function can be used to write your own custom spike dip checks.

Parameters:
  • s – the pd.Series() to be checked

  • delta – maximum allowed change per minute

Returns:

a meteo_qc.Result() object containing the outcome of the applied check.

Return type:

Result