meteo_qc¶
- class meteo_qc.ColumnMapping[source]¶
Class for adding columns of a dataframe to to groups. This can be done by calling:
from meteo_qc import ColumnMapping column_mapping = ColumnMapping() column_mapping['temperature_2m'].add_group('temperature')
which will add the column
temperature_2mto the grouptemperatureand run all checks registered for this group.- classmethod autodetect_from_df(df)[source]¶
Autodetect the groups from the column names.
import meteo_qc import pandas as pd df = pd.DataFrame( data=[[10], [20]], index=pd.date_range( start='2022-01-01 10:00', end='2022-01-01 10:10', freq='10min', ), columns=['air_temperature_2m'], ) column_mapping = meteo_qc.ColumnMapping().autodetect_from_df(df) print(column_mapping)
This will result in the
air_temperature_2mcolumn being registered with the temperature group.ColumnMapping({'air_temperature_2m': GroupList(['generic', 'temperature'])})- Parameters:
df (
pandas.DataFrame) – Thepandas.DataFrameto infer the groups from the column names- Returns:
An instance of
meteo_qc.ColumnMapping()with columns registered that could be inferred from the column name.- Return type:
- class meteo_qc.FinalResult[source]¶
Final Result dictionary of the quality control.
- Parameters:
columns – column that were quality controlled, mapping to a dictionary of
resultsbeing another dictionary mapping the check function to to its Result.passed – did the the entire quality control pass (all checks)
data_start_date – timestamp in milliseconds of the start date of the provided input data
data_end_date – timestamp in milliseconds of the end date of the provided input data.
- class meteo_qc.Result(function: str, passed: bool, msg: str | None = None, data: list[list[float]] | None = None)[source]¶
A
NamedTuplestoring the Results of one quality check.- Parameters:
- meteo_qc.apply_qc(df, column_mapping)[source]¶
Apply the quality control to a a
pandas.DataFrame.- Parameters:
df (
pandas.DataFrame) – The DataFrame the quality control should be applied tocolumn_mapping (
meteo_qc._colum_mapping.ColumnMapping) – A column mapping (meteo_qc.ColumnMapping()), that assigns groups to columns. Seemeteo_qc.ColumnMapping()for more information on how to create and customize one.
- Return type:
- Returns:
A result as json serializable dictionary to be rendered in a an HTML template.
{ "columns": { { "temp": { "passed": False, "results": { "missing_timestamps": Result( function="missing_timestamps", passed=False, msg="missing 1 timestamps (assumed frequency: 10min)", data=None, ), "null_values": Result( function="null_values", passed=False, msg="found 7 values that are null", data=[ [1641034800000, None, True], [1641038400000, None, True], [1641042000000, None, True], [1641045600000, None, True], [1641049200000, None, True], [1641052800000, None, True], [1641056400000, None, True], ], ), "persistence_check": Result( function="persistence_check", passed=True, msg=None, data=None ), "range_check": Result( function="range_check", passed=True, msg=None, data=None ), "spike_dip_check": Result( function="spike_dip_check", passed=True, msg=None, data=None ), }, }, }, ... }, "data_end_date": 1641056400000, "data_start_date": 1641031200000, "passed": False, }
- meteo_qc.get_plugin_args()[source]¶
Get the (default) arguments that were registered with the check functions. For example when using this:
import meteo_qc import pandas as pd @meteo_qc.register('custom_group', arg=123) def custom_check(s: pd.Series, arg: int) -> meteo_qc.Result: ... print(meteo_qc.get_plugin_args())
For
custom_groupthe argumentargwas registered with123.{ ... 'generic': {'missing_timestamps': {}, 'null_values': {}}, 'custom_group': {'custom_check': {'arg': 123}}, ... }
This can also be used to change the default values e.g. for the group
temperature:import meteo_qc plugin_args = meteo_qc.get_plugin_args() plugin_args['temperature']['range_check']['lower_bound'] = -60
- meteo_qc.infer_freq(s)[source]¶
Infer the frequency of a
pd.DateTimeIndex()by copying the dates and shifting them by one timestamp then subtracting the dates from each other taking the minimum.- Parameters:
s – a
pd.Series()with apd.DateTimeIndex().- Returns:
if the series is too short (< 3)
Nonesince the frequency cannot be inferred. Else afreqstre.g.10min.- Return type:
str | None
- meteo_qc.persistence_check(s, window, excludes=[])[source]¶
A check function checking if values in the
pd.Series()sare persistent for a certain amount of time. “stuck values”.This function can be used to write your own custom persistence checks.
- Parameters:
s – the
pd.Series()to be checkedwindow – a timedelta after which the values must have changed
excludes – values to exclude from the check e.g. useful for radiation or precipitation parameters that are
0during the night or0without precipitation
- Returns:
a
meteo_qc.Result()object containing the outcome of the applied check.- Return type:
- meteo_qc.range_check(s, lower_bound, upper_bound)[source]¶
A check function checking if values in the
pd.Series()s are within a range.This function can be used to write your own custom range checks.
- Parameters:
s – the
pd.Series()to be checkedlower_bound – the lower bound of the allowed values (inclusive)
upper_bound – the lower bound of the allowed values (inclusive)
- Returns:
a
meteo_qc.Result()object containing the outcome of the applied check.- Return type:
- meteo_qc.register(group, **kwargs)[source]¶
A decorator for registering a plugin function.
import meteo_qc @meteo_qc.register('temperature', args1=5, pi=3.141) def custom_check(s: pd.DataFrame, arg1: int, pi: float) -> meteo_qc.Result: ...
The function
custom_checkwill now be called for every column that is registered with the grouptemperatureand will be part of themeteo_qc.FinalResult()returned bymeteo_qc.apply_qc().The
meteo_qc.register()decorators can also be stacked to register a function for multiple groups.import meteo_qc @meteo_qc.register('relhum', args1=10, pi=3.141) @meteo_qc.register('temperature', args1=5, pi=3.141) def custom_check(s: pd.DataFrame, arg1: int, pi: float) -> meteo_qc.Result: ...
The function
custom_checkwill now be called for every column that is registered with the grouptemperatureor the grouprelhum, but with the corresponding arguments. The assigned arguments can be checked usingmeteo_qc.get_plugin_args().- Parameters:
group (
str) – The group the function should registered with. This can be existing groups or a new group.kwargs (
typing.Any) – The keyword arguments that are associated with function that is decorated. For an example see above.
- Return type:
typing.Callable[[typing.Callable[...,meteo_qc._data.Result]],typing.Callable[...,meteo_qc._data.Result]]
- meteo_qc.spike_dip_check(s, delta)[source]¶
A check function checking if values in the
pd.Series()s have sudden spikes or dips.This function can be used to write your own custom spike dip checks.
- Parameters:
s – the
pd.Series()to be checkeddelta – maximum allowed change per minute
- Returns:
a
meteo_qc.Result()object containing the outcome of the applied check.- Return type: