meteo_qc
#
- class meteo_qc.ColumnMapping[source]#
Class for adding columns of a dataframe to to groups. This can be done by calling:
from meteo_qc import ColumnMapping column_mapping = ColumnMapping() column_mapping['temperature_2m'].add_group('temperature')
which will add the column
temperature_2m
to the grouptemperature
and run all checks registered for this group.- classmethod autodetect_from_df(df)[source]#
Autodetect the groups from the column names.
import meteo_qc import pandas as pd df = pd.DataFrame( data=[[10], [20]], index=pd.date_range( start='2022-01-01 10:00', end='2022-01-01 10:10', freq='10min', ), columns=['air_temperature_2m'], ) column_mapping = meteo_qc.ColumnMapping().autodetect_from_df(df) print(column_mapping)
This will result in the
air_temperature_2m
column being registered with the temperature group.ColumnMapping({'air_temperature_2m': GroupList(['generic', 'temperature'])})
- Parameters:
df (
pandas.core.frame.DataFrame
) – Thepandas.DataFrame
to infer the groups from the column names- Returns:
An instance of
meteo_qc.ColumnMapping()
with columns registered that could be inferred from the column name.- Return type:
- class meteo_qc.FinalResult[source]#
Final Result dictionary of the quality control.
- Parameters:
columns – column that were quality controlled, mapping to a dictionary of
results
being another dictionary mapping the check function to to its Result.passed – did the the entire quality control pass (all checks)
data_start_date – timestamp in milliseconds of the start date of the provided input data
data_end_date – timestamp in milliseconds of the end date of the provided input data.
- class meteo_qc.Result(function: str, passed: bool, msg: str | None = None, data: list[list[float]] | None = None)[source]#
A
NamedTuple
storing the Results of one quality check.- Parameters:
-
data:
typing.Optional
[list
[list
[float
]]]# Alias for field number 3
-
msg:
typing.Optional
[str
]# Alias for field number 2
- meteo_qc.apply_qc(df, column_mapping)[source]#
Apply the quality control to a a
pandas.DataFrame
.- Parameters:
df (
pandas.core.frame.DataFrame
) – The DataFrame the quality control should be applied tocolumn_mapping (
meteo_qc._colum_mapping.ColumnMapping
) – A column mapping (meteo_qc.ColumnMapping()
), that assigns groups to columns. Seemeteo_qc.ColumnMapping()
for more information on how to create and customize one.
- Return type:
- Returns:
A result as json serializable dictionary to be rendered in a an HTML template.
{ "columns": { { "temp": { "passed": False, "results": { "missing_timestamps": Result( function="missing_timestamps", passed=False, msg="missing 1 timestamps (assumed frequency: 10min)", data=None, ), "null_values": Result( function="null_values", passed=False, msg="found 7 values that are null", data=[ [1641034800000, None, True], [1641038400000, None, True], [1641042000000, None, True], [1641045600000, None, True], [1641049200000, None, True], [1641052800000, None, True], [1641056400000, None, True], ], ), "persistence_check": Result( function="persistence_check", passed=True, msg=None, data=None ), "range_check": Result( function="range_check", passed=True, msg=None, data=None ), "spike_dip_check": Result( function="spike_dip_check", passed=True, msg=None, data=None ), }, }, }, ... }, "data_end_date": 1641056400000, "data_start_date": 1641031200000, "passed": False, }
- meteo_qc.get_plugin_args()[source]#
Get the (default) arguments that were registered with the check functions. For example when using this:
import meteo_qc import pandas as pd @meteo_qc.register('custom_group', arg=123) def custom_check(s: pd.Series, arg: int) -> meteo_qc.Result: ... print(meteo_qc.get_plugin_args())
For
custom_group
the argumentarg
was registered with123
.{ ... 'generic': {'missing_timestamps': {}, 'null_values': {}}, 'custom_group': {'custom_check': {'arg': 123}}, ... }
This can also be used to change the default values e.g. for the group
temperature
:import meteo_qc plugin_args = meteo_qc.get_plugin_args() plugin_args['temperature']['range_check']['lower_bound'] = -60
- meteo_qc.infer_freq(s)[source]#
Infer the frequency of a
pd.DateTimeIndex()
by copying the dates and shifting them by one timestamp then subtracting the dates from each other taking the minimum.- Parameters:
s – a
pd.Series()
with apd.DateTimeIndex()
.- Returns:
if the series is too short (< 3)
None
since the frequency cannot be inferred. Else afreqstr
e.g.10min
.- Return type:
str | None
- meteo_qc.persistence_check(s, window, excludes=[])[source]#
A check function checking if values in the
pd.Series()
s
are persistent for a certain amount of time. “stuck values”.This function can be used to write your own custom persistence checks.
- Parameters:
s – the
pd.Series()
to be checkedwindow – a timedelta after which the values must have changed
excludes – values to exclude from the check e.g. useful for radiation or precipitation parameters that are
0
during the night or0
without precipitation
- Returns:
a
meteo_qc.Result()
object containing the outcome of the applied check.- Return type:
- meteo_qc.range_check(s, lower_bound, upper_bound)[source]#
A check function checking if values in the
pd.Series()
s are within a range.This function can be used to write your own custom range checks.
- Parameters:
s – the
pd.Series()
to be checkedlower_bound – the lower bound of the allowed values (inclusive)
upper_bound – the lower bound of the allowed values (inclusive)
- Returns:
a
meteo_qc.Result()
object containing the outcome of the applied check.- Return type:
- meteo_qc.register(group, **kwargs)[source]#
A decorator for registering a plugin function.
import meteo_qc @meteo_qc.register('temperature', args1=5, pi=3.141) def custom_check(s: pd.DataFrame, arg1: int, pi: float) -> meteo_qc.Result: ...
The function
custom_check
will now be called for every column that is registered with the grouptemperature
and will be part of themeteo_qc.FinalResult()
returned bymeteo_qc.apply_qc()
.The
meteo_qc.register()
decorators can also be stacked to register a function for multiple groups.import meteo_qc @meteo_qc.register('relhum', args1=10, pi=3.141) @meteo_qc.register('temperature', args1=5, pi=3.141) def custom_check(s: pd.DataFrame, arg1: int, pi: float) -> meteo_qc.Result: ...
The function
custom_check
will now be called for every column that is registered with the grouptemperature
or the grouprelhum
, but with the corresponding arguments. The assigned arguments can be checked usingmeteo_qc.get_plugin_args()
.- Parameters:
group (
str
) – The group the function should registered with. This can be existing groups or a new group.kwargs (
typing.Any
) – The keyword arguments that are associated with function that is decorated. For an example see above.
- Return type:
typing.Callable
[[typing.Callable
[...
,meteo_qc._data.Result
]],typing.Callable
[...
,meteo_qc._data.Result
]]
- meteo_qc.spike_dip_check(s, delta)[source]#
A check function checking if values in the
pd.Series()
s have sudden spikes or dips.This function can be used to write your own custom spike dip checks.
- Parameters:
s – the
pd.Series()
to be checkeddelta – maximum allowed change per minute
- Returns:
a
meteo_qc.Result()
object containing the outcome of the applied check.- Return type: