datasets package
Submodules
datasets.configured module
- datasets.configured._align_data(df_y: DataFrame, dfs_x: tuple) tuple
- datasets.configured.load_dfs_test_maize() tuple
- datasets.configured.load_dfs_test_maize_fr() tuple
- datasets.configured.load_dfs_test_maize_us() tuple
- datasets.configured.load_dfs_test_softwheat_nl() tuple
datasets.dataset module
- class datasets.dataset.Dataset(data_target: DataFrame | None = None, data_inputs: list | None = None)
Bases:
object
- static _empty_df_target() DataFrame
Helper function that creates an empty (but rightly formatted) dataframe for yield statistics
- static _filter_df_on_index(df: DataFrame, keys: list, level: int)
Helper method for filtering a dataframe based on the occurrence of certain values in a specified index
- Parameters:
df – the dataframe that should be filtered
keys – the values on which it should filter
level – the index level in which samples should be filtered
- Returns:
a filtered dataframe
- _get_feature_data(loc_id: int, year: int) dict
Helper function for obtaining feature data corresponding to some index :param loc_id: location index value :param year: year index value :return: a dict containing all feature data corresponding to the specified index
- static _split_df_on_index(df: DataFrame, split: tuple, level: int)
- static _validate_dfs(df_y: DataFrame, dfs_x: list) bool
Helper function that implements some checks on whether the input dataframes are correctly formatted
- Parameters:
df_y – dataframe containing yield statistics
dfs_x – list of dataframes each containing feature data
- Returns:
a bool indicating whether the test has passed
- property feature_names: set
Obtain a set containing all feature names
- indices() list
- property location_ids: set
Obtain a set containing all location ids occurring in the dataset
- split_on_years(years_split: tuple) tuple
Create two new datasets based on the provided split in years
- Parameters:
years_split – tuple e.g ([2012, 2014], [2013, 2015])
- Returns:
two data sets
- targets() array
Obtain an numpy array of targets or labels
- property years: set
Obtain a set containing all years occurring in the dataset