datasets package

Submodules

datasets.configured module

datasets.configured._align_data(df_y: DataFrame, dfs_x: tuple) tuple
datasets.configured.load_dfs_test_maize() tuple
datasets.configured.load_dfs_test_maize_fr() tuple
datasets.configured.load_dfs_test_maize_us() tuple
datasets.configured.load_dfs_test_softwheat_nl() tuple

datasets.dataset module

class datasets.dataset.Dataset(data_target: DataFrame | None = None, data_inputs: list | None = None)

Bases: object

static _empty_df_target() DataFrame

Helper function that creates an empty (but rightly formatted) dataframe for yield statistics

static _filter_df_on_index(df: DataFrame, keys: list, level: int)

Helper method for filtering a dataframe based on the occurrence of certain values in a specified index

Parameters:
  • df – the dataframe that should be filtered

  • keys – the values on which it should filter

  • level – the index level in which samples should be filtered

Returns:

a filtered dataframe

_get_feature_data(loc_id: int, year: int) dict

Helper function for obtaining feature data corresponding to some index :param loc_id: location index value :param year: year index value :return: a dict containing all feature data corresponding to the specified index

static _split_df_on_index(df: DataFrame, split: tuple, level: int)
static _validate_dfs(df_y: DataFrame, dfs_x: list) bool

Helper function that implements some checks on whether the input dataframes are correctly formatted

Parameters:
  • df_y – dataframe containing yield statistics

  • dfs_x – list of dataframes each containing feature data

Returns:

a bool indicating whether the test has passed

property feature_names: set

Obtain a set containing all feature names

indices() list
static load(name: str) Dataset
property location_ids: set

Obtain a set containing all location ids occurring in the dataset

split_on_years(years_split: tuple) tuple

Create two new datasets based on the provided split in years

Parameters:

years_split – tuple e.g ([2012, 2014], [2013, 2015])

Returns:

two data sets

targets() array

Obtain an numpy array of targets or labels

property years: set

Obtain a set containing all years occurring in the dataset

datasets.dataset_overview module

datasets.dataset_torch module

Module contents