Builder#

class bigearthnet_gdf_builder.builder.Season#

A simple season class.

bigearthnet_gdf_builder.builder.add_full_ben_s1_metadata(gdf)#

This is a wrapper around many functions from this library. It requires an input GeoDataFrame in S1-BigEarthNet style.

See get_gdf_from_s1_patch_dir for details to create a new GeoDataFrame. This function adds the following columns:

  • snow: bool - Whether or not the patch contains seasonal snow

  • cloud_or_shadow: bool - Whether or not the patch contains clouds/shadows

  • original_split: One of: train|validation|test|None; Indicates to which

    split the patch was originally assigned to

  • new_labels: label|None - The 19-label nomenclature or None if

    no target labels exist.

  • country: str - The name of the BigEarthNet country the patch belongs to.

  • season: str - The season in which the tile was aquired.

In short, the function will add all the available metadata.

Parameters

gdf (geopandas.geodataframe.GeoDataFrame) –

Return type

geopandas.geodataframe.GeoDataFrame

bigearthnet_gdf_builder.builder.add_full_ben_s2_metadata(gdf)#

This is a wrapper around many functions from this library. It requires an input GeoDataFrame in S2-BigEarthNet style.

See get_gdf_from_s2_patch_dir for details to create a new GeoDataFrame. This function adds the following columns:

  • snow: bool - Whether or not the patch contains seasonal snow

  • cloud_or_shadow: bool - Whether or not the patch contains clouds/shadows

  • original_split: One of: train|validation|test|None; Indicates to which

    split the patch was originally assigned to

  • new_labels: label|None - The 19-label nomenclature or None if

    no target labels exist.

  • country: str - The name of the BigEarthNet country the patch belongs to.

  • season: str - The season in which the tile was aquired.

In short, the function will add all the available metadata.

Parameters

gdf (geopandas.geodataframe.GeoDataFrame) –

Return type

geopandas.geodataframe.GeoDataFrame

bigearthnet_gdf_builder.builder.assign_to_ben_country(gdf, crs='epsg:3035')#

Takes a GeoDataFrame as an input and appends a country column. The country column indicates the closest BEN country.

The function calculates the centroid of each input geometry with the crs projection. These centroids are then used to find and assign the entry to the closest BEN country. Centroids help to more deterministically assign a border-crossing patch to a country. For the small BEN patches (1200mx1200m) the _error_ of the approximation is negligible and a good heuristic to assign the patch to the country with the largest overlap.

Parameters
  • gdf (geopandas.geodataframe.GeoDataFrame) –

  • crs (str) –

Return type

geopandas.geodataframe.GeoDataFrame

bigearthnet_gdf_builder.builder.ben_s1_patch_to_gdf(patch_path)#

Given the filepath to a BigEarthNet json _metadata_labels file, or to the containing patch folder, the function will return a single row GeoDataFrame.

The filepath is necessary, as only the filename contains the patch name.

The datetime will be parsed with best effort and guaranteed to be in the format ‘YYYY-MM-DDThh-mm-ss’ (different to Ben-S2!)

The coordinates that indicate the upper-left-x/y and lower-right-x/y will be converted into a shapely.Polygon.

The coordinate reference system (CRS) will be equivalent to the one given in the json file. Or with other words, the data is not reprojected!

Parameters

patch_path (Union[pydantic.types.FilePath, pydantic.types.DirectoryPath]) –

Return type

geopandas.geodataframe.GeoDataFrame

bigearthnet_gdf_builder.builder.ben_s1_patch_to_reprojected_gdf(patch_path, target_proj='epsg:3035')#

Calls ben_s1_patch_to_gdf and simply reprojects the resulting GeoDataFrame afterwards to the given target_proj.

This is a tiny wrapper to ensure that the generated BEN GeoDataFrame’s can be concatenated and have a common coordinate reference system.

See ben_s1_patch_to_gdf for more details.

Parameters
  • patch_path (Union[pydantic.types.FilePath, pydantic.types.DirectoryPath]) –

  • target_proj (str) –

Return type

geopandas.geodataframe.GeoDataFrame

bigearthnet_gdf_builder.builder.ben_s2_patch_to_gdf(patch_path)#

Given the filepath to a BigEarthNet json _metadata_labels file, or to the containing patch folder, the function will return a single row GeoDataFrame.

The filepath is necessary, as only the filename contains the patch name.

The datetime will be parsed with best effort and guaranteed to be in the format ‘YYYY-MM-DD hh-mm-ss’ (different to BEN-S1!)

The coordinates that indicate the upper-left-x/y and lower-right-x/y will be converted into a shapely.Polygon.

The coordinate reference system (CRS) will be equivalent to the one given in the json file. Or with other words, the data is not reprojected!

Parameters

patch_path (Union[pydantic.types.FilePath, pydantic.types.DirectoryPath]) –

Return type

geopandas.geodataframe.GeoDataFrame

bigearthnet_gdf_builder.builder.ben_s2_patch_to_reprojected_gdf(patch_path, target_proj='epsg:3035')#

Calls ben_s2_patch_to_gdf and simply reprojects the resulting GeoDataFrame afterwards to the given target_proj.

This is a tiny wrapper to ensure that the generated BEN GeoDataFrame’s can be concatenated and have a common coordinate reference system.

See ben_s2_patch_to_gdf for more details.

Parameters
  • patch_path (Union[pydantic.types.FilePath, pydantic.types.DirectoryPath]) –

  • target_proj (str) –

Return type

geopandas.geodataframe.GeoDataFrame

bigearthnet_gdf_builder.builder.box_from_ul_lr_coords(ulx, uly, lrx, lry)#

Build a box (Polygon) from upper left x/y and lower right x/y coordinates.

This specification is the default BigEarthNet style.

Parameters
  • ulx (numbers.Real) –

  • uly (numbers.Real) –

  • lrx (numbers.Real) –

  • lry (numbers.Real) –

Return type

shapely.geometry.polygon.Polygon

bigearthnet_gdf_builder.builder.build_gdf_from_s1_patch_paths(paths, *, n_workers=8, progress=True, target_proj='epsg:3035')#

Build a single geopandas.GeoDataFrame from the BEN-S1 json files. The code will run in parallel and use n_workers processes. By default a progress-bar will be shown.

From personal experience, the ideal number of workers is 8 in most cases. For laptops with fewer cores, 2 or 4 n_workers should be set. More than 8 usually leads to only minor improvements and with n_workers > 12 the performance usually degrades.

The function returns a single GDF with all patches reprojected to target_proj, which is epsg:3035 by default.

If the directory contains no S2 patch-folders, an ValueError is raised.

Parameters
  • paths (List[pathlib.Path]) –

  • n_workers (pydantic.types.PositiveInt) –

  • progress (bool) –

  • target_proj (str) –

Return type

geopandas.geodataframe.GeoDataFrame

bigearthnet_gdf_builder.builder.build_gdf_from_s2_patch_paths(paths, *, n_workers=8, progress=True, target_proj='epsg:3035')#

Build a single geopandas.GeoDataFrame from the BEN-S2 json files. The code will run in parallel and use n_workers processes. By default a progress-bar will be shown.

From personal experience, the ideal number of workers is 8 in most cases. For laptops with fewer cores, 2 or 4 n_workers should be set. More than 8 usually leads to only minor improvements and with n_workers > 12 the performance usually degrades.

The function returns a single GDF with all patches reprojected to target_proj, which is epsg:3035 by default.

If the directory contains no S2 patch-folders, an ValueError is raised.

Parameters
  • paths (List[pathlib.Path]) –

  • n_workers (pydantic.types.PositiveInt) –

  • progress (bool) –

  • target_proj (str) –

Return type

geopandas.geodataframe.GeoDataFrame

bigearthnet_gdf_builder.builder.build_raw_ben_s1_parquet(ben_path, output_path=Path('raw_ben_s1_gdf.parquet'), n_workers=8, target_proj='epsg:3035', verbose=True)#

Create a fresh BigEarthNet-S1-style parquet file from all the image patches in the root ben_path folder. The output will be written to output_path.

The default output is raw_ben_s1_gdf in the current directory.

The other options are only for advanced use. Returns the resolved output path.

Parameters
  • ben_path (pathlib.Path) –

  • output_path (pathlib.Path) –

  • n_workers (int) –

  • target_proj (str) –

  • verbose (bool) –

Return type

pathlib.Path

bigearthnet_gdf_builder.builder.build_raw_ben_s2_parquet(ben_path, output_path=Path('raw_ben_s2_gdf.parquet'), n_workers=8, target_proj='epsg:3035', verbose=True)#

Create a fresh BigEarthNet-S2-style parquet file from all the image patches in the root ben_path folder. The output will be written to output_path.

The default output is raw_ben_s2_gdf in the current directory.

The other options are only for advanced use. Returns the resolved output path.

Parameters
  • ben_path (pathlib.Path) –

  • output_path (pathlib.Path) –

  • n_workers (int) –

  • target_proj (str) –

  • verbose (bool) –

Return type

pathlib.Path

Generate the recommended S1-GeoDataFrame and save it as a parquet file.

It will call build_raw_ben_s1_parquet under the hood and remove patches that are not recommended for DL. If add_metadata is set, the GeoDataFrame will be enriched with extra information, such as Country and Season of the patch. See add_full_ben_metadata for more information.

This tool will store all intermediate results in the default USER directory. This directory will be printed to allow accessing these intermediate results if necessary. The resulting GeoDataFrame will be copied to output_path.

The other keyword arguments should usually be left untouched.

Parameters
  • ben_path (pathlib.Path) –

  • add_metadata (bool) –

  • output_path (pathlib.Path) –

  • n_workers (int) –

  • target_proj (str) –

  • verbose (bool) –

Return type

pathlib.Path

Generate the recommended S2-GeoDataFrame and save it as a parquet file.

It will call build_raw_ben_s2_parquet under the hood and remove patches that are not recommended for DL. If add_metadata is set, the GeoDataFrame will be enriched with extra information, such as Country and Season of the patch. See add_full_ben_metadata for more information.

This tool will store all intermediate results in a temporary directory. This temporary directory will be printed to allow accessing these intermediate results if necessary. The resulting GeoDataFrame will be copied to output_path.

The other keyword arguments should usually be left untouched.

Parameters
  • ben_path (pathlib.Path) –

  • add_metadata (bool) –

  • output_path (pathlib.Path) –

  • n_workers (int) –

  • target_proj (str) –

  • verbose (bool) –

Return type

pathlib.Path

bigearthnet_gdf_builder.builder.extend_ben_s1_parquet(ben_parquet_path, output_name='extended_ben_s1_gdf.parquet', verbose=True)#

Extend an existing BigEarthNet-S1-style parquet file.

The output will be written next to ben_parquet_path with the file output_name. The default name is extended_ben_s1_gdf.

This function heavily relies on the structure of the parquet file. It should only be used on parquet files that were build with this library! Use the functions of this package directly to have more control!

Parameters
  • ben_parquet_path (pathlib.Path) –

  • output_name (str) –

  • verbose (bool) –

Return type

pathlib.Path

bigearthnet_gdf_builder.builder.extend_ben_s2_parquet(ben_parquet_path, output_name='extended_ben_s2_gdf.parquet', verbose=True)#

Extend an existing BigEarthNet-S2-style parquet file.

The output will be written next to ben_parquet_path with the file output_name. The default name is extended_ben_s2_gdf.

This function heavily relies on the structure of the parquet file. It should only be used on parquet files that were build with this library! Use the functions of this package directly to have more control!

Parameters
  • ben_parquet_path (pathlib.Path) –

  • output_name (str) –

  • verbose (bool) –

Return type

pathlib.Path

bigearthnet_gdf_builder.builder.get_ben_countries_gdf()#

Return a GeoDataFrame that includes the shapes of each country from the BigEarthNet dataset.

This is a subset of the naturalearthdata 10m-admin-0-countries dataset:

https://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-0-countries

Return type

geopandas.geodataframe.GeoDataFrame

bigearthnet_gdf_builder.builder.get_gdf_from_s1_patch_dir(dir_path, *, n_workers=8, progress=True, target_proj='epsg:3035')#

Searches through dir_path to assemble a BEN-S1-style GeoDataFrame. Will only consider correctly named directories. Wraps around get_s1_patch_directory and build_gdf_from_s1_patch_paths.

Raises an error if an empty GeoDataFrame would be produced.

Parameters
  • dir_path (pydantic.types.DirectoryPath) –

  • n_workers (pydantic.types.PositiveInt) –

  • progress (bool) –

  • target_proj (str) –

Return type

geopandas.geodataframe.GeoDataFrame

bigearthnet_gdf_builder.builder.get_gdf_from_s2_patch_dir(dir_path, *, n_workers=8, progress=True, target_proj='epsg:3035')#

Searches through dir_path to assemble a BEN-S2-style GeoDataFrame. Will only consider correctly named directories. Wraps around get_s2_patch_directory and build_gdf_from_s2_patch_paths.

Raises an error if an empty GeoDataFrame would be produced.

Parameters
  • dir_path (pydantic.types.DirectoryPath) –

  • n_workers (pydantic.types.PositiveInt) –

  • progress (bool) –

  • target_proj (str) –

Return type

geopandas.geodataframe.GeoDataFrame

bigearthnet_gdf_builder.builder.remove_bad_ben_gdf_entries(gdf)#

It will ensure that the returned frame will only contain patches that also have labels for the 19 label version.

If the GeoDataFrame doesn’t include a column named new_labels, it will be created by converting the labels column. The patches that do not contain any new_labels are dropped.

There are 57 patches that would have no target labels. Also patches that are covered by seasonal snow or clouds/shadows are removed if present.

The dataframe will be reindexed.

Note: This function applies to both S1 and S2 BigEarthNet dataframes!

Parameters

gdf (geopandas.geodataframe.GeoDataFrame) –

Return type

geopandas.geodataframe.GeoDataFrame

bigearthnet_gdf_builder.builder.remove_discouraged_parquet_entries(ben_parquet_path, output_name='cleaned_ben_gdf.parquet', verbose=True)#

Remove entries of an existing BigEarthNet-style (S1 or S2) parquet file.

The output will be written next to ben_parquet_path with the file output_name. The default name is cleaned_ben_gdf.parquet.

This function only requires the input parquet file to have the name column and the original 43-class nomenclature called labels.

Parameters
  • ben_parquet_path (pathlib.Path) –

  • output_name (str) –

  • verbose (bool) –

Return type

pathlib.Path

bigearthnet_gdf_builder.builder.tfm_month_to_season(dates)#

Uses simple mathmatical formula to transform date to seasons string given their months.

The season is calculated as the meterological season, assuming that we are on the northern hemisphere.

Parameters

dates (pandas.core.series.Series) –

Return type

pandas.core.series.Series