Skip to content

Latest commit

 

History

History
134 lines (111 loc) · 4.77 KB

File metadata and controls

134 lines (111 loc) · 4.77 KB

Change Log

All notable changes to this project will be documented in this file.

2.2.0 - 2026-03

Runner

  • allowing for custom table name (has priority before class name)
  • added options to add filter on dependencies and target table based on column-value pairs
  • target table can now selectively write based on secondary virtual partitions

Common

  • table reader can also filter based on given column-values pairs

2.1.4 - 2026-02

Common

  • table reader optimization

2.1.3 - 2026-02

Runner

  • Separate writer from runner, sorting schema to align to written table
  • Added a debug run option that return dataframe without writing anything

2.1.2 - 2026-01

Maker

  • Feature type normalization to Double and Long

2.1.1 - 2026-01

  • Replaces 2.1.0 release

2.1.0 - 2025-10

General

  • Updated python version to 3.12 and pyspark to 4.0
  • Migrated from poetry to UV

Runner

  • Added merge_schema manual override option

2.0.11 - 2025-08-12

Loader

  • optimized number of internal read operation when setting up PysparkFeatureLoader

2.0.10 - 2025-07-29

Runner

  • removed unnecessary count() call from logger function

Jobs

  • fixed module register with disable decorators not offloading dependencies

2.0.9 - 2025-07-24

Loader

  • PysparkFeatureLoader.feature_schema now also accepts a list of schemas
    • the features can now be split into multiple schemas (names of groups must remain unique across all schemas)
  • loader now also accepts storage table names interchangeably with feature group names
    • assumes that feature group names contain uppercase lettering

Runner

  • runner no longer checks number of generated records before attempting to write them, instead it retrieves the information from storage

2.0.8 - 2025-05-15

Runner

  • expanded config overrides logging
  • added config validation
  • added owner attribute to GroupMetadata class

2.0.7 - 2025-04-15

Runner

  • email reporting is now optional
  • refactored BookKeeper methods
    • bookkeeping table is no longer being overwritten, new records are appended instead

2.0.6 - 2025-02-26

Job Metadata

  • @jobs can now request job_metadata
  • job_metadata contains
    • name of the job (e.g. model predict)
    • information about the job package: distribution_name and version (e.g. test-model, v 0.2.1)

2.0.3 - 2025-01-27

Runner

  • added run time timestamp to record

2.0.2 - 2025-01-24

Runner

  • new bookkeeping functionality
    • save reporting information to dataframe
    • added bookkeeping config option to runner config

2.0.1 - 2024-10-02

bugfixes

  • compound keys now work properly in sequential features
  • module register now resets on reload while testing

2.0.0 - 2024-09-21

Runner

  • runner config now accepts environment variables
  • restructured runner config
    • added metadata and feature loader sections
    • target moved to pipeline
    • dependency date_col is now mandatory
    • custom extras config is available in each pipeline and will be passed as dictionary available under pipeline_config.extras
    • general section is renamed to runner
    • info_date_shift is always a list
  • transformation header changed
  • added argument to skip dependency checking
  • added overrides parameter to allow for dynamic overriding of config values
  • removed date_from and date_to from arguments, use overrides instead

Jobs

  • jobs are now the main way to create all pipelines
  • config holder removed from jobs
  • metadata_manager and feature_loader are now available arguments, depending on configuration
  • added @config decorator, similar use case to @datasource, for parsing configuration
  • reworked Resolver + Added ModuleRegister
    • datasources no longer just by importing, thus are no longer available for all jobs
    • register_dependency_callable and register_dependency_module added to register datasources
    • together, it's now possilbe to have 2 datasources with the same name, but different implementations for 2 jobs.

TableReader

  • function signatures changed
    • until -> date_until
    • info_date_from -> date_from, info_date_to -> date_to
    • date_column is now mandatory
  • removed TableReaders ability to infer schema from partitions or properties

Loader

  • removed DataLoader class, now only PysparkFeatureLoader is needed with additional parameters

1.3.0 - 2024-06-07

Added

  • passing dependencies from runner to a Transformation
    • optional dependency names in the config that could be recalled via dictionary to access paths and date columns
  • Rialto now adds rialto_date_column property to written tables

Changed

  • signature of Transformation
  • Allowed future dependencies