Draft
Conversation
…es w/o content_length (#1235)
…arlier than expected (#1237)
* Kicked off webhooks domain: described webhook subscription aggregate * Added 'WebhookEvent' simple aggregate * Sketched subscription event store methods * Sketched WebhookDelivery (fact) and it's repository * Couple more methods in delivery repository * Improved ID types. Introduced TaskAttemptId type * Introduced concept of `TaskAttempt` in task, and reshuffled state structure around it without changing event store * Drafter retrying logic and retry policies at task aggregate level. Not integrated or materialized yet. * Drafted retries in task snapshots * Working sqlite migration * Stabilized existing tests * Unit tests added for retry policies * Task scheduler config for retry policy + service-layer tests for retries * Implemented repository for webhook events (3 incarnations, no tests yet) * Unit tests implemented for WebhookEvent repositories. Detecting duplicate ID when creating events. * SQL for webhook subscriptions and deliveries * Drafted in-memory repo for webhook subscriptions * Sketched in-memory webhook delivery repository * Drafted Postgres/SQlite implementations fo webhook delivery repository * Drafted Postgres webhook subscription event store implementation * Implemented SQlite version of webhook subscription store * Tests for webhook delivery repository * Basic coverage for webhook subscription event store * Merge corrections * Finished tests for repository layer * Test corrections * Drafted webhook event bridge service * Sketched webhooks service layer, including: - webhook logical plan in Task System, propagation through planner/runner - webhook sender: populates delivery object, generates headers, sends webhook, updates the delivery object with webhook response - webhook signer: implementation of RFC9421 * udeps cleaned * Wrote very naive webhook signing and sender tests * Removed "ref" field from webhook event + linter fixed * WebhookSender => WebhookDeliveryWorker * Separated webhook outbox bride and event builder * Sending more universal DATASET.REF.UPDATED event with "blockRef" field * Separated WebhookSender from WebhookDelivery worker (to increase testability) * WebhookOutboxBridge => WebhookDeliveryScheduler * Drafted webhook secret generator * Added outbox consumer that removes subscriptions of a removed dataset * MInor correction in sending test * Sketched webhook subscriptions GQL api (no tests yet, incomplete) * Self-review question * Merge corrections * merge corrections * Removed task retries. Binding webhook delivery 1-to-1 to task * First test for subscriptions API * Extended test coverage for webhook subscription create GQL entry. Extracted creation use case * Reorganized GQL api handlers to use use cases more * Reorganized webhook subscriptions API to be more dataset bound to avoid separate security checking * Reshuffled update operations in GQL for webhook subscriptions * Idempotent subscription reactions * More subscription GQL API tests * Webhook subscription use case tests * Tests for WebhookDatasetRemovalHandler * Tests for webhook delivery scheduler service * Tests for webhook delivery worker * Merge corrections * Will lint/codegen pass with project toolchain? * try fixing lint/codegen * Removed obsolete comment * Recovered original formatting of GQL schema
* Custom workflow for SQLX migrations dev branch * GQL: Account Deletion API (#1242) * GQL: AccountMut::delete_account(): scaffolding * validate_password(): use .len() * DidSecretKeyRowModel: use type fullpath for DidEntityType * schema.gql: update * AccountMut::delete_account_by_name(): use DeleteAccountUseCase * AccountMut: use AccountName scalar instead of String * AccountMut: use Email scalar instead of String * AccountService::delete_account_by_name(): add * make sqlx-local-setup: fix * did_secret_keys: drop creator_id * CreateAccountUseCaseImpl::execute(): update fallback email generation * AccountService::delete_account_by_name(): add [2] * AccountRepository::delete_account_by_name(): implement * account_messages.rs -> account_lifecycle_message.rs * DeleteAccountUseCaseImpl: send AccountLifecycleMessage::deleted() * AccountRepository::delete_account_by_name(): return the removed account not only id * AccountLifecycleMessageDeleted::display_name: add * DatasetAccountDeletionHandler: introduce * DatasetRegistry::all_dataset_handles_by_owner_id(): implement * DatasetRegistry::all_dataset_handles_by_owner() -> all_dataset_handles_by_owner_name() * DatasetAccountDeletionHandler: introduce [2] * DidSecretKeyRepository::delete_did_secret_key(): add * kamu-account-services: group message producers and consumers * messaging_outbox::prelude: introduce * messaging_outbox::prelude: introduce * DidSecretService: handle DatasetLifecycleMessage * access_tokens: add ON DELETE CASCADE for account_id * accounts_passwords: add account_id * Linter fixes * CreateAccountUseCaseImpl::generate_email(): extract * PasswordHashRepository::save_password_hash(): add "account_id" * Makefile: resort vertically db crates * AccountLifecycleMessageDeleted: add "email" field * DidSecretKeyRepository::get_did_secret_key(): add * test_insert_and_locate_did_secret_keys(): update * test_create_account(): fixed * test_create_dataset_from_snapshot_creates_did_secret_key(): fixed * test_update_email_bad_email(): fixed * Makefile: remove db crate duplicates * sqlx: add cached queries * AccountsMut::create_account(): absorb * AccountMut::delete(): rethink * AccountMut: access checks * AccountMut: update tracings * DeleteAccountUseCaseImpl: only admins * GQL: AccountMut::modify_password(): re-think * DeleteAccountUseCaseImpl: allow self-deletion * Self-review * CHANGELOG.md: update * DeleteAccountUseCaseImpl::authenticated(): renamed from unauthenticated() * Integration fixes * SelfDeletionIsForbidden: remove unused struct * AccountRepository::delete_account_by_name(): do not return deleted account * OsoDatasetAuthorizer: move dill macros to struct declaration * DeleteAccountUseCase::execute(): take &Account as argument * DeleteAccountUseCaseImpl: use utils::AccountAuthorizationHelper * DatasetRegistry::all_dataset_handles_by_owner_id(): return odfOwnedDatasetHandleStream * Revert "DatasetRegistry::all_dataset_handles_by_owner_id(): return odfOwnedDatasetHandleStream" This reverts commit 8ac0da8. * DatasetAccountDeletionHandler::handle_account_lifecycle_deleted_message(): add a PERF note * CI: fix codegen action * CHANGELOG.md: update * test_delete_account_use_case_impl(): implement * GQL: test_accounts: add tests * CHANGELOG.md: update * Tests fixes * sqlx: update cached queries * Remove password logic from account service level (#1243) * Refactor password logic * Update changelog * GQL: Collection API, `extra_data` validation (#1246) * CollectionMut: take dataset by a ref * VersionedFileMut: take dataset by a ref * GQL: Add ExtraData scalar * GQL: use ExtraData scalar * Typo fixes * Fix tests * GQL: ExtraData scalar: add tests * CHANGELOG.md: update * Allow MIT-0 license usage * kamu-adapter-graphql: correct feature gate * Release (minor): 0.238.0 (#1248) --------- Co-authored-by: Sergei Zaychenko <szaychenko@kamu.dev> Co-authored-by: Roman Boiko <roman.bv20@gmail.com>
* Upgrade to new rustc and 2024 edition (#1254) * Search by account name (#1253) * Search filters also by account name * Update changelog * Fix review comments. Iter 1 * Replace format by to_string() * Fix fmt * Wallet based authentication: Phase 1 (#1239) * EvmWalletAuthenticationProvider: scaffolding * kamu-datasets: remove extra dep (itertools) * kamu-adapter-auth-web3: implement Web3WalletAuthenticationProvider (w/o nonce checking) * kamu --show-error-stack-trace * APIServerRunCommand: get token after HTTP server initialization * kamu-cli: activate Web3WalletAuthenticationProvider * kamu-adapter-graphql: extract auth_mut/ * kamu-adapter-auth-web3: ChecksumWalletAddress -> ChecksumEvmWalletAddress * GQL: AuthWeb3Mut::nonce(): implement * kamu-web3: introduce Web3AuthNonceRepository * kamu-web3-services: introduce Web3NonceServiceImpl * kamu-cli: register kamu-web3-services * kamu-auth-web3: update EIP_4361_EXPECTED_STATEMENT text * schema.gql: update * Web3NonceServiceImpl: impl InitOnStartup * kamu-web -> kamu-auth-web * kamu-auth-web3-inmem: implement * kamu-auth-web3-repo-tests: implement * kamu-auth-web3-inmem: tests * AuthWeb3Mut: nonce() -> eip4361_auth_nonce() * kamu-auth-web3-postgres: scaffolding * kamu-auth-web3-postgres: implement * Web3AuthenticationNonceEntity: expired_at -> expires_at * kamu-auth-web3-sqlite: implement * sqlx: update cached queries * Web3AuthenticationNonce -> Web3AuthenticationEip4361Nonce * Web3AuthenticationNonceEntity -> Web3AuthenticationEip4361NonceEntity * Web3AuthenticationEip4361NonceEntity -> Web3AuthEip4361NonceEntity * Web3AuthNonceRepository -> Web3AuthEip4361NonceRepository * Web3NonceService -> Web3AuthEip4361NonceService * Web3NonceServiceImpl -> Web3AuthEip4361NonceServiceImpl * Add auth_eip4361 to filenames * EvmWalletAddressConvertor: tests * Web3AuthEip4361NonceRepository::consume_nonce(): implement * Web3WalletAuthenticationProvider: verify nonce * PostgresWeb3AuthNonceRepository -> PostgresWeb3AuthEip4361NonceRepository * InMemoryWeb3AuthNonceRepository -> InMemoryWeb3AuthEip4361NonceRepository * SqliteWeb3AuthNonceRepository -> SqliteWeb3AuthEip4361NonceRepository * kamu-cli: register db repos * kamu-cli: move kamu_adapter_auth_web3::register_dependencies() to configure_server_catalog() * Changes after merging * test_login_enabled_methods(): fix * Self-review * Fix typos * clippy fixes * GQL: Account::account_type(): add * Account::prepare_account_name_for_storage(): save checksummed wallet address caseness * odf::AccountID: the type as an enum (initial migration) * AccountID: as_did() -> as_did_odf() * DidPkh: implement * odf::AccountID: tests * Web3WalletAuthenticationProvider::login(): generate did:pkh: account ID * Fixes after merging * test_read_shapefile_geom(): fix test * Web3WalletAuthenticationProvider: tests * Self-review * odf::AccountID::to_stack_string(): implement * odf::AccountID::as_stack_string(): implement * ToStackString: implement * odf::AccountID::as_id_without_did_prefix(): implement * Self-review * Self-review [2] * AccountProvider: introduce * lazy_static: remove dep * GQL: Account::account_provider(): return AccountProvider enum * GQL: Eip4361AuthNonce: scalar * Web3WalletAuthenticationProviderHarness: add signature generation notes * test_signature_verified(): add * kamu-cli: register kamu_adapter_auth_web3 with kamu_adapter_oauth * AccountService: create_account() -> create_password_account() * LoginPasswordAuthProvider::login(): update comment * Web3AuthenticationEip4361Nonce: use regex * opendatafabric-metadata: add "did-pkh" feature * kamu-accounts-repo-tests: remove unused dep * kamu-cli: fix "web-ui" build * GQL: AuthMut::login(): use AccountProvider for "login_method" argument * Unittests fixes * CHANGELOG.md: update * Release (minor): 0.240.0 (#1255) * Update `sqlx` to `0.8.6`, vol.2 (#1222) * sqlx: 0.8.5 * images: sqlx-cli@0.8.5 * CHANGELOG.md: update * sqlx: 0.8.6 * sqlx: 0.8.6 [2] * images/sqlx-cli: read versions from the repo * Hotfix: `web3-wallet` authorization provider: interactive login use case support (Device Flow) (#1257) * Web3WalletAuthenticationProvider: fix device flow * Release (patch): 0.240.1 * Post-merge changes --------- Co-authored-by: Sergii Mikhtoniuk <mikhtoniuk@gmail.com> Co-authored-by: Roman Boiko <roman.bv20@gmail.com>
When creating datasets on user's behalf, ensure naming scheme for the users matches parent account name.
* Molecule API V2: scaffolding (#1454) * GQL: split Molecule into MoleculeV1 & MoleculeV2 * GQL: MoleculeMutV2 WIP * Move account_mut.rs to ./account_mut/ * GQL: Account quotas * test_search_accounts_by_name_pattern(): fix rebase collision * GQL: MoleculeDataRoomMut * GQL: MoleculeV2: announcements & whole file tweaks * GQL: MoleculeMutV2: announcements * GQL: MoleculeV2::activity() * Fixes after self review * AccountQuotasMut::set_user_level_quotas(): add doc string * Backported changes to make clippy a bit happier * Molecule API V2: reorganize module structure (#1456) * Tests: mark the current molecule tests as v1 * Modularize Molecule queries * Molecule: extract common things into common.rs * molecule/v2: create a dir * molecule/v1: create a dir * molecule_mut/: create a dir * molecule_project_v2.rs: extract * molecule_activity_event_v2.rs: extract * molecule_data_room_dataset_v2.rs: extract * molecule/: split rest entities * molecule_mut/: split rest entities * MoleculeV2::activity(): add "filters" arguments * MoleculeAnnouncementsDatasetV2::tail(): add "filters" argument * schema.gql: regenerate * chore/molecule-v2-data-room-impl (#1458) * UpdateVersionFileUseCaseHelper: extract * MoleculeDataRoomMutV2::start_upload_file(): implement * QueryService::get_changelog_projection(): implement * QueryService::get_changelog_projection(): add "hint" options * MoleculeDataRoomMutV2: polished methods expect "finish_upload_file()" * MoleculeDataRoomMutV2::finish_upload_file_new_file() * MoleculeDataRoomFinishUploadFileV2 * MoleculeDataRoomMutV2::update_file_metadata(): a try to add the second step * MoleculeDataRoomMutV2::finish_upload_file_new_file_version() * clippy fixes * schema.gql: regenerate * Fix Molecule v1 tests * Working on Molecule v2 data room api * Molecule v2 API: implement basic data room operations * Migration fix * Inlined `UpdateVersionFileUseCaseHelper` within GQL utils, as we may not use `UploadService` in the "datasets" domain. + fixed MySQL migration * Try fixing ODF code generation flow * MoleculeDataRoomMutV2::finish_upload_file_new_file_version(): use UpdateCollectionEntriesUseCase * MoleculeDataRoomMutV2::finish_upload_file_new_file(): use UpdateCollectionEntriesUseCase * MoleculeDataRoomMutV2::move_entry(): use UpdateCollectionEntriesUseCase * MoleculeDataRoomMutV2::remove_entry(): use UpdateCollectionEntriesUseCase * MoleculeDataRoomMutV2::update_file_metadata(): implement * Schema regenerated * test_molecule_v2_data_room_operations(): use pretty_assertions::assert_eq * moveEntry test * removeEntry test * updateFileMetadata test * schema.gql: regenerate --------- Co-authored-by: Sergii Mikhtoniuk <mikhtoniuk@gmail.com> Co-authored-by: Sergei Zaychenko <zaychenko.sergei@gmail.com> * Extracting Molecule service layer from GQL code: Phase 1 - Projects (#1463) * Established Molecule domain & service crates. Extracted "View Molecule Projects" use case, and plugged into v1/v2 APIs * Extracted "Find Molecule project" use case * Extracted "Create Molecule project use case". Molecule dataset snapshots, as well as generic VersionedFile/Collection dataset snapshots are now a domain, not GQL concern. * Extracted `MoleculeProjectEntity` object to use instead of untyped JSON objects at domain level. GQL objects converted to [Object], replacing [SimpleObject], and keep entity. * Improved telemetry in new Molecule service layer * Sketched MoleculeProjectMessage outbox events. For now, sending "Created" message from the corresponding use case. * Review notes fixed * Molecule APIv2: activity (#1467) * UpdateCollectionEntriesUseCaseImpl::build_data_batches(): return note * MoleculeDatasetSnapshots::data_room_v2(): create alias internally * MoleculeDatasetSnapshots::projects(): create alias internally * MoleculeDatasetSnapshots::announcements(): create alias internally * StageDataResult: update doc strings * PushIngestOpts: fix a typo * MoleculeDatasetSnapshots::global_data_room_activity() * MoleculeProjectService -> MoleculeDatasetService * MoleculeDatasetService::get_global_data_room_activity_dataset() * MoleculeProjectV2::ipnft_token_id(): use U256 * MoleculeAppendDataRoomActivityUseCaseImpl * Tests stabilized * MoleculeDatasetSnapshots::global_data_room_activity(): update comment re LIST<BYTE_ARRAY item (STRING)>) * MoleculeDataRoomMutV2::finish_upload_file_new_file_version(): write data room activity * MoleculeDataRoomMutV2::finish_upload_file_new_file(): add maintainer permissions to molecule * DatasetHandleLoader: add AccessCheckedDatasetRef-related load() method * DatasetHandleLoader: add AccessCheckedDatasetRef-related load() method + ResolvedDataset * MoleculeVersionedFile::latest(): use data loader * MoleculeViewDataRoomActivitiesUseCaseImpl * MoleculeV2::activity() * MoleculeV2::activity() * test_molecule_v2_data_room_operations(): global activity checks (part) * MoleculeProjectV2::get_data_room_activity_events(): fixed, unit-tested * clippy fixes * test_internal_error(): simplify * OperationType::deserialize(): simplify * access_level: update todo * Add todos * Consts for snapshot names * Add global prefix * Refactoring: extracting versioned file, collections into service layer (#1468) * Datasets domain use cases reorganized by folders * Extracted 'ViewCollectionEntriesUseCase' use case in datasets domain * Merge corrections * Extracted `FindCollectionEntryUseCase` * FindCollectionEntryUseCase => FindCollectionEntriesUseCase * More renaming cleanups * Listing structs: use EntityPageListing template. Introduced `CollectionPath` at datasets domain level. Domain's `ExtraDataFields` applied more systematically. * Extracted `ViewVersionedFileHistoryUseCase` * Extracted `FindVersionedFileVersionUseCase` use case * Simplified structures in `UpdateCollectionEntriesUseCase` * Cleanups in GQL adapters for collections and versioned files * Minor review * Molecule APIv2: global data room activity finalization (#1470) * Finish global data room activity * Correct project data room activities * Add disable/enable project API (#1469) * Add disable/enable project api * Fix clippy * Add comments * Fix review comments - Iter 1 * Refactor changelog entry duplication * Refactore: use GraphQLQueryRequest in tests * Add chain length asserts * Make chain search with alias parameter * Fix review comments * Fix tests * Update schema * Extracting service layer from Molecule GQL API (part 2 - data rooms) (#1472) * Molecule use cases need some folder structure too. Extracted use cases `MoleculeFindProjectDataRoomEntryUseCase` and `MoleculeViewProjectDataRoomEntriesUseCase`: those indirectly request project data rooms as a collection dataset, and map structures. A direct collection adapter talking to service layer of `kamu-datasets` (not to GQL!), with the extension seam for future federation (invoking collection entries from base GQL API remotly) * Got rid of manual DataFrame at GQL level when writing or updating file versions * `MoleculeDataRoomEntry`: simplified domain structure and GQL equivalent * Some intermediate cleanups after merge * First attempt to extract data room UPSERT use case * Upsert data room entry: returning new data room record * Extracted `MoleculeRemoveProjectDataRoomEntryUseCase` * Extracted `MoleculeMoveProjectDataRoomEntryUseCase` + aligned common parts with removals * Update metadata uses data-room level upsert use case * Telemetry cleanup * Naming cleanups * First sketch of data room outbox message: sending for move and remove * Split upsert data room entry on create and update UC, as they need to produce different outbox output * Propagating source event time for collection entry operations * Propagating system time from versioned file ingest properly * Got rid of extra ReBAC check at highest data room access point * Molecule APIv2: global/project announcements (#1471) * MoleculeDatasetService::get_global_announcements_dataset() * MoleculeCreateAnnouncementUseCase * MoleculeProjectMutV2::announcements() * format-utils crate * MoleculeAnnouncementsDatasetMutV2::create() * MoleculeViewGlobalDataRoomActivitiesUseCaseImpl: respect announcements * Molecule use cases: activity/ -> activities/ * Adaptation to the latest refactoring * MoleculeV2::activity() * MoleculeAnnouncements (project) * MoleculeProjectV2::activity(): update with announcements * MoleculeCreateAnnouncementUseCaseImpl: register * Tests fix * Tests fix [2] * Tests fix [3] * MoleculeProjectAnnouncementDataRecord: add a TODO * schema.gql: regenerate * Outbox message corrections (as needed in search prototype) * Molecule: announcements tests (#1474) * test_molecule_v2_announcements_operations(): checkpoint -- add 2 files * test_molecule_v2_announcements_operations(): checkpoint -- create empty announcement * test_molecule_v2_announcements_operations(): checkpoint -- Create an announcement with one attachment * test_molecule_v2_announcements_operations(): checkpoint -- Create an announcement with two attachments * test_molecule_v2_announcements_operations(): checkpoint -- Create an announcement with attachment DID that does not exist * test_molecule_v2_announcements_operations(): checkpoint -- Announcements are listed as expected * test_molecule_v2_announcements_operations(): finish * MoleculeAnnouncementEntry: system_time/event_time * Schema corrected * MoleculeEncryptionMetadata (#1475) * MoleculeEncryptionMetadata * schema.gql: regenerate * MoleculeDatasetSnapshots::versioned_file_v2(): remove todo * MoleculeEncryptionMetadata: extract to domain * Sanitize versioned file dataset name generation / validate `CollectionPathV2` scalar (#1480) * DatasetNameGenerator * MoleculeDataRoomMutV2::build_new_file_dataset_alias(): use DatasetNameGenerator * CollectionPathV2 (domain) * CollectionPathV2 (domain): updates * CollectionPathV2 (GQL) * CollectionPathV2 (GQL): tests * schema.gql: regenerate * kamu-datasets: remove unused dep * Test fixes after resent changes * RUSTSEC-2025-0134 * Minor dependency updates * Molecule APIv2: Basic categorical filtering (w/o unit-tests) (#1488) * ViewCollectionEntriesUseCase: support extra data filters * From<GetDataRoomCollectionEntriesFilters> for Option<kamu_datasets::ExtraDataFieldsFilter> * MoleculeDataRoomCollectionService::get_data_room_collection_entries(): add filters * MoleculeViewDataRoomEntriesUseCaseImpl: filters * MoleculeDataRoomProjection::entries(): filters * MoleculeDatasetSnapshots::global_announcements(): update SetInfo * utils::DataFrameExtraDataFieldsFilterApplier: extract * MoleculeAnnouncements::tail(): filters * GetDataRoomCollectionEntriesFilters -> GetMoleculeDataRoomCollectionEntriesFilters * A clearer separation of filter entities * MoleculeAnnouncements::tail(): filters * MoleculeProjectV2::get_data_room_activity_events(): filters * MoleculeProjectV2::get_data_room_activity_events(): filters [2] * MoleculeProjectV2::activity(): filters * MoleculeViewGlobalActivitiesUseCase: filters * schema.gql: regenerate * test_molecule_v2_activity(): start unlocking * test_molecule_v2_activity(): Activities are empty * Molecule phase 2: Extracting service layer (Part 3 - Versioned files) (#1489) * Versioned files: sketched and plugged create/update use cases * Specialized use case for update file metadata. Integrated read file version use case, and simplified read model. * Minor: versioned file API moved out of data room file * Minor: avoid cloning file info for serde * Minor: unifying arguments of update/upload use cases * Clarified access checking in versioned file use cases * Isolated versioned file content access behind a service * Avoiding ResolvedDataset and similar in Molecule domain interface * MoleculeVersionedFile::asOf supported. Drafted MoleculeVersionedFile::matching (not public) - takes the versioned file version that exactly matches data room entry. MoleculeVersionedFile::latest is correctly not reusing denromalized data, as it's not guaranteed the data room entry is the latest one. * Revised schema optionals * Spelling * Enabled MoleculeVersionedFile::matching endpoint. Optimized MoleculeVersionedfile::latest endpoint, when data room entry is also the latest, using denormalized data. * Guiding comments * Killed undesired GQL => Molecule.Services dependency * Merge corrections * [2/2] Molecule APIv2: Basic categorical filtering (w/ unit-tests) (#1492) * test_molecule_v2_activity(): Create a few versioned files * test_molecule_v2_activity(): Upload new file versions * test_molecule_v2_activity(): Link new file into the project data room -- not relevant for v2 * test_molecule_v2_activity(): Move a file (retract + append) * test_molecule_v2_activity(): Update a file (correction from-to) -- not relevant for v2 * test_molecule_v2_activity(): Create an announcement * test_molecule_v2_activity(): Upload a new file version * test_molecule_v2_activity(): Remove a file * test_molecule_v2_activity(): Check project activity events * test_molecule_v2_activity(): Create another project * test_molecule_v2_activity(): Create an announcement for the second project * test_molecule_v2_activity(): Check global activity events * test_molecule_v2_activity(): In-between activity asserts * test_gql_custom_molecule_v2: remove misleading clone() * test_molecule_v2_activity(): Filters without values * datafusion: register array functions * DataFrameExtraDataFieldsFilterApplier:: respect array columns * test_molecule_v2_activity(): Filters by tags: tag1 * test_molecule_v2_activity(): Filters by tags: [tag2] * test_molecule_v2_activity(): // Filters by tags: [tag2, tag1] * test_molecule_v2_activity(): Filters by categories: [test-category-1] * test_molecule_v2_activity(): Filters by categories: [test-category-2] * test_molecule_v2_activity(): Filters by categories: [test-category-2, test-category-1] * test_molecule_v2_activity(): Filters by access levels: [public] * test_molecule_v2_activity(): Filters by access levels: [holders] * test_molecule_v2_activity(): Filters by access levels: [public, holders] * test_molecule_v2_activity(): Filters combination: [test-tag2] AND [test-category-1] AND [holders] * test_molecule_v2_activity(): Project filters * test_molecule_v2_announcements_operations(): announcements filters * test_molecule_v2_data_room_operations(): announcements filters * Molecule Phase 2 - extract service layer (Part 4 - Activities and Announcements) * Extracted 'view project announcements' use case * Extracted "find project announcement" use case * Split Molecule dataset services * Moved most services from Molecule domain to services crate, broke all dependencies from API level * Minor cleanups * Sketched new approach of dataset accessor and used it to simplify announcements use cases for now * Same accessor approach applied to activities * Same accessor approach applied to projects dataset * Better reader/writer helpers for projects * Naming simplifications * Simplified announcements data model * Further model unifications: clear separation between changelog entry, changelog insertion record, payload record, and entity. Cleaned up event time / system time propagation in all Molecule write use cases. * Outbox events for announcements * Outbox event for activities * Activities: extracted view project activity use case, related structures cleanup * Minor correction * Schema migration prep * Fix error propagation in GQL data loaders (#1501) * Allow optional event_time in global announcements ingest * Allow optional event_time in global activity, collections, and file datasets * Molecule APIv2: Basic search (#1502) * kamu-molecule-domain: export utils as module * MoleculeSearchUseCaseImpl * molecule_extra_data_fields_filter() -> molecule_fields_filter() * GQL: MoleculeV2::search() * GQL: MoleculeV2::get_molecule_projects_mapping(): extract * extra_data_fields_filter.rs -> molecule_fields_filter.rs * MoleculeGlobalActivitiesService -> MoleculeGlobalDataRoomActivitiesService * schema.gql: regenerate * MoleculeSearchUseCaseImpl::get_global_data_room_activities_listing(): use projection * MoleculeSearchTypeInput: use Enum * MoleculeSemanticSearchFoundItem -> MoleculeSemanticSearchHit * MoleculeSearchUseCaseImpl: correct pattern & global data room projection * GQL: MoleculeV2::search(): return data room entries instead of files * test_molecule_v2_search(): Empty prompt * test_molecule_v2_search(): Prompt: "text" (files + announcement (body)) * test_molecule_v2_search(): Prompt: "tESt" (files + announcement (headline)) * test_molecule_v2_search(): Prompt: "bLaH" (only announcements (body)) * test_molecule_v2_search(): Prompt: "lain" (only files) * MoleculeSearchTypeInput: update * test_molecule_v2_search(): Filters: byIpnftUids: [PROJECT_1_UID] * test_molecule_v2_search(): Filters: byIpnftUids: [PROJECT_2_UID] * test_molecule_v2_search(): Filters: byIpnftUids: [PROJECT_2_UID, PROJECT_1_UID] * test_molecule_v2_search(): Filters: byType * test_molecule_v2_search(): Filters: byTags * test_molecule_v2_search(): Filters: byCategories * test_molecule_v2_search(): Filters: byAccessLevels * test_molecule_v2_search(): Filters combo * MoleculeSearchFilters::by_type(): represent as collection * MoleculeSearchTypeInput -> MoleculeSearchEntityKindInput * schema.gql: regenerate * Updating denormalized entry metadata is done via correction * Assert accessLevel in Molecule tests * Fix announcement record V1 compatibility * Ignore pre-migration events in project data room activity * Align search result structure with activity * Fix path validation and using V2 path in all new APIs * Fix DatasetNameGenerator * Hotfix: Molecule APIv2: announcements creation (#1519) * MoleculeDatasetSnapshots::global_announcements(): event_time should be NOT optional * MoleculeDatasetSnapshots::global_data_room_activity(): event_time should be NOT optional * MoleculeDatasetSnapshots::global_data_room_activity(): content_hash not null, content_type nullable * MoleculeCreateAnnouncementUseCase/MoleculeAppendGlobalDataRoomActivityUseCase: require event_time * Partially revert hotfix changes (#1521) * Backported common changes from ElasticSearch branch * Molecule APIv2: Data room operations: strict input validation (#1522) * MoleculeDataRoomMutV2::finish_upload_file_new_file(): check path * MoleculeDataRoomMutV2::move_entry(): check path * schema.gql: regenerate * MoleculeDataRoomMutV2::finish_upload_file_new_file_version(): check ref * schema.gql: regenerate * Run dataset stats indexer in parallel (#1509) (#1523) (#1524) * Run dataset stats indexer in parallel * Use maximum cores number for stats indexing * fix clippy * Add accout quotas (#1481) * Add account quotas * Update schema * Fix tests * Fix review comments- Iter 1 * Fix review comments - Iter 2 * Fix imports * User correct defaults * Fix test ingest * Reduce default quota * Fix revie comments - Iter 3 * resolve account id from target dataset * Skip quota checks for single tanant mode * set account quotas only for admins * Fix review comments - Iter 4 * Fix error message propagate * Add e2e quota tests * Add get quota default fallback (#1527) * Add get quota default fallback * Update schema * MInor deps + deny list + sqlx-check fix * Remove encryption_metadata from update_file_metadata method (#1531) Co-authored-by: Sergei Zaychenko <zaychenko.sergei@gmail.com> * Update file metadata method with expected head param (#1530) * Add expected head field to update file metadata methdo * Add tests * Update schema --------- Co-authored-by: Sergei Zaychenko <zaychenko.sergei@gmail.com> * Feat/1511 add bykind activity filter (#1532) * Add activity byKind filters * Refactor activy kind --------- Co-authored-by: Sergei Zaychenko <zaychenko.sergei@gmail.com> * `DatasetEntryServiceImpl::resolve_dataset_handles_by_refs()`: correct alias resolution in multi-tenant mode (#1529) * test_dataset_entry_service.rs -> test_dataset_entry_service_impl.rs * test_dataset_entry_service_impl.rs: use explicit pretty_assertions imports * test_utils::test_for_each_tenancy(): implement * test_dataset_entry_service_impl.rs: use test_for_each_tenancy() macro * test_resolve_dataset_handles_by_refs(): by ids * test_resolve_dataset_handles_by_refs(): by aliases * test_resolve_dataset_handles_by_refs(): use resolution_report * test_resolve_dataset_handles_by_refs(): by handles * test_resolve_dataset_handles_by_refs(): mixed * test_resolve_dataset_handles_by_refs(): special case for mixed aliases in multi-tenant * DatasetEntryServiceImpl::resolve_dataset_handles_by_dataset_aliases(): resolve empty account alias name * clippy fixes * Self-review --------- Co-authored-by: Sergei Zaychenko <zaychenko.sergei@gmail.com> * Remove molecule prefixes from gql args (#1534) * Remove molecule_ prefixes from gql args * Modify domain structs * Revert dataset schema changes * Add changed by param to move/remove entry methods (#1535) * Add change by for entry operations * Fix tests * Cleanup * Implement 2steps remove entry logic * Refactor diff helper method * Hotfix: use 'array_has_any' instead of 'array_has_all' to combine filters within same logical group via OR operator * `MoleculeAnnouncementEntry::attachments()`: return `[MoleculeDataRoomEntry]` not `[DatasetID]` (#1541) * FindCollectionEntriesUseCase::execute_find_by_ref(): ref as a ref not slice * MoleculeFindDataRoomEntryUseCaseImpl: update error handling * MoleculeFindDataRoomEntryUseCase::execute_find_by_refs(): impl * MoleculeDataRoomCollectionServiceImpl::find_data_room_collection_entries_by_refs(): implement * MoleculeCreateAnnouncementUseCaseImpl::validate_attachments(): corrections * MoleculeAnnouncementPayloadRecord: fix typos * MoleculeAnnouncementEntry::attachments(): return file versioned files not refs * MoleculeAnnouncementEntry::attachments(): return dataroom entries not refs * MoleculeDataRoomCollectionServiceImpl::find_data_room_collection_entries_by_refs(): restrore refs order * test_molecule_v2_activity(): update * test_molecule_v2_announcements_operations(): update * test_molecule_v2_search(): update * Self-review * test_molecule_v2_activity_change_by_for_remove(): fix * Add molecule per project access level filter (#1537) * Add molecule per project access level filter * Add access level rule deduplication logic * Make ipnftUid required field for MoleculeAccessLevelRule * Reduce clones * Molecule APIv1: feature based GQL gate (#1542) * GQL: guards module breakdown * Molecule::v1 -- feature gate * FeatureEnabledGuard: tests * Self-review * GQL: Config: add "molecule_api_v1_enabled" switch * Log incoming GraphQL requests (#1543) * test_molecule_v2_dump_dataset_snapshots(): fix global dataset names (#1548) * Cover molecule as_of gql historical data (#1547) * Molecule APIv1 switch corrections (#1549) * Regen resources * test_gql_custom_molecule_v1: fix unit-tests (DI issue) * Molecule + Elasticsearch (#1462) * Renamed existing natural language service stuff ElasticSearch first steps: - followed footsepts of natural language search components - makefile: ElasticSearch start, stop, and clean actions - achieved connectivity to started ElasticSearch cluster - achieved connectivity to self-launched child container service - test GQL endpoint, routing cluster health info for now - indexer: scaffold, no real indexing yet - search service API: scaffold, no real implementation yet Full text search: - support framework for entity schemas registration (no versioning/migrations yet) - ES: first client calls for index check, registration, documents counting - feeding schemas for accounts and datasets from corresponding domains (very simplified fields now) - entire process happens via plugin style, partiular domains register provider components that are accessed by template shared process ElasticSearch code structure extended: - separated low level operations in `ElasticSearchClient`: deals with engine connection building, sending queries in the right format, interpreting responses, and dealing with API errors - `ElasticSearchIndexMappings` handles creation of index mappings for the given schema + hashing it's content, this will be a future place to apply complex column properties depending on configs - `ElasticSearchVersionedEntityIndex` manages indexes for entities and aliases, auto-registers indcies, validates schema metadata, automatically detects drifts without version modification, automatically applies breaking or reindsable upgrades - main repository code stays at very high level Basic shape of full ElasticSearch index re-indexing + sketched simplest indexing procedure for Datasets Indexing owner-id in datasets index (for filters) ElasticSearch indexing added for Accounts Indexing creation time for Accounts/Datasets Indexing dataset documents similarly to natural language seach: added schema fields, description, keywords, and attachments More realstic field roles: hierarchical identifiers (account name, dataset name, alias, schema field), prose (description, attachments), keywords (owner_id, keyword, dataset_kind) - with corresponding analyzers and properties for ElasticSearch Added "Title" field role, which is in between Prose and Identifier, using for account's display names for now. Identifier fields get inner-ngrams (3..6) and wider edge-ngrams (2..10). Account life cycle events update ElasticSearch index: - massaged events format a bit to satisfy new needs - new outbox event handler for account search index updates - reorganized account schema code to encapsulate 1 document operations, while indexer and update handler use it's helpers - issuing bulk insert, update, delete operations in ES for account events Dummy implementation in e2e tests (until a better solution is found, as containerized ES starts for over 20s per each command, and that's affected by acocunt/dataset lifecycle events) Implemented updates to ElasticSearch for dataset-related events: lifecycle, reference update, parent account rename/delete. Fixed account deletion handler in datasets domain, no ReBAC/dangling checks should be executed during system event handling. Datasets schema: better incremental re-indexings for partial updates Hotfix: improved detection of invalid intervals in case of breaking changes in the dataset, when expected tail is ahead of head First sketching of a search function: - simplest querying: query_string vs match_all, depending if non-empty query was received - support specifying list of indexes vs defaulting to all schemas - ES: sending search request, decoding response - trivial GQL endpoint support Naive pagination support (size/from). Requesting source fields in multiple modes: None, All, Particular, Complex (include+exclude patterns) Next ElasticSearch steps: - Search schemas constants moved and published by domains, so that GQL can reference those fields. - Support flexible sorting of search results: N criterias, by field or relevance score, configurable direction. - Each schema now provides a field that can be used for universal alphabetial sorting ("title" alias) Support basic search filters (keyword = value, keyword in {values}) and compound from those (and, or, not). Added convenience macros to specify compound filters and for sort specifications. Search highlights for textual fields: displays best fragments explaining why certain document's field matched the query On-demand "explain" option: outputs low-level ElasticSearch scoring computation formula * Sketched Molecule service crate + 4 Molecule search index schemas (projects, data room entries, announcements, activitty events) * ES: support unprocessed objects field (stored, but not indexed or searched) * Schema: using UnprocessedObject for activity body JSON * Merged previous adapter into new sku/molecule/domain|services * Automatic reindexing of Molecule projects in ElasticSearch * Flakky hell on flow e2e fixed * Reacting to MoleculeProjectMessageCreated, adding new ElasticSearch document * Supporting boolean fields + added a generic banning feature for ES indices (filter is auto-attached to "read alias"). Fix: search should always be directed to "read alias", never to "writable index". * Added Molecule project banning reaction in search: handling outbox "project disabled" and "project reenabled" messages and setting "is_banned" attribute * Merge corrections * Introduced `KamuBackgroundCatalog`: catalog wrapper without user account, to use for background lazy processes like on demand search indexer. Generalized Molecule reindexing template algorithm: start from `KamuBackgroundCatalog`, attach Molecule org account as subject, initiate separate transaction, then run indexing on Molecule's account behalf. Drafted data room entries reindexing. Relaxed compatibility requirements, so that v1 data room datasets don't crash when loaded. * Forgotten fields in data rooms schema * Bulk-based indexing for projects and data room entries * Indexing speedup: loading entries from N projects in parallel * Corrections in ES client: use single bulk update operation with encoded comands, so that heterogenous ops are possible in one bulk. Simplified search context: we don't need account for now. * Incremental indexing of data rooms. * Lock correction * Sketched global indexing of announcements * Sketched announcements incremental indexing * Sketched activity full indexing * Sketched incremental indexing for activities * Indexing code restructuring and simplification * Added "activity_type" field to activity search schema * Sketched filtering latest dataset room entries via ElasticSearch query * "contentText" from versioned file entries is automatically attached to data room entry index in ElasticSearch * Drafted listing latest view of global and project announcements via ElasticSearch * Stabilized announcements filtering via ElasticSearch: - sort objects fixed - view global activities: fetching announcements via use case, to activate ElasticSearch as well * Similar changes in projects schema * Filtering global and project activities via ElasticSearch. ElasticSearch: fixed bug in schema name resolution. * Merge corrections * Implemented Molecule's search use case via ElasticSearch. + Common module for all Molecule shared search schema declarations: field names and field definitions. * Search corrections * Backported ElasticSearch-focused changes from Molecule branch * Deps correction * Simplifying renames * Unified account/dataset schemas to the style in Molecule branch * Prototyped framework for integration tests with ElasticSearch involved: - EsTestContext: main facility, lazily initializes reusable ES client - a test proc-macro hiding the plumbing of the context - each test receives a dill::Catalog prefilled with ES client, ES repository impl, with unique randomly generated index prefix - a succesful test automatically cleans it's own indices, while a failing test keeps the indices available for inspection - on first ES client initialization, the potentially abandoned test indices from previous sessions are discarded automatically - written first couple tests for Accounts indexing * Correct spelling of "Elasticsearch" brand name * ElasticSearch test group on CI/CD * More account indexing tests * Initial test suite for datasets indexing. Hardening es_client against async waiting issues: created index must be reachable, assigned alias must be reachable. Makefile: automated cleaning of abandoned test artifacts from previous experiments. Not doing this in fixtures, as `cargo nextest run` creates races around it executing every test in separate process. * Common searching harness + aligning dataset use case harness to be more pluggable * Shared fixture for account use case tests + indexing tests. Accounts indexing: testing predefined indexer * Tests: predefined datasets indexing * MT version of predefiend datasets indexing test * MT version of incremental indexing * Tests: renaming or deleting account affects index of it's datasets * Udeps fixed * Improvements and tests for detailed dataset content indexing (schema, setInfo, attachments). Not indexing default vocabulary fields in schema, as those do not contribute to search relevance. Indexing tests run with real outbox to maximize realism: forcing sync when necessary between test steps * Added basic test suide for datasets searching: checking analyzers, filters, stemmers, ... * Elasticsearch basically plugged into v2 Molecule tests, runs indexing, writes data, but is not yet queried * Stabilized Molecule Elasticsearch filter/search tests. Problem with E2E test persists for now * Temp fix: e2e + elasticsearch * Stabilized e2e vs integration test vs full manual indexing test with regards to catalogs * Removed Postgres + Elasticsearch e2e combo * Elasticsearch: abiility to setup, connect to, and test with a server using HTTPS/TLS * Elasticsearch support for access level rules * test_molecule_v2_activity_change_by_for_remove: left only SQlite version, our CI is not ready for elasticsearch+postgres combo * Search indexer config: - affects both QDrant and Elasticsearch - support "clear_on_start" and dataset indexing filters in Elasticsearch - support disabling incremental search index updates * Feature flag: enable/disable Molecule APIs to read from projection in Elasticsearch * ODF generated file fix * More odf gen corrections * More ODF codegen fix * Schema regenerated * Hotfix: do not put empty dataset documents to Elasticsearch * Hotfix: dataset search indexing might start before HEAD is written to S3, so use particular hash from outbox message * Hotfix: round 2 dataset indexing stabilization * Hotfix: handling ES 404 error properly when requesting a non-existing document by id * Add webhook trigger recovery job (#1554) * Add webhook trigger recovery job * Fix tests * Add new get all subscriptions method * Add sqlx queries * Add tracing --------- Co-authored-by: Sergei Zaychenko <zaychenko.sergei@gmail.com> Co-authored-by: Sergii Mikhtoniuk <mikhtoniuk@gmail.com> Co-authored-by: Sergei Zaychenko <szaychenko@kamu.dev> Co-authored-by: Roman Boiko <roman.bv20@gmail.com>
Simplified global item filters (security filter covers Molecule account_id filters)
* Simplified common fields generation via flags (is_banned, security, embeddings). Semantic embeddings in Molecule data room and announcement indices.. Support fuill and incremental embeddings indexing from Molecule data room entries (description, content text) and announcements (headline, body) * Implemented caching embeddings in the database and tracking stats for entity chunks and prompts. * Enabled hybrid search in Molecule use cases. Corrections: - dealing with dummy encoder not returning anything for prompts: degrading to textual search - corrected embedings cache queries - fixed error with announcements index sorting: HEADLINE needs a nested keyword field - hybrid search also needs secondary sort criterias for textual part, passing event_time DESC for Molecule case * Telemetry improvements around hybrid search * Disabled test dependencies for search cache crates * Scoring and explanations as hidden fields in Molecule search GQL API. * Improved RRF custom explanation
…ring "ipnft_uid" column
* molecule_data_room_mut_v2: helpers trivial updates * MoleculeActivityMessage::WriteRequested: added * MoleculeActivitySearchUpdater: "handle" WriteRequested message * MoleculeAppendGlobalDataRoomActivityUseCaseImpl: only request not write * MoleculeAsyncGlobalActivityWriter: initial implementation * DEVELOPER.md: typo fixes * Regen di.puml * MoleculeAppendGlobalDataRoomActivityUseCaseImpl: add "global" to the file name * MoleculeActivityMessagePublished: add "system_time" field * MoleculeAsyncGlobalActivityWriter::handle_write_requested_message(): fallback for "source_event_time" * MoleculeViewGlobalActivitiesUseCaseImpl::global_activities_from_search(): sort by "event_time" & add "total_count" fallback * MoleculeViewProjectActivitiesUseCaseImpl::global_activities_from_search(): sort by "event_time" & add "total_count" fallback * MoleculeAppendGlobalDataRoomActivityUseCaseImpl::ensure_within_quota(): add * Regen di.puml * Remove TODOs related to asynchronous writings * test_gql_custom_molecule_v2: provide es & src test variants * clippy fixes * test_molecule_v2_search: stabilize * Fix yanked crate: bytes * Fix yanked crate: ssi-core * test_competitive_writing_of_global_activities_src_multi_thread() * MoleculeAsyncGlobalActivityWriter::handle_write_requested_message(): add note re NOT concurent writes * MoleculeDataRoomActivityPayloadRecord::roughly_estimated_size_in_bytes(): extract the method * Fix yanked crate: time * Ingest: introduce "ignore_quota_check" flag to bypass quota check * Ingest: introduce "skip_quota_check" flag to bypass quota check (renamed)
* Backport: DataFusion `to_table()` * SessionContextBuilder: register ToTableUdtf * Fix DI circle dep issues * test_to_table_udtf_loaded: added * Self-review * ToTableUdtf: use "on_resolve_dataset_callback" * ToTableUdtf: replace Fut with AsyncFn * QueryServiceImpl::session_context(): add a context note * Update Cargo.lock after rebasing * Finalization: unit-tests
…be used across different networks (#1603) * Web3WalletAuthenticationProvider::login(): guarantee of uniqueness * Add migrations
…be used across different networks -- vol.2 (#1604) * DidPkhAccountIdentity::from_did_pkh(): extract & use * Fix yanked crate: lz4_flex (0.12.1) * DidPkhAccountIdentity::from_did_pkh(): return Result<Self, InternalError> * DidPkhAccountIdentity::from_did_pkh(): add a naive unit-test * test_create_wallet_accounts(): update
…be used across different networks -- vol.3 (#1608) * Add a note re the image patch version * DidPkhAccountIdentity::from_did_pkh(): cut email user prefix if needed * DidPkhAccountIdentity::from_did_pkh(): use UUIDv5 for email generation * DidPkhAccountIdentity::from_did_pkh(): update the note * kamu-accounts: remove unused dep secrecy
…ement was created (#1616) * test_molecule_v2_activity: compare file versions * test_gql_custom_molecule_v2: use into_json_data() * test_molecule_v2_activity: correct test * test_gql_custom_molecule_v2: fix other tests * test_gql_custom_molecule_v2: sort for asserts * FindCollectionEntriesUseCaseImpl::execute_find_multi_by_refs(): add "before_event_time" arg * Update tar crate version * MoleculeAnnouncementEntry::attachments(): clarify asVersionedFile usage * Address review comments * Update deny.toml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
2nd delivery includes:
moleculeGQL API group