-
Notifications
You must be signed in to change notification settings - Fork 15
docs: add basic catalog documentation #150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mspiekermann
wants to merge
1
commit into
eclipse-edc:main
Choose a base branch
from
mspiekermann:cataloc_docs
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+61
−1
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,4 +4,64 @@ description: Covers how publishing and retrieving federated data catalogs works. | |
| weight: 60 | ||
| --- | ||
|
|
||
| TDB | ||
| The Federated Catalog (FC) provides a scalable solution for metadata discovery in decentralized dataspaces. The FC employs a set of crawlers, that periodically scrape the dataspace requesting the catalog from each participant in a list of participants and consolidates them in a local cache, which provides a unified, searchable view of available data assets across the network. | ||
|
|
||
|
|
||
|
|
||
| Instead of requiring participants to query each other directly for catalog information, the FC uses a crawling and caching mechanism. It periodically collects catalog metadata from participant connectors and stores this information locally. By maintaining a cached copy of participant catalogs, the Federated Catalog enables faster and more reliable catalog queries while reducing network load and removing the need for real-time access to all participants during query execution. In this way, the FC effectively serves as a read-optimized replica of participant catalogs within a dataspace. | ||
|
|
||
|
|
||
|
|
||
| Keeping a locally cached version of every participant's catalog makes catalog queries more responsive and robust, and it can cause a reduction in network load. | ||
|
|
||
|
|
||
|
|
||
| The Federated Catalog is based on EDC components for core functionality, specifically those of the connector for extension loading, runtime bootstrap, configuration, API handling etc., while adding specific functionality using the EDC extensibility mechanism. | ||
|
|
||
|
|
||
|
|
||
| ## Main components | ||
|
|
||
| The Federated Catalog architecture consists of two main subsystems: | ||
|
|
||
| 1. **Federated Catalog Cache (FCC)**: responsible for crawling participant connectors and maintaining the aggregated catalog cache. | ||
|
|
||
| 2. **Federated Catalog Node (FCN)**: hosts a participant’s catalog and responds to catalog requests issued by external crawlers. | ||
|
|
||
| The main features of both subsystems are summarized in the table below. | ||
|
|
||
|
|
||
| | Federated Catalog Node (FCN) | Federated Catalog Cache (FCC) | | ||
| |:-----------------------------|:-------------------------------| | ||
| | Serves a catalog to dataspace participants | Caches foreign catalog entries and their policies | | ||
| | Supports policy-based queries | Provides a query interface | | ||
| | May support multiple catalog protocols | Entries can be sent to Connector instances for retrieval | | ||
| | Pluggable storage system | May support multiple catalog protocols | | ||
| | | Pluggable storage (schemeless) | | ||
|
|
||
|
|
||
| ## Important classes | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This paragraph could be removed, it's too technical and likely outdated |
||
|
|
||
| - **Crawler: a piece of software within the FC that periodically issues update-requests to other TCNs. Receives a WorkItem and executes the DSP catalog request. | ||
|
|
||
| - **WorkItem**: a unit of work (= "crawl target") for the crawler | ||
|
|
||
| - **UpdateRequest**: a DSP catalog request from the Crawler to a TCN to get that TCN's catalog | ||
|
|
||
| - **UpdateResponse**: the response to an UpdateRequest, containing one Catalog | ||
|
|
||
| - **ExecutionPlan**: defines how the crawlers should run, i.e. periodically, based on an event, etc. By default, FC runs on a periodic schedule. | ||
|
|
||
| - **ExecutionManager**: this is the central component that instantiates Crawlers and schedules/distributes the work among them. | ||
|
|
||
| - **QueryService**: a service that interprets and executes a catalog query against the cache | ||
|
|
||
| ## Deployment considerations | ||
|
|
||
| The design of the crawlers aims at being ephemeral, scalable and low-maintenance. The ultimate goal is to crawl a dataspace as fast and efficiently as possible while maintaining robustness. They are relatively dumb pieces of software: when there's work, they are instantiated, they run off and crawl. | ||
|
|
||
| The amount of crawlers is one variable that can have great influence on the performance of the FC. When there is a lot of participants, but they rarely ever update their asset catalogs, one might get away by spawning a few crawlers, and updates would trickle in at a relatively moderate pace. Lots of participants with frequent updates to their asset catalogs might warrant the additional compute cost of spinning up many crawlers. | ||
|
|
||
| Another way to tweak the performance is to change the ExecutionPlan to the needs of the dataspace. Out of the box, FC supports a periodic execution, but in some dataspace there might be additional triggers such as dataspace events or even manual triggers through a web API. | ||
|
|
||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spacing, maybe three empty lines are too much to separate paragraphs