Improve plugin documentation (second batch)#987
Conversation
* develop: (55 commits) Update @Schema annotations to not use deprecated attributes. Make sure that Sources are closed, even if they fail. Added Codec where it was missing. - Added WorkbenchConfig.dataIntegrationUrl analog to dataManagerUrl and dataPlatformUrl - Renamed `dataplatformUrl` to `dataPlatformUrl` Add localBaseUrl to WorkbenchConfig Heading instead of emphasis. Remove duplicated endpoint and replace usages in UI code JSON dataset: `#arrayText` on non-existing properties now returns empty result instead of empty array string. fix module import for webpack refactor: update metadata type from IMetaData to IMetadata in DatasetClearButton install yarn dependencies with frozen lockfile Move clear dataset function into correct place build frontend code first, only call tests if it succeeded fetch explicitly submodules and re-order frontend/backend parts, group them together move rootDir config to jest config file try different approach to call command in workspace folder use correct file name for config script do not use TypeScript for jest config add ts-node package move arg to correct place ...
FeedbacksAlignmentGoal is clear (serialize But how to use ? A Reference links are redundant with the introduction (exact same links in Overview) RDF File Dataset4.1 File size check
What is the value, to be specific ? In Memory DatasetNice use cases , examples and explanations Overall LGTM |
That's configurable in the |
That was my writing style. At the beginning, those links are mentioned, at the end, they're just references. So yes, it's "redundant", in the usual "bibliographic" sense (those anchors are used in the text, not dangling links). |
Could be useful to mention this parameter in the doc (as well as the default value) |
Silk — Improve plugin documentation (second batch)
https://jira.eccenca.com/browse/CMEM-7013
This PR adds documentation for the following Silk dataset plugins:
<Cell>-level structure and optional relation/measure).RdfFileDataset.md
RDF file reads RDF data from a local file (or ZIP archive) into the project as an in-memory dataset and, for supported formats, can also write RDF back to a file.
The doc starts with the intended usage window (small/medium files, snapshots for exploration/mapping/linking, simple export) and immediately flags the hard constraint: everything is loaded in memory, so very large files belong in an external store. Then it walks the data shape and IO story: single file vs ZIP input (plus the regex gate for which ZIP entries are considered), dataset output as queryable graph(s), and the graph-selection rule (named graph only where the chosen format supports it; otherwise default graph, with the graph parameter ignored for graph-less formats). Configuration notes focus on how to think, not just what to fill: file/ZIP behavior, format auto-detection (and the “can’t detect → error” path), the write restriction (only N-Triples as output), advanced narrowing via an entity list, and ZIP file filtering via regex. Behavior is described as a sequence you can predict: size check → parse into an in-memory dataset (default + possibly named graphs) → select graph → serve repeated reads from memory until the underlying file timestamp changes → reload on next access → write path serializes as N-Triples only. It ends with limitations + “when to use” guidance and concrete examples (simple Turtle, N-Quads with an explicit graph, ZIP with multiple RDF files).
InMemoryDataset.md
In-memory dataset is a small embedded RDF store that keeps all data in memory and exposes it via SPARQL as a temporary working graph inside workflows.
The doc frames it as a deliberately non-persistent scratch graph: one in-memory RDF model, all reads and writes mediated through a SPARQL endpoint, and an empty state after application restart. Within workflows it’s explicitly bidirectional—usable as both source and sink—so upstream components can write entities/links/triples into it and downstream components query it like a normal SPARQL dataset (entity retrieval, path/type discovery, sampling, etc.), with no file backing at all. Writing is explained by sink type but unified in effect: entity sink converts entities to triples, link sink writes link triples, triple sink adds triples directly; all converge into the same single in-memory graph. The one configuration knob (“Clear graph before workflow execution”, default true) is treated as the semantic switch: either a fresh empty graph per run, or a longer-lived in-memory graph across runs within the same process. Limitations are stated as operational consequences (memory-bound, no persistence, best for small/medium intermediates and prototyping) and the examples reinforce the intended patterns: temporary integration graph, scratch experimentation area, small lookup store.
AlignmentDataset.md
Alignment is a write-only dataset that exports link results as Alignment files following the AlignAPI format specification (and the SWJ60 description).
The doc keeps scope tight from the start: it exists to serialize links between entities in a standardized alignment format, not to read entities, run transformations, or do extra processing. It motivates the shape via separation of concerns and interoperability: a focused exporter that produces files consumable by alignment-aware tooling and usable in subsequent workflows. The core mechanics are explained at the link-record level—each link becomes one
<Cell>with explicit source URI, target URI, optional relation (e.g.,=), and an optional confidence measure (0.0–1.0)—and the plugin is responsible for emitting a well-formed file (structure, header/footer, UTF-8). A minimal example anchors how multiple links map to multiple<Cell>entries, and the references section points to the AlignAPI format spec and the SWJ60 paper for full semantics and edge details.