Rewrite VoID queries to avoid QLever memory exhaustion on large datasets

## Problem

On datasets with hundreds of millions of triples (e.g. `dataset_bhi` with 686M triples), several VoID aggregation queries fail with QLever memory allocation errors even with `--memory-max-size 12G`.

QLever stores its index on disk (the OS caches it into RAM as available). The `--memory-max-size` parameter is a budget for **query processing and caching only**. On `dataset_bhi`, available memory within this budget shrank as stages progressed: 3.4 GB remained for `subjects.rq` and `object-literals.rq`, dropping to just 787 MB by `entity-properties.rq`. The cause of this shrinkage isn't confirmed — it could be cached query results from earlier stages, memory not fully released after intermediate queries, or fragmentation.

The root cause of the failures themselves is that `FILTER` expressions (`ISBLANK`, `ISLITERAL`) force QLever to materialize the full index scan row-by-row, then deduplicate in memory. Without a filter, QLever can answer `COUNT(DISTINCT)` directly from its sorted permutation indexes with minimal memory.

See #284 for the full error analysis.

### Failing queries

| Query | Pattern | Memory needed | Available |
|---|---|---|---|
| `subjects.rq` | `COUNT(DISTINCT ?s) FILTER(!ISBLANK(?s))` | 4.3 GB | 3.4 GB |
| `object-literals.rq` | `COUNT(DISTINCT ?o) FILTER(ISLITERAL(?o))` | 4.3 GB | 3.4 GB |
| `entity-properties.rq` | `COUNT(DISTINCT ?s), COUNT(DISTINCT ?o) GROUP BY ?p` | 5.5 GB | 787 MB |

### Queries that succeed on the same dataset

| Query | Pattern | Why it works |
|---|---|---|
| `properties.rq` | `COUNT(DISTINCT ?p)` — no filter | Few distinct predicates |
| `triples.rq` | `COUNT(*)` — no filter, no distinct | Just a count |
| `object-uris.rq` | `COUNT(DISTINCT ?o) FILTER(ISIRI(?o))` | Fewer distinct IRI objects fit in memory |

## Proposed fixes

### 1. Investigate and tune QLever memory settings

The `@lde/sparql-qlever` `Server` class currently only exposes `--memory-max-size`. QLever also supports `--cache-max-size` (e.g. Wikidata uses `--cache-max-size 15G` alongside `--memory-max-size 20G`). Tuning the cache cap might reserve more of the budget for query execution — but first we need to understand what's consuming the available memory as stages progress.

### 2. Rewrite `subjects.rq`: drop the `ISBLANK` filter

The VoID spec defines `void:distinctSubjects` as the number of distinct subjects — it doesn't require excluding blank nodes. Removing `FILTER(!ISBLANK(?s))` lets QLever answer directly from its SPO permutation index without materialization.

### 3. Rewrite `object-literals.rq`: compute by subtraction

Instead of `COUNT(DISTINCT ?o) FILTER(ISLITERAL(?o))`, compute:

- Total `COUNT(DISTINCT ?o)` (no filter — answered from OPS index, minimal memory)
- Minus `COUNT(DISTINCT ?o) FILTER(ISIRI(?o))` (already succeeds on large datasets)
- Minus blank node objects if needed

This avoids the `ISLITERAL` filter that forces materialization of all distinct literals.

### 4. Rewrite `entity-properties.rq`: split into two queries

The dual `COUNT(DISTINCT ?s), COUNT(DISTINCT ?o) GROUP BY ?p` requires deduplicating two columns simultaneously. Splitting into separate queries — one for `COUNT(DISTINCT ?s) GROUP BY ?p` and one for `COUNT(DISTINCT ?o) GROUP BY ?p` — halves peak memory per query.

### 5. Reorder stages

Run the most memory-intensive queries first, before other stages consume available memory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite VoID queries to avoid QLever memory exhaustion on large datasets #302

Problem

Failing queries

Queries that succeed on the same dataset

Proposed fixes

1. Investigate and tune QLever memory settings

2. Rewrite `subjects.rq`: drop the `ISBLANK` filter

3. Rewrite `object-literals.rq`: compute by subtraction

4. Rewrite `entity-properties.rq`: split into two queries

5. Reorder stages

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Query	Pattern	Memory needed	Available
`subjects.rq`	`COUNT(DISTINCT ?s) FILTER(!ISBLANK(?s))`	4.3 GB	3.4 GB
`object-literals.rq`	`COUNT(DISTINCT ?o) FILTER(ISLITERAL(?o))`	4.3 GB	3.4 GB
`entity-properties.rq`	`COUNT(DISTINCT ?s), COUNT(DISTINCT ?o) GROUP BY ?p`	5.5 GB	787 MB

Query	Pattern	Why it works
`properties.rq`	`COUNT(DISTINCT ?p)` — no filter	Few distinct predicates
`triples.rq`	`COUNT(*)` — no filter, no distinct	Just a count
`object-uris.rq`	`COUNT(DISTINCT ?o) FILTER(ISIRI(?o))`	Fewer distinct IRI objects fit in memory

Rewrite VoID queries to avoid QLever memory exhaustion on large datasets #302

Description

Problem

Failing queries

Queries that succeed on the same dataset

Proposed fixes

1. Investigate and tune QLever memory settings

2. Rewrite subjects.rq: drop the ISBLANK filter

3. Rewrite object-literals.rq: compute by subtraction

4. Rewrite entity-properties.rq: split into two queries

5. Reorder stages

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

2. Rewrite `subjects.rq`: drop the `ISBLANK` filter

3. Rewrite `object-literals.rq`: compute by subtraction

4. Rewrite `entity-properties.rq`: split into two queries