Skip to content

CAMEL-23267: Use a LinkedHashMap as inProgressRepository for file com…#22311

Open
thomas-gantenbein-tga wants to merge 1 commit intoapache:mainfrom
thomas-gantenbein-tga:feature/CAMEL-23267_replace-lrucache-filecomponent
Open

CAMEL-23267: Use a LinkedHashMap as inProgressRepository for file com…#22311
thomas-gantenbein-tga wants to merge 1 commit intoapache:mainfrom
thomas-gantenbein-tga:feature/CAMEL-23267_replace-lrucache-filecomponent

Conversation

@thomas-gantenbein-tga
Copy link
Copy Markdown
Contributor

Description

To reduce cpu load when many files are processed, let's use a LinkedHashMap instead of SimpleLRUCache as the default inProgressRepository for the file component. See https://issues.apache.org/jira/browse/CAMEL-23267 and #camel > file consumer does not release items in SimpleLRUCache

Copy link
Copy Markdown
Contributor

@gnodet gnodet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code on behalf of Guillaume Nodet

Overview

This PR addresses a real performance bug: SimpleLRUCache.remove() doesn't clean up its internal lastChanges deque, so after ~100K add/remove cycles the deque fills with ghost entries and compressChangesIfNeeded() iterates the entire deque on every put(), causing high CPU load. The fix replaces the default inProgressRepository backing store for the file component with a LinkedHashMap using insertion-order eviction.

Verdict: Request changes

Issues to address

  1. Test assertion gap — see inline comment.

  2. getMaxCacheSize() returns 0 — The new factory method doesn't call setCacheSize(), so getMaxCacheSize() (a JMX-exposed @ManagedAttribute) will return 0 for repositories created with this method. See inline comment.

Questions / discussion points

  1. Root cause not fixed — The actual bug is in SimpleLRUCache.remove() not cleaning up lastChanges. This PR works around it for one caller. Should we fix SimpleLRUCache directly instead? That would fix all callers without needing a new factory method. Other components (AWS2S3Endpoint, IBMCOSEndpoint) use the same MemoryIdempotentRepository.memoryIdempotentRepository() pattern for their in-progress repositories and have the same issue.

  2. Thread safety note (pre-existing, non-blocking) — getCacheSize() calls cache.size() without acquiring cacheAndStoreLock, and doStop() calls cache.clear() without the lock. With SimpleLRUCache (backed by ConcurrentHashMap) this was benign, but with a plain LinkedHashMap these become data races. Not introduced by this PR, but worth being aware of.

}
for (int i = 0; i < cacheSize - entriesNotFittingInRepository; i++) {
assertFalse(repository.contains(String.valueOf(i)), "Repository should not contain entry " + i);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop condition i < cacheSize - entriesNotFittingInRepository equals i < 1, so this only checks that entry "0" was evicted. It should verify all 4 evicted entries (0, 1, 2, 3) are absent.

Suggested change
}
for (int i = 0; i < entriesNotFittingInRepository; i++) {

protected boolean removeEldestEntry(Entry<String, Object> eldest) {
return size() > cacheSize;
}
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This factory method passes the map to the MemoryIdempotentRepository(Map) constructor, which does not set the cacheSize field. As a result, getMaxCacheSize() (a JMX-exposed @ManagedAttribute) will return 0.

Consider setting it:

public static IdempotentRepository memoryIdempotentRepositoryInsertionOrder(int cacheSize) {
    LinkedHashMap<String, Object> map = new LinkedHashMap<>() {
        @Override
        protected boolean removeEldestEntry(Entry<String, Object> eldest) {
            return size() > cacheSize;
        }
    };
    MemoryIdempotentRepository answer = new MemoryIdempotentRepository(map);
    answer.setCacheSize(cacheSize);
    return answer;
}

@gnodet
Copy link
Copy Markdown
Contributor

gnodet commented Mar 28, 2026

Claude Code on behalf of Guillaume Nodet

Follow-up: What about fixing SimpleLRUCache directly?

The root cause

SimpleLRUCache.remove(Object key) (line 183) does delegate.remove(key) but doesn't decrement totalChanges and doesn't remove the key's entry from lastChanges. So the deque accumulates "ghost" entries — keys that were added and then removed from the delegate but still sit in the deque.

When totalChanges exceeds 2 * maximumCacheSize (100,000 for the file component), compressChangesIfNeeded() fires on every subsequent put(). It iterates the entire deque, deduplicates by key, but ghost entries for removed keys still survive deduplication (they're unique keys). The deque never shrinks below the number of distinct ghost keys.

What a fix would look like

  1. Decrement totalChanges in remove() to prevent unbounded growth:
@Override
public V remove(Object key) {
    ValueHolder<V> removed = delegate.remove(key);
    if (removed != null) {
        totalChanges.decrementAndGet();
    }
    return extractValue(removed);
}
  1. Filter out ghost entries in compressChangesIfNeeded():
private void compressChangesIfNeeded() {
    if (isQueueFull()) {
        Deque<Entry<K, ValueHolder<V>>> newChanges = new ConcurrentLinkedDeque<>();
        Deque<Entry<K, ValueHolder<V>>> currentChanges = lastChanges.getAndSet(newChanges);
        Set<K> keys = new HashSet<>();
        Entry<K, ValueHolder<V>> entry;
        while ((entry = currentChanges.pollLast()) != null) {
            // skip ghost entries for keys no longer in the cache
            if (keys.add(entry.getKey()) && delegate.containsKey(entry.getKey())) {
                newChanges.addFirst(entry);
            }
        }
        totalChanges.set(newChanges.size());
    }
}

Benefits vs. drawbacks

Fix SimpleLRUCache This PR's LinkedHashMap approach
Scope Fixes all callers (file, AWS S3, IBM COS, any future use) Only fixes the file component
Complexity Small, targeted change in existing code Adds a new factory method + new code path
LRU semantics Preserves LRU eviction (access-order) Switches to insertion-order eviction (FIFO)
Thread safety Already thread-safe (ConcurrentHashMap + locks) Relies on MemoryIdempotentRepository's lock; getCacheSize()/doStop() are unguarded
Risk Higher — SimpleLRUCache is used broadly, subtle concurrency bugs possible Lower — isolated change, only affects file component default
Performance Slightly heavier (deque + ConcurrentHashMap overhead) Lighter (plain LinkedHashMap, no deque)

Recommendation

Ideally do both: fix SimpleLRUCache because the ghost entry bug is a real defect that will bite other callers (AWS S3 and IBM COS endpoints have the same in-progress repository pattern). But also keep the LinkedHashMap approach for the in-progress repository — it's simpler, lighter, and a better semantic fit since in-progress files don't need LRU reordering (they're added, processed, then removed). The SimpleLRUCache fix could be tracked as a separate JIRA ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants