Skip to content

🌱 Force Cache Refresh when Server was not found by name.#84

Merged
guettli merged 17 commits intomainfrom
tg/refresh-stale-cache
Mar 19, 2026
Merged

🌱 Force Cache Refresh when Server was not found by name.#84
guettli merged 17 commits intomainfrom
tg/refresh-stale-cache

Conversation

@guettli
Copy link
Copy Markdown

@guettli guettli commented Mar 16, 2026

Refresh the Robot server cache once when a bare-metal node lookup by name misses in the cached list.

This fixes the case where:

  1. the cache is filled,
  2. a new bare-metal server is created,
  3. InstanceExists() looks up the server by name,
  4. the cached list is stale and would otherwise return false.

Fix: Deleting node because it does not exist in the cloud provider

Without this fix new Nodes get deleted if the baremetal node is not in the cache yet.

... node_controller.go:244] "Unhandled Error" err="error syncing '...': failed to get instance metadata for node bm-...: hcloud/instancesv2.InstanceMetadata: failed to get instance metadata: no matching server found for node 'bm-...': server not found, requeuing" logger="UnhandledError"

... event.go:389] "Event occurred" object="bm-..." fieldPath="" kind="Node" apiVersion="v1" type="Normal" reason="DeletingNode" message="Deleting node bm-...because it does not exist in the cloud provider"

Changes

  • add ServerGetListForceRefresh() to the Robot client interface
  • implement forced cache reload in the cached Robot client
  • make getRobotServerByName() retry once with a forced refresh after a cache miss
  • add a regression test covering:
    • cache filled with bm-existing
    • bm-new created afterwards
    • InstanceExists(bm-new) must return true

Upstream HCloud CCM

Related PR in upstream ccm: fix robot name lookup after stale cache miss by guettli · Pull Request #1204 · hetznercloud/hcloud-cloud-controller-manager

@guettli guettli force-pushed the tg/refresh-stale-cache branch from 9a905a4 to a8f2d65 Compare March 16, 2026 15:34
@guettli guettli force-pushed the tg/refresh-stale-cache branch from 5679a23 to 02bd962 Compare March 16, 2026 15:47
@guettli guettli force-pushed the tg/refresh-stale-cache branch from 02bd962 to 1d9d7da Compare March 16, 2026 15:47
Comment thread hcloud/util.go Outdated
Comment thread hcloud/util.go Outdated
Comment thread hcloud/util.go Outdated
Comment thread hcloud/util.go Outdated
Comment thread internal/robot/client/cache/client.go Outdated
Comment thread internal/robot/client/cache/client.go
Comment thread internal/robot/client/cache/client.go Outdated
Comment thread internal/robot/client/cache/client.go Outdated
Comment thread hcloud/util.go Outdated
Comment thread hcloud/util.go Outdated
guettli added 4 commits March 17, 2026 09:42
	// Remember this node name, so that it does not trigger a cache refresh again.
	c.NodeTriggeredForcedRefresh(string(node.Name))
@guettli guettli requested a review from janiskemper March 17, 2026 09:04
Comment thread hcloud/instances_test.go Outdated
}
}

func TestInstances_InstanceExistsRobotServerRepeatedMissingNameSkipsForceRefresh(t *testing.T) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this test

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this comment:

	// If a node name is not in the cache, then only on the time the cache should be refreshed. A
	// second time (during the time of CACHE_TIMEOUT), the unknown node name should not trigger a
	// cache refresh again.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you just check twice the same thing. How does that fit to the description? I still don't understand it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janiskemper I updated the test and added coments: 4cfba46

return c.ServerGetList()
}

c.m = nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment that you do this in order to delete the cache and to call the actual API?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

part of 57538d0

Comment thread internal/robot/client/cache/client.go
Comment thread internal/robot/client/cache/client.go
Comment thread internal/robot/client/cache/client.go Outdated

// nodeTriggeredForcedRefresh records that nodeName already triggered a forced
// refresh.
func (c *cacheRobotClient) nodeTriggeredForcedRefresh(nodeName string) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you inline this function? I find it confusing that it is not. AFAIK it's only used in one place.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

part of 57538d0

@guettli guettli requested a review from janiskemper March 19, 2026 07:57
@guettli guettli merged commit 12329f5 into main Mar 19, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants