fix(security): harden hbbs/hbbr against unauthenticated abuse and state exhaustion#642
fix(security): harden hbbs/hbbr against unauthenticated abuse and state exhaustion#642ssh4net wants to merge 3 commits intorustdesk:masterfrom
Conversation
Update Cargo.lock with multiple dependency bumps and additions, update the hbb_common submodule, and adjust source code across src/common.rs, src/database.rs, src/peer.rs, src/relay_server.rs and src/rendezvous_server.rs to match the updates. Commit also updates the local SQLite DB (db_v2.sqlite3) and adds a new integration test (tests/server_protection_process.rs) to exercise server protection behavior. These changes combine dependency maintenance with corresponding code and test updates to ensure compatibility and improved coverage. Signed-off-by: Vlad <shaamaan@gmail.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughAdds per-IP connection and UDP rate limiting, trusted-proxy header handling, peer/relay capacity controls with eviction/pruning, license/key validation across rendezvous and relay flows, database peer retention controls, protection stats API, and extensive unit and integration tests. Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant ConnHandler as Connection<br/>Handler
participant RateLimiter as Rate Limiter<br/>(common.rs)
participant Cache as Peer Cache
participant DB as Database
Client->>ConnHandler: Incoming TCP/UDP request
ConnHandler->>RateLimiter: allow_connection_from_ip(addr)<br/>or allow_udp_packet_from_ip(addr)
alt Rate limit exceeded
RateLimiter-->>ConnHandler: false (blocked)
ConnHandler-->>Client: drop/refuse
else Rate limit OK
ConnHandler->>Cache: get_or_for_registration(id, ip)
alt Pending registrations exceed per-IP limit
Cache-->>ConnHandler: None (TOO_FREQUENT)
ConnHandler-->>Client: error response
else Within limits
Cache->>Cache: prune_cache_for_insert()
Cache->>DB: insert_peer(id, ...)
alt Peer count at cap and retention enabled
DB->>DB: prune_old_peer_records()
DB->>DB: recheck peer count
end
alt Still at peer capacity
DB-->>Cache: PeerLimitReached
ConnHandler-->>Client: PEER_LIMIT_REACHED
else Capacity available
DB-->>Cache: Inserted(guid)
ConnHandler-->>Client: success response
end
end
end
sequenceDiagram
actor Client
participant ReqHandler as Relay<br/>RequestHandler
participant RateLimiter as Rate Limiter<br/>(common.rs)
participant RelayQueue as Pending Relay Queue
participant DB as Database
Client->>ReqHandler: relay_request(stream, key, ip)
ReqHandler->>RateLimiter: validate license/server key
alt License key mismatch
ReqHandler->>ReqHandler: send_relay_refuse("Key mismatch")
ReqHandler-->>Client: refusal
else License key valid
ReqHandler->>RelayQueue: prune_expired_pending_relays()
RelayQueue->>RelayQueue: compute pending_count(global)<br/>and pending_count(per_ip)
alt Global or per-IP cap exceeded
RelayQueue-->>ReqHandler: limit reason
ReqHandler->>ReqHandler: send_relay_refuse(reason)
ReqHandler-->>Client: refusal
else Capacity available
RelayQueue->>RelayQueue: insert PendingRelay{stream, ip, created_at}
ReqHandler->>DB: pair with waiting peer stream
ReqHandler-->>Client: relay forwarding
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/database.rs (1)
153-178:⚠️ Potential issue | 🟠 MajorMake the peer-cap check atomic with the insert.
This is
count -> maybe prune -> count -> insertacross separate statements, andpeer_count()grabs a fresh pooled connection each time. Concurrent registrations, or another server process writing the same SQLite file, can pass both checks and still insert pastMAX_TOTAL_PEER_RECORDS, so the new cap is bypassable under load.Suggested direction
- if peer_limit_reached(self.peer_count().await?, self.max_total_peers) { - if peer_retention_prune_enabled(self.peer_record_retention_days) { - let deleted = self.prune_old_peer_records().await?; - if deleted > 0 { - crate::common::record_protection_event("peer_records_pruned"); - log::info!("pruned {} old peer records before inserting {}", deleted, id); - } - } - } - if peer_limit_reached(self.peer_count().await?, self.max_total_peers) { + let mut conn = self.pool.get().await?; + sqlx::query("BEGIN IMMEDIATE") + .execute(conn.deref_mut()) + .await?; + + let total = sqlx::query!("select count(*) as count from peer") + .fetch_one(conn.deref_mut()) + .await? + .count as i64; + + if peer_limit_reached(total, self.max_total_peers) { + if peer_retention_prune_enabled(self.peer_record_retention_days) { + let cutoff = peer_retention_cutoff_arg(self.peer_record_retention_days); + let deleted = sqlx::query("delete from peer where created_at < datetime('now', ?)") + .bind(cutoff) + .execute(conn.deref_mut()) + .await? + .rows_affected(); + if deleted > 0 { + crate::common::record_protection_event("peer_records_pruned"); + log::info!("pruned {} old peer records before inserting {}", deleted, id); + } + } + } + + let total = sqlx::query!("select count(*) as count from peer") + .fetch_one(conn.deref_mut()) + .await? + .count as i64; + if peer_limit_reached(total, self.max_total_peers) { crate::common::record_protection_event("peer_limit_reached"); log::warn!("peer record limit reached, rejecting new peer {}", id); + sqlx::query("ROLLBACK").execute(conn.deref_mut()).await?; return Ok(InsertPeerResult::PeerLimitReached); } let guid = uuid::Uuid::new_v4().as_bytes().to_vec(); sqlx::query!( "insert into peer(guid, id, uuid, pk, info) values(?, ?, ?, ?, ?)", @@ - .execute(self.pool.get().await?.deref_mut()) + .execute(conn.deref_mut()) .await?; + sqlx::query("COMMIT").execute(conn.deref_mut()).await?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/database.rs` around lines 153 - 178, Wrap the count->prune->count->insert sequence in a single DB transaction using the same pooled connection so the peer-cap check is atomic: acquire a connection once (let conn = self.pool.get().await?), begin a transaction with an immediate/serializable lock (use sqlx transaction on that conn), then call peer_count (or replace it with an inline SELECT COUNT(*)) and prune_old_peer_records using that same conn (modify prune_old_peer_records to accept &mut conn or run its SQL inline inside the transaction), re-check peer_limit_reached against self.max_total_peers, and only then perform the insert into peer and commit the transaction; if limit reached, rollback and return InsertPeerResult::PeerLimitReached. Ensure all queries use the transaction/conn rather than grabbing fresh connections.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/common.rs`:
- Around line 188-205: The map currently allocates an entry for every new source
IP (via CONNECTION_RATE_LIMITS and the key variable), which allows memory
exhaustion; before inserting a new ConnectionRateEntry, enforce a global cap
(e.g., MAX_CONNECTION_RATE_ENTRIES) by checking lock.len() and pruning stale
entries (reuse prune_connection_rate_limits) and if still at capacity evict the
oldest entry (compare last_seen_at) or simply treat the unseen IP as
rate-limited and return false instead of inserting; update the logic around
lock.entry(key).or_insert(...) to only create the entry when under the cap and
to perform eviction via last_seen_at on CONNECTION_RATE_LIMITS when needed.
- Around line 272-281: apply_trusted_proxy_addr currently rewrites trusted-proxy
peer addresses to SocketAddr::new(ip, 0), collapsing all clients behind the same
public IP into one synthetic address and causing collisions where tcp_punch
(keyed by SocketAddr in src/rendezvous_server.rs) overwrites concurrent
connections. Change apply_trusted_proxy_addr so that when a forwarded_ip is
present it preserves the original connection's port (use addr.port()) instead of
0 (i.e., construct the new SocketAddr from the parsed ip and the original addr's
port), keeping the per-connection discriminator while still using
parse_forwarded_ip and trust_proxy_headers().
In `@src/peer.rs`:
- Around line 214-231: The current flow in get_or_for_registration lets multiple
concurrent callers bypass MAX_PENDING_REGISTRATIONS_PER_IP because
can_cache_pending_registration() inspects state before prune_cache_for_insert()
and before acquiring the write lock; fix by performing the pending-registration
count and pruning while holding the same write lock used to insert the
placeholder: acquire the write lock (the same map.write() used later), call
prune_cache_for_insert() or inline its logic under that lock, recompute the
per-IP pending count, and only insert the new Arc<RwLock<Peer>> if the count is
below MAX_PENDING_REGISTRATIONS_PER_IP; if the cap is reached return None.
Ensure you still short-circuit and return an existing peer when w.get(id) is
present after acquiring the write lock.
In `@src/relay_server.rs`:
- Around line 396-404: The WS listener currently calls allow_connection_from_ip
for "hbbr-ws" using the raw peer addr returned by listener2.accept(), which
rate-limits the reverse proxy rather than the real client when
TRUST_PROXY_HEADERS is enabled; change the logic so that handle_connection (or
the WebSocket handshake in make_pair) performs the allow_connection_from_ip
check using the reconstructed client IP when proxy headers are trusted, rather
than using the listener peer addr. Concretely, remove or bypass the
allow_connection_from_ip call in the listener2.accept() branch and ensure
make_pair or the handshake path invokes allow_connection_from_ip("hbbr-ws",
client_ip) after extracting and validating proxy headers (respecting
TRUST_PROXY_HEADERS) so quotas are applied per real client IP.
In `@src/rendezvous_server.rs`:
- Around line 304-312: The rate-limit check currently performed right after
listener3.accept() using allow_connection_from_ip("hbbs-ws", addr) applies the
proxy IP rather than the real client IP and should be moved or adjusted so it
uses the post-proxy/forwarded client address; update the logic to perform the
allow_connection_from_ip call inside handle_listener/handle_listener_inner after
apply_trusted_proxy_addr() has resolved the real client IP (or, if
TRUST_PROXY_HEADERS is enabled, extract the forwarded address first and pass
that IP to allow_connection_from_ip) so per-client buckets are enforced
correctly instead of rate-limiting the proxy.
In `@tests/server_protection_process.rs`:
- Around line 52-80: reserve_hbbs_port and reserve_hbbr_port race because they
bind sockets, drop them, and return only port numbers; another process can claim
the ports before the child is spawned. Fix by changing these helpers to keep the
actual bound sockets alive until the child process is started: in
reserve_hbbs_port return the bound TcpListener(s) and UdpSocket (or a struct
holding them) instead of u16, and in reserve_hbbr_port return the bound
TcpListener(s); then pass those open descriptors (or rely on inheritable fds)
into the child spawn so the ports remain reserved until the child binds them.
Ensure functions named reserve_hbbs_port and reserve_hbbr_port are updated
accordingly and all call sites are adjusted to accept and hold the returned
listeners/sockets until after child startup.
---
Outside diff comments:
In `@src/database.rs`:
- Around line 153-178: Wrap the count->prune->count->insert sequence in a single
DB transaction using the same pooled connection so the peer-cap check is atomic:
acquire a connection once (let conn = self.pool.get().await?), begin a
transaction with an immediate/serializable lock (use sqlx transaction on that
conn), then call peer_count (or replace it with an inline SELECT COUNT(*)) and
prune_old_peer_records using that same conn (modify prune_old_peer_records to
accept &mut conn or run its SQL inline inside the transaction), re-check
peer_limit_reached against self.max_total_peers, and only then perform the
insert into peer and commit the transaction; if limit reached, rollback and
return InsertPeerResult::PeerLimitReached. Ensure all queries use the
transaction/conn rather than grabbing fresh connections.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: aee2f984-90d0-4037-9ee6-eebb7a48e720
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
db_v2.sqlite3libs/hbb_commonsrc/common.rssrc/database.rssrc/peer.rssrc/relay_server.rssrc/rendezvous_server.rstests/server_protection_process.rs
Introduce MAX_CONNECTION_RATE_ENTRIES with a default and enforce a cap on connection-rate entries: evict the oldest entry when the cap is reached, record eviction/rejection protection events, and expose the cap in protection summaries. Fix forwarded IP port handling and add websocket-forwarded rate-limit checks in relay and rendezvous servers. Refactor peer caching insertion: compute eviction candidates, remove excess peers before inserting pending registrations, and enforce per-IP pending registration limits without the previous helper. Update select_peer_ids_to_evict usage and minor formatting/await handling. Revamp test process spawn helpers: replace single-port reservation with retrying pick_*_port_candidate functions, make spawn_hbbs/spawn_hbbr return the chosen port and child, preserve temp dirs on certain errors, and add/adjust tests (including a new test to assert connection-rate entry bounding). Miscellaneous whitespace, logging, and small API-call formatting improvements across files. Signed-off-by: Vlad <shaamaan@gmail.com>
There was a problem hiding this comment.
🧹 Nitpick comments (3)
src/peer.rs (3)
271-297:prune_cache_for_inserthas a TOCTOU gap between snapshot and removal.This method takes a read lock to build the snapshot, releases it, then acquires a write lock to remove entries. Between these operations, the map state could change (entries added/removed by concurrent calls). While this is less critical here because:
- It's called from
get()which is a read path- Worst case is slightly more or fewer evictions than intended
The
get_or_for_registrationmethod correctly holds the write lock throughout, which is the critical path for pending registration limits.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/peer.rs` around lines 271 - 297, prune_cache_for_insert has a TOCTOU window: it builds a snapshot under a read lock then releases it and later takes a write lock to remove peers, allowing concurrent map changes between snapshot and removal; fix by performing eviction decision while holding the write lock (i.e., acquire self.map.write().await before computing entries and calling select_peer_ids_to_evict) or, if you must keep the read snapshot, re-check/compute entries again after acquiring the write lock and only remove IDs that still exist and meet eviction criteria; reference prune_cache_for_insert, select_peer_ids_to_evict, and self.map (and mirror the write-lock approach used in get_or_for_registration).
227-238: Potential performance concern with async read locks inside write lock scope.Within the write lock scope (
let mut w = self.map.write().await), the code iterates over all peers and callspeer.read().awaiton eachLockPeer. This holds the map write lock while waiting on potentially many individual peer read locks. Under high load with many cached peers, this could cause contention.Consider whether extracting only the needed fields (ip, guid, last_reg_time) into a snapshot structure before eviction analysis could reduce lock hold time. However, given the security-focused nature of this PR and the existing pattern in
prune_cache_for_insert, this may be acceptable for now.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/peer.rs` around lines 227 - 238, The current loop holds self.map.write().await while awaiting peer.read().await for each LockPeer, causing contention; change to first capture a snapshot of the peer handles (clone the peer_id and Arc/handle to LockPeer from self.map) while holding the write lock, then immediately drop the write lock and iterate over that snapshot performing peer.read().await to extract ip, guid and last_reg_time and build PeerEvictionEntry and pending_ids_for_ip; this mirrors the behavior in prune_cache_for_insert and ensures the write lock is not held during per-peer async reads.
79-89: Consider acceptingmax_pending=0as "unlimited" or documenting the behavior.When
max_pendingis 0,pending_registration_limit_exceededalways returnsfalse(no limit), which seems intentional. However,env_usize_orfilters out 0 values via.filter(|value| *value > 0), so an operator cannot explicitly setMAX_PENDING_REGISTRATIONS_PER_IP=0to disable the limit. This is fine if 0 means "use default", but may be surprising if an operator wants to disable the feature entirely.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/peer.rs` around lines 79 - 89, The current behavior treats MAX_PENDING_REGISTRATIONS_PER_IP=0 as "use default" because env_usize_or filters out zero, while pending_registration_limit_exceeded already treats 0 as "unlimited"; either make zero explicitly settable or document it. To fix, update env_usize_or (or add env_usize_or_allow_zero) to stop filtering out zero so a parsed 0 is returned, and keep pending_registration_limit_exceeded as-is (it will then correctly treat 0 as unlimited), or alternatively add a comment on pending_registration_limit_exceeded explaining that max_pending==0 means unlimited and leave env_usize_or unchanged if you prefer default-on-missing semantics. Ensure changes reference env_usize_or and pending_registration_limit_exceeded so reviewers can find the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/peer.rs`:
- Around line 271-297: prune_cache_for_insert has a TOCTOU window: it builds a
snapshot under a read lock then releases it and later takes a write lock to
remove peers, allowing concurrent map changes between snapshot and removal; fix
by performing eviction decision while holding the write lock (i.e., acquire
self.map.write().await before computing entries and calling
select_peer_ids_to_evict) or, if you must keep the read snapshot,
re-check/compute entries again after acquiring the write lock and only remove
IDs that still exist and meet eviction criteria; reference
prune_cache_for_insert, select_peer_ids_to_evict, and self.map (and mirror the
write-lock approach used in get_or_for_registration).
- Around line 227-238: The current loop holds self.map.write().await while
awaiting peer.read().await for each LockPeer, causing contention; change to
first capture a snapshot of the peer handles (clone the peer_id and Arc/handle
to LockPeer from self.map) while holding the write lock, then immediately drop
the write lock and iterate over that snapshot performing peer.read().await to
extract ip, guid and last_reg_time and build PeerEvictionEntry and
pending_ids_for_ip; this mirrors the behavior in prune_cache_for_insert and
ensures the write lock is not held during per-peer async reads.
- Around line 79-89: The current behavior treats
MAX_PENDING_REGISTRATIONS_PER_IP=0 as "use default" because env_usize_or filters
out zero, while pending_registration_limit_exceeded already treats 0 as
"unlimited"; either make zero explicitly settable or document it. To fix, update
env_usize_or (or add env_usize_or_allow_zero) to stop filtering out zero so a
parsed 0 is returned, and keep pending_registration_limit_exceeded as-is (it
will then correctly treat 0 as unlimited), or alternatively add a comment on
pending_registration_limit_exceeded explaining that max_pending==0 means
unlimited and leave env_usize_or unchanged if you prefer default-on-missing
semantics. Ensure changes reference env_usize_or and
pending_registration_limit_exceeded so reviewers can find the code.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7ec488ab-b976-4661-8692-62aaf30bdb07
📒 Files selected for processing (5)
src/common.rssrc/peer.rssrc/relay_server.rssrc/rendezvous_server.rstests/server_protection_process.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- src/common.rs
- tests/server_protection_process.rs
Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Summary
This PR hardens
rustdesk-serveragainst malicious clients and generic internet abuse.The main goal is to treat clients as a possible attack vector, not only the server as a trust root for clients. The changes focus on:
This PR also includes the required shared protocol update in
hbb_commonso client and server use the same keyed request fields.Why
Before this change, several server paths were still too permissive:
hbbsdeployments enforced the server key for punch-hole requests, but not for all registration or presence pathshbbrcould be abused to accumulate large numbers of half-open relay requestsThat made the server vulnerable to:
What changed
1. Keyed auth on
hbbscontrol-plane requestsVulnerability:
hbbsrequests still worked without the configured server keyFix:
RegisterPeerRegisterPkOnlineRequestTestNatRequestLICENSE_MISMATCHforRegisterPkwhere the client benefits from a clear failure reasonImplementation notes:
licence_keyto the relevant rendezvous protocol messages2. Relay half-open abuse caps in
hbbrVulnerability:
Fix:
Key mismatchrefusal on relay auth failure3. Do not trust forwarded IP headers by default
Vulnerability:
X-Real-IP/X-Forwarded-Forfrom arbitrary clients, weakening rate limits, logging, and abuse controlsFix:
TRUST_PROXY_HEADERSis explicitly enabledX-Forwarded-Forhandling to the first address only4. Bound punch-hole request tracking
Vulnerability:
Fix:
5. Registration abuse reduction
Vulnerability:
Fix:
RegisterPeerin addition to existingRegisterPkprotections6. In-memory peer cache growth controls
Vulnerability:
Fix:
7. Persistent peer record cap and retention pruning
Vulnerability:
Fix:
MAX_TOTAL_PEER_RECORDSPEER_RECORD_RETENTION_DAYSAdditional fix:
RegisterPknow returns the real backend result instead of always reportingOKPEER_LIMIT_REACHEDcorrectly on the wire8. Listener-level rate limiting
Vulnerability:
Fix:
hbbsmain TCP listenerhbbsNAT/test listenerhbbswebsocket listenerhbbsUDP receive loophbbrTCP listenerhbbrwebsocket listener9. Operator-facing protection stats
Problem:
Fix:
protection-stats/psadmin outputTests
Added process-level regression coverage in
tests/server_protection_process.rsfor:hbbsadminpshbbradminpshbbsOnlineRequestkeyed auth enforcementhbbsRegisterPkkeyed auth enforcementhbbrrelay key mismatch refusalhbbspersistent peer cap behaviorhbbsstale peer retention pruningValidated with:
cargo test --test server_protection_process -- --test-threads=1Security impact
This PR does not try to solve every operational abuse scenario, but it materially improves the default security posture for self-hosted RustDesk deployments:
Notes
This PR depends on the matching shared
hbb_commonprotocol update so client and server use the same keyed rendezvous request fields.Summary by CodeRabbit
New Features
Tests