Sync Issue 2

Here's my analysis of this log. This is a **completely different failure mode** from the previous logs -- and it confirms your hypothesis.

**Timeline:**

| Time | Event | Block | Rate |
|------|-------|-------|------|
| 17:52:48 | Fresh sync starts from genesis, 1 peer (Qmbct) | #21 | - |
| 17:52 - 17:53:57 | Fast sync through small blocks | #335 | 3-8 bps |
| 17:54:02 | **Hits large tx-flood blocks** | #338 | 0.3-0.7 bps |
| 17:56:17 | Through the fat region, speeds up | #417 | 1.9-9.8 bps |
| 18:01:43 | **Stalls completely** at a huge block | #1558 | 0.0 bps |
| 18:02:02 | First `Network(Timeout)` - peer disconnected | #1558 | stuck |
| 18:03:03 | Backoff clears, new request sent | #1558 | stuck |
| 18:03:23 | Second `Network(Timeout)` (20s later) | #1558 | stuck |
| 18:04:23 | Third disconnect | #1558 | stuck |
| 18:04:26 | "Potential long-range attack" (stale mined block) | #1558 | - |
| 18:04 - 18:10 | **~6 minutes dead**: 0 peers, 0 progress | #1558 | - |
| 18:10:11 | Qmbct reconnects, `Refused` in 77ms | #1558 | - |
| 18:10:24 | QmQ4A connects, crawls at ~1 block/5 seconds | #1559 | ~0.1 bps |
| 18:11:26 | QmQ4A `Network(Timeout)` too | #1562 | stuck |
| 18:11 onwards | Endless Refused/Timeout cycle across all peers | #1562 | 0 |
| **20:26:42** | **User kills node** - still stuck at #1562, 0 peers | #1562 | dead |

The node sat stuck for **over 2 hours** and never recovered.

**Root cause -- confirmed: large block + 20s timeout:**

The proof is in the download bandwidth numbers. While stuck at #1558:

```
18:01:48 -- 0.0 bps, download 87.8 kiB/s
18:01:53 -- 0.0 bps, download 200.2 kiB/s
18:01:58 -- 0.0 bps, download 172.7 kiB/s
18:02:02 -- Network(Timeout)
```

Data IS flowing (100-200 kiB/s) but the block doesn't finish importing. The block request times out at 20 seconds. At 200 kiB/s, the max transferable in 20 seconds is ~4 MB, but the block is likely larger (the tx-flood blocks had 538 extrinsics and hit the 5 MiB block weight limit). The peer is also bandwidth-constrained serving its own traffic.

When QmQ4A finally manages to serve block #1559 (took ~5s), that block barely made it. But subsequent blocks are equally large, and the timeouts resume.

**This is NOT the same as the previous fork-loop problem.** This node:
- Never forked (it's syncing from genesis)
- Never mined on its own chain (the "Potential long-range attack" at 18:04:26 was a stale block proposal from before the sync stall)
- Has zero transaction propagation overhead

The sole problem is: **the sync/2 request-response protocol's 20-second timeout is too short for large blocks over a bandwidth-limited link.** One timeout triggers the disconnect-backoff cascade, and the node never recovers because the same large block blocks it every time.


Full log:
[log9.txt](https://github.com/user-attachments/files/26508229/log9.txt)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync Issue 2 #476

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Time	Event	Block	Rate
17:52:48	Fresh sync starts from genesis, 1 peer (Qmbct)	#21	-
17:52 - 17:53:57	Fast sync through small blocks	#335	3-8 bps
17:54:02	Hits large tx-flood blocks	#338	0.3-0.7 bps
17:56:17	Through the fat region, speeds up	#417	1.9-9.8 bps
18:01:43	Stalls completely at a huge block	#1558	0.0 bps
18:02:02	First `Network(Timeout)` - peer disconnected	#1558	stuck
18:03:03	Backoff clears, new request sent	#1558	stuck
18:03:23	Second `Network(Timeout)` (20s later)	#1558	stuck
18:04:23	Third disconnect	#1558	stuck
18:04:26	"Potential long-range attack" (stale mined block)	#1558	-
18:04 - 18:10	~6 minutes dead: 0 peers, 0 progress	#1558	-
18:10:11	Qmbct reconnects, `Refused` in 77ms	#1558	-
18:10:24	QmQ4A connects, crawls at ~1 block/5 seconds	#1559	~0.1 bps
18:11:26	QmQ4A `Network(Timeout)` too	#1562	stuck
18:11 onwards	Endless Refused/Timeout cycle across all peers	#1562	0
20:26:42	User kills node - still stuck at #1562, 0 peers	#1562	dead

Sync Issue 2 #476

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions