Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
293 commits
Select commit Hold shift + click to select a range
bf56cb6
inherit default cluster mode in new cluster (#733)
geoffrey1330 Nov 12, 2025
0e72282
Update environment variables for Simply Block (#737)
Hamdy-khader Nov 12, 2025
25e3dd2
Main lvol sync delete (#734)
Hamdy-khader Nov 13, 2025
cd68c60
added fdb multi AZ support (#736)
geoffrey1330 Nov 13, 2025
1c38b6e
increased k8s fdb memory limit (#740)
geoffrey1330 Nov 14, 2025
5d9e0a4
Added MIT License (#742)
noctarius Nov 14, 2025
ee8d460
Update constants.py (#744)
schmidt-scaled Nov 15, 2025
2b14491
set size of lvstore cluster in constants (as ratio to distrib page size)
schmidt-scaled Nov 15, 2025
cfd14f7
Merge remote-tracking branch 'origin/main'
schmidt-scaled Nov 15, 2025
314c4cf
Update sc name (#746)
geoffrey1330 Nov 17, 2025
ce6ae0f
updated to distributed provisioning (#748)
geoffrey1330 Nov 17, 2025
5596c11
Update Dockerfile_base (#750)
geoffrey1330 Nov 17, 2025
aaa9b42
sleep after openshift core isolation until reboot (#753)
geoffrey1330 Nov 18, 2025
b60925d
added try and except to patch_prometheus_configmap func (#756)
geoffrey1330 Nov 19, 2025
bb90c60
added hostNetwork true to simplyblock controlplane services (#771)
geoffrey1330 Nov 23, 2025
5f352d9
Merge branch 'main' into main-sfam-2359
Hamdy-khader Nov 24, 2025
7655dbd
Set cluster_id optional on SNodeAPI docker version
Hamdy-khader Nov 24, 2025
880630f
fix type checker
Hamdy-khader Nov 25, 2025
91443ed
fix type checker
Hamdy-khader Nov 25, 2025
43c97a5
Set cluster_id optional on SNodeAPI docker version (#777)
Hamdy-khader Nov 25, 2025
33ee3e4
add cluster_id param for spdk_process_is_up (#779)
geoffrey1330 Nov 26, 2025
2531483
updated images for openshift preflight check (#741)
geoffrey1330 Nov 27, 2025
6ff60fd
Fix snapshot replications tickets
Hamdy-khader Nov 27, 2025
98ff764
Fix sfam-2496
Hamdy-khader Nov 27, 2025
05b6cd1
Follow up 1
Hamdy-khader Nov 27, 2025
32d6ad4
fix lvol replication_start
Hamdy-khader Nov 27, 2025
580a519
fix rep service
Hamdy-khader Nov 27, 2025
254035d
fix snapshot clone return value
Hamdy-khader Nov 27, 2025
36f45b9
added graylog env GRAYLOG_MESSAGE_JOURNAL_MAX_SIZE (#782)
geoffrey1330 Nov 27, 2025
f412121
Create partitions and alcemls on node add in parallel (#763) (#785)
Hamdy-khader Dec 1, 2025
3c60a2c
Remove stats from fdb and get it from Prometheus (#762) (#786)
Hamdy-khader Dec 1, 2025
6ddfd0b
Increase jc comp resume retry on node not online (#690)
Hamdy-khader Dec 1, 2025
8e3fe70
Adds missing services to k8s mgmt (#788)
Hamdy-khader Dec 2, 2025
a77c5e4
fix sfam-2507 (#791)
geoffrey1330 Dec 3, 2025
db2ca62
Update cluster.py (#793)
geoffrey1330 Dec 3, 2025
67455ed
Update mgmt_node_ops.py (#795)
geoffrey1330 Dec 3, 2025
20068bf
remove function get_node_name_by_ip (#794)
geoffrey1330 Dec 3, 2025
e91dbc0
Fix /cluster/create_first response (#798)
Hamdy-khader Dec 3, 2025
cd2b261
replicate snapshot back to src _1 (#790)
Hamdy-khader Dec 3, 2025
a620255
Merge branch 'main' into main-sfam-2359
Hamdy-khader Dec 3, 2025
bc79957
Remove user creation and switch (#799)
Hamdy-khader Dec 4, 2025
c09ad75
Merge branch 'main' into main-sfam-2359
Hamdy-khader Dec 4, 2025
43a4cae
Fix apiv2 pool add response to return pool dict (#800)
Hamdy-khader Dec 4, 2025
ce479eb
Update mgmt_node_ops.py (#796)
geoffrey1330 Dec 4, 2025
ec07572
Fix add-node apiv2 to remove unused param "full_page_unmap" (#801)
Hamdy-khader Dec 4, 2025
f48c839
Main fix add node apiv2 (#802)
Hamdy-khader Dec 4, 2025
f673f0a
Update cluster_ops.py (#797)
geoffrey1330 Dec 4, 2025
d2ad973
Fix node-add apiv2 response (#803)
Hamdy-khader Dec 4, 2025
1b0d7aa
Main fix node list apiv2 response (#804)
Hamdy-khader Dec 4, 2025
e0cd5ab
Update storage_deploy_spdk.yaml.j2 (#805)
geoffrey1330 Dec 4, 2025
7d8bb86
Adding quick outage case, changes to ssh utils (#806)
RaunakJalan Dec 5, 2025
7a3cf03
Merge branch 'main' into main-sfam-2359
Hamdy-khader Dec 5, 2025
df90efe
Fix sn list apiv2 response _2
Hamdy-khader Dec 5, 2025
98b6384
Fix sn list apiv2 response _3
Hamdy-khader Dec 5, 2025
24d6ced
Fix sn list apiv2 response _2 (#807)
Hamdy-khader Dec 5, 2025
d1163e3
Update cluster.py (#808)
geoffrey1330 Dec 8, 2025
7a52b77
Add stats to spdk_http_proxy_server.py
Hamdy-khader Dec 8, 2025
9526f8c
Add stats to spdk_http_proxy_server.py _2
Hamdy-khader Dec 8, 2025
d163662
Fdb health check (#809)
geoffrey1330 Dec 8, 2025
3a3e7b5
Fix 2
Hamdy-khader Dec 8, 2025
93f7c8c
Fix 2
Hamdy-khader Dec 8, 2025
db31b6e
Fix 3
Hamdy-khader Dec 8, 2025
e344c93
Fix sfam-2515
Hamdy-khader Dec 8, 2025
e18c09a
Merge pull request #811 from simplyblock/main-sfam-2515
alirezamhd2024 Dec 8, 2025
82600ba
Fix sfam-2524
Hamdy-khader Dec 9, 2025
70d3864
Fix sfam-2523
Hamdy-khader Dec 9, 2025
6a9357c
Fix sfam-2502
Hamdy-khader Dec 10, 2025
3b40dc1
Fix sfam-2527
Hamdy-khader Dec 11, 2025
bc64704
Add task API to v2 (#818)
Hamdy-khader Dec 12, 2025
1014518
Merge pull request #816 from simplyblock/main-sfam-2502
alirezamhd2024 Dec 12, 2025
196b93c
Increase snapshot replication task retry on node not online
Hamdy-khader Dec 12, 2025
d81835a
fix sfam-2516 _1
Hamdy-khader Dec 12, 2025
5c1dd94
fix sfam-2516 _2
Hamdy-khader Dec 12, 2025
0b920c3
fix sfam-2516 _3
Hamdy-khader Dec 12, 2025
a485224
fix linter
Hamdy-khader Dec 10, 2025
9d145e2
fix sfam-2516 _4
Hamdy-khader Dec 13, 2025
0db9022
wip
Hamdy-khader Dec 13, 2025
4016222
Merge branch 'main' into main-sfam-2359
Hamdy-khader Dec 13, 2025
ff05ec6
wip
Hamdy-khader Dec 13, 2025
8786169
wip 2
Hamdy-khader Dec 13, 2025
4734c2d
wip 2
Hamdy-khader Dec 13, 2025
b159951
Show number of devices on storage node response (#819)
Hamdy-khader Dec 15, 2025
8e10e29
Return capacity usage per object in API V2 (cluster, node, device, po…
Hamdy-khader Dec 15, 2025
15e0af3
Exclude src snap node id when starting replication on cloned lvol
Hamdy-khader Dec 16, 2025
8dcc73c
Adds cluster rebalancing event (#823)
Hamdy-khader Dec 18, 2025
14e3c06
Add snode port change event (#824)
Hamdy-khader Dec 19, 2025
1a7d4f4
Fix snode health check cluster logs (#825)
Hamdy-khader Dec 19, 2025
8d9d0a5
Fix prom client cluster ip in case of k8s (#826)
Hamdy-khader Dec 22, 2025
da0c057
fix snapshot replication source and target in case of replicate_to_so…
Hamdy-khader Dec 24, 2025
340d1aa
fix cluster add apiv2 (#829)
Hamdy-khader Dec 30, 2025
7649052
Fix deps version (#832)
Hamdy-khader Jan 2, 2026
3b74ce1
fix id_device_by_nqn int+str (#833)
Hamdy-khader Jan 6, 2026
21cd2da
Adds openapi.json
Hamdy-khader Jan 6, 2026
315ef01
Fix remove device response if device was removed (#839)
Hamdy-khader Jan 9, 2026
0ffe243
added simplyblock crds and custom resource (#814)
geoffrey1330 Jan 19, 2026
4fbf425
Main sfam 2359 api (#844)
geoffrey1330 Jan 19, 2026
575816b
added replication_start and stop to api v2 (#845)
geoffrey1330 Jan 19, 2026
c1e9ce1
Merge branch 'main' into main-sfam-2359
geoffrey1330 Jan 20, 2026
852a04e
Waleed Hotfix changes to main (#846)
wmousa Jan 21, 2026
dbb360e
Cherry-pick commits from R25.10-Hotfix that was made by [Hamdy Michae…
Hamdy-khader Jan 22, 2026
1b08d82
Merge branch 'main' into main-sfam-2359
Hamdy-khader Jan 27, 2026
5f96694
Enhance snapshot replication logic to support snapshot instances and …
Hamdy-khader Jan 27, 2026
105a14f
Add replication-trigger command to start replication for logical volumes
Hamdy-khader Jan 28, 2026
1bdb533
fixed UnboundLocalError: local variable 'os_endpoint' referenced befo…
geoffrey1330 Jan 28, 2026
e09135e
Adds JM device fixes to main (#856)
Hamdy-khader Jan 29, 2026
c4bee3f
added check for spdk_container is running (#857)
geoffrey1330 Jan 29, 2026
56bd031
Refactor lvol_controller.py to improve node validation during LVol cr…
Hamdy-khader Jan 29, 2026
8d6652f
Update rpc_client.py (#860)
schmidt-scaled Feb 2, 2026
0cdf8a6
Update storage_node_ops.py (#861)
schmidt-scaled Feb 2, 2026
9cb7636
Update rpc_client.py (#864)
schmidt-scaled Feb 3, 2026
3671b7b
Update lvol_controller.py (#865)
schmidt-scaled Feb 3, 2026
1d3213b
Update storage_node_ops.py (#866)
schmidt-scaled Feb 3, 2026
9db8156
Update lvol_controller.py (#867)
schmidt-scaled Feb 3, 2026
adc9e06
Update storage_node_ops.py (#868)
schmidt-scaled Feb 3, 2026
0ac5bce
Merge branch 'main' into main-sfam-2359
Hamdy-khader Feb 5, 2026
37f46af
fix 1
Hamdy-khader Feb 5, 2026
920034e
fix 1
Hamdy-khader Feb 5, 2026
f81e198
Hotfix to main (#862)
wmousa Feb 5, 2026
d78223d
Merge branch 'main' into main-sfam-2359
Hamdy-khader Feb 5, 2026
745eeb5
skip rendering prometheus configmap on helm upgrade (#869)
geoffrey1330 Feb 5, 2026
ff20e2e
Fix 1
Hamdy-khader Feb 6, 2026
1790cf8
Merge branch 'main' into main-sfam-2359
geoffrey1330 Feb 9, 2026
73f71e0
removed csi hostpath controller and node (#870)
geoffrey1330 Feb 9, 2026
9a67a24
Merge branch 'main' into main-sfam-2359
geoffrey1330 Feb 9, 2026
3d4d935
Add ptpl_file parameter to namespace in rpc_client
schmidt-scaled Feb 9, 2026
9fa8393
fix typo
Hamdy-khader Feb 9, 2026
3bffc55
fix rep status return output
Hamdy-khader Feb 9, 2026
16c4f6e
fix: handle missing replicate_as_snap_instance parameter
Hamdy-khader Feb 9, 2026
03da3e2
fix: use unique UUID for snapshot replication identifier
Hamdy-khader Feb 10, 2026
a3ce25f
fix: improve replication duration calculation logic
Hamdy-khader Feb 10, 2026
80e7e43
Update lvol_controller.py (#871)
schmidt-scaled Feb 11, 2026
0b804b6
Fix ptpl_file path construction for eui64 (#872)
schmidt-scaled Feb 11, 2026
bdca196
reverted api v2 field to id from uuid (#873)
geoffrey1330 Feb 11, 2026
d2d45e4
Merge branch 'main' into main-sfam-2359
geoffrey1330 Feb 11, 2026
ee06ee3
feat: add replicate_lvol_on_target_cluster function and API endpoint
Hamdy-khader Feb 13, 2026
f85b354
fix: change replicate_lvol endpoint from GET to POST
Hamdy-khader Feb 13, 2026
a3b4b0e
fix: set lvs_name for bdev_lvol in replication process
Hamdy-khader Feb 13, 2026
3e07070
wip
Hamdy-khader Feb 13, 2026
b3ef5cb
Ceil new_size to the nearest GiB in snapshot resizing (#877)
Hamdy-khader Feb 16, 2026
319b56d
Main clone new size (#878)
Hamdy-khader Feb 16, 2026
2f1f86f
adds lvol clone stack
Hamdy-khader Feb 16, 2026
55bb8a2
R25.10 hotfix dump tree (#879) (#880)
Hamdy-khader Feb 16, 2026
79aed3d
R25.10 hotfix client data nic (#874) (#875)
Hamdy-khader Feb 16, 2026
c4f4f44
fix: update sorting key for snapshots from creation_dt to created_at
Hamdy-khader Feb 16, 2026
2374c62
feat: add configuration for MCP and implement device status reset fun…
Hamdy-khader Feb 16, 2026
60371c9
set snapshot name when creating lvol no target cluster
Hamdy-khader Feb 17, 2026
777c980
return lvol on target if exists
Hamdy-khader Feb 17, 2026
2357bb9
fix lvol list
Hamdy-khader Feb 17, 2026
edb9a9f
updated _ReplicationParams field (#847)
geoffrey1330 Feb 18, 2026
38a7d92
return new lvol connection string on lvol connect if cluster is suspe…
Hamdy-khader Feb 18, 2026
515fbce
Hotfix 2 main (#882)
wmousa Feb 19, 2026
16854ee
Complete automation changes with stress tests (#883)
RaunakJalan Feb 22, 2026
79ac79e
use a diff variable name conv_new_size (#885)
geoffrey1330 Feb 23, 2026
940e325
Stress bootstrap + stress test run
RaunakJalan Feb 23, 2026
97608f1
Stress bootstrap + stress test run (#887)
RaunakJalan Feb 23, 2026
7a3f730
Stress bootstrap + stress test run
RaunakJalan Feb 23, 2026
bbe3cc6
Stress bootstrap + stress test run
RaunakJalan Feb 23, 2026
23f15a2
Stress bootstrap + stress test run
RaunakJalan Feb 23, 2026
1bc431e
Stress bootstrap + stress test run
RaunakJalan Feb 23, 2026
1002cc7
Stress bootstrap + stress test run
RaunakJalan Feb 23, 2026
d09a0c2
Stress bootstrap + stress test run
RaunakJalan Feb 23, 2026
d6bee99
Stress bootstrap + stress test run
RaunakJalan Feb 23, 2026
b00dc11
Stress bootstrap + stress test run
RaunakJalan Feb 23, 2026
5086437
Stress bootstrap + stress test run
RaunakJalan Feb 23, 2026
ffac4a8
Changing quick outage and stress run for max variables
RaunakJalan Feb 23, 2026
281a53b
Changing quick outage and stress run for max variables
RaunakJalan Feb 23, 2026
80a7d31
Changing quick outage and stress run for max variables
RaunakJalan Feb 23, 2026
6b8b923
Changing quick outage and stress run for max variables
RaunakJalan Feb 23, 2026
a8ad9f0
Fixing cleanup and branch use
RaunakJalan Feb 24, 2026
ae3df4d
use a diff variable name conv_new_size (#885)
geoffrey1330 Feb 23, 2026
111e050
Merge branch 'main' into test-stress-final-yaml
RaunakJalan Feb 24, 2026
c6471b7
Linter error
RaunakJalan Feb 24, 2026
ba5fcca
Merge pull request #889 from simplyblock/test-stress-final-yaml
alirezamhd2024 Feb 24, 2026
e5b1455
Namespace lvols testing + Adding existing cluster stress run (#891)
RaunakJalan Feb 24, 2026
3a71852
Fixing sbcli error (#892)
RaunakJalan Feb 24, 2026
03ee0fb
Update continuous_failover_ha_namespace.py (#893)
RaunakJalan Feb 24, 2026
fc79eb1
Test fix sbcli (#894)
RaunakJalan Feb 25, 2026
77d255a
Ping to data nic from SNodeAPI (#890)
Hamdy-khader Feb 25, 2026
7e800cc
Adding log collection and upload (#895)
RaunakJalan Feb 26, 2026
8d6d71c
Add data nic param to bootstrap (#896)
RaunakJalan Feb 27, 2026
9419234
Resize clone newsize greater (#898)
geoffrey1330 Feb 27, 2026
cd3eb9c
assign variable conv_new_size value 0 (#899)
geoffrey1330 Feb 27, 2026
aed65e3
avoid updating lvol size during clone rely on resize_lvol func (#900)
geoffrey1330 Feb 27, 2026
e77ea41
feat: add endpoint to list replication tasks for a volume
Hamdy-khader Mar 2, 2026
a08a09a
Fixing namespace test lvol count and naming collision (#901)
RaunakJalan Mar 4, 2026
c440d6b
Adding e2e yaml and data nic param (#903)
RaunakJalan Mar 4, 2026
089a3e8
Test: Fixing snapshot deleting uuid column (#904)
RaunakJalan Mar 5, 2026
8203aac
Fix sfam-2635 (#906)
Hamdy-khader Mar 5, 2026
018c10e
Testing: Fixing e2e tests (#905)
RaunakJalan Mar 6, 2026
d2630ee
updated endpoint and func list_replication_tasks
geoffrey1330 Mar 6, 2026
79f156c
update endpoint list_replication_tasks to use instance_api
geoffrey1330 Mar 6, 2026
cfe790b
Update Docker image tag to latest main version (#907)
Hamdy-khader Mar 6, 2026
a8d37bb
update snapshot soft delete log msg (#908)
Hamdy-khader Mar 6, 2026
47bd5e1
Fixed slack notification and parallel lvol case (#909)
RaunakJalan Mar 6, 2026
af8a43d
updated snapshot replication crd
geoffrey1330 Mar 10, 2026
90c133f
feat: add suspend and resume commands for lvol subsystems
Hamdy-khader Mar 11, 2026
7a28adb
feat: add configuration settings and utility scripts for volume manag…
Hamdy-khader Mar 11, 2026
8961c8f
feat: add configuration settings, utility scripts, and endpoints for …
Hamdy-khader Mar 11, 2026
e28549a
wip
Hamdy-khader Mar 12, 2026
3466666
Adds replicate_lvol_on_source_cluster apiv2
Hamdy-khader Mar 12, 2026
b290387
fix 1
Hamdy-khader Mar 13, 2026
28a21ab
Adds 'from_source' attr to lvol model
Hamdy-khader Mar 13, 2026
0ffb155
fix: update suspend and resume functions to return boolean values
Hamdy-khader Mar 13, 2026
cb20278
fix: toggle 'from_source' attribute in lvol model during replication
Hamdy-khader Mar 16, 2026
3033eca
return from_source from api
geoffrey1330 Mar 16, 2026
0aec9e5
fix: update lvol UUID handling during replication process
Hamdy-khader Mar 16, 2026
bd1e38d
fix: lvol delete on target
Hamdy-khader Mar 16, 2026
547054f
feat: add configuration and utility scripts for managing storage node…
Hamdy-khader Mar 16, 2026
a21f00d
fix: update lvol attributes for cloning and set from_source flag
Hamdy-khader Mar 17, 2026
eb05502
refactor replicate_lvol_on_source_cluster
Hamdy-khader Mar 17, 2026
362935a
fix issue
Hamdy-khader Mar 18, 2026
f0bb920
Revert "fix issue"
Hamdy-khader Mar 19, 2026
863c704
Revert "refactor replicate_lvol_on_source_cluster"
Hamdy-khader Mar 19, 2026
5850525
fix issue
Hamdy-khader Mar 19, 2026
4c48a03
adds prints
Hamdy-khader Mar 19, 2026
b817874
add function to delete last snapshot if needed after replication
Hamdy-khader Mar 19, 2026
3e4bce6
fix
Hamdy-khader Mar 19, 2026
9b284fa
fix 2
Hamdy-khader Mar 19, 2026
3360bfe
fix: handle KeyError in lvol status change and enhance replication fu…
Hamdy-khader Mar 20, 2026
e52298c
avoid Runtime error if lvol not found
geoffrey1330 Mar 20, 2026
a636552
feat: initialize new lvol with unique identifiers and updated naming …
Hamdy-khader Mar 23, 2026
1436e3f
fix: remove unnecessary sleep calls in lvol creation process
Hamdy-khader Mar 23, 2026
d2081fc
fix: filter out deleted lvols in get_lvols and update lvol deletion p…
Hamdy-khader Mar 23, 2026
dcff5c2
fix: update lvol retrieval method to use get_lvols instead of get_all…
Hamdy-khader Mar 23, 2026
9115a15
fix: return None instead of False when lvol is not found
Hamdy-khader Mar 23, 2026
de87c12
fix: update replicate_lvol_on_source_cluster to include cluster ID
Hamdy-khader Mar 23, 2026
9e6b43f
feat: add configuration for KiroAgent and implement lvol replication …
Hamdy-khader Mar 23, 2026
5f2f49d
fix: enhance lvol creation process with target cluster NQN and update…
Hamdy-khader Mar 23, 2026
c8e7f5b
updated SimplyBlockSnapshotReplication crd field
geoffrey1330 Mar 23, 2026
dc48e21
fix: handle missing start_time in replication duration calculation
Hamdy-khader Mar 24, 2026
4eecf51
fix: handle forced deletion of snapshots when storage node is not found
Hamdy-khader Mar 24, 2026
4b0b666
fix: update KMS init container image to use the correct repository
Hamdy-khader Mar 24, 2026
43bd2ee
fix lvol delete response if lvol is in_deletion
Hamdy-khader Mar 24, 2026
4c07fed
fix 1
Hamdy-khader Mar 24, 2026
80801f4
Add the mount of /mnt/ramdisk to docker deployment
wmousa Nov 20, 2025
35e5bc7
fix the replicate_lvol_on_source api call params
Hamdy-khader Mar 25, 2026
1206cc7
fix the replicate_lvol_on_source api call params 2
Hamdy-khader Mar 25, 2026
6b66567
fixed init job failed to mkdir /etc/systemd/
geoffrey1330 Mar 25, 2026
49c2651
remove init copy script container
geoffrey1330 Mar 25, 2026
0a801ea
updated storagenode crd
geoffrey1330 Mar 25, 2026
90daf45
added storage cr param spdkImage
geoffrey1330 Mar 25, 2026
b6820c5
fix: update replicate_lvol_on_source_cluster to accept cluster_id and…
Hamdy-khader Mar 26, 2026
659b158
fix: update lvol deletion process to set status and write to database
Hamdy-khader Mar 26, 2026
e0a16f9
Revert "fix: update replicate_lvol_on_source_cluster to accept cluste…
Hamdy-khader Mar 26, 2026
d5d0ed5
point image to dockerub
geoffrey1330 Mar 26, 2026
b41ae75
point image to dockerub
geoffrey1330 Mar 26, 2026
7c62027
added namespace to api resource
geoffrey1330 Mar 26, 2026
5196a53
feat: add clone_lvol function to create snapshots and clones of logic…
Hamdy-khader Mar 26, 2026
dbd9c02
feat: implement clone endpoint for logical volumes with retry logic
Hamdy-khader Mar 27, 2026
a47d320
Merge branch 'main-local' into main-sfam-2359
Hamdy-khader Mar 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,270 changes: 1,270 additions & 0 deletions .github/workflows/e2e-bootstrap.yml

Large diffs are not rendered by default.

930 changes: 930 additions & 0 deletions .github/workflows/e2e-only.yml

Large diffs are not rendered by default.

128 changes: 128 additions & 0 deletions .github/workflows/e2e-scheduler.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
name: E2E Scheduler

# Runs 4 e2e-bootstrap variants daily (sequentially) unless a stress run is active.
# Run 1: R25.10-Hotfix ndcs=1 npcs=1 storage: .201-.203
# Run 2: R25.10-Hotfix ndcs=2 npcs=2 storage: .201-.204
# Run 3: main ndcs=1 npcs=1 storage: .201-.203
# Run 4: main ndcs=2 npcs=2 storage: .201-.204
#
# Infrastructure defaults not listed here are defined in e2e-bootstrap.yml.

on:
schedule:
- cron: '30 18 * * *' # daily at 12:00 AM IST (18:30 UTC)
workflow_dispatch: # allow manual trigger for testing

jobs:
# ============================================================
# GUARD: skip all runs if a stress workflow is in progress
# ============================================================
check-stress:
name: Check if stress run is active
runs-on: [self-hosted]
outputs:
stress_running: ${{ steps.check.outputs.stress_running }}
steps:
- name: Check for in-progress stress runs
id: check
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
shell: bash
run: |
count1=$(gh run list --repo "${{ github.repository }}" \
--workflow stress-run-bootstrap.yml --status in_progress \
--json databaseId --jq 'length' 2>/dev/null || echo 0)
count2=$(gh run list --repo "${{ github.repository }}" \
--workflow stress-run-only.yml --status in_progress \
--json databaseId --jq 'length' 2>/dev/null || echo 0)
if [[ "$count1" -gt 0 || "$count2" -gt 0 ]]; then
echo "stress_running=true" >> "$GITHUB_OUTPUT"
echo "Stress run in progress — skipping all scheduled e2e runs."
else
echo "stress_running=false" >> "$GITHUB_OUTPUT"
echo "No stress run active — proceeding."
fi

# ============================================================
# RUN 1 — R25.10-Hotfix | ndcs=1 npcs=1 | 3 storage nodes
# ============================================================
e2e-run-1:
name: "E2E R25.10-Hotfix 1+1"
needs: check-stress
if: needs.check-stress.outputs.stress_running != 'true'
uses: ./.github/workflows/e2e-bootstrap.yml
secrets: inherit
with:
SBCLI_BRANCH: "R25.10-Hotfix"
STORAGE_PRIVATE_IPS: "192.168.10.201 192.168.10.202 192.168.10.203"
MNODES: "192.168.10.210"
API_INVOKE_URL: "http://192.168.10.210/"
BASTION_IP: "192.168.10.210"
GRAFANA_ENDPOINT: "http://192.168.10.210/grafana"
BOOTSTRAP_DATA_CHUNKS: "1"
BOOTSTRAP_PARITY_CHUNKS: "1"
BOOTSTRAP_HA_JM_COUNT: "4"
BOOTSTRAP_JOURNAL_PARTITION: "1"

# ============================================================
# RUN 2 — R25.10-Hotfix | ndcs=2 npcs=2 | 4 storage nodes
# ============================================================
e2e-run-2:
name: "E2E R25.10-Hotfix 2+2"
needs: [check-stress, e2e-run-1]
if: always() && needs.check-stress.outputs.stress_running != 'true' && needs.e2e-run-1.result != 'skipped'
uses: ./.github/workflows/e2e-bootstrap.yml
secrets: inherit
with:
SBCLI_BRANCH: "R25.10-Hotfix"
STORAGE_PRIVATE_IPS: "192.168.10.201 192.168.10.202 192.168.10.203 192.168.10.204"
MNODES: "192.168.10.210"
API_INVOKE_URL: "http://192.168.10.210/"
BASTION_IP: "192.168.10.210"
GRAFANA_ENDPOINT: "http://192.168.10.210/grafana"
BOOTSTRAP_DATA_CHUNKS: "2"
BOOTSTRAP_PARITY_CHUNKS: "2"
BOOTSTRAP_HA_JM_COUNT: "4"
BOOTSTRAP_JOURNAL_PARTITION: "1"

# ============================================================
# RUN 3 — main | ndcs=1 npcs=1 | 3 storage nodes
# ============================================================
e2e-run-3:
name: "E2E main 1+1"
needs: [check-stress, e2e-run-2]
if: always() && needs.check-stress.outputs.stress_running != 'true' && needs.e2e-run-2.result != 'skipped'
uses: ./.github/workflows/e2e-bootstrap.yml
secrets: inherit
with:
SBCLI_BRANCH: "main"
STORAGE_PRIVATE_IPS: "192.168.10.201 192.168.10.202 192.168.10.203"
MNODES: "192.168.10.210"
API_INVOKE_URL: "http://192.168.10.210/"
BASTION_IP: "192.168.10.210"
GRAFANA_ENDPOINT: "http://192.168.10.210/grafana"
BOOTSTRAP_DATA_CHUNKS: "1"
BOOTSTRAP_PARITY_CHUNKS: "1"
BOOTSTRAP_HA_JM_COUNT: "4"
BOOTSTRAP_JOURNAL_PARTITION: "1"

# ============================================================
# RUN 4 — main | ndcs=2 npcs=2 | 4 storage nodes
# ============================================================
e2e-run-4:
name: "E2E main 2+2"
needs: [check-stress, e2e-run-3]
if: always() && needs.check-stress.outputs.stress_running != 'true' && needs.e2e-run-3.result != 'skipped'
uses: ./.github/workflows/e2e-bootstrap.yml
secrets: inherit
with:
SBCLI_BRANCH: "main"
STORAGE_PRIVATE_IPS: "192.168.10.201 192.168.10.202 192.168.10.203 192.168.10.204"
MNODES: "192.168.10.210"
API_INVOKE_URL: "http://192.168.10.210/"
BASTION_IP: "192.168.10.210"
GRAFANA_ENDPOINT: "http://192.168.10.210/grafana"
BOOTSTRAP_DATA_CHUNKS: "2"
BOOTSTRAP_PARITY_CHUNKS: "2"
BOOTSTRAP_HA_JM_COUNT: "4"
BOOTSTRAP_JOURNAL_PARTITION: "1"
Loading
Loading