Skip to content

Commit d982da5

Browse files
authored
Merge pull request #32 from OferMania/zh2
NF-1393: Introduces ramcloud_test_cluster.py, which allows you to bri…
2 parents 0bf4cc9 + 93cf825 commit d982da5

4 files changed

Lines changed: 140 additions & 2 deletions

File tree

README.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,49 @@ To make changes to the RAMCloud code simply make changes to the code in the
7777
Once RAMCloud is rebuilt, you can run the unit and standalone tests again to
7878
run the updated code.
7979

80+
# Bringing up your own RAMCloud test cluster
81+
82+
There's a script to simplify bringing up/down/resetting your RAMCloud cluster for you,
83+
especially if you feel like debugging RAMCloud from python3 interpreter (arguably a
84+
really nice way to troubleshoot in RAMCloud). It's the file testing/ramcloud_test_cluster.py,
85+
from within the development environment (it's ./config/dev-env to bring this up, as
86+
mentioned in previous section). From this dev environment, you can run:
87+
88+
python3 testing/ramcloud_test_cluster.py
89+
90+
This shows the status of your RAMCloud cluster. (Nifty, aye?) It may say you don't have
91+
a cluster up or not. You can bring one up or clear out all RAMCloud tables in this
92+
cluster by doing:
93+
94+
python3 testing/ramcloud_test_cluster.py -a reset
95+
96+
The nice thing about this command is clearing out all tables without wasting time bringing
97+
down then up RAMCloud. You can also do this to bring down the cluster when you're done.
98+
99+
python3 testing/ramcloud_test_cluster.py -a stop
100+
101+
The -a option also supports start and status, in addition to reset and stop. start will
102+
hard-reset the cluster if it's up already (slower), or in the event there's no cluster up,
103+
it brings one up. status shows if a cluster is up or not (it's equiv to omitting the -a option)
104+
105+
There's also the -n option, which controls the number of nodes to bring up (each node has
106+
zk + rc-coordinator + rc-server). When -n is ommitted, it defaults to 3. You should RARELY
107+
ever need to change this from the default. 3 is arguably the minimum # of nodes needed for
108+
"good behavior" in zk (due to consensus algorithm with tie-breaker) and rc-server (due to
109+
needing one instance for master copy, one instance for backup, and one instance for
110+
"probation" until it is trusted by the other rc-servers and elected rc-coordinator)
111+
112+
In the event you do need to mess with -n (let's say you want it at 4), note that you WILL
113+
need to hard-reset the cluster (ie doing -a reset will NOT work). Something like this should
114+
achieve the effect you want:
115+
116+
python3 testing/ramcloud_test_cluster.py -a start -n 4
117+
118+
After this point, you can continue to soft-reset the cluster, and it keeps the same number of
119+
nodes. I.e., this command should work at this point:
120+
121+
python3 testing/ramcloud_test_cluster.py -a reset
122+
80123
# Obtaining the Patched Code
81124

82125
First, install `stgit` through your package manager, e.g. `apt-get install

config/Dockerfile.node

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,8 +49,8 @@ ARG DISTRO_NAME=apache-zookeeper-3.5.8-bin
4949

5050
# Download Apache Zookeeper, verify its PGP signature, untar and clean up
5151
RUN set -eux; \
52-
wget -q "https://www.apache.org/dist/zookeeper/$SHORT_DISTRO_NAME/$DISTRO_NAME.tar.gz"; \
53-
wget -q "https://www.apache.org/dist/zookeeper/$SHORT_DISTRO_NAME/$DISTRO_NAME.tar.gz.asc"; \
52+
wget -q "http://archive.apache.org/dist/zookeeper/$SHORT_DISTRO_NAME/$DISTRO_NAME.tar.gz"; \
53+
wget -q "http://archive.apache.org/dist/zookeeper/$SHORT_DISTRO_NAME/$DISTRO_NAME.tar.gz.asc"; \
5454
export GNUPGHOME="$(mktemp -d)"; \
5555
# Removing these checks because the GPG_KEY value above is no longer correct for the 3.5.7 ZK package
5656
# gpg --keyserver ha.pool.sks-keyservers.net --recv-key "$GPG_KEY" || \

testing/cluster_test_utils.py

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import copy
22
import docker
33
import kazoo.client
4+
import kazoo.exceptions
45
import logging
56
import logging.config
67
import os
@@ -102,6 +103,52 @@ def launch_node(cluster_name, hostname, zk_servers, external_storage, zkid, ip,
102103
logger.info('Launching node container %s with IP address %s...successful', hostname, ip)
103104
return docker_client.containers.get(container_id)
104105

106+
def get_status():
107+
docker_containers = docker_client.containers.list(all=True, filters={"name":"ramcloud-node-*"})
108+
docker_network = False
109+
try:
110+
docker_network = docker_client.networks.get("ramcloud-net")
111+
except docker.errors.NotFound as nf:
112+
pass
113+
if not docker_containers:
114+
logger.info('No ramcloud nodes found')
115+
else:
116+
logger.info('Found %s ramcloude nodes', len(docker_containers))
117+
if not docker_network:
118+
logger.info('ramcloud network not found')
119+
else:
120+
logger.info('Found ramcloud network')
121+
return (docker_network, docker_containers)
122+
123+
def destroy_network_and_containers(docker_network, docker_containers):
124+
try:
125+
for dc in docker_containers:
126+
print("removing container:", dc.name)
127+
dc.remove(force=True)
128+
if docker_network:
129+
print("removing network:", docker_network)
130+
docker_network.remove()
131+
except docker.errors.NotFound as nf:
132+
print("unable to destroy containers and/or network")
133+
134+
def get_ensemble(num_nodes = 3):
135+
return {i: '10.0.1.{}'.format(i) for i in range(1, num_nodes + 1)}
136+
137+
def get_table_names(ensemble):
138+
try:
139+
zkc = get_zookeeper_client(ensemble)
140+
return zkc.get_children('/ramcloud/main/tables')
141+
except kazoo.exceptions.NoNodeError as nne:
142+
# If tables in zk exists but wasn't initialized, then this is thrown, so return an empty list
143+
return []
144+
145+
def drop_tables(ensemble, table_names):
146+
r = ramcloud.RAMCloud()
147+
external_storage = 'zk:' + external_storage_string(ensemble)
148+
r.connect(external_storage, 'main')
149+
for table_name in table_names:
150+
r.drop_table(table_name)
151+
105152
# ClusterTest Usage in Python interpreter:
106153
# >>> import cluster_test_utils as ctu
107154
# >>> x = ctu.ClusterTest()

testing/ramcloud_test_cluster.py

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
import cluster_test_utils as ctu
2+
import argparse
3+
import sys
4+
5+
if __name__ == '__main__':
6+
parser = argparse.ArgumentParser()
7+
parser.add_argument('--action', '-a', metavar='A', type=str, default="status",
8+
help="Defines the action to take: status, reset, start, stop")
9+
parser.add_argument('--nodes', '-n', type=int, default=3,
10+
help="Number of zk, rc-coordinator, and rc-server instances to bring up. Only relevant when there's no cluster up yet. Default is 3")
11+
12+
args = parser.parse_args()
13+
14+
print("action =",args.action)
15+
print("nodes =",args.nodes)
16+
if (args.action == "start"):
17+
x = ctu.ClusterTest()
18+
x.setUp(num_nodes = args.nodes)
19+
elif (args.action == "status"):
20+
ctu.get_status()
21+
elif (args.action == "stop"):
22+
docker_network, docker_containers = ctu.get_status()
23+
ctu.destroy_network_and_containers(docker_network, docker_containers)
24+
elif (args.action == "reset"):
25+
docker_network, docker_containers = ctu.get_status()
26+
if (not docker_network):
27+
# No network (or containers), means bring up new cluster
28+
print("Bringing up new cluster with ", args.nodes, " nodes")
29+
x = ctu.ClusterTest()
30+
x.setUp(num_nodes = args.nodes)
31+
elif (not docker_containers):
32+
# A network but no containers means no data, so take it down, & bring back up
33+
print("Inconsistent State")
34+
print("Bringing up new cluster with ", args.nodes, " nodes")
35+
ctu.destroy_network_and_containers(docker_network, [])
36+
x = ctu.ClusterTest()
37+
x.setUp(num_nodes = args.nodes)
38+
else:
39+
# We have a network and containers. Get the ensemble, table names, then drop all tables!
40+
print("Found a cluster with ", len(docker_containers), " nodes")
41+
print("Identifying tables")
42+
ensemble = ctu.get_ensemble(len(docker_containers))
43+
table_names = ctu.get_table_names(ensemble)
44+
print("Table names = ", table_names)
45+
print("Dropping all tables")
46+
ctu.drop_tables(ensemble, table_names)
47+
else:
48+
parser.print_help()

0 commit comments

Comments
 (0)