Oct 17, 2008

Testing: simulating a network failure

Figuring out how distributed software behaves in the event of a network failure or partition can be difficult. It requires testing that involves multiple machines, multiple networks and many hands!.
Virtualisation helps, but for really simple testability, a single VM environment is what you need. What follows, with some context, is a simple solution that may help.

Recently I was trying to track down an issue with Apache ActiveMQ network support. The test scenario required a bunch of VMWare images, dual network cards and periodic manual network disabling.
In order to understand the scenario I tried to reduce it to something more manageable. The iptables firewall in Linux meant I did not have to yank out any network cables. With iptables, and a good tutorial, it is relatively easy to simulate a network failure or temporary network outage by instructing iptables to drop network packets that originate from, or are destined for, an individual port.
For my test, I had a simple network of two embedded brokers, a producer on one broker and a consumer on the other. Both the producer and consumer used the vm protocol, leaving the tcp connector free for the networking calls. The connector was using port: 61616. To simulate a network failure, by dropping all tcp packets to and from port 61616, the following iptables rules do the trick:
$ sudo iptables -I INPUT 1 -p tcp --sport 61616 -j DROP;sudo iptables -I INPUT 2 -p tcp --dport 61616 -j DROP
In order to enable communication again, the two rules added above need to be deleted (for simplicity I just delete the first rule twice):
$ sudo iptables -D INPUT 1;sudo iptables -D INPUT 1
This works fine because I have control over the Linux box and I don't typically run any iptables rules. But this will not always be the case and this will not hold on other platforms or on shared Linux work stations. In addition it requires some manual intervention so it cannot be easily automated.

What I needed I thought, was a simple java socket proxy that could sit as an intermediary between the two ends of the network and which I could control through code. Something that will let traffic pass through until it is instructed not to do so. A quick google did not produce any obvious candidate for reuse so I coded a simple solution that worked for me and built a test case around it. The resulting SocketProxy is uses in BrokerQueueNetworkWithDisconnectTest. The usage pattern is based around replacing required tcp URIs with a proxy URL:

socketProxy = new SocketProxy(remoteURI);
DiscoveryNetworkConnector connector = new DiscoveryNetworkConnector(new URI("static:(" + socketProxy.getUrl() + ")"));
The proxy takes the target URI, sets up a listener and forwarder to the target and through getUrl() returns the proxy URL. To simulate a network failure, socketProxy.stop() is called during the test execution. socketProxy.resume() allows a network reconnect such that recovery can be validated. It made my life a little easier and meant I could produce a reliable and portable test case using a single JVM. I know I will use it again :-)

Note: There is also the option to pause/resume the proxy. This keeps the sockets open but does not allow any traffic to pass through. Pausing allows the simulation of a slow network which was handy for exercising the ActiveMQ inactivity monitor.

3 comments:

yurif2005 said...

This is not a 100% clean solution because even when the proxy is paused the client can send packets to inbound connections on the proxy and the local kernel with send acks back until the socket buffer in the kernel is full. Right?

yurif2005 said...
This comment has been removed by a blog administrator.
Gary Tully said...

correct, the send and receive buffers can indeed be relevant. It all depends on the use case.