Adithya Palle
The client and server are all separate programs.
I have provided jar files in the out folder that are ready to execute, or you can compile the program as specified in the "Compilation" section and run the jar files.
To run the jar files, use the following commands from the project root:
# server programs
# starts the acceptor and learner remote objects at the specified port on the local machine
# I don't recommned running this directly, use start_server.sh instead to run all replicas
java -jar out/StartAcceptorAndLearner.jar <registry port>
# starts the server replicas given the specified port and a list of addresses to all acceptors
# I don't recommend running this directly, use start_server.sh instead to run all replicas
java -jar out/StartProposer.jar <registry port> <[Optional] servers.txt file path>
# starts the server replicas at the specified ports
# Example invocation to run 5 replicas: bash start_server.sh 10001 10002 10003 10004 10005
# you can use cleanup_server.sh to delete the registries after use if it was interrupted/crashed, but this is called in startup_server.sh before the startup so its not necessary
# note that this will also generate the servers.txt file with the addresses of the replicas that is needed for the client programs
bash start_server.sh <list of ports separated by spaces>
# client sample program that runs at least 5 PUTS, GETS, and DELETES and tests consistency
# pay attention to the console output to see the results of the sample GET, PUT, DELETE operations
# note that this clears the key-value store before running the sample operations
# if you don't specify the server text file path, it will look for `servers.txt` to find a list of replica addresses and randomly pick one that is available
java -jar out/ClientSample.jar <[Optional] servers.txt file path>
# client program that tests concurrent operations by running multiple threads that perform PUT, GET, DELETE operations
# also tests consistency
# note that this clears the key-value store before running the sample operations
java -jar out/TestConcurrency.jar <num_threads> <[Optional] servers.txt file path>
# client program that allows user to interact with the key value store using a command line interface
java -jar out/InteractiveClient.jar <[Optional] servers.txt file path>
# Failure simulation for acceptors
java -jar out/FailureSimulator.jar <[Optional] servers.txt file path>Note that if you are Windows, you will need to run the scripts in a bash shell such as Git Bash or WSL. On Mac, the native terminal should work fine.
You can start the replicas in separate terminals,
or you can run bash server.sh to start the server replicas all in one terminal.
The replicas will still start at the separate ports you provide.
Example invocation to run 5 replicas: bash start_server.sh 10001 10002 10003 10004 10005
The servers.txt file is generated by the start_server.sh script and contains the addresses
of the replicas that the client will use to connect to the replicas. If you'd like to start the
server manually and generate this file manually, please make sure it has the following structure:
<ip/hostname>,<port>
<ip/hostname>,<port>
<ip/hostname>,<port>
If you wish to compile from the command line, follow these steps:
bash compile.shThis will generate 5 jar files in the out directory. The jar files are named as follows:
StartAcceptorAndLearners.jar # Server (start acceptors for KV replicas)
StartProposers.jar # Server (start proposers for KV replicas)
ClientSample.jar # Client (sample operations)
TestConcurrency.jar # Client that tests concurrency by running multiple threads that perform PUT, GET, DELETE operations
InteractiveClient.jar # Client that allows user to interact with the server using a command line interface
FailureSimulator.jar # Program that simulates failures for acceptors
If using Intellij IDEA, you can simply use build to compile the code or run the main methods for the TCP/UDP clients and servers to run the program.
Since PAXOS assures fault-tolerance by avoiding a single point of failure, I've removed the coordinator (a single point of failure)
from a previous implementation and the clients access all replicas using the auto-generated servers.txt
file that contains the addresses of the replicas. The client (KeyValueStoreController) random chooses a replica
from the list to connect to before making every request (this connection is cached). This mimics how systems like Kafka and even DNS resolution
maintain transparency, through having a local copy of the addresses of the replicas. In an actual distributed deployment of
this system, the client should have a list of remote replica addresses to connect to that is generated during deployment.
The reason I went for this approach rather than having a centralized load balancer or having the client connect directly to individual replicas
is because I wanted to maintain transparency and avoid a single point of failure. Its clear why a load balancer is a single point of failure,
but having the client connect directly to the replicas exposes the replication of the system to the client as they will need to select
a host from a list of addresses. In the current config driven system, the replication is at least hidden away in the config (servers.txt) file.
Note that we define a PAXOS instance per key, since consensus is only meaningful at the key level. Ensuring that each key reaches consensus guarantees that the overall key-value store remains consistent.
To support mutable values for keys, we allow for multiple rounds of consensus per key. Each round is identified by an epoch number, which is distinct from the proposal (or sequence) number. Since the epoch number is a global value for a given key (like the sequence number), it also requires consensus. To maintain consistency, we store the epoch number in the learners, and update it across all replicas after each successful round of consensus. This epoch number defines the current round of consensus for the key.
Once consensus is reached on a value for a key, the epoch number is incremented by 1 across all replicas. A transaction is only considered complete if this epoch update also achieves quorum. This ensures that any future proposals are interpreted as updates in a new round, rather than as conflicting proposals from an earlier one. This mechanism preserves the integrity of key updates.
As a result, all proposals with the same epoch number are treated as conflicting proposals that must undergo consensus, whereas proposals with higher epoch numbers are treated as updates in a new round. This mechanism prevents us from discarding newer values due to leftover accepted values from older rounds, supporting key updates in a consistent way.
This design is effectively my own implementation of multi-PAXOS, although I was unaware of the formal protocol details when building it. I later discovered that there are many similarities between my approach and multi-PAXOS — especially the use of an epoch (or slot) number to track the round of consensus.
In this project, we switch to a more optimistic concurrency control model for syncing reads and writes that allows for better consistency. Previously, we allowed for dirty/stale reads as we did not want to limit read throughput while the system was handling writes. However, in this system we require consensus/quorum to read the value at each key. Therefore, if the system is in an intermediary state during the updates where quorum has not been reached, the read operation will NACK and we will retry until we can reach a consistent state with quorum. This presents a tradeoff between read throughput and consistency, but we believe that consistency is more important in this context as it is one of the main goals of PAXOS.
In our systems, all replicas are both proposers (Proposer), acceptors (Acceptor), and learners (Learners). This means that each remote object registry stores exactly one Acceptor, Proposer, and Learner object for each replica.
You can evaluate the system yourself by running the servers and clients and looking at log outputs, but this may be tedious.
In the ClientSample.jar and TestConcurrency.jar , I have added tests using the Testing class to verify the consistency
of the data retrieved and modified by the clients. These tests assure that the system remains
consistent even though we have replicated it and send concurrent requests. Note that you can also test the bandwidth of the server
by increasing the thread count on the TestConcurrency.jar client. From my tests, it works consistently with
up to 1000 threads and failures beyond that range seem to be due to the thread spawning issues on the client side.
This shows that the backend can handle a large number of concurrent clients even from the same machine.
To further streamline testing, I created a shell script (run_tests.sh) that compiles the code,
starts the server, and runs all testing programs described above on all replicas and multiple times
and with a varying number of threads (for the concurrency test). You will see if all the automated tests have passed
if the console outputs the line "ALL TESTS IN run_tests.sh PASSED SUCCESSFULLY!!!". You might see some of the log output
of cleanup_server.sh at the end of the log outputs, so scroll up to find the success message.
Here are some client log examples from ClientSample.jar:
(base) adithyapalle@Adithyas-MacBook-Air Project4 % java -jar out/ClientSample.jar
[2025-03-24 15:31:12.875464 | 4921,main ] You did not provide a file for server addresses. Using default servers.txt file
[2025-03-24 15:31:12.950766 | 4921,main ] Using proposer at localhost:10004
[2025-03-24 15:31:12.991180 | 4921,main ] -----Populating the KV store with sample data-----
[2025-03-24 15:31:12.994761 | 4921,main ] Using proposer at localhost:10001
[2025-03-24 15:31:13.083709 | 4921,main ] Using proposer at localhost:10005
[2025-03-24 15:31:13.129876 | 4921,main ] Using proposer at localhost:10005
[2025-03-24 15:31:13.152303 | 4921,main ] Using proposer at localhost:10004
[2025-03-24 15:31:13.182535 | 4921,main ] Using proposer at localhost:10004
[2025-03-24 15:31:13.205163 | 4921,main ] -----Displaying all elements in the key-value store-----
[2025-03-24 15:31:13.208911 | 4921,main ] Key-Value Store:
key1 : value1
key2 : value2
key5 : value5
key3 : value3
key4 : value4
[2025-03-24 15:31:13.212313 | 4921,main ] -----Demonstrating delete a key-----
[2025-03-24 15:31:13.215429 | 4921,main ] GET key1: value1
[2025-03-24 15:31:13.218558 | 4921,main ] Using proposer at localhost:10002
[2025-03-24 15:31:13.255785 | 4921,main ] GET key1: null
[2025-03-24 15:31:13.258428 | 4921,main ] -----Demonstrating overwrite a key-----
[2025-03-24 15:31:13.261301 | 4921,main ] GET key2: value2
[2025-03-24 15:31:13.264662 | 4921,main ] Using proposer at localhost:10003
[2025-03-24 15:31:13.303054 | 4921,main ] GET key2: value2_new
[2025-03-24 15:31:13.305674 | 4921,main ] -----Demonstrating get a non-existent key-----
[2025-03-24 15:31:13.308060 | 4921,main ] GET key6: null
[2025-03-24 15:31:13.310091 | 4921,main ] -----Demonstrating delete a non-existent key-----
[2025-03-24 15:31:13.310156 | 4921,main ] Using proposer at localhost:10004
[2025-03-24 15:31:13.333814 | 4921,main ] -----Demonstrating delete everything-----
[2025-03-24 15:31:13.333911 | 4921,main ] Using proposer at localhost:10001
[2025-03-24 15:31:13.357341 | 4921,main ] Using proposer at localhost:10003
[2025-03-24 15:31:13.378337 | 4921,main ] Using proposer at localhost:10004
[2025-03-24 15:31:13.398857 | 4921,main ] Using proposer at localhost:10001
[2025-03-24 15:31:13.422500 | 4921,main ] GET key2: null
[2025-03-24 15:31:13.424806 | 4921,main ] GET key3: null
[2025-03-24 15:31:13.427337 | 4921,main ] GET key4: null
[2025-03-24 15:31:13.430189 | 4921,main ] GET key5: null
[2025-03-24 15:31:13.433687 | 4921,main ] -----Demonstrating put a numeric key-----
[2025-03-24 15:31:13.433776 | 4921,main ] Using proposer at localhost:10005
[2025-03-24 15:31:13.455988 | 4921,main ] GET 1: value1
[2025-03-24 15:31:13.458626 | 4921,main ] -----Demonstrating put a key with special characters-----
[2025-03-24 15:31:13.458713 | 4921,main ] Using proposer at localhost:10003
[2025-03-24 15:31:13.483033 | 4921,main ] GET key%#: value$^
[2025-03-24 15:31:13.486017 | 4921,main ] -----Demonstrating put a really long value-----
[2025-03-24 15:31:13.486139 | 4921,main ] Using proposer at localhost:10003
[2025-03-24 15:31:13.510845 | 4921,main ] GET many_smiles: :D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D:D
[2025-03-24 15:31:13.513685 | 4921,main ] -----Demonstrating restore a deleted key-----
[2025-03-24 15:31:13.515834 | 4921,main ] GET key1: null
[2025-03-24 15:31:13.515930 | 4921,main ] Using proposer at localhost:10005
[2025-03-24 15:31:13.539566 | 4921,main ] GET key1: value1
[2025-03-24 15:31:13.541145 | 4921,main ] ALL TESTS PASSED
[2025-03-24 15:31:13.541204 | 4921,main ] Exiting.
As you can see, we randomly distribute our requests to any replica for each request and the system remains consistent.
In the log output, you will see an annotation before each log of the structure [ timestamp | process_id, thread_name] . Since this is a client log, you will see that all the requests occur on a single thread on the same process, but if we inspect a snippet of the server logs for this run, we will see that the request are indeed hitting all the unique replicas ( look at unique process ids):
[2025-03-24 15:31:04.106181 | 4910,main ] Server ready
[2025-03-24 15:31:12.977654 | 4904,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-24 15:31:12.979782 | 4905,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-24 15:31:12.981745 | 4906,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-24 15:31:12.983368 | 4907,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-24 15:31:12.985849 | 4908,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-24 15:31:12.987733 | 4904,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-24 15:31:12.988477 | 4905,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-24 15:31:12.989232 | 4906,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-24 15:31:12.989911 | 4907,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-24 15:31:12.990579 | 4908,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-24 15:31:13.021418 | 4904,RMI TCP Connection(14)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.027172 | 4905,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.032255 | 4906,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.036878 | 4907,RMI TCP Connection(14)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.042080 | 4908,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.045948 | 4904,RMI TCP Connection(14)-127.0.0.1 ] Received PUT request for key: key1 and value: value1
[2025-03-24 15:31:13.049300 | 4905,RMI TCP Connection(13)-127.0.0.1 ] Received PUT request for key: key1 and value: value1
[2025-03-24 15:31:13.052737 | 4906,RMI TCP Connection(13)-127.0.0.1 ] Received PUT request for key: key1 and value: value1
[2025-03-24 15:31:13.056135 | 4907,RMI TCP Connection(14)-127.0.0.1 ] Received PUT request for key: key1 and value: value1
[2025-03-24 15:31:13.061292 | 4908,RMI TCP Connection(13)-127.0.0.1 ] Received PUT request for key: key1 and value: value1
[2025-03-24 15:31:13.101439 | 4904,RMI TCP Connection(14)-127.0.0.1 ] Received GET request for key: key2
[2025-03-24 15:31:13.103450 | 4905,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key2
[2025-03-24 15:31:13.105922 | 4906,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key2
[2025-03-24 15:31:13.108281 | 4907,RMI TCP Connection(14)-127.0.0.1 ] Received GET request for key: key2
[2025-03-24 15:31:13.112978 | 4908,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key2
[2025-03-24 15:31:13.116540 | 4904,RMI TCP Connection(14)-127.0.0.1 ] Received PUT request for key: key2 and value: value2
[2025-03-24 15:31:13.118884 | 4905,RMI TCP Connection(13)-127.0.0.1 ] Received PUT request for key: key2 and value: value2
[2025-03-24 15:31:13.120785 | 4906,RMI TCP Connection(13)-127.0.0.1 ] Received PUT request for key: key2 and value: value2
[2025-03-24 15:31:13.122337 | 4907,RMI TCP Connection(14)-127.0.0.1 ] Received PUT request for key: key2 and value: value2
[2025-03-24 15:31:13.123726 | 4908,RMI TCP Connection(13)-127.0.0.1 ] Received PUT request for key: key2 and value: value2
[2025-03-24 15:31:13.136190 | 4904,RMI TCP Connection(14)-127.0.0.1 ] Received GET request for key: key3
[2025-03-24 15:31:13.137991 | 4905,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key3
[2025-03-24 15:31:13.139360 | 4906,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key3
[2025-03-24 15:31:13.140558 | 4907,RMI TCP Connection(14)-127.0.0.1 ] Received GET request for key: key3
[2025-03-24 15:31:13.141717 | 4908,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key3
[2025-03-24 15:31:13.142800 | 4904,RMI TCP Connection(14)-127.0.0.1 ] Received PUT request for key: key3 and value: value3
[2025-03-24 15:31:13.143969 | 4905,RMI TCP Connection(13)-127.0.0.1 ] Received PUT request for key: key3 and value: value3
[2025-03-24 15:31:13.145144 | 4906,RMI TCP Connection(13)-127.0.0.1 ] Received PUT request for key: key3 and value: value3
[2025-03-24 15:31:13.146281 | 4907,RMI TCP Connection(14)-127.0.0.1 ] Received PUT request for key: key3 and value: value3
[2025-03-24 15:31:13.147307 | 4908,RMI TCP Connection(13)-127.0.0.1 ] Received PUT request for key: key3 and value: value3
[2025-03-24 15:31:13.164304 | 4904,RMI TCP Connection(14)-127.0.0.1 ] Received GET request for key: key4
[2025-03-24 15:31:13.166028 | 4905,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key4
[2025-03-24 15:31:13.167246 | 4906,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key4
[2025-03-24 15:31:13.168345 | 4907,RMI TCP Connection(14)-127.0.0.1 ] Received GET request for key: key4
[2025-03-24 15:31:13.169630 | 4908,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key4
[2025-03-24 15:31:13.171213 | 4904,RMI TCP Connection(14)-127.0.0.1 ] Received PUT request for key: key4 and value: value4
[2025-03-24 15:31:13.172612 | 4905,RMI TCP Connection(13)-127.0.0.1 ] Received PUT request for key: key4 and value: value4
[2025-03-24 15:31:13.173896 | 4906,RMI TCP Connection(13)-127.0.0.1 ] Received PUT request for key: key4 and value: value4
[2025-03-24 15:31:13.175161 | 4907,RMI TCP Connection(14)-127.0.0.1 ] Received PUT request for key: key4 and value: value4
[2025-03-24 15:31:13.176441 | 4908,RMI TCP Connection(13)-127.0.0.1 ] Received PUT request for key: key4 and value: value4
[2025-03-24 15:31:13.189077 | 4904,RMI TCP Connection(14)-127.0.0.1 ] Received GET request for key: key5
[2025-03-24 15:31:13.190334 | 4905,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key5
[2025-03-24 15:31:13.191500 | 4906,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key5
[2025-03-24 15:31:13.192615 | 4907,RMI TCP Connection(14)-127.0.0.1 ] Received GET request for key: key5
[2025-03-24 15:31:13.193836 | 4908,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key5
[2025-03-24 15:31:13.194918 | 4904,RMI TCP Connection(14)-127.0.0.1 ] Received PUT request for key: key5 and value: value5
[2025-03-24 15:31:13.196062 | 4905,RMI TCP Connection(13)-127.0.0.1 ] Received PUT request for key: key5 and value: value5
[2025-03-24 15:31:13.197340 | 4906,RMI TCP Connection(13)-127.0.0.1 ] Received PUT request for key: key5 and value: value5
[2025-03-24 15:31:13.198550 | 4907,RMI TCP Connection(14)-127.0.0.1 ] Received PUT request for key: key5 and value: value5
[2025-03-24 15:31:13.199852 | 4908,RMI TCP Connection(13)-127.0.0.1 ] Received PUT request for key: key5 and value: value5
[2025-03-24 15:31:13.205764 | 4904,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-24 15:31:13.206481 | 4905,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-24 15:31:13.207119 | 4906,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-24 15:31:13.207823 | 4907,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-24 15:31:13.208484 | 4908,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-24 15:31:13.209239 | 4904,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-24 15:31:13.209836 | 4905,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-24 15:31:13.210478 | 4906,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-24 15:31:13.211200 | 4907,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-24 15:31:13.211850 | 4908,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-24 15:31:13.212832 | 4904,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.213394 | 4905,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.213933 | 4906,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.214491 | 4907,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.215056 | 4908,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.235994 | 4904,RMI TCP Connection(14)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.238045 | 4905,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.239362 | 4906,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.240697 | 4907,RMI TCP Connection(14)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.241967 | 4908,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.243778 | 4904,RMI TCP Connection(14)-127.0.0.1 ] Received PUT request for key: key1 and value: null
[2025-03-24 15:31:13.244945 | 4905,RMI TCP Connection(13)-127.0.0.1 ] Received PUT request for key: key1 and value: null
[2025-03-24 15:31:13.246041 | 4906,RMI TCP Connection(13)-127.0.0.1 ] Received PUT request for key: key1 and value: null
[2025-03-24 15:31:13.247084 | 4907,RMI TCP Connection(14)-127.0.0.1 ] Received PUT request for key: key1 and value: null
[2025-03-24 15:31:13.248105 | 4908,RMI TCP Connection(13)-127.0.0.1 ] Received PUT request for key: key1 and value: null
[2025-03-24 15:31:13.253490 | 4904,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.254035 | 4905,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.254520 | 4906,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.255014 | 4907,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.255515 | 4908,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.256151 | 4904,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.256619 | 4905,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.257203 | 4906,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.257730 | 4907,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.258207 | 4908,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for key: key1
[2025-03-24 15:31:13.258809 | 4904,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for key: key2
[2025-03-24 15:31:13.259360 | 4905,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for key: key2
[2025-03-24 15:31:13.259903 | 4906,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for key: key2
[2025-03-24 15:31:13.260447 | 4907,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key2
[2025-03-24 15:31:13.261001 | 4908,RMI TCP Connection(12)-127.0.0.1 ] Received GET request for key: key2
[2025-03-24 15:31:13.281363 | 4904,RMI TCP Connection(14)-127.0.0.1 ] Received GET request for key: key2
[2025-03-24 15:31:13.284053 | 4905,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key2
[2025-03-24 15:31:13.285306 | 4906,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key2
[2025-03-24 15:31:13.286557 | 4907,RMI TCP Connection(14)-127.0.0.1 ] Received GET request for key: key2
[2025-03-24 15:31:13.287758 | 4908,RMI TCP Connection(13)-127.0.0.1 ] Received GET request for key: key2
[2025-03-24 15:31:13.289968 | 4904,RMI TCP Connection(14)-127.0.0.1 ] Received PUT request for key: key2 and value: value2_new
[2025-03-24 15:31:13.291167 | 4905,RMI TCP Connection(13)-127.0.0.1 ] Received PUT request for key: key2 and value: value2_new
[2025-03-24 15:31:13.292340 | 4906,RMI TCP Connection(13)-127.0.0.1 ] Received PUT request for key: key2 and value: value2_new
[2025-03-24 15:31:13.293413 | 4907,RMI TCP Connection(14)-127.0.0.1 ] Received PUT request for key: key2 and value: value2_new
[2025-03-24 15:31:13.294528 | 4908,RMI TCP Connection(13)-127.0.0.1 ] Received PUT request for key: key2 and value: value2_new
I've implemented a failure simulator (FailureSimulator.jar) that periodically kills and restarts acceptors at random intervals.
This has been added at the end of start_server.sh, so it automatically runs when you start the servers.
The client is resilient to failures of the acceptors with the help of the Paxos algorithm. The client will retry requests if it cannot reach a quorum for either updates or reads. At least n//2 + 1 agreements are needed to reach quorum, which technically means Paxos is only resilient to n//2 failures. However, in practice, we can handle more failures than this because we have retries on the client side that will perform exponentially-spaced retries up to 3 times, in which it is likely that one of those attempts will find a quorum. However, success is not guaranteed as we can get really unlucky and have lots of downtime due to the random nature of the failure simulator, but you will find that the tests pass pretty much all the time.
Here are some sample logs to demonstrate the resilience of the system to failures:
In the below logs, we demonstrate a simple PUT request, followed by a GET request, and then a DELETE request on the same key. Client Logs:
(base) adithyapalle@Adithyas-MacBook-Air Project4 % java -jar out/InteractiveClient.jar
[2025-03-25 23:53:55.656218 | 45488,main ] You did not provide a file for server addresses. Using default servers.txt file
[2025-03-25 23:53:55.679211 | 45488,main ] Specify an operation:
1. GET <key>
2. PUT <key> <value>
3. DELETE <key>
4. CLEAR
5. DISPLAY
6. QUIT
2
[2025-03-25 23:54:01.102097 | 45488,main ] Enter the key to put:
a
[2025-03-25 23:54:01.890215 | 45488,main ] Enter the value to put:
b
[2025-03-25 23:54:04.947508 | 45488,main ] PUT request sent to server
[2025-03-25 23:54:04.947705 | 45488,main ] Specify an operation:
1. GET <key>
2. PUT <key> <value>
3. DELETE <key>
4. CLEAR
5. DISPLAY
6. QUIT
1
[2025-03-25 23:54:06.866245 | 45488,main ] Enter the key to get:
a
[2025-03-25 23:54:07.697812 | 45488,main ] GET a request sent to server
[2025-03-25 23:54:07.736932 | 45488,main ] Server Response:
b
[2025-03-25 23:54:07.737144 | 45488,main ] Specify an operation:
1. GET <key>
2. PUT <key> <value>
3. DELETE <key>
4. CLEAR
5. DISPLAY
6. QUIT
3
[2025-03-25 23:54:11.519701 | 45488,main ] Enter the key to delete:
a
[2025-03-25 23:54:12.324408 | 45488,main ] DELETE request sent to server
[2025-03-25 23:54:16.498513 | 45488,main ] Specify an operation:
1. GET <key>
2. PUT <key> <value>
3. DELETE <key>
4. CLEAR
5. DISPLAY
6. QUIT
5
[2025-03-25 23:54:19.108229 | 45488,main ] DISPLAY request sent to server
[2025-03-25 23:54:19.131497 | 45488,main ] Server Response:
Key-Value Store:
[2025-03-25 23:54:19.131762 | 45488,main ] Specify an operation:
1. GET <key>
2. PUT <key> <value>
3. DELETE <key>
4. CLEAR
5. DISPLAY
6. QUIT
6
Server Logs:
Starting server on port 10001...
Starting server on port 10002...
Starting server on port 10003...
Starting server on port 10004...
Starting server on port 10005...
[2025-03-25 23:53:42.835472 | 45469,main ] RMI Registry started on port: 10001
[2025-03-25 23:53:42.846276 | 45469,main ] Starting Learner ...
[2025-03-25 23:53:42.839456 | 45470,main ] RMI Registry started on port: 10002
[2025-03-25 23:53:42.850920 | 45470,main ] Starting Learner ...
[2025-03-25 23:53:42.824549 | 45471,main ] RMI Registry started on port: 10003
[2025-03-25 23:53:42.851052 | 45471,main ] Starting Learner ...
[2025-03-25 23:53:42.838790 | 45472,main ] RMI Registry started on port: 10004
[2025-03-25 23:53:42.855591 | 45472,main ] Starting Learner ...
[2025-03-25 23:53:42.857110 | 45469,main ] Starting Acceptor ...
[2025-03-25 23:53:42.840932 | 45473,main ] RMI Registry started on port: 10005
[2025-03-25 23:53:42.858784 | 45473,main ] Starting Learner ...
[2025-03-25 23:53:42.861353 | 45471,main ] Starting Acceptor ...
[2025-03-25 23:53:42.861685 | 45469,main ] Acceptor and Learner ready
[2025-03-25 23:53:42.862567 | 45470,main ] Starting Acceptor ...
[2025-03-25 23:53:42.863466 | 45471,main ] Acceptor and Learner ready
[2025-03-25 23:53:42.865203 | 45470,main ] Acceptor and Learner ready
[2025-03-25 23:53:42.866348 | 45473,main ] Starting Acceptor ...
[2025-03-25 23:53:42.866802 | 45472,main ] Starting Acceptor ...
[2025-03-25 23:53:42.868250 | 45473,main ] Acceptor and Learner ready
[2025-03-25 23:53:42.868797 | 45472,main ] Acceptor and Learner ready
Starting server on port 10001...
Starting server on port 10002...
Starting server on port 10003...
Starting server on port 10004...
Starting server on port 10005...
Started failure simulator
[2025-03-25 23:53:45.825037 | 45483,main ] You did not provide the path to the servers.txt file, just using the default.
[2025-03-25 23:53:45.907150 | 45483,main ] Attempting to connect to replica at localhost:10001
[2025-03-25 23:53:45.909255 | 45479,main ] Registry already exists. Connecting to existing one...
[2025-03-25 23:53:45.912889 | 45478,main ] Registry already exists. Connecting to existing one...
[2025-03-25 23:53:45.919577 | 45482,main ] Registry already exists. Connecting to existing one...
[2025-03-25 23:53:45.937176 | 45481,main ] Registry already exists. Connecting to existing one...
[2025-03-25 23:53:45.933653 | 45480,main ] Registry already exists. Connecting to existing one...
[2025-03-25 23:53:45.999578 | 45478,main ] Proposer ready
[2025-03-25 23:53:46.002906 | 45479,main ] Proposer ready
[2025-03-25 23:53:46.008890 | 45483,main ] Attempting to connect to replica at localhost:10002
[2025-03-25 23:53:46.019242 | 45483,main ] Attempting to connect to replica at localhost:10003
[2025-03-25 23:53:46.019841 | 45482,main ] Proposer ready
[2025-03-25 23:53:46.029151 | 45481,main ] Proposer ready
[2025-03-25 23:53:46.034820 | 45483,main ] Attempting to connect to replica at localhost:10004
[2025-03-25 23:53:46.039616 | 45483,main ] Attempting to connect to replica at localhost:10005
[2025-03-25 23:53:46.045203 | 45483,main ] Starting failure simulator for acceptor 0
[2025-03-25 23:53:46.045992 | 45483,main ] Starting failure simulator for acceptor 1
[2025-03-25 23:53:46.046122 | 45483,main ] Starting failure simulator for acceptor 2
[2025-03-25 23:53:46.046318 | 45483,main ] Starting failure simulator for acceptor 3
[2025-03-25 23:53:46.046581 | 45483,main ] Starting failure simulator for acceptor 4
[2025-03-25 23:53:46.048543 | 45480,main ] Proposer ready
[2025-03-25 23:53:51.390462 | 45469,RMI TCP Connection(3)-127.0.0.1 ] Shutting down acceptor
[2025-03-25 23:53:51.518823 | 45473,RMI TCP Connection(3)-127.0.0.1 ] Shutting down acceptor
[2025-03-25 23:53:51.601884 | 45471,RMI TCP Connection(3)-127.0.0.1 ] Shutting down acceptor
[2025-03-25 23:53:53.270072 | 45472,RMI TCP Connection(3)-127.0.0.1 ] Shutting down acceptor
[2025-03-25 23:53:53.634865 | 45483,Thread-2 ] Starting Acceptor ...
[2025-03-25 23:53:53.658373 | 45483,Thread-2 ] Acceptor 2 restarted
[2025-03-25 23:53:53.711298 | 45483,Thread-0 ] Starting Acceptor ...
[2025-03-25 23:53:53.717474 | 45483,Thread-0 ] Acceptor 0 restarted
[2025-03-25 23:53:53.923770 | 45470,RMI TCP Connection(3)-127.0.0.1 ] Shutting down acceptor
[2025-03-25 23:53:54.342712 | 45483,Thread-4 ] Starting Acceptor ...
[2025-03-25 23:53:54.353293 | 45483,Thread-4 ] Acceptor 4 restarted
[2025-03-25 23:53:55.290420 | 45483,Thread-3 ] Starting Acceptor ...
[2025-03-25 23:53:55.300524 | 45483,Thread-3 ] Acceptor 3 restarted
[2025-03-25 23:53:56.636820 | 45483,Thread-1 ] Starting Acceptor ...
[2025-03-25 23:53:56.646124 | 45483,Thread-1 ] Acceptor 1 restarted
[2025-03-25 23:53:59.847790 | 45483,RMI TCP Connection(6)-127.0.0.1 ] Shutting down acceptor
[2025-03-25 23:54:01.208168 | 45483,RMI TCP Connection(6)-127.0.0.1 ] Shutting down acceptor
[2025-03-25 23:54:01.284814 | 45483,RMI TCP Connection(6)-127.0.0.1 ] Shutting down acceptor
[2025-03-25 23:54:01.638559 | 45483,RMI TCP Connection(6)-127.0.0.1 ] Shutting down acceptor
[2025-03-25 23:54:02.224933 | 45483,Thread-2 ] Starting Acceptor ...
[2025-03-25 23:54:02.232984 | 45483,Thread-2 ] Acceptor 2 restarted
[2025-03-25 23:54:02.677942 | 45483,RMI TCP Connection(6)-127.0.0.1 ] Shutting down acceptor
[2025-03-25 23:54:02.801693 | 45482,RMI TCP Connection(2)-127.0.0.1 ] Waiting 1 seconds before retrying
[2025-03-25 23:54:03.485966 | 45483,Thread-0 ] Starting Acceptor ...
[2025-03-25 23:54:03.493509 | 45483,Thread-0 ] Acceptor 0 restarted
[2025-03-25 23:54:03.645560 | 45483,Thread-4 ] Starting Acceptor ...
[2025-03-25 23:54:03.653598 | 45483,Thread-4 ] Acceptor 4 restarted
[2025-03-25 23:54:03.835979 | 45469,RMI TCP Connection(3)-127.0.0.1 ] Received GET request for key: a
[2025-03-25 23:54:03.845127 | 45482,RMI TCP Connection(2)-127.0.0.1 ] Waiting 1 seconds before retrying
[2025-03-25 23:54:04.499102 | 45483,Thread-1 ] Starting Acceptor ...
[2025-03-25 23:54:04.507153 | 45483,Thread-1 ] Acceptor 1 restarted
[2025-03-25 23:54:04.669713 | 45483,Thread-3 ] Starting Acceptor ...
[2025-03-25 23:54:04.674769 | 45483,Thread-3 ] Acceptor 3 restarted
[2025-03-25 23:54:04.866479 | 45470,RMI TCP Connection(3)-127.0.0.1 ] Received GET request for key: a
[2025-03-25 23:54:04.877812 | 45471,RMI TCP Connection(3)-127.0.0.1 ] Received GET request for key: a
[2025-03-25 23:54:04.886680 | 45472,RMI TCP Connection(3)-127.0.0.1 ] Received GET request for key: a
[2025-03-25 23:54:04.894450 | 45473,RMI TCP Connection(3)-127.0.0.1 ] Received GET request for key: a
[2025-03-25 23:54:04.902257 | 45469,RMI TCP Connection(3)-127.0.0.1 ] Received PUT request for key: a and value: b
[2025-03-25 23:54:04.906886 | 45470,RMI TCP Connection(3)-127.0.0.1 ] Received PUT request for key: a and value: b
[2025-03-25 23:54:04.910921 | 45471,RMI TCP Connection(3)-127.0.0.1 ] Received PUT request for key: a and value: b
[2025-03-25 23:54:04.914958 | 45472,RMI TCP Connection(3)-127.0.0.1 ] Received PUT request for key: a and value: b
[2025-03-25 23:54:04.920147 | 45473,RMI TCP Connection(3)-127.0.0.1 ] Received PUT request for key: a and value: b
[2025-03-25 23:54:07.711906 | 45469,RMI TCP Connection(7)-127.0.0.1 ] Received GET request for key: a
[2025-03-25 23:54:07.718846 | 45470,RMI TCP Connection(7)-127.0.0.1 ] Received GET request for key: a
[2025-03-25 23:54:07.725407 | 45471,RMI TCP Connection(7)-127.0.0.1 ] Received GET request for key: a
[2025-03-25 23:54:07.731251 | 45472,RMI TCP Connection(7)-127.0.0.1 ] Received GET request for key: a
[2025-03-25 23:54:07.736188 | 45473,RMI TCP Connection(7)-127.0.0.1 ] Received GET request for key: a
[2025-03-25 23:54:11.639716 | 45483,RMI TCP Connection(6)-127.0.0.1 ] Shutting down acceptor
[2025-03-25 23:54:12.386689 | 45469,RMI TCP Connection(3)-127.0.0.1 ] Received GET request for key: a
[2025-03-25 23:54:12.390965 | 45470,RMI TCP Connection(3)-127.0.0.1 ] Received GET request for key: a
[2025-03-25 23:54:12.397660 | 45478,RMI TCP Connection(2)-127.0.0.1 ] Waiting 1 seconds before retrying
[2025-03-25 23:54:12.435249 | 45483,RMI TCP Connection(6)-127.0.0.1 ] Shutting down acceptor
[2025-03-25 23:54:12.696550 | 45483,RMI TCP Connection(6)-127.0.0.1 ] Shutting down acceptor
[2025-03-25 23:54:13.237376 | 45483,RMI TCP Connection(6)-127.0.0.1 ] Shutting down acceptor
[2025-03-25 23:54:13.261207 | 45483,RMI TCP Connection(6)-127.0.0.1 ] Shutting down acceptor
[2025-03-25 23:54:13.408620 | 45478,RMI TCP Connection(2)-127.0.0.1 ] Waiting 2 seconds before retrying
[2025-03-25 23:54:14.380400 | 45483,Thread-2 ] Starting Acceptor ...
[2025-03-25 23:54:14.387545 | 45483,Thread-2 ] Acceptor 2 restarted
[2025-03-25 23:54:14.476040 | 45483,Thread-3 ] Starting Acceptor ...
[2025-03-25 23:54:14.482603 | 45483,Thread-3 ] Acceptor 3 restarted
[2025-03-25 23:54:15.239236 | 45483,Thread-1 ] Starting Acceptor ...
[2025-03-25 23:54:15.246279 | 45483,Thread-1 ] Acceptor 1 restarted
[2025-03-25 23:54:15.426854 | 45471,RMI TCP Connection(3)-127.0.0.1 ] Received GET request for key: a
[2025-03-25 23:54:15.434651 | 45472,RMI TCP Connection(3)-127.0.0.1 ] Received GET request for key: a
[2025-03-25 23:54:15.443606 | 45478,RMI TCP Connection(2)-127.0.0.1 ] Waiting 1 seconds before retrying
[2025-03-25 23:54:16.238793 | 45483,Thread-4 ] Starting Acceptor ...
[2025-03-25 23:54:16.245998 | 45483,Thread-4 ] Acceptor 4 restarted
[2025-03-25 23:54:16.323466 | 45483,Thread-0 ] Starting Acceptor ...
[2025-03-25 23:54:16.329837 | 45483,Thread-0 ] Acceptor 0 restarted
[2025-03-25 23:54:16.459684 | 45473,RMI TCP Connection(3)-127.0.0.1 ] Received GET request for key: a
[2025-03-25 23:54:16.476212 | 45469,RMI TCP Connection(3)-127.0.0.1 ] Received PUT request for key: a and value: null
[2025-03-25 23:54:16.483078 | 45470,RMI TCP Connection(3)-127.0.0.1 ] Received PUT request for key: a and value: null
[2025-03-25 23:54:16.485378 | 45471,RMI TCP Connection(3)-127.0.0.1 ] Received PUT request for key: a and value: null
[2025-03-25 23:54:16.487429 | 45472,RMI TCP Connection(3)-127.0.0.1 ] Received PUT request for key: a and value: null
[2025-03-25 23:54:16.489412 | 45473,RMI TCP Connection(3)-127.0.0.1 ] Received PUT request for key: a and value: null
[2025-03-25 23:54:19.112081 | 45469,RMI TCP Connection(7)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-25 23:54:19.119758 | 45470,RMI TCP Connection(7)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-25 23:54:19.123017 | 45471,RMI TCP Connection(7)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-25 23:54:19.126156 | 45472,RMI TCP Connection(7)-127.0.0.1 ] Received GET request for all key-value pairs
[2025-03-25 23:54:19.129850 | 45473,RMI TCP Connection(7)-127.0.0.1 ] Received GET request for all key-value pairs
Note that DELETE requests from the client are received as PUT(key, null) requests on the server side. As seen from the above, the server is able to perform retries to handle the acceptors inconveniently failing amidst client requests. As a result, the client is able to perform business as usual with occasional minor delays due to retry attempts. This shows the power of Paxos and its ability to handle failures gracefully, demonstrating its ability to preserve availability and consistency in real-world scenarios.
You can also just run the run_tests.sh script while the failure simulator is running to see the system work in spite of failing acceptors.