Introduction
Speed is a key selling point for every FinTech crypto project that seeks to re-invent a corner of the industry. It may seem to the bystander that blockchain technology is inherently faster than traditional methods since almost every project advertises their service to be faster, better and cheaper. This is not true. Blockchains come at different speeds and some are not good enough for the FinTech environment. Recently, Scandiweb went on a quest to find a solution to the low rate of transactions per second (TPS) of the Ethereum blockchain for an external project. On our search for a fast, reliable and cost-effective blockchain, we came across JPMorgan’s Quorum, which is an Ethereum fork. On paper, it looked like just what we’re looking for to solve our partner’s challenging goals of:
- Free transactions
- Instant transactions
- Large throughput
- Managed privacy of transactions
In this article, you will learn about the extensive performance testing we went through to determine whether Quorum was the right solution for us. We will take a look at our methodology, then proceed to the results and hurdles encountered and, in conclusion, share the valuable knowledge gained.
Methodology & Set-up
First off, we needed to develop a methodology on how we will perform these tests to get the result we need and here’s what we decided:
- We will have 2 types of nodes — blockchain nodes and spam nodes. Spam nodes represent people who connect to blockchain nodes to spam transactions.
- We will set up and manage an infrastructure on AWS that resembles the real world on a small scale — blockchain and spam nodes spread throughout the whole world (in the regions that AWS allows us to do so).
- Each spam node will blast transactions to its assigned blockchain node.
- The transaction spamming will happen using a web3.js library and several Node.js scripts written in-house to facilitate the transaction spamming and later analysis of results.
- On each spam node, transaction spamming will happen from one account to infinite random accounts.
- Once transaction spamming is done, we will read the blockchain to determine the effective TPS rate as well as read each spam node’s saved local data to determine the latency of transaction sending and their inclusion in the blockchain
AWS node infrastructure was managed using Terraform, which allowed us to quickly spawn as many nodes as we wanted. The software side was managed by user data which allowed us to have the same setup on every spawned node.
A sample from our Terraform code — a Quorum node.
Onboarding with Quorum was fairly straightforward since we have extensive experience with Ethereum. Initially, there were issues with permissioned nodes not connecting to each other, which were solved by using a fixed static-nodes.json permissions file (instead of dynamically updating it with every new node added). This setup was stable and we proceeded to the testing phase!
The Tests
We performed our first tests using a conservative rate just to find out if everything works out of the box. We sent 10 transactions per second from 10 different clients. The results were what we expected — the blockchain could process around 100 TPS with ease. This, of course, is a very low number, so we gradually went higher until we hit a wall at around 100 transactions being sent per second from a single client.
Error #1
At this point, transactions began failing with this error:
*Error: Invalid JSON RPC response: “Error: read ECONNRESETn at _errnoException (util.js:1022:11)n at TCP.onread (net.js:615:25)”*
After a quick investigation on what’s going on we found this in the server logs:
*TCP: request_sock_TCP: Possible SYN flooding on port 22000.*
We followed it up with a more extensive investigation of why transactions are not going through. We discovered that by using the Web3.js HTTP provider, every transaction being sent opened a new TCP connection on the server, which quickly filled up the allowed file descriptor limit.
We tried to increase this limit, however, it turned out that Quorum is using an older version of Geth which basically ignores the file descriptor limit that you set.
Solution #1
What we found to be the best solution to this was using WebSockets instead of HTTP since it doesn’t open up infinite connections on the server and we can spam transactions at any rate that we wish.
Error #2
The fun didn’t end there. After this, during the creation of transactions, the following issue was being returned:
*Error: Number can only safely store up to 53 bits*
What made it particularly difficult to understand was that it wasn’t present while using HTTP provider, but suddenly popped up when connecting via WebSockets.
Deep investigation of this error revealed an issue with web3.js caused by the nature of Quorum. Inside the web3-core-helpers dependency, there’s some formatter code, which contains this line:
*utils.hexToNumber(block.timestamp)*
It’s very straightforward — it just turns a hexadecimal number into a BigNumber. The issue is that Ethereum returns the timestamp in seconds, but Quorum returns a timestamp in nanoseconds (since the block times can be significantly shorter), and that value does not fit into 53 bits.
Solution #2
For the purposes of this performance testing, we simply divided the timestamp by 1⁰³, which gave us just enough room to fit into a BigNumber without losing much precision.
Significant problem
Now that we could finally spam the transactions, everything seemed to go smoothly, until it did not. We amped up our tests to a more heavy spamming style of around 150 transactions per second from a single client (1500 TPS total). At first, everything seemed to be great — transactions were going through, nothing was being dropped. Then after stopping the scripts, and having sent around 30 thousand transactions, we were waiting for them to be mined, however, only some 3–4 thousand were even after 20 minutes of waiting.
Attempted solution #1
Our first thought was that the transaction nonce (double spend protection number) was not being updated fast enough, and maybe there were multiple transactions being sent with the same nonce, even though we did not see an error indicating that. To exclude this option from the possible issues, we remade the transaction spamming scripts by introducing manual nonce updating. Now we made sure that the nonce is correct for all the transactions, but that did not seem to resolve the issue.
Not our fault
We then dug even deeper, again expecting some issue on our end, and what we found is that Quorum has issues with transaction pooling. What we noticed was that the transaction pool is frozen — there could be 20k transactions just waiting to get mined forever. After an extensive research, we stumbled upon a Chinese article explaining the issue, and that there’s no good solution to it, rather than limiting the transaction sending rate, checking for the transaction pool before sending transactions, or increasing the pool limit.
Attempted solution #2
For the purposes of this TPS testing, we tried to increase the pool limit, and it seemed to be working. However, this did not affect the TPS and actually decreased the average TPS since if there are, let’s say 10,000 transactions pooled, it can take significant time for the nodes to process the pool, or these transactions can even get stuck forever, as explained in the article above.
For example from our tests, processing 15k transactions in the pool took around 10 minutes.
Further Tests
Long-term, high load transaction spamming will not yield high TPS results since the pool will get filled up and lower the average TPS. To simulate a burst activity and find out what’s the effective transaction rate, we moved on to small-scale tests, small-scale meaning that the transactions are spammed for short periods of time.
We focused on C5 type instances as they are processing-optimized. Respective i3, A.K.A high I/O instances, provided no significant increase compared to these results in our testing.
Here are our selected best results on 3 different instance sizes using small-scale testing:
What’s notable in these results is the massive difference in latency. C5.2xlarge average latency was 5.2 seconds, however, for the largest instance, c5.18xlarge, it was only 0.34 seconds. The transaction processing maximum and average values are also, as expected, increasing with the performance of the instances.
Conclusions
We gained a lot of interesting insights and results. Some of the main conclusions are:
- Proper Quorum blockchain setup is not trivial
- Quorum has issues with transaction pool processing, which can either be painfully slow or infinite
- The effective small-scale TPS rate on Quorum is on average 2.3 thousand transactions per second, using a c5.18xlarge instance from AWS.
What does this mean for the project? Despite the mentioned flaws, the benefits of Quorum far outweigh the negatives and it’s a significant improvement over the Ethereum blockchain! Currently, we see Quorum as a good fit for our project, and we plan to continue working with it to further try to improve its performance and make it perfect for our project.
Find out more about our Blockchain solutions here!
Blockchain solutions, ICO’s, Microservice Architecture — Need help with your crypto project? Get in touch with us at [email protected]!
Share on: