I have a system – a micro service architecture platform – built on Node.js. It can run on a single computer or distributed. It is a quite small system but quite critical that it works correctly.
Under what circumstances would the system fail to work correctly? How much load can it handle? How does it behave under too heavy load?
Stress testing is difficult, and expensive. Ideally you have plenty of test clients simulating realistic usage. It can be done, but often not easily. A simple and cheap option is to run the system on less resources.
My system used to run perfectly on a Raspberry Pi. The tests work fine. I have also kept the integrationtests working (although there have been timing issues). However, the other day I tried to restore production data to the Raspberry Pi, and it failed to run properly. Problems were
- High latency and timeouts
- Heavy swapping
- Escallating retries, making the situation worse
The last point is particularly interesting. Error handling is designed for stability and recovery, but it risks increasing the total load, making the system even more unstable.
I did make the system work on a RPi again, and in doing so I leant about real problems, and fixed them. It is an interesting excersise in finding problems in systems that don’t work properly, and it is a practical way to “measure first, optimize second”.
Does your system work, with a reasonable amount of production data, on a Raspberry Pi?
0 Comments.