When programming, thinking small often allows for a quick start but after a while your project slows down as it grows with pain. Based on every such painful experience there are countless of good practices in programming for thinking big. However, thinking big is difficult and comes with overhead, and if you think too big there is a risk you will only think big, and not think about your problem and your actual code very much.
I will give a number of examples of smaller and bigger thinking (don’t get stuck reading the list).
Memory usage
Small thinking: everything fits in RAM
– smaller: everything fits in CPU cache, or CPU registries
Big thinking: using external storage, streaming, compression
– bigger: scaling out
Parallelism
Small thinking: single process and thread
Big thinking: multi threading, sending work to other processes
– bigger: scaling out
Portability
Thinking small: portable by standard compliance
– smaller: single platform
Thinking big: target specific tweaks, build and configuration options
– bigger: target specific dependencies (Mysql for Linux, MS SQL for windows)
Source management
Small thinking: versioned tarballs
– smaller: just a single local file
Big thinking: a git/svn repository
– bigger:several repositories, bug tracker, access rights
Building
Small thinking: single standard compile command
– smaller: no building required in the first place
Big thinking: make
– bigger: autoconf, tools and configuration required (Babel)
– even bigger: build a build-and-config-system (like menuconfig for Linux kernel)
Testing
Small thinking: assert()
Big thinking: unit tests
– bigger: test driven development, test coverage analysis
– even bigger: continuous integration
Configuration
Small thinking: command line options
– smaller: hard coded
Big thinking: configuration file
– bigger: configuration (G)UI
– even bigger: download configuration, find out configuration itself, selection of different configurations (like XML-file, JSON-file or database)
Error handling
Small thinking: crash with error message
Big thinking: log file(s), verbose levels
– bigger: error recovery, using system logs (like Windows event log)
– even bigger: monitoring, choice of different external log systems
UI
Thinking small: single CLI or GUI
Thinking big: build a backend library or server that allows for different UIs
Dependencies (code)
Thinking small: only standard library
Thinking big: require libraries (external and own code broken out to libraries)
– bigger: optional dependencies, supporting different libraries that do the same thing
– even bigger: dependencies can be loaded dynamically during run time
Dependencies (databases, services)
Thinking small: no dependencies
Thinking big: external storage
– bigger: allow multiple clients against common storage
– even bigger: distributed, scaled out, storage
State
Small thinking: functions and data
– smaller: rely on global data
Big thinking: encapsulation (OO-style), immutable data (FP-style)
Generics
Small thinking: functions are specific for data
Big thinking: generic functions by templates, interfaces, generators, iterators
Performance
Small thinking: code is fast enough
Big thinking: architecture allows scaling out for more performance as required
Deployment
Small thinking: manual copy-replace is good enough
Big thinking: testing, continuous integration, rollback, zero-downtime
Automation
Thinking small: automation not needed since all tasks are simple enough
Thinking big: automation makes complex tasks fast and easy
One size does not fit all
The important thing to understand is that there is no silver bullet. Each program, problem or project has its own requirements and its own sweet spot of small-vs-big. But this my change over time; even if you need to be a bit big later on, it may not help in the beginning.
Perhaps obvious; it is not meaningful to say that a full CI-environment is better than assert() (and you may argue that they are entirely different things). Having global data is not (for all problems) worse than having completely immutable state. And so on.
Your need for big varies within a project: you may need a very big and configurable build process to build something that has very small requirements when it comes to scaling out.
There are no safe default choices
You need to make qualified small-vs-big choices. If you are an experienced programmer you often don’t need to think much about it. If you work within an environment where you already master tools that are available (perhaps even mandatory) you can use them with little overhead. However, if you take your environment (perhaps just an IDE, or more) for granted and you rely on it, it may not be so easy for someone else to pick up where you left.
Just as you can fall behind others who use better tools you can grow fat and fall behind those who use fewer tools (and stay smaller).
If in doubt: start small
Going small in every aspect is often not good enough (except for very isolated problems). But it can be a good start and if (or when) it fails you will learn from your mistake. You will understand how to architect your software better and you will understand why some (big) tools and practices really exist. That is good wisdom!
Going big in every aspect is most definitely going to make you fail. You may need to do this for building systems like SAP or Windows (but such large project often do fail). If you fail with something far too big it is hard to learn from it. Chances are you never really got down to the requirements and chances are much energy was spent integrating tools and frameworks into a development and operation environment that worked at all.
Small sometimes goes a long way
There are often theoretical discussions about small-vs-big. Big often looks attractive and powerful. However, some problems are just hard regardless how you solve them, and a small solution is often more right on target.
There was a macro-kernel vs micro-kernel discussion. A micro kernel is a big solution: more encapsulation, more isolation, less global data, more dynamic loading and so on. Linux is obviously more successful than HURD (the GNU micro kernel), mostly because it actually works.
Agile and Refactoring
Agile and refactoring are about encouraging you to start small, make things that are good enough for now, and fix them later on (if ever needed). Often the problem down the road is not what you expected when you started.
Architecture, Microservices, UNIX
The UNIX principle is that everything is a program that does one thing well.
Microservices is much the same thing except it spans over several networked services.
This works. Because most of the time, for most purposes, the developers (of UNIX and Microservices) can think small. Most programs in a UNIX system, like most services in a Microservice architecture, are for most practical purposes small programs or small services.
UNIX: some programs need to be highly secure, some need an interactive UI, some need to log, some have high performance requirements, some have dynamic dependencies and some are better not written in C. This is why you should build a microservice architecture (not a monolith) and this is how you should build it (unless you are as good as Torvalds and you can land a Monolith in C – but that works thanks to very good architecture and practices – and Linux is still just the Kernel in a much bigger system).
Limited time
All software projects have limited time available. Time is spent on:
- Understanding requirements
- Producing code that correctly and efficiently matches the requirements
- Test and deployment
- Solution architecture
- Tools and frameworks: understanding and integration
#1 deliver value even on its own (sometimes a technical solution is not even required).
#4 and #5 only deliver value if the they save total time (by lowering #2 and #3).
#2 sometimes is just not possible without #5, then please go ahead with #5.
But if #2 takes one week if you use Notepad to code a single index.html file containing HTML+CSS+JavaScript (and this solves the requirements), then there must be a good case for spending time on #4 and #5 (going big) instead of just solving the problem (staying small).
#4 and #5 produce what I call invented problems; problems that you did not have in the first place, that are not related to your requirements but comes with your tools. The most obvious example is licencing issues. If you go multithreading and/or use an external database you suddenly have deadlocks, race conditions, transactions and semaphores to worry about: is that price worth it for what you get from the database or multithreading? Deployment (and server configuration) is absolutely necessary, often rather complicated, and delivers no value to the customer what so ever.
Always ask yourself: how hard would it be to solve this problem using the smallest reasonable set of tools?
Maintain vs Replace
Many big practices are about producing maintainable code. Often this never pays off:
- There is no need for the code anymore
- The code does what it needs to do, and no change is required
- Even though the code itself is maintainable, no one understands the problem and the solution well enough to actually improve (or even change) it
When (if) the moment of change actually comes, often a fresh start is anyway the best solution. If programs are made small and they do one thing well (so it is quite easy to test that another program can replace it) replacing them is not a big deal.
This means that ugliness (global variables, lack of encapsulation, hard coded limitations, lack of proper test coverage, inability to scale, and so on) often is not a problem. On the other hand, a (big) program that is not fit for purpose (not correct and efficient) never produce much value in the fist place.
Performance (and Scaling)
Golden rule of optimization:
- Don’t
- Experts only: see (1)
This is not entirely true but most of your code is not performance critical. In computing, there are two ways you can get faster:
- Go small: find ways to make your code require less resources
- Go big: assign more resources to run your code
The truth is that modern hardware is extremely powerful. Even a Raspberry Pi V1 (with 700MHz CPU and 512MB RAM) can serve enormous amounts of network requests or crunch amazingly many numbers. If a Raspberry Pi is not enough for you, you either have
- very many users
- a very complicated/large/heavy problem
- coded a solution that mostly wastes resources
If you know that #1 (only) is your case, go ahead and scale out big. Be sure to know your bottlenecks and seriously consider your storage model.
If #2 is your case you need to sit down and think.
If #3 is your case, you should have stayed small from the beginning. It is probably cheaper to rewrite significant parts of your solution in C (or another language that uses minimal resources) and keeping all data in RAM, than it is to scale your code out.
Availability (and Redundancy)
You may need high availability: downtime, unexpected or not, is expensive.
The big solution is to go for redundancy: if one goes down the other takes over. This can be the right thing to do – when everything else has already been tried. Sometimes the cure is worse than the disease.
The small solution is to keep your program simple and when something unexpected happens, let it crash. This way you will quite soon (pre production) have nailed out the critical errors. And if you can not make it stable no redundancy or fault-tolerant environment will really save you.
Conclusion
When going big understand the cost. The road to hell is sided by good intentions. Beware of grande architectures.
0 Comments.