I have worked for 20 years creating business value and utility using software. A common discussion is: should we build this piece of software ourselves, or should we search for something that already solves our problem and download or buy that?
I will share my experiences. It is just anecdotes of course. But it is also about learning from mistakes I have seen and experienced.
BizTalk 2002
My first job was about implementing BizTalk 2002 as a message broker between several ERP systems. In hindsight, the actual requirements were:
- Moving XML files from one network drive to another
- Simple XML file mapping / transformation (in the actual case, this could probably have been avoided altogether with a proper architecture in the first place)
- Error handling bad messages – what was wrong with a particular message
- Different queues, so that many messages that are not time critical do not delay an urgent message
- Being able to flush queues in case large amounts of messages are received in error (this happened several times)
I think a competent programmer could have built this (in 2002) using Python, IIS (for GUI) and the filesystem on Windows in a few weeks.
We were using BizTalk 2002 and MSMQ (Microsoft Message Queueing) instead of file shares and we ran into problems like:
- GUI based BizTalk configuration stored in SQL database made automated configuration and deployment a nightmare.
- MSMQ was limited to (somewhat less than) 2GB of total message storage, viewing and deleting messages in MSMQ was quite combersome when there were thousands or tens of thousands of messages
- BizTalk throughput was quite horrible, since messages passing through the platform was read and written to multiple SQL tables along the way
- Errors were logged in Windows Event Log (all in the same log), with a GUID identifying to the actual message. All failed messages (regardless of type or integration) were in a single view, identified by its guid, and only cross searching the event log would give details about a particular message.
- All messages were being processed in a single queue, with no way to give anything priority.
- Deleting messages that had been received in errors took literally several hours of manual work in the GUI
At the time, the license cost alone for a cluster with two BizTalk 2002-servers and two SQL Servers amounted to approximately 12-24 months of salary for a programmer.
The SQL Server Cluster is its own story. We had dozens or hundreds of SQL Servers in our organisation, but this was the only application considered critical enough for a cluster. So nobody knew how the cluster really worked and we had so many problems with it. After a few years we just turned off one of the nodes and the other node was doing fine until the plug was eventually plugged for the entire system.
BizTalk 2002 probably lived in production for 10-12 years, half of that time neither BizTalk, Windows 2000 or Windows 2000 SQL Server were supported by Microsoft.
Upgrading to BizTalk 2004 was impossible, as it was a completely different product, and it was in many ways even worse then BizTalk 2002. We did evaluate it thoroughly.
webMethods
After the disaster with BizTalk 2002 we decided to get a new message broker. By now I had learnt the real requirements and I could have built what we needed in Python on IIS (or any reasonable language and any reasonable web server). However relying on own code was to scary for management so after a long evaluation we purchased webMethods (for a license cost of more than a yearly salary for a senior programmer). webMethods was not particularly suitable out of the box, but it was a decent Java-based development and server environment, with good support for working with XML. So I built the integration platform that I knew we needed using webMethods instead of Python. This was a great success for many years. webMethods was eventually purchased by Software AG, the direction of the product changed but the core features that we had limited ourselves to using were still in place (had we many of the advanced features of webMethods our architecture and investment would have broken down much quicker). Finally, a conflict over licensing, not technical issues, made my employer invest many man hours in a 1-1-migration from webMethods to another (proprietary) platform, but by that time I was busy doing other things. If we on the other had had built the message broker we needed in Python, it could have still been running 20 years later with little need for maintenance.
SQL Server Integration Services
Data was to be exported from an Oracle database to text files, and then loaded into a MS SQL Server database. We talk about 1 GB of data every day, completely replacing the old data in place. However, the two databases had different purposes and different structures, so it was not entirely trivial.
Of course, some genious project manager decided that some genious SQL Server consultants were going to load data into SQL Server using SSIS (a new version of it, that none of the consultants had any real proven experience with). After spending about a coffee break on modelling the SQL Database and months of work with SSIS, the SSIS package was a catastrophy. No data validation, no error handling, unacceptable performance. The data exported from Oracle was not 100% good, that was obvious, but the SSIS package could not output any sensible errors that could be passed back to the Oracle people (remember, same story as with BizTalk, all happy path and no reasonable error handling is what you get when you pay Microsoft large amounts of money for advanced software).
At this point I said: I will help by writing a Python script that will validate the Oracle data. How to validate that the data can be properly loaded? I wrote a script that loaded the data into the a SQL database. It took two weeks to program in Python, and we had fully working error handling, validation and good performance. Two consultants and many months of SSIS-work left the project.
SQL Server
The same data from above was now stored in a SQL Server that was the backend for a customer facing statistics application. The data was structured in hierarchies (locations, customers, articles) and writing SQL queries for arbitrarily deep recursive hierarchies is not that nice. It did not perform well either. In 2010 a typical report could take 10 seconds to generate on a very decent Windows server. This can be done properly with star schemas and denormalized data, but that was not how the consultants thought when they struggled to get any data imported with SSIS.
I was recovering from illness and was free to explore some concepts freely. The first Raspberry Pi was popular at the time (700MHz single core ARMv6, 512MB RAM). I took the data from the SQL Database above and tried to squeeze 2 GB of SQL exported data into a file that would fit in the RAM of the Raspberry Pi. It was easy. I basically just made an array of C-structs that contained denormalised records. All strings I put in a separate memory space, removing duplicates, and referenced them from the C-structs. It all fit in a 400 Mb file on the Raspberry Pi SD-card, which gave med 100Mb for Linux, the web server and the web application.
I wrote a cgi-binary in C, and that binary mmap-ed the 400 Mb file, did a “full table scan” and returned a few basic reports. That took less than a second, much faster than the Windows Server with SQL Server on vastly better hardware. This was just a proof-of-concept, but it was surprisingly simple and straight forward to write it in C, and traversing recursive hierarchies is a pleasure in C compared to SQL.
I am quite sure a properly designed SQL database would have performed just as well. But the thing is that SQL is used because it is supposed to help, when in fact it is not particularly suitable for full-table-scan style reports and hierarchical data. Even though competent people can make fantastic things with SQL it is not easy, and in many case not easier than just writing real code instead.
Optimization
In the company we had an old resource optimisation module (travelling salesman type problem), implemented directly in an ERP system. That system was to be replaced and it was obvious that the optimisation code could not be migrated in any way. The original developer did not want to make a new implementation in a new language. Many options were evaluated and no good solutions were found.
Finally I said: give me two weeks and I will write a proof-of-concept optimiser, and if you are happy with it we can add the extra features and make it production-worthy.
Inspired by my C-CGI project above, I wrote this thing in C and it was faster and better than the old module, which had taken several minutes, even occasionally blocking parts of that ERP system. My CGI-program typically ran in less than a second. It was so fast that most other operations, that were internal to the new ERP system, were slower than calling my external C-CGI code that was an NP-hard-problem-solver.
Obviously people were sceptic to my choice of C, even more sceptic when they learnt that I used C89 ANSI C with zero dependencies (for CGI, HTTP, SOAP, etc). But I had little use for that.
Now, 10 years later, a new project was again replacing the ERP system. This time everybody was very happy to see that it took very little effort to put the C-CGI into a docker container and running it in the cloud. It was also no problem adding a few features and improvements to the 10 year old code. Imagine if this had been written with some 2013 .NET web technology and a lot of dependencies, it would have been much harder to move on.
My role in this
I understand if the reader at this point wonders why I let the projects fail miserably all the time when there was an easy solution. The truth is that as a junior developer it is hard to see problems in the big picture, even harder to change peoples mind. Even as a senior developer I had my role and responsibilities (in another projects or teams), and often the only thing I could do was share my mind and then watch the train crash. I also suffered from burnout from trying to take responsibility for things I was not responsible for.
Management often wants big vendors that they can negotiate with and complain with, rather than putting their trust in their own developers.
I now had enough experience, and there were enough failed projects, for my words to weight a bit heavier with management.
Learning from this
I was privileged in the sense that I stayed for many years in the organisation and I saw what became of the systems and code after the projects were long closed. Many developers are sent into projects, they are given limited background information, they have limited and often exaggerated experience of the project tools, and then they leave the project about the time the system goes live.
What happens a month, a year, or five years after go live, they never know. The entire industry of software consultants is missing learning from long term feedback. Instead it is obsessed with always new knowledge of new unproven versions of relatively short lived tools.
I stayed in the organisation because I wanted to learn.
I realised that I need to work with people who share my passion for delivering long term stable solutions, not jump to new projects and technology every 6-12 months.
A new way of doing things
I formulated some principles for new projects and a new platform:
- Everything should be made as simple as possible, but not simpler (Einstein)
- Text files over binary files, for data and programs, whenever possible
- Use local files over any other forms of storage, whenver possible
- Minimize work related to dependency lifecycles (upgrades, retirement of components)
- Often it is better to rewrite something from scratch, so it is kind of more important to build replaceable small pieces of software that does one thing and does it well (unix principle), than implementing things perfectly or even documenting the internals
- Performance matters: slow code itself cause instability, bugs, extra work and problems
- As a learning developer, real learning is about methods, algorithms, concepts, analysis of requirements, design, testing – ideas that are valid decade after decade. Learning tools that have short life cycle is a waste of time and effort
- Most time should be spent learning about the (business/customer) problem (domain) and writing value-adding code.
- Working software is often replaced not because of new requirements but because the underlying technology is obsolete or not supported. This is waste. Often, progress is not possible because we are only dealing with yesterdays problems. Thus, software should be based on standards and stable products. As a developer I am intrigued by solving new problems with new code, not by replacing old working systems creating no real business value.
- Linux is superior to Windows. People who only work with Windows or appreciate Windows, have slowly over the years been trained to bad taste when it comes to software design choices.
Apart from this defensive programming (focus on error handling) and the Agile Manifesto are good principles.
On Complexity
Another skill that is critical, when seeing any software problem, is to being able to ask yourselves how long time it would take to implement a solution in a suitable general purpose language of choice (C, Python, JavaScript) and/or using the standard library. If you know your problem can be solved in 2 days using C, with 15 lines of JS-code and JS standard library, or in two weeks using Python you may not need a dependency (that will cost you time to learn and configure, require that you keep it updated, may be upgraded or abandoned, will add complexity and possibly bugs to your code). More often than not, the solution is already available as a standard command in your linux shell, or in the or the standard library. If you just look for it and know what you have.
I have found that many productivity tools (like XML mappers) simplify the easy cases and complicate the difficult cases. If you have a lot of simple cases and a lot of stupid programmers, perhaps that makes sense. But if your work is to solve problems, you are not helped by tools that assume your problem is easy. When you have a difficult problem you don’t want to solve the problem while also working around the tool. This is somewhat similar to the happy-path and error handling. Many tools and technologies (like Promises) seem to simplify the happy-path, but when error handling is priority (which it always should be) the tools often dont help much.
Typescript is supposed to be superior to JavaScript because it checks parameter types. I learnt programming ADA so I am not impressed. If you are serious about type checking or validation, you need to check that numbers are within valid ranges, strings represent valid things, arguments are coherent with each other, objects make sense in the real world. Typescript helps some stupid simple errors, but not to define and validate real business objects in a stable way.
How is it going?
Since almost 10 years I am running a software platform with multiple applications based on Node.js and Vue.js and the principles mentioned above. Stability and performance is good, new code is deployed to production almost daily and test coverage is decent. Even better, the business has confidence in what I do so I have more or less complete freedom as developer/architect, and I spend no time on project plans, budgets, time reporting, or bureaucratic processes.
I will describe a few relevant choices.
Angular.JS
In 2014 Angular.js (Angular v1) seemed to be a good choice and it was quite trendy. Since then the people behind Angular.js have abandoned it, Vue and React have emerged, and current versions of Angular are mostly not compatible at all with Angular.js.
I adopted Angular.js completely. The good thing is that only a small subset of all features of Angular.js were used and no Angular.js extensions/plugins (such as a router) were used, so it has been quite easy to simplify the old Angular controllers and migrate them to Vue2 and later Vue3. This migration is almost complete by now.
According to my own principles I should not have used Angular.js but relied on “Vanilla” JS and direct DOM manipulation instead. I do not know, in hindsight, if that would have been better. But I didn’t have and still do not have the knowledge about direct DOM manipulation to build big complex SPAs without using Angular or Vue.
But it is interesting to note that Angular.js was the only major dependency the platform relied on, it aged badly, and replacing it has been rather costly.
I can note that Web Components (the standard) was not a realistic option in 2015, and it was not a realistic option years later when I evaluated it either. So possibly Vue3 is a good choice today.
Argon2
Password “encryption” is done using Argon2. I wrote a separate post about it. I am obviously not implementing the encryption code myself, I would never do anything like that.
HTML Canvas / Graphs / Maps
There have been some need to generate graphs or maps. This has been done with standard HTML/JavaScript Canvas. None of the graphics generated are particularly standard-looking and a simple graph/map library would not have met the requirements.
Other dependencies
There are other dependencies for things that make no sense to implement directly in JavaScript.
- wkhtmltopdf – used to create PDF, comes with Debian
- libxlswriter – used to create Excel files (via a small C-program)
- trumbowyg – used for end user editing HTML capability
- nginx – https reverse proxy
- NPM packages (dev only)
- mocha – a test runner – quite unnecessary dependency, especially now when Node.js has its own test capability
- eslint – to validate JS code
- htmllint – to validate HTML
- c8 – to get test coverage and other statistics
Conclusions
Working in software projects, loaded with bureaucracy, unrealistic expectations and new and unproven tools that nobody masters is like driving a convertible on the highway with the roof off. It is tiring, you can not go on for long.
Working in a maintenance situation where all you do is replacing working software because the dependencies are no longer supported is the same thing.
However, when you stop relying on tools and dependencies that run out of fashion and support, when you focus on understanding and solving the real problem – not stupid invented technical problems – you start to deliver real value. That way, you build trust and can get rid of the bureaucratic overhead, it is like getting the roof back on, just cruising comfortably, making fast progress.