Minimalistic Services and Applications

Question: There are plenty of documentation, patterns, architectures and practices for scaling up your cloud Services and Applications solution. But how do I scale it down?

In 2015 I set up a minimalistic architecture for delivering Services and Web Applications. It was based on 15 years of experience (not only positive) on constructing and operating applications, services, servers and integrations. Now in 2018 I can say that my architecture is doing very fine. I have continously been delivering business value for 3 years. I will share some principles and technical details.

Limitations and reservations
Not all solutions are good for everyone. Neither is mine. If you know that you want worldwide internet scalability my architecture is not for you. But often you are building applications that are internal for your organisation. Or you have a local/regional physical business that you need to support with services and applications. Then you know that there is a practical upper limit of users that is not very high.

While modern cloud services are supposed to scale more or less unlimited this does not come for free. It comes with complexity and drawbacks that you may not want to pay for, since you are anyway not aiming for those gigantic volumes of users and data.

The architecture I am presenting is designed both to perform and to scale. But within limits. Know your limits.

Microservices
Microservices is about many things. I have made a practical interpretation.

My… delivery platform… consists of microservices that communicate with each other. They should have limited responsibilities and they should not grow too big. Each service should store its own data, really. Two different services should share data via their (public) APIs, never by using a shared storage.

I ended up with a separate Authentication Service (knowing about users and credentials) and Roles Service (knowing about roles/priviliges granted to a user). In hindsight perhaps this could, or should, have been just one service. On the other hand, if I want to store something like personal Settings/Preferences for each user, perhaps it is good that it does not go to a common single User service that grows more complex than necessary.

As you may know, there is another Microservice principle about each service being able to run in multiple instances and that (via Event Sourcing and CQRS) state is not immediately consistent, but eventually consistent. I normally outright break this principle saying that a single service has a single instance holding the single truth. I feel ok doing this since I know that each service is not too big, and can be optimized/rewritten if needed. I also feel ok doing this because I know I save a lot of complexity and my approach opens up for some nice optimizations (see below).

It is all about HTTP APIs
My microservices talk to each other over HTTP in the simplest possible way. The important thing is that your web applications, native (mobile) applications, external partners and IoT-devices use the same APIs.

I want it to be trivial to connect to a service using wget/curl, an Arduino, or any left behind environments my clients may be using. I also want any server platform to be capable of exposing APIs in a conforming way.

What I basically allow is:

http://host:port/ServiceName/Target/Action?token={token}&...your own parameters

Your service needs to have a name and it is in the URL. Target is something like Order or Customer. Action is something like Update or Cancel. token is something you need to obtain from the Authentication Service before making any calls. You can have extra parameters, but for more data it is preferable to POST a JSON object.

I dont want any extra headers (for authentication, cookies or whatever) but I respect Content-Type and it should be correct. Absolutely no non-standard or proprietary headers.

I only use GET and POST. It just doesn’t get clear and obvious enough if you try to be smart with PUT and DELETE.

For things like encryption (HTTPS) and compression (gz) I rely on nginx.

Reference Implementation
The above principles constitute the Architecture of a number of services together making up a virtual application and service platform. As you can see you can build this with almost any technology stack that you want. That is the entire point!

  • You may want to make API calls from known and unknown devices and systems in the future
  • You may want some legacy system to be part of this virtual delivery platform
  • You may want to build some specific service with some very specific technology (like a .NET service talking to your Active Directory)
  • You may find a better technology choice in the future and migrate some of your services from current technology

But for most purposes you can build most services and applications using a few, simple, free and powerful tools. More important than the tools themselves are established standards (HTTP, HTML, JavaScript and CSS) and principles about simplicity and minimalism.

JSON and JavaScript
After working for years with integrations, web services (SOAP), XML, SQL databases and .NET I can say that the following type of technology stack i common:

  1. Web application is written in JavaScript, works with JSON
  2. Web application communicates with server using XML
  3. Server processes data using .NET/C#
  4. Data is persisted using SQL queries and a relational database

This means that a single business object (such as an Order) has data representations in SQL, C#, XML and JSON. This means that you have several mappings or transitions in both ways. You can also not reuse business logic written in SQL, C# or JavaScript in another layer of your application.

With Node.js you have the opportunity to do:

  1. Web application is written in JavaScript, works with JSON
  2. Web application communiates with server using JSON
  3. Server processes data using JavaScript and JSON
  4. Data is persisted in JSON format (either in files or a database like MongoDB)

This is simply superior. A lot of problems just disappear. You can argue about Java vs C#, HTTP1 vs HTTP2, Angular vs React and things like that. But you just cant argue about this (the fundamental advantage a pure JS stack gives you – not because JavaScript is a superior language but because it is the defacto language of the web).

So my reference platform is based on Node.js and I store my data in JSON.

Binary formats
Binary formats have their advantages. It is about power and efficiency. But rarely the differences are significant. Base64-encoding is 33% more expensive than the original binary. Compiled languages are somewhat faster and use less memory. But humans can’t read binary. The compilation (or transpilation) into binary (or machine generated code) is not only an extra step requiring extra tools. It also creates a longer distance between the programmer and source code on one hand and the execution and its error messages on the other hand. Source maps are a remedy to a problem that can be avoided altogether.

I was once responsible for a .NET solution (with Reporting Services) that was so hard to change and deploy that we eventually refused to even try. I realised that if the system had been coded in the worst imaginable PHP I could have made a copy of the source (in production), modified the system (in production) and restored the system if my changes were not good.

Similar problems can appear with databases. Yes, you can make a backup and restore a database. But how confident do you feel that you can just restore the database and the system will be happy? What is IN the database backup/restore, and what is configuration outside that database that might not be trivial or obvious to restore (access rights, collation settings, indices and stored procedures, id-counters, logging settings and so no).

So my reference platform minimises the use of binary formats, build steps and databases. I code plain JavaScript and i prefarably store data in regular files. Obvously I use native file formats for things like images and fonts.

Storage
Some appliations have more live data than others. I have rarely come across very large amounts of transaction or record data. I have very often come across applications with little data (less than 100Mb) and a truly complex relational database. I have also seen relational databases with not too much data (as in 1GB) with severe performance problems.

So, before architecting your solution for 10-100GB+ data, ask yourself if it can happen. And perhaps, if it eventually happens it is better to deal with it then?

Before constucting a relational datamodel with SQL ask yourself if it is really worth it.

Since we are using a micrsoservice strategy and since services share data via their APIs two things happen:

  1. Most services might get away with very little (or no) data at all (while some have much data)
  2. A service that later turns out to need to deal with more data than it was first built for can be refactored/rebuilt without affecting the other services

So I suggest, if in doubt, start small. What I do (somewhat simplified) is:

  1. Start up Node.js service
  2. Load data from local files into RAM
  3. All RO-access is RAM only
  4. When data is updated, I write back to file within 10s (typically all of it every time, but I keep different kinds of data in different files).
  5. Flush data before shutting down

This has an advantage which is not obvious. JavaScript is single-threaded (but Node.js has more threads) so a single request is guaranteed to finish completely before the next request starts (unless you make some async callback for waiting or I/O). This means that you have no transaction issues to deal with – for free – which significantly simplifies a lot of your request handling code and error handling.

Another advantage is that RAM is extremely fast. It will often be faster and cheaper to “just access all the data in RAM” than to fetch a subset of the data from a database and process it.

This may sound like “reinventing the wheel”. But the truth is that the above 1-5 are very few lines of quite simple code. You can use functions like map(), reduce() and filter() directly on your data without fetching it (async) first. That will save you lines of code.

Again, this may not work for all your services for all future, but it is surprisingly easy and efficient.

Code, Storage, Configuration and installation
When I check out my (single) git repostory I get something like:

packages/                         -- all my source code and dependencies
tools/                            -- scripts to control my platform,
                                     and a few other things

I then copy the environment template file and run install (to make node_modules from packages):

$ cp tools/env-template.json dev.json
$ ./tools/install.sh

This config file can be edited to replace “localhost” with something better and decide what services should run on this machine (here, in this library) and where other services run if I use different machines. Now I start the system, and now I have:

$ node tools/run dev.json ALL     -- use dev.json, start ALL services

dev.data/                         -- all data/state
dev.json                          -- all environment configuration
node_modules/
packages/                         -- all code
tools/

I can now browse that services on localhost:8080, but to login I need to create an Admin user using a script in tools (that just calls an API function) before logging in.

Notice how easy it is to start a new environment. There are no dependencies outside packages. You may create a dev-2.json which will then live in dev-2.data side by side with dev. To backup your state you can simply backup dev.data and move it to any other machine.

Lets have a look at dev.data (the files for one service):

Authentication.localstorage/     -- all data for one service
Authentication.log/              -- a log file for one service (kept short)

In packages you find:

common/                          -- JavaScript packages that can be used
                                    on Node as well as web
node/                            -- Node-only-packages
services/                        -- Node-packages containing services
web/                             -- JavaScript packages that can be used
                                    on the web only

You shall include tests on different levels (unit, integration) in a way that suits you. The above is somewhat simplified, but on the other hand in hindsight I would have preferred some things to be simpler than I actually implemented them.

Notice that there are no build scripts and no packaging required. All node code is executed in place and web applications load and execute files directly from packages/.

Serving files, input validation, proxy and nginx
Node.js is very capable of serving files (and APIs) just as it is. I have written a custom Node.js package that services use to handle HTTP requests. It does:

  • Validation of URL (that URLs conform to my standards)
  • Authentication/authorization
  • Is it a file or an API call
  • Files: serve index.js from common/ and web/, and www/ with all contents from all packages
  • APIs: validate target, action (so it exists), validate all URL-parameters (dates, numbers, mandatory input, and so on)

This may seem odd but there are a few good reasons for doing exactly this.

  1. Service APIs and policies are metadata-driven
  2. Consistent good logging and error messages
  3. Consistent authorization and 401 for everything questionable (both for files and APIs)
  4. The same service serves both API and www-files which eliminates all need to deal with cross-site issues (which is something of the least value-adding activity imaginable)
  5. Consistent input validation (if there is anything I don’t trust people get right every time they write a new service this is it)

You can probably do this on top of Express, or with Express, if you prefer not to use Node.js standard functionality.

At this point, each service listens at localhost:12345 (different ports) so you need a proxy (nginx) that listens to 80 and forwards to each service (remember the service name is always in the URL).

I prefer each service to handle all its API calls. Quite often it just forwards them to another service to do the actual job (lets say a user action of the Order service should create an entry in the Log service: the Order web UI calls Order/log/logline, which in turn calls the Log service). This can be very easily achieved: after authentication/authorization you just send the request through (standard Node.js does this easily).

Dependencies
The web has more npm packages than anyone can possibly want. Use them when you need (if you want, read Generic vs Specific Code, Lodash and Underscore Sucks, …).

My biggest fear (really) is to one day check out the source code on a new machine and not being able to install dependencies, build it, test it, run it and deploy it. So I think you should get rid of dependecies and build, and rather focus on testing, running and deployment.

I think, when you include a dependency, place it in packages/ and push it to your repository. Then you are in control of updating the dependency when it suits you. New dev/test/prod machines will get your proven and tested versions from packages/, regardless what the author did to the package.

This approach has both advantages and disadvantages. It is more predictable than the alternatives and I like that more than anything else.

Error handling
I take error handling seriously. Things can get strange in JavaScript. You should take the differences between numbers and strings, objects and arrays seriously (thats why you should not use Lodash/Underscore). There are no enums to safely use with switch-statements. I often add throw new Error(…) to code paths that should not happen or when data is not what I expect.

On the (Node.js) server I don’t have a big try-catch around everything to make sure the server does not crash. I also don’t restart services automatically when they fail. I write out a stack-trace to and let the server exit. This way I always work with a consistent, correct state. Critical errors need to be fixed, not ignored. This is the Toyota way – everyone has a red button to stop production if they see anything fishy. In effect my production system is among the most stable systems I have ever operated.

Validation, models and objects
Data validation is important. Mostly, the server needs to validate all data sent to it. But a good UX requires continous validation of input as well.

I put effort into defining models (basically a class in an OO language). But since my data objects are regularly sent over the network of fetched from disk I don’t want to rely on prototypes and member functions. I call each object type a model, and early on I write a quite ambitious validation-function for each model.

Sharing code between Node.js, web (and AngularJS)
I want my code (when relevant) be be usable on both Node.js and the Web. The Web used to mean AngularJS but I have started not using it.

This is what I do:

 /*
  * myPackage : does something
  *
  * depends on myUtil.
  */
(function() {
  'use strict';

  function myFactory(myUtil) {

    function doSomething(str) {
      ...
    }

    return {
      doSomething : doSomething
    };
  }

  if ('undefined' !== typeof angular) { // angular
    angular.module('mainApplication').factory('myPackage',
                  ['myUtil',
          function( myUtil ) {
      return myFactory(myUtil);
    }]);
  } else if ( 'undefined' !== typeof MYORG ) { // general web
    MYORG.myPackage = myFactory(MYORG.util);
  } else if ( 'undefined' === typeof window ) { // nodejs (probably)
    module.exports = myFactory( require('common/util') );
  } else {
    throw new Error('Neither angular, node or general web');
  }
})();

This way exactly the same source code can be used both on the web and in Node.js. It requires no build step. The “general web” approach relies on a global object (call it what you want) and you may prefer to do something else. You just need to make sure you can serve common/util/index.js and common/mypackage/index.js to the web.

Scaling and cloud technology
For a simple development system, perhaps for a test system or even for a production system, everything can live in a single folder. If you need more power or separation you can put each service in a Docker container. You can also run different (groups of) services as different users on different machines.

So, the minimalistic architecture easily scales to one service per machine. In practice you can run a heavy service on a single machine with 16GB RAM (or more) which will allow for quite much RW-data. 16GB or more RAM is quite cheap compared to everything else.

Scaling and more data storage
There are many other possible strategies for a service that needs more storage than easily fits in RAM (or can be justified in RAM).

Some services (like a log) is almost exclusively in Write mode. You can keep just the last day (or hour) in RAM and just add new files for every day. It is still quite easy and fast to query several days or logs when needed.

Some services (like a customer statistics portal) has mostly RO-data that is not regularly accessed, and that lives in “islands”. Then you can have (load from other systems) a JSON-file for each customer. When the customer logs in you load that file to memory and later you can just recover that memory. Such a service can also be divided into several services: 1 main RW, 1 RO (A-L), 1 RO (M-Z).

Some services will do expensive processing or perhaps expensive communication/integration with other systems. Such processing or integration can be outsourced to a dedicated service, freeing up resources in the main service. If you for example generate a PDF, make sure you do it in a process outside Node.js.

In the same way a service can offload storage to another service (which could possibly be a MongoDB).

Web files (html, css, images, js) can be cached by nginx (if you accept to serve them without authentication) and served virtually for free even if your service has full control.

Things like logging can also be outsourced to a dedicated and enterprise class logging software. Nevertheless, it is good to have a simple reference Node.js logging service that can be used for development purposes locally.

Finally, GDPR indicates that you should throw away data. You can also move data from a live system to a BI-system or some Big Data tool. Perhaps your architecture does not need to support data growth for 10+ years – perhaps it is better it does not.

Scaling – conclusion
These scaling strategies may not sound too convincing. But the truth is that building your entire system in a single very powerful monolith is probably going to be less scalable. And building everything super scalable from the beginning is not easy or cheap (but if thats what you really need to do, go ahead).

Integration testing
Notice how integration testing can be achieved locally, automated, with virtually no side effects:

  1. Generate a integration-env.json
  2. Start up services (as usual)
  3. Run tests to inject data into the services (throw standard APIs)
  4. Run tests to read data, query data
  5. Shut down services
  6. Remove integration-env.json and integration-env.data/

Source control and repositories
For now, I have all code in a single git repository. It would be easy to use multiple repositories if that simplifies things (when developing multiple independent services at the same time). Linux is in a single git repository so I think my services and applications can be too.

Tooling
All developers prefer different tools and I think this should be respected. I also think coding style does not need to be completely consistent across services (although single files should be kept consistent).

But just as developers should be allowed their own tools, the artifacts of those tools should not make the repository dirty. And the next developer should not need to use the same tools as the previous to be able to keep working on the code.

Web frameworks
If I mastered direct DOM-manipulation I would probably suggest that you should too (and not use any web frameworks). However I have been productive using AngularJS (v1) for years. Since AngularJS is inevitably getting old I have started using Vue.js instead (which I think is actually a better choice than Angular, however check my post about loading vue templates).

React is also a fine framework but it requires a build process. For my minimalistic approach that is a very high and unnessecary addition of complexity. I don’t see any indications that React is fundamentally more productive or competent than Vue.js so I think you are fine with Vue.js (or jquery or Vanilla.js if you prefer).

Performance
I have, to be honest, not had the opportunity to add very many simultaneous users to my system. On the other hand I have used it for rather mission critical services for 3 years with very few issues. So this architecture has served me well – it may or may not serve you well.

My production environment consists of a single VPS with 1 core, 2GB RAM and 20GB storage. Performance is excellent and system load minimal.

Missing Details
Obviously there are a lot of details left out in this post. You dont have to do things exactly the way I did it. I just want to outline an architecture based on minimalistic principles. The details of users, authentication, logging and naming conventions are of course up to you to decide.

Feel free to ask though! I am open to discuss.

Conclusion and final words
I wrote this post quickly and I will probably add more content in the future (and correct/clarify things that could be improved).

 

  1. Well written article and obviously grounded in experience. Some questions:

    “I dont want any extra headers (for authentication, cookies or whatever)” <– Why prefer authentication in body and not header? Some technical reason or personal preference?

    Since you're not using a database with ACID compliance, how do you implement eventual consistency? How do you handle dependencies between storage files? What if one file can be written and the second can't? Admittedly this happens very seldom but would it be a catastrophy if it happened? I can certainly see scenarios where currupt data would be a disaster. Is this where you crash and stop the service?

    Don't you have a need for analytics and querying the data? How do you query json data for analytics? Like answering the question "What is the average number of items on an order for a particular country?" This is where SQL and RDBMSs really shine.

    Since everything is running on one machine, do you deploy all services at once when there are changes, even if there is no changes to the majority of services? Or do you just deploy the service that need to be updated?

  2. Excellent questions Johan. I will perhaps update the article a little.

    Authentication
    When it comes to authentication there are two steps.
    1) Send credentials (user+pass) to Authentication Service to obtain a token
    2) When calling a service or requesting a file, supply token in URL.

    To visit the Blog-service, go to:
    http://localhost/Blog/index.html?token=1234567890abcde

    If token is missing you will be forwarded to the Login-service.

    So I want the token in the ULR rather than the body or the header. The reason for this is that it makes it easer to call services from command line or directly via the web browser. Supplying token= is almost always trivial, while using a header (or a POST) is rarely trivial.

    I dont care much how you send user+pass to authenticate and obtain a token in the first place.

    ACID
    I will admit that if
    1) I have two “tables”: Orders and Customers
    2) a single request creates a customer and an order
    3) the order is written to disk a few seconds later
    4) the customer is not successfully written after the order

    there will be an inconsistent state. Can I justify that?

    First, as you guessed, I think the server should crash to avoid making the corruption worse. Writing to local disk must simply not fail. Hopefully I can get something to a log.

    Second, in theory, a database engine can deal with this perfectly. But in real life, for a small team with limited resources, databases are complex products that can be tricky to configure and administrate. Also, writing correct transactions is not trivial. So the security and comfort you feel with your ACID compliance database may come at a cost of other worries or problems.

    If there is a power outage (or an OS crash) there is always a risk for lost data and/or corruption.

    Note that I dont overwrite the current datafile. I write a new one next to it, then delete the old one and rename the new one. So I should always have a complete file (although perhaps not the latest).

    This is not perfect. But I sleep well. It has never cause problems. Backup/restore is trival (I do it daily from Prod to Dev). And if something goes wrong I can understand what happened.

    Data Query
    In SQL you would do:
    > select * from orders where OrderId=1
    I would do
    > orders.filter( function(o) { return o.id === 1; } );

    There are cases where SQL shine! But sometimes you have recursive hierarchies/trees of data and that is really a pain in SQL and very easy do deal with in real code. You can also store more compex JSON objects (compared to SQL, but this is not an advanctage to MongoDB) limiting the need to JOIN tables in the first place.

    I have found that the worst cases for SQL database are when you have much data, rather complex filters and you want to do multiple, nontrivial calculations (like average). I really prefer to loop over an array and calculate intermediate results as I go. Remember, by the way, that I can reuse code across server/client, so perhaps I have (pure, testable) functions like Order_calcTotal(order) that I can use for these purposes, and also use elsewhere in the stack.

    Single or Multiple Machines
    In practice, I am lazy and when I upgrade something I Ctrl-C everything, update and starts everything. But it is perfectly fine to just restart the affected services. And some services are often upgraded, others rarely, so it would make sense to separate them. I just have not bothered, but I can do it with no effort if I choose to.

    Hope that answered your questions. Please get back if you have more thoughts or if anything is unclear!

  3. The arguments I’ve heard against using a token in the url is that requests and their urls are often logged. Logs can leak. That is perhaps more of an issue if you’re using third party logging services like Application Insights etc. So someone with access to the logs can impersonate another identity by hijacking the token. But of course, there’s a million factors that plays into this decision. Simplified development for you is certainly one of them.

    I agree that writing correct transactions is a hard problem. But as I see it there’s no way around it. Either you do it correctly or you end up with inconsistent data. My own preference is to always have consistent data. One question though. Have you actually looked for broken records? I.e child records that doesn’t have a corresponding parent record. Parent records with no child records. Etc. I’ve found that if you have a system that silently (well, your is not silent since it crashes) accepts broken data there will be broken data. But you’ll be blissfully unaware of it. Especially in larger and older systems. If you later do analytics on broken data you’ll draw the wrong conclusions.

    By storing everything in an RDBMS with properly designed foreign keys and other constraints you just dodge all these problems. The database won’t accept broken data. The transactions will cause an error if you do them incorrectly. So it’s very easy to catch errors. This is where there is a lot of different opinions on the whole SQL vs NoSQL argument but I feel that using something light weight like SQLite you can get all the pros of foreign keys and still store the actual data as json. So you won’t be burdened with a scrict schema. I personally don’t see a strict schema as something negative since it means I can always count on all the data to be correct. But I can understand that others reach a different conclusion.

    Regarding SQL. The WITH clause can do recursive queries and while it has a little weird syntax you outsource the actual implementation details of executing the query to the SQL engine. In many cases when one has non trivial queries it is very hard to write efficient code to loop over everything yourself. With indexes you can speed up execution times by a factor 1000 without changing any code. The WITH clause can be used to make tricky queries more readable as well.

    There are certainly things that can be difficult to do in sql. Averages are easy using the AVG() function though. You can even do moving averages by adding the OVER() clause. The OVER() clause with its accompanying PARTITION BY/ORDER BY stuff can be used with many other (window) functions and are a very powerful tool.

    That’s my take as someone who enjoys working with databases. 🙂

  4. Yes… I don’t put the token in my logs. I use Node.js to handle requests and I log a lot, but not the token or the pure/raw URL.

    I am not going to argue with you about databases. I did suggest a minimal approach.
    If you master them and they are there… use them!

    I have seen a lot of situations where programmers have had too little knowledge and experience with SQL and database configuration/administration, and the result is complex and bad. So my solution may be much better than not doing SQL properly.

    I came to this conclusion when people started using Entity Framwork + Code first, and the relational model ended up being generated from C# object. Thats another topic, but I was not impressed.

    BTW: you can apply most other things from my minimal architecture but use relational databases, no problem. You dont have to go all-in on my minimal approach.

  5. I can absolutely see the value of a minimal approach. Complexity and more moving parts comes at a cost.

    Yes, using an RDBMS is not a guarantee that it will be used well. The skills and preferences of the team is certainly a factor.

    I agree with the aversion to the Code First approach. I take data too seriously to let that be handled by an abstraction layer.

  6. I think I didn’t answer about inconsistency… I am not aware of inconsistencies and I dont worry too much about it (although, it would be catastrofic).

    In SQL an Order span several tables. Usually at least Order and OrderLine. This means that orphaned OrderLines is a possibility.

    When you store the JSON you can store one Order object with children: OrderLines, Shipment, Invoice, Payment (you may want to draw a line somewhere, an Payment should perhaps go elsewhere).

    So there are less possibilities for orphaned OrderLines or OrderLines ending up coupled to the wrong Order in the first place.

    Further, I validate data integrity more carefully that I would be able to in SQL. An (hypothetical) example, I can require that the Shipment date is not before Order date (only if validate if Shipment exist).

    In conclusion, it would be ridiculous of my to call my approach better than a real DB. But I can say that in my environment it is more successful and effective for the (relatively) small systems and data sets I work with.

  7. A properly used RDBMS will never have orphaned OrderLine rows. In SQL you can define a foreign key in the OrderLine table that points to the primary key in the Order table. The referential integrity of the RDMBS will not allow orphaned OrderLine rows to be committed. You will immediately be notified of such an error the first time it happens. Most likely during development. This is true no matter how deep your structure is. E.g you’ll probably have a foreign key to the Customer table and something pointing to a Campaign table as well.

    Your example with shipment date not older than order date can be implemented with an “before insert” and “before update” trigger. You can do all kinds of validations using triggers. It has the advantage that you can trust all data, even historical data, to comply to all the rules you have defined.

    Json blobs doesn’t give you such guarantees. Your data will comply to the rules that was current at the time it was written/updated. But if you had different rules during the application’s life cycle you could end up with historical data that doesn’t comply with the current set of rules. This is not obvious at all and not easy to always take into consideration and could potentially skew analytics down the line. An RDBMS forces you to deal with these things up front. How big of an issue this is depends of course on the purpose the application should serve. In some cases it would be absolutely critical. In others it wouldn’t matter.

    I realize that the article is about scaling down but the reasoning I follow above is regarding a solution that will scale up. If you actually validate all records, even historical data, on every write then it’s a different thing.

    Even though _I_ will choose SQL every time I’m not trying to convert you to anything. I just wanted to point out some things that I didn’t agree with. 🙂

  8. Of course, you get it right with foreign key!

    I just mean that by keeping my object as one object (not split across tables) the integrity problem foreign key solves does not happen in the first case.

    When it comes to data complying with the format it used to have… it can be both a good and a bad thing.

Leave a Comment


NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.