data modeling – What is the normal form of JSON?


This is going to sound like a trivial question, but I like to think it’s actually a deep one. The simple quesiton is, “What is the normal form of a typical JSON object?” For reference, I put an example below, but consider any typcial JSON object you’ve dealt with, same question applies.

I ask this theoretical question for a practical reason. In practice, we often need to convert JSON objects to some set of tables. Once they are tables, the tables have measurable normal forms based on all the usual rules of normal forms.

But getting to those tables with their normal form takes work. Now, what else “takes work”. Answer: going from lower normal forms to higher normal forms. What doesn’t “take work”, is going down the normal forms. Or at least just a trivial amount of work. That is, if I have 6NF, I can rather quickly manipulate my way down to any lower normal form. If I have, say 2NF, and I need to work my way to at least 5NF for some practical reason, I have much work to do.

Well…since it is rather hard to get JSON to any decent normal form, then intuitively it seems it must be in a very low normal form. I’m hoping someone here can quantify that normal form of the JSON. Much apprecaited.

But I still haven’t given the most critical rationale. It is not uncommon for non-technical leaders to ask for miracles. I’m not criticizing, we all know it happens. And the miracle is something of the form, “just write some code to automatically make JSON into tables”.

But wait! If my theory is correct, and JSON is basically 0NF or so, then you can’t automate you way out of it. You can’t go from the very low NF of JSON to anything decent, such as 3NF+, in an automated fashing because that “takes work”. That is, it takes smart humans understanding the domain.

Now, I know some trivial JSON can become some trivial tables. I know there are a few tools that handle the simple cases. But I believe a general purpose JSON-to-Table converter is theoretically not possible because JSON is so low on the normalization information (in the rigorous Claude Shannon sense), that you can’t automate it away.

So, what is the normal form of a typical JSON object? And is there some theory I didn’t find that already proves you can’t automate your way out of this.

Thanks!

{
  "data": {
    "cust1": {
      "name": "Jane",
      "age": 33,
      "address": "Main Street",
      "favorites": {
        "colors": ("blue", "green")
      }
    },
    "cust2": {
      "name": "Joe",
      "age": 44,
      "address": "West Road",
      "favorites": {
        "colors": ("red", "yellow")
      }
    }
  }
}