Seven Simple Rules

Rule #1: Data first, structure later. Maybe.

Explanation

I recommend not to worry about a declared data structure in an ERD sense. Initially.

Learn to love nt:unstructured (& friends) in development.

I think Stefano pretty much sums this one up.

My bottom-line: Structure is expensive and in many cases it is entirely unnecessary to explicitly declare structure to the underlying storage.

There is an implicit contract about structure that your application inherently uses. Let’s say I store the modification date of a blog post in a lastModified property. My App will automatically know to read the modification date from that same property again, there is really no need to declare that explicitly.

Further data constraints like mandatory or type and value constraints should only be applied where required for data integrity reasons.

Example

The above example of using a lastModified Date property on for example “blog post” node, really does not mean that there is a need for a special nodetype. I would definitely use nt:unstructured for my blog post nodes at least initially. Since in my blogging application all I am going to do is to display the lastModified date anyway (possibly “order by” it) I barely care if it is a Date at all. Since I implicitly trust my blog-writing application to put a “date” there anyway, there really is no need to declare the presence of a lastModified date in the form a of nodetype.

Rule #2: Drive the content hierarchy, don’t let it happen.

Explanation

The content hierarchy is a very valuable asset. So don’t just let it happen, design it. If you don’t have a “good”, human-readable name for a node, that’s probably something that you should reconsider. Arbitrary numbers are hardly ever a “good name”.

While it may be extremely easy to quickly put an existing relational model into a hierarchical model, one should put some thought in that process.

In my experience if one thinks of access control and containment usually good drivers for the content hierarchy. Think of it as if it was your file system. Maybe even use files and folders to model it on your local disk.

Personally I prefer hierarchy conventions over the nodetyping system in a lot of cases initially, and introduce the typing later.

CAUTION
The way a content repository is structured can impact performance as well. For best performance, the number of child nodes attached to individual nodes in a content repository should generally not exceed 1’000.
See How much data can CRX handle? for more information.

Example

I would model a simple blogging system as follows. Please note that initially I don’t even care about the respective nodetypes that I use at this point.

/content/myblog
/content/myblog/posts
/content/myblog/posts/what_i_learned_today
/content/myblog/posts/iphone_shipping

/content/myblog/comments/iphone_shipping/i_like_it_too
/content/myblog/comments/iphone_shipping/i_like_it_too/i_hate_it

I think one of the things that become apparent is that we all understand the structure of the content based on the example without any further explanations.

What may be unexpected initially is why I wouldn’t store the “comments” with the “post”, which is due to access control which I would like to be applied in a reasonably hierarchical way.

Using the above content model I can easily allow the “anonymous” user to “create” comments, but keep the anonymous user on a read-only basis for the rest of the workspace.