Recently I was in a conversation about data modeling standards. I confess that I'm not really the standards type. I understand the value of standards and especially how important it is to follow them so others can interpret and use work products. It is just that I prefer to focus on understanding of the principles behind the standards. In general, it seems to me that following standards is trivial for someone who understand the principles, but impossible for someone who doesn't. But there doesn't seem to be general understanding of data modeling principles.

By coincidence I've spoken with a number of DBAs from different shops over past weeks, and they uniformly tell me that they often receive urgent requests to deploy a database accompanied by poorly conceived Entity-Relationship Diagrams that leave open many significant business requirements questions. On behalf of those database administrators here's my first draft, which I'll call "best practices" instead of standards. Please note that this is only a first draft of selected best practices for operational data modeling at the project level. Many of these practices apply generally, but the list excludes some concerns specific to other data modeling efforts like enterprise, data warehouse, datamart, or other non-operational models. Also, the focus of these notes is on logical modeling. Physical modeling is covered only in reference to the topics addressed in the points on logical modeling. Finally, I'm using the term "Logical Data Model" consistently with the logical and detailed logical model types that Hoberman, Burbank, and Bradley describe in Data Modeling for the Business.

Logical Data Modeling

Modeling Process and Model Accessibility

  • All project participants should be included either as participants or reviewers in the modeling process, including business stakeholders and subject matter experts, requirements analysts, architects, designers, developers, testers, and of course data modelers and DBAs. Level of involvement may vary, but the minimum core participants are data modelers, SMEs, requirements analysts, and architects.
  • The LDM is entirely consistent with and at the same level of detail as other project requirements documents.
  • The LDM must be directly translatable to English easily understood by business project participants.
  • The data model must be understandable to a business audience, whether in graphical/metadata format or expressed as text business rules. If one of these best practices makes the model less understandable to business participants, then the team will maintain a separate business presentation version along with the definitive model.

Model Content and Format

  • The LDM must be normalized and free of implementation considerations (efficiency optimizations, surrogate keys, audit columns, etc.)
  • LDM object names:
    • Entity: singular business noun
    • Attribute: singular business noun
    • Relationship: two verb clauses, one for each relationship direction
    • All names should apply the same naming convention
    • All abbreviations should be used consistently
  • Metadata supporting the graphical LDM must include, at a minimum:
    • Model scope, objectives, and assumptions
    • Definition of the level of data atomicity
    • Entity definitions and exclusions
    • Attribute definitions:
      • Text description
      • Logical format (e.g. integer, float, character, long text, etc.)
      • Security requirements (i.e. what users are allowed access to this attribute in what conditions)
    • Relationship definitions:
      • A text definition is optional but strongly encouraged for cases where it is non-trivial
      • Any business rules that are not expressed by relationship cardinality symbols shown on the data model (e.g. mutual exclusivity among relationships, specific limits on "many" cardinality, etc.)

Model Quality

  • All attributes in the LDM are primitives (requirements for and processes to generate derived and summary data should be defined on the process side. Derived and summary data is added as part of denormalization during database design.
  • Each attribute represents a distinct business concept. The LDM contains no embedded data.
  • Every entity has a unique, natural, and minimal primary key.
  • There should be no Many to Many relationships in the LDM.
  • Sub/Supertypes should be used sparingly.
  • The team should consistently use one data modeling convention and tool.
  • The LDM should not include a "code translation" entity. Code attributes are permitted (although this is an implementation-specific consideration) and translations describe the entity to which they belong.

Logical To Physical Data Model Transition

  • Physical data modeling applies design objectives/constraints and planned access patterns to the logical model in order to generate a database design.
  • Any denormalization applied during physical modeling can be reversed with a SQL statement.
  • Every database table will include a system-generated integer primary key and system-generated audit columns, at a minimum: create date/time, update date/time, create userid, update userid.
    • A unique index on the logical key is highly recommended
    • Additional audit data may be needed to meet requirements
    • If data in scope is loaded from another database, then source data anomalies are either
      • Corrected if project constraints permit, or
      • Documented and accepted as exceptions by project stakeholders
  • Source data anomalies typically include multiple attributes embedded in columns and inappropriately denormalized tables.