Canonical Data Animation
Sometimes a picture can bring more clarity to a concept. For Canonical Data, an animation is what is called for. I found this animation of canonical data and its implementation. I think the first minute and a half paint a very good picture of how canonical data is implemented and can be leveraged. Later in the animation they start to describe a global vision of implementation. Unfortunately I must disagree with this vision. I don’t think that having a global canonical form of data will ever truly be a solution that works.
Take a look at this and tell me what you think.
The illustration of data that is passed between two applications, having gaps and overflow of the data needed by a third party application is all to real. And this is a good representation of what can be done using a Canonical format to handle the needs of all of the parties.
There are two options with different benefits that I have used. One is to build a actual file format that holds your canonical data. Another is to use a database to act as your canonical. Both of these have trade offs.
File based Canonical:
To make a canonical file format you will need to pick the type of file that you want all of your applications to be receiving either directly or through an adapter of some kind. Building from scratch a flat file or XML file is the most flexible, but requires you to do a bunch of planning. And after it is done, this format must be maintained.
Using a file based canonical does allow you a fairly easy way to find the state of a failed step, as you can look at the file and identify what is wrong. You can also correct the data there and allow the process to continue. You can also make copies of these files for your monitoring so that tracking your data and transactions becomes easy, and you performance metrics become rich with data.
Database based Canonical:
Sometimes people, DBAs especially, get excited when we talk about doing this. They are visualizing one massive Canonical Database that holds all of the transactions and is accessed by all of the applications. And there are many products that work this way internally. But this is not the only approach.
The one database to rule all canonical, or as I like to call it “Lord of the Databases”, requires a DBA to pay attention to optimizing, backing up, and all of the other care and feeding tasks that go along with having a database that you maintain for the long haul. This is efficient in that you can get all of your performance data from one place, and monitoring is one connection. However, some times applications have limitations in how they talk to a database that is not theirs, and this can make implementation complex.
Another way to use a Database based Canonical model is to use disposable databases. In the Disposable Database implementation, you create a database that is small, only contains the structures and tables for the one transaction, and gets destroyed at the end of the transaction life cycle. Using the Disposable Database, you don’t ever have to optimize them, back them up, or any of the other care and feeding tasks that are part of the LOTDB implementation.
Comparative implementations:
I want to examine and compair the File vs DB canonical implementations in more detail in another article. If you have another Canonical implementation that I haven’t seen, please let me know. I would love to examine that as well.
Subscribe to "The Integration Engineer" by Email
Find out about the tools and services available at The Integration Engineer's Consulting site.











