Thoughts on the Canonical Messaging Pattern
Many BizTalk solutions that I have implemented and worked on have followed the canonical messaging pattern. It’s certainly one of the first things I consider when building a new solution and a concept that I come across often. I would consider it “best practice” to implement such a pattern, given it’s benefits (which I will outline in this post).
As they say, “a picture paints a thousand words” so here is a graphical view of this pattern compared against a solution not implementing this pattern (i.e. a peer to peer (P2P) solution):
As you can see in regards to the canonical pattern (in green/with a tick), documents that are logically equivalent map to a standard application specific format (the canonical format). Lets unpack this statement a little.
The term logically equivalent is specific to our application; for example, external purchase orders in the formats indicated in the diagram above are equivalent in the context of the solution and so map to a standard format internally. This means that in the context of the application, these external purchase order formats are the same and will be processed in the same way. However to stores and suppliers, these different purchase order formats are quite distinct.
Canonical format describes how documents will be represented internally in our solution. In BizTalk, this has to be in XML (since BizTalk uses XML internally to represent messages).
The next question is how do we build our canonical document such that it can represent documents that are logically equivalent but may actually be formatted quite differently? Actually this statement is not quite correct: the canonical document should be created first independently of any external representations (e.g. to represent the essence of what a purchase order is) and then it should be a case of deciding how external representations map to the canonical representation. In the case of BizTalk, this will typically involve writing some XSLT that converts various formats from or to the canonical format.
I have to admit that when I first started out building BizTalk solutions I didn’t immediately grasp the benefits of having canonical representations of messages in my solution. This quickly changed however. Obviously there is a performance hit since every message will be transformed twice but I think this overhead is well justified given some of the benefits it provides below (I have tried to list these in order of importance):
- Impact of schema change is minimised – since all messages map to or from the canonical document, if (following our example) a store or supplier decide to change their schema, it will only be necessary to change one map. Compare this to the P2P solution: 4 maps would need to be changed if a store changed their schema and not only that, each supplier would need to contacted and regression testing would need to arranged with each. By utilising a canonical document type, we protect parties from the impact of schema changes.
- Minimising impact of change (2) – since orchestrations, for example, will work on the canonical schema, any changes to external schemas will not require orchestration changes and redeployment.
- Additional document formats can be added with relative ease – only one new additional map would be required to or from the canonical format. Also it would only be necessary to deal with one integration partner and specific knowledge of all downstream message formats is not required – only detailed knowledge of the new message format and the canonical format is needed.
- Reduction in solution complexity – with the canonical solution, 7 maps need to be maintained; 12 maps need to be maintained with the P2P solution.
Here are a couple of caveats that I have come across in respects to this pattern:
- There can be only one canonical representation for your logical message type! I recently worked on a solution where Xsd.exe had been used to create classes for the canonical schemas and then these classes where used in the solution orchestrations… As the canonical schemas changed, the classes were not recreated. This can introduce subtle bugs; for example, if you were to assign canonical message 1 (schema) to canonical message 2 (class) in your orchestration, data not defined in message 2 will be lost… So it is definitely best practice to ensure that only one canonical representation is available in your solution.
- It is harder to implement this pattern retrospectively, after the solution is in Production. So even if your solution is simple, do yourself a favour and future proof by baking in a canonical schema.
I hope this post demonstrates the benefits of the canonical messaging pattern and why solutions should implement it.