In this article, one of Dufrain’s Data Architects Jonny Lang brings to life data mesh and drills down on key areas to understand and take forward. He also shared his recommendations for further learning.
What is modern data platform architecture?
First and foremost, to give credit where credit is due, the term and architectural Data Mesh pattern was first thought up by Zhamak Dehghani and her team at Thoughtworks in 2018, who has also wrote/ took part in a couple of books about the subject.
Thanks to Zhamak and her team, we now have a better way of organising businesses data platforms and products through a decentralised domain driven design (DDD), whilst following an organisationally wide defined Federated Computational Governance strategy!
Rather than simply throwing out these very tech-heavy terms and phrases and leaving you to it, this article will go over all aspects of Data Mesh in more detail, to better help you understand what exactly Data Mesh is, and why you should consider it for your organisation’s data platform strategy.
The what and why
In short, Data Mesh is an architectural pattern that includes:
- Federated Computational Governance
- Decentralised Data Driven Domains
- Data Products
- Polyglot Persisted Data Store.
You should consider Data Mesh as an option if your organisation’s business areas would like to keep control of their own data, so that they don’t have to rely on a single team for the entire company to perform queries or create reports.
With the above, here is a logical diagram that illustrates how it all fits together:

Ref: What is a Data Mesh? Architecture & Best Practices (qlik.com)
Now that you’ve seen the answer in a single low-quality image, would you like to understand what it means? Or as the Federal Network’s promotional video’s phrase it…”Would you like to know more?”
Federated computational governance
When I first started researching this topic, I found many different articles and video’s all primarily focusing on the governance side of data; as rightly it should! Just like when designing any architectural platform, you must (and I can’t stress this enough) always design with security and governance in mind. The way I like to think of it is like a car:
It’s all well and good having a nice shiny car with the giant spoiler on the back and the “go-faster” stripes painted down the middle, sparing no expense…but if you don’t take care of it, keeping it clean, well-oiled and serviced regularly…then it’s going to mould and rust, the engine will overheat and all manner of warning lights are going to flash up on your dash. Eventually it will require a complete overhaul to get it to run like it should.
Before building the data platforms following the Data Mesh architectural patterns, your organisation must first agree to a data strategy that best conforms to the business needs. For example:
- Policies need to be put in place to define wider controls
- Roles & responsibilities need to be assigned to the right users
- Data needs to be organised and easily understood throughout the business.
The platform also needs to be fast and easily scalable, whilst being cost effective. It needs to be reliable with multiple environments, and it MUST be secure to ensure that only the right people can see the right data, with no margin for error.
Once these are defined and agreed throughout the entire business, then and only then can you begin designing and building out your Data Mesh patterned platforms.
Aside from governance, data mesh’s secondary focus is the decentralisation of data and data ownership. For example:
There is no one single team that manages and maintains the entire company’s data store located on a single data platform. Instead, each business area oversees creating and maintaining their own platform, all of which follow the same defined principles, strategies, and architectural framework.
In the data mesh world, these “business areas” are called “Domains”. And as a whole, this process is known as Federated Computational Governance.
Decentralised domains
As previously stated, Domains are individual business areas, which are all in charge of their own data ingestion and transformation within their own data platforms. For example, some organisations may have the following Domains:
- Finance
- Sales
- Marketing
- HR
- Customer Service
- Innovation
- Operations
- Etc…
All Domains are operating independently, keeping with the defined Federated Computational Governance strategy, whilst also assisting and working together with other Domains to help the business become more data mature. It all comes down to that old proverb:
There is no “I” in “Team”, but there are ten “U”‘s in “Unfortunately the data platform failed because we couldn’t work together, costing us thousands upon thousands of dollars, pounds, peso’s, euro’s and yen”.
However, before allowing the domain the ability to manage / maintain their data from end to end, it’s best practice to perform a Functional Business Decomposition to better define any problem areas that your organisation may be experiencing. On completion of this exercise, logical boundaries and responsibilities between these identified problem areas become clearer. Further information about defining this can be found here.
Data products

Once the Domains have been defined, which is no small feat, and are all following and keeping with the previously defined Federated Computational Governance (strategies, principles, and architectural framework), each Domain can then begin to ingest and manage / organise their data into their own cloud spaces, transforming the data into information as best suits their and the company’s needs.
Finally, the Domain presents their insights in some form of analytical self-serving business intelligence reporting. All of this combined form’s what is known as a Data Product.
“Each Data Product involves data, code assets, metadata and related policies. They can be delivered as API’s, reports, tables or full datasets in a lake” – Microsoft Learn
With all Domains having the option to request read-level access to all other Domains Data Products, they now have the ability to not only self-serve all of the company’s data, but also combine these other Data Products into a whole new Data Product. For example:
A law firm may want to be able to report on the differences between what an individual person or timekeeper has achieved, with regards to fees billed out to their client, compared to what monetary target value has been set for that individual to achieve; ensuring that this person is operating efficiently and meeting their mark. To do this, the firm will need to pull data from Data Products located within the Finance and HR Domains.
With the Finance Domain hosting the billed amounts to their clients within a SQL based source Practice / Case Management System, and the HR Domain holding the budget / target data per individual within a data lake, fed from a Salesforce API.
Each Domain already has visibility to the other’s Data Products, allowing that requesting domain the ability to create a whole new Data Product that combines the above two data sets into a new overarching data product related to this specific scenario.
This type of modelling architecture does not have to conform to use the same technologies within that cloud provider and can instead use any technology(ies) that best meets their requirements. This is known as Polyglot Persistence or Polyglot Data Store.
Polyglot data store
The Polyglot Persistence data storage philosophy shows us that there may not always be a “one size fits all” solution when it comes to data storage. Instead, it recognises that a hybrid solution may be more optimal. Just like how “Polyglot Programming” utilises a number of different languages to achieve a single task, such as:
The progression of .parquet / delta files through various medallion lake layers utilising PySpark and SparkSQL, following the Data Vault methodology and finishing off with a 2nd Normal Form Star schema Kimball methodology analytical Gold layer.
The Polyglot Persisted Data Store is a way of utilising and combining more than one architectural platform to achieve a single desired outcome. For example:
Some Domains may require streaming services for live / near-real-time data, where others may only require large incremental batch upserts. And some may require the use of storing and querying petabytes of unstructured data, whereas others may require a simple SQL database.
All of this is achieved whilst following the Federated Computational Governance and all allowing the ability of visibility into their data products, giving each other the capability of creating new combined information for the betterment of the company.
Architectural best practices: hub and spoke bounded context
As described within the Federated Computational Governance section of this article, when it comes to the creation of the Data Mesh, we need to ensure that we are following Microsoft’s “Well Architected Framework” by following the below documented steps:
- The platform runs not only quickly, but optimally
- Easily scalable
- Cost effective
- Reliable
- Multiple environments
- And lastly, but certainly not least, it MUST be secure.
In order to achieve this, I’ve found that the Microsoft approved open source “Cloud Scale Analytics” follows a pattern of which allows direct communication between Domains, whilst staying in control with central cataloguing and classification for the protection of data (such as Azure Purview), so that the data can be searchable no matter what Domain you belong to. Keeping with this example, Purview can also automatically control the access to the underlying data within these Data Products via Purview Workflows.
This technique is known as bounded context.

Ref: A financial institution scenario for data mesh – Cloud Adoption Framework | Microsoft Learn
Bounded context
As we can see from the above image, there is a “hub” in the centre that hosts all governance capabilities (i.e. Data Management Landing Zone), such as: Purview, Key Vault and Cost Monitoring. The “spoke” demonstrates that should a Domain (or a Data Landing Zone) wish to use the Data Product of another Domain, it first goes through the hub to search for and request / gain access to the desired Data Product. Then on achieving this access, the originating Domain will then be able to view and interact with the data as required.
Polyglot persisted data store vs data federation

Ref: Federated Data Warehouse Architecture (zentut.com)
Something I wanted to mention here, as I noticed that there may be some confusion between these two architectural patterns. The Polyglot Persisted Data Store does bear some resemblance to “Federated Data” or “Data Federation”, with the key difference that Data Federation is a convergence of all data into a single Common Data Model, providing a single data source for all front-end BI applications outside of an individual Domain.
Closing statement
So, there you have it. You now know that a Data Mesh architectural pattern is the combination of a Federated Computational Governance, decentralised to form a Polyglot Persisted Data Store that can be created, maintained, and shared throughout all business areas (or Domains) within the entire organisation.
Should you wish to learn more about this subject at an even greater level, talk to us here at Dufrain and we can share out knowledge or answer your questions.
Book recommendations and references
There are also several books you can purchase. My recommendation is Zhamak’s book. Otherwise, the below referenced articles also helped me understand this concept.
I’ll now leave you with an altered quote from Uncle Ben that I and others in my field strive to follow: “with great knowledge comes great responsibility.”
We recommend that you start by speaking to organisations that have implemented data architecture to understand how the process was. By kickstarting your journey with a consultancy like Dufrain, your organisation can get it right from the beginning.
References
- Data Mesh: an Architectural Deep Dive (infoq.com)
- Polyglot persistence: what is it and why does it matter? | Packt Hub (packtpub.com)
- Data Mesh Principles: 4 Core Pillars & Logical Architecture (atlan.com)
- Azure Well-Architected Framework – Microsoft Azure Well-Architected Framework | Microsoft Learn
- Cloud-scale analytics – Microsoft Cloud Adoption Framework for Azure – Cloud Adoption Framework | Microsoft Learn
- Bounded Context (martinfowler.com)
- What is a data mesh? – Cloud Adoption Framework | Microsoft Learn
- Data domains – Cloud Adoption Framework | Microsoft Learn
- What is a data product? – Cloud Adoption Framework | Microsoft Learn
- Cloud-scale analytics data products in Azure – Cloud Adoption Framework | Microsoft Learn
- A financial institution scenario for data mesh – Cloud Adoption Framework | Microsoft Learn
