What's In it For Me? (WIIFM)
Data & analytics platforms are complex and touch many different stakeholders within the enterprise. To get as many stakeholders actively engaged, a clear vision that resonates with all of them is needed. Explaining what is in it for them (WIIFM) is key, and best communicated by examples:
- Increasing trust in data
- Reducing downstream work and inconsistencies in reports
- Improved and faster access to data to drive business decisions
- Optimizing the cost of operating and maintaining the system
- Better alignment with regulatory requirements
Start With A Modern Reference Architecture
Whether you are building from a relatively immature state, or you are building on top of an established foundation, referring to a proven reference architecture and example roadmap is important for your current or next data & analytics platform project.
For immature or fledging environments, these reference architectures can help you to plan for a foundation that will allow you to scale over time by incrementally adding complexity. For an already established and complex environment, reviewing ideal target architecture examples can help to confirm technical debt and legacy systems that require migration, as well as identify types of integration that may now be sub-optimal due to recent innovations and product enhancements.
Key Concepts for the Fledging Organization
If your organization feels like their data sources are fragmented and disorganized, a lack of medallion architecture might be the cause.
Medallion architecture ("bronze", "silver", "gold") is one of the most fundamental data design patterns to pursue as part of an initial investment in a data & analytics platform.
With this pattern, the bronze layer is the landing area. In the silver layer, we start to introduce schemas and other structures. Finally, the gold layer makes ready for business consumption data available.
When your data is organized within this pattern along with related best practices, solutions like Databricks are able to demonstrate their full capabilities. For example, data can be loaded incrementally and automatically, which reduces maintenance and processing time and costs. The pattern also allows for "time travel" and the recreation of tables for new purposes, which helps provide high degrees of data trust and reproducibility.
Key Recent Changes for the Mature Organization
As a more mature organization, you likely already have an architecture diagram that looks similar to the above example. You have already established medallion architecture, and have likely already invested heavily in ETL (extract, transform, and load) code.
In a mature data platform environment, reviewing reference architecture can help you to identify previous decisions that may have been made because of previous solution limitations or environmental constraints, which are no longer present. This is also an opportunity to review newly emerged capabilities, which you may wish to consider to reduce operating costs of your existing platform, or to help enable new business outcomes.
Some examples of relatively new capabilities on the Databricks platform include:
- Unity Catalog - a unified governance layer for most things inside Databricks, and the option to manage external sources as well
- Delta Live Tables & Workflows - the next generation way of building and managing ETL pipelines with a new take on automated streaming, change data capture and cluster management
- Managed MLFlow & GenAI Tools - Rapidly evolving tools for managing data science end-to-end processes ("MLOps"), including optimized deployment of generative AI models
- Delta Sharing - an open source approach to securely sharing data across platforms, clouds, regions and 3rd parties