Embrace Implementation Methodologies
Agile methodologies are a great choice for software development, or when used within some technical operational teams. For a data platform implementation or migration, the traditional waterfall based approach is often the preferred choice.
Vendors like Databricks have adapted the traditional waterfall implementation methodology to reflect the contemporary of adopting cloud based data platforms.
However, a traditional waterfall migration approach is not synonymous with a "big bang" project. Organizations that T1A has helped which took an incremental "waves" approach experienced the highest levels of success and predictability.
Databricks Migration Methodology
The Databricks Migration Methodology initially focuses on understanding the legacy environment. This is where profilers and scanners are leveraged to help automate the process of discovery.
For Phase 2, a deeper dive into the existing jobs and code are taken. This is where the viability of using automate code conversion tools is assessed. An estimate of the migration costs and total cost of ownership are available at this stage.
In Phase 3, the technology mapping takes shape, which leads to an implementation strategy. For example, which initial workloads will be converted first, and how features will change in the target architecture.
Phase 4 is where the the testing approach comes to life, typically with a production pilot. Most often, multiple iterations of code conversion, testing and retesting is required. In parallel, training and enablement plans are developed.
Finally, Phase 5 concludes with a change management plan for users and other stakeholders. A roll-out plan typically includes a final validation and rollback plans.
Data platforms are complex, and they become intertwined into multiple technical functional areas such as BI, analytics and data science. Adding to this complexity is that the underlying data never stops growing or changing, which is independent of the code on the platform. It is for these reasons, that Databricks recommends reviewing migration execution pillars at every phase of the project.
"Big Bang" vs. "Waves"
T1A's clients experienced the best results when avoiding "big bang" migrations. As per the Databricks Migration Methodology, the planning of MVPs or "waves" of jobs was the preferred approach.
For a successful wave based approach, the project team needs to have a strong understanding of existing workloads; this puts the team in the best possible position to identify the best MVPs and the sequence.
Secondly, the team requires a standardized method of testing and validating jobs. Testing and validation is not something that happens once, as live environments can often have thousands of jobs, which are moving at the speed of business changes. An extended code freeze for the duration of a migration is almost never a viable option, so the best recourse is to improve the team's testing and validation velocity.