[3] IMPLEMENTATION ACCELERATORS: Scanners & Crawlers

updated on 21 February 2024

Start with Understanding Your Environment

For any data & analytics platform implementation that is transforming or migrating form an existing platform, it's important to understand which assets are already in place. A full inventory is a necessary first step, before the optimal implementation and migration strategies can be identified. 

Some past insights from running scanners, crawlers and analysis solutions on existing environments include:

  • A community services government organization was able to exclude 90% of their data & analytics code from an upcoming migration by classifying unused and duplicate code
  • An insurance company was able to identify which legacy project code were good candidates for automated code conversion, which were focused on ETL workloads; jobs that were more statistics focused were marked for manual conversion
  • A financial services organization can quickly determined which legacy data and compute sources were compatible and ready for conversion to the latest data catalog technologies 

Unity Catalog Crawler & Assessment ("UCX")

image-44i6p

Databrick's provides an automated crawler, assessment and conversion tool, for assisting with a migration to Databrick's data catalog technology, "Unity Catalog". 

UCX is available at no cost on the GitHub repository. Upon executing the assessment report, metrics on the compatibility and readiness of tables, mount points, clusters and other objects are summarized: 

An excerpt from the UCX assessment dashboard.
An excerpt from the UCX assessment dashboard.

Assessment widgets provide additional compatibility detail at the storage and schema level. These lower level assessments provide a lower grain of detail on compatibility, and provide error messages to help guide corrective action. 

Screenshot-2024-02-19-231756-b0ooa

Profiler for SAS

SAS is much more than a language or base SAS code; SAS is a platform that can include many specialized products and applications for some customers. For this reason, it is important to secure a profiler or scanner that supports SAS nuances.

T1A's Alchemist Analyzer is a purpose build profiler built for understanding SAS environments and helping to prepare migration strategies.

The
The "Alchemist" analyzer by T1A is a purpose built profiler and visual analysis tool for SAS

The profiling and analysis of SAS jobs typically seeks to answer the following questions:

  • Which SAS jobs are actively executed vs. legacy code, which may be discarded?
  • Which jobs are ad hoc vs. scheduled through SAS applications and schedulers?
  • Which active jobs are statistics and modeling focused (which are more challenging to convert automatically to python)?
  • Which active jobs are using unusual libraries, or network storage locations?
  • For applicable SAS applications, where can I find isolated segments of code that have a common and isolated data lineage? 
  • Where can I find isolated segments of code, which would be good initial migration candidates? 

Read more