Major US Telecom Provider Migrates 4,200 SAS Programs and 3.8M Lines of Code to Databricks in 18 Months

MigryX Case Study • April 2026 • Telecommunications

Executive Summary

A major telecommunications provider in the United States — supporting tens of millions of wireless, broadband, and enterprise subscribers — had accumulated one of the largest SAS estates in the industry. Spanning 4,200 SAS programs, 3.8 million lines of code, deep macro libraries, and a combined SAS Enterprise Guide and SAS Data Integration Studio deployment, the environment consumed over $6 million annually in SAS licensing and supporting infrastructure. Following a board-level decision to standardize on Databricks as the enterprise analytics platform, the organization selected MigryX to execute the full SAS-to-Databricks migration. Over 18 months, MigryX's AST-based parser resolved the entire macro inheritance hierarchy, converted all PROC calls and DATA step logic to optimized PySpark, and deployed the migrated estate to Databricks with Unity Catalog governance. The program produced over 2 million lines of PySpark, performance improvements of 3–6X on subscriber analytics and network telemetry pipelines, and a projected $8.5 million in three-year savings.

Client Overview

The client is a facilities-based telecommunications carrier offering wireless, fiber broadband, and enterprise networking services nationwide. Their analytics function is critical to subscriber lifecycle management — from acquisition propensity modeling and network capacity planning to churn prediction, billing dispute resolution, and regulatory reporting to the FCC. The SAS estate had been built incrementally over 18 years, starting with a small team of biostatistics-trained analysts and growing through four major acquisitions that each added their own SAS macro libraries and PROC-based reporting frameworks. By the time of the migration assessment, the estate encompassed work from hundreds of analysts and engineers across marketing, finance, network operations, and regulatory affairs — many of whom were no longer with the organization.

The strategic rationale for migration was clear: SAS licensing costs had grown substantially through successive SAS Institute contract renewals, the SAS Enterprise Guide interface was increasingly at odds with the organization's data science talent pipeline (new hires expected Python and cloud-native tooling), and the organization's Databricks investment — already substantial for new-build ML and streaming analytics use cases — was producing significant underutilization on per-workspace compute commitments.

Business Challenge

The SAS-to-Databricks migration presented a distinctive set of challenges compared to other ETL platform migrations, driven by SAS's unique combination of a proprietary programming language, a rich procedural library, and deeply nested macro capabilities:

The MigryX Approach

MigryX deployed its SAS AST parser as the foundation of the conversion pipeline. Unlike regex-based or pattern-matching approaches, the AST parser fully tokenizes and parses each SAS program into a language-aware syntax tree that correctly understands scope rules, macro variable resolution order, conditional compilation (%IF/%THEN/%ELSE), and the interaction between macro and DATA step boundaries. This allows MigryX to resolve the full effective code of each program — with all macro calls expanded in context — before generating the target PySpark representation.

The macro resolution phase was particularly critical for this client. The 1,400-macro library included several deeply interdependent macro families developed for subscriber segmentation, network KPI calculation, and regulatory reporting. MigryX's resolver built a complete macro call graph, identified circular reference risks (which were flagged and resolved with client SMEs), and generated expanded representations that were then optimized to eliminate redundant intermediate datasets before PySpark generation. The resulting PySpark code for macro-heavy programs was, in most cases, more concise than the macro-expanded SAS representation.

DATA step conversions employed MigryX's semantic transformation library, which maps SAS DATA step constructs to their structural PySpark equivalents. The MERGE statement with multiple IN= options was converted to join chains with explicit null-checking and row-source tagging. RETAIN-based running calculations were converted to PySpark window functions with appropriate frame specifications. Hash object lookups were converted to broadcast-joined reference DataFrames. Array processing with DO OVER loops was converted to PySpark explode/pivot patterns or vectorized column operations as appropriate for the array's cardinality and use pattern.

SAS DI Studio job flows were extracted via the DI Studio metadata export API, parsed to recover the job dependency graph, and converted to Databricks Workflow task definitions. The resulting Workflows maintained the original scheduling cadence, dependency semantics, and error handling behaviors while adding Databricks-native monitoring, alerting, and retry policies. Unity Catalog was configured as the governance layer for all migrated datasets, with column-level access controls aligned to the client's existing data classification taxonomy — a requirement driven by FCC CPNI (Customer Proprietary Network Information) regulations and the client's privacy program under state-level consumer protection laws.

SAS PROC Mapping Reference

SAS PROC PySpark / Databricks Equivalent Conversion Notes
PROC SORT DataFrame.orderBy() NODUPKEY option mapped to .dropDuplicates() on BY keys
PROC SQL spark.sql() / DataFrame API SAS SQL extensions (CALCULATED, OUTOBS) resolved to standard SQL
PROC TRANSPOSE DataFrame.groupBy().pivot() ID, VAR, and BY variable mappings preserved exactly
PROC MEANS / SUMMARY groupBy().agg(mean, std, min, max) CLASS and VAR variable structure mapped to groupBy keys and agg columns
PROC FREQ groupBy().count() + cross-tab pivot TABLES statement cross-tabulation converted to pivot with count aggregation
PROC LOGISTIC Databricks MLlib LogisticRegression Parameter options mapped; model outputs stored in Delta Lake
PROC REG Databricks MLlib LinearRegression MODEL statement variables mapped to feature/label columns
PROC MIXED Custom PySpark + statsmodels UDF Mixed model fitting requires Python statsmodels via Pandas UDF
PROC REPORT Databricks SQL + Delta table output DEFINE variable options mapped to SQL aggregations and column aliases
PROC TABULATE groupBy().pivot().agg() chain CLASS hierarchy and TABLE statement structure fully preserved
PROC IMPORT / EXPORT spark.read.csv/excel() / .write DBMS= options mapped to Spark format readers with equivalent options
DATA Step (MERGE) PySpark join chain with IN= tracking columns Complex BY-variable merges require semantic analysis

Key Migration Highlights

Security & Compliance

As a regulated telecommunications carrier, the client operates under FCC CPNI rules, state-level Consumer Privacy Acts (including California CPRA and New York SHIELD Act), PCI DSS for billing systems, and SOX financial reporting controls. The migration program was scoped to maintain compliance posture throughout the transition, with no degradation of existing controls permitted.

Results & Business Impact

The 18-month program concluded with all 4,200 SAS programs successfully migrated, validated, and operating in production on Databricks. The SAS Enterprise Guide environment was decommissioned three months ahead of the original target date, and the SAS DI Studio metadata server was shut down upon completion of the final job flow cutover. The program's timeline and methodology served as the blueprint for two subsequent migration programs covering acquired-company SAS estates.

4,200
SAS Programs Migrated
2M+
Lines of PySpark Generated
3–6X
Avg. Pipeline Performance Gain
$8.5M
Projected Savings Over 3 Years
18 Mo
End-to-End Migration Duration
85%
Fully Automated Conversion Rate

The $8.5 million three-year savings projection is anchored by $5.8 million in eliminated SAS Institute licensing fees (Enterprise Guide, Base SAS, SAS/STAT, SAS/ETS, and DI Studio components), $1.7 million in reduced SAS Metadata Server and compute infrastructure operating costs, and $1.0 million in reduced SAS administration and developer support labor. These savings materially contribute to the organization's stated goal of reducing legacy analytics platform spend by 70% within five years to fund AI/ML capability buildout on Databricks.

"Our SAS estate was the result of 18 years of organic growth through acquisitions and analyst-driven development. We had programs that nobody fully understood, macro libraries that were treated as black boxes, and a licensing bill that went up every year regardless of what we negotiated. The AST-based approach gave us the first complete picture of what our SAS estate was doing, at a level of detail we'd never had before. The licensing savings in the first year substantially offset the program investment."

— Vice President of Data Engineering, Major US Telecommunications Provider

Ready to Modernize Your SAS Estate?

See how MigryX can accelerate your migration to Databricks with AST-driven automation. Full macro resolution. Comprehensive PROC coverage. Automated validation.

Explore Databricks Migration →