Inheriting a Data Disaster: Building an Enterprise Data Platform (Part 1)

I’ve recently taken the opportunity at my organization to move from Lead DBA to Data Architect / Data Platform Owner. It’s sort of convoluted, but in essence we had people leaving — the Lead DBA who was replacing me already, the Data Warehouse Lead, and the Data Governance Officer.

I knew the Data Warehouse Lead was leaving and had volunteered to take that role. Yes, I knew what I was getting into before saying anything. Our “data warehouse” was anything but. At best, I’d call it a data repository.

There weren’t any fact tables, no dimension tables, no declared grain, no star schemas, and no semantic or cube layer. There were no slowly changing dimensions, no conformed dimensions, and no real modeling strategy at all. What we had instead was a collection of source system tables pulled in wholesale, with views thrown on top.

Not a terrible proof-of-concept idea or starting point — but this was legitimately where reporting and operational decisions were being made.

We had one view in particular with over 600 columns, across more than 30 joins, and zero indexes. Some tables didn’t even have primary keys or unique constraints.

Every day, as the Lead DBA, I was getting complaints about the performance of the data warehouse. Why was the Lead DBA getting these complaints? Because the “data warehouse” was sitting directly in my production OLTP cluster. The ADF pipelines failed constantly. There was no source control, no change management, and — in true terrible fashion — while most developers followed a dev/test/prod promotion process, our actual data warehouse developer made changes directly in production using SSMS. No dev. No test.

Yeah — it wasn’t even a promotion to take this on. 😀
Just me saying, I can fix that.

So let’s start where I started: identifying the pain points while stabilizing and standardizing practices. This series will probably hit one or two items per post — usually one technical pain point and one business pain point. Identifying these helped me architect a plan, a platform, and ultimately change how the organization thought about data.

Pain Point #1 – Wild West Code Change & Deployment Practices (Internal Pain)

The first thing I did was convince a super talented .NET developer to become my Senior Data Engineer. Before that officially happened, he helped me introduce and implement GitLab, code reviews, real use of Dev and Test, and — most importantly — enforce standard practices.

Why GitLab? Because the developers were already using it. I’m very anti–multiple systems, and we already had a contract, licensing, institutional knowledge, and the platform was in place. Zero friction to get started.

We set up issues as lightweight Standard Operating Procedures for any repeatable task. We established a code review and promotion process.

The organization still hadn’t automated CI/CD, so that quickly became my next coaching-up theme. I started pushing hard that we needed to spend time and energy getting deployment automation in place. We’ll get to that later.

Paint Point #2 – “Data Warehouse” performance (Business Pain)

I didn’t even really need to ask about this — but I did conduct interviews with operations, data science, report developers, and analysts. There were two problems here, and one very obvious reason we were in this situation.

No one had actually designed a data warehouse. As I mentioned earlier, what existed was a repository: tables and views. There were no fact tables, no dimensions, no defined grain, no handling of slowly changing data, and no semantic layer. There were barely indexes — we were literally reporting off heaps.

My background is heavily DBA-focused, with some practical exposure to data warehousing from an unsuccessful consulting bid earlier in my career. Because of that, I always assume the system I’m inheriting was built by someone doing the best they could with the time, money, and knowledge they had at the time. There’s no reason to blame or shame — it is what it is, and hopefully we can make it better.

Given the tooling and timeframe, it was reasonable to assume a traditional dimensional warehouse was the intended outcome. But what we had wasn’t functioning as one.

I knew early on that this “data warehouse” — and yes, I’ll probably keep putting that in quotes — was going to need to become an actual data warehouse. My first plan was to request a new server, stand up SQL Server Analysis Services, and start building proper dimensional models. Let’s call that Plan A.

Potential Plot Twist – “Snapshot Repository”

While learning more about the existing warehouse from the retiring DW Lead, I learned we also had something called a Snapshot Repository.

At some point, someone had implemented a pattern that looked roughly like:

SELECT * FROM Y INTO snapshot_repository.dbo.Y_<date>

This happened every pay period, for multiple tables, because someone wanted to see what the data looked like at points in the past.

That immediately triggered something in the back of my brain. I couldn’t quite place it yet — but it felt important.

Wrapping this post

Ok, that’s it for today. In the next post we’ll talk about 2 more pain points, and another plot twist. maybe?

Inheriting a Data Disaster: Building an Enterprise Data Platform (Part 1)

Pain Point #1 – Wild West Code Change & Deployment Practices (Internal Pain)

Paint Point #2 – “Data Warehouse” performance (Business Pain)

Potential Plot Twist – “Snapshot Repository”

Wrapping this post

About The Author

Ryan

Leave a reply Cancel reply