Claudio-Paul B Lead
Claudio Diniz
Cláudio Diniz Data Engineer

Tech Focus Tue 7th December, 2021

From software engineering to data engineering with… Scott Cutts

After a few conversations about what a data engineer is these days, I’ve found that there isn’t a shared understanding of the role. I also noticed the majority of the data engineers I spoke to were experienced software engineers.

Based on this, I decided to create a new blog post series that consists of interviews with our data engineers. I believe this will help to demystify data engineering and it might encourage more software engineers to become data engineers.

This week the data engineer I’m interviewing is Scott Cutts.

What is data engineering for you and how does it overlap with software engineering?

I think data and software engineering are two sides of the same coin, and that is the business domain. Whereas software is about direct user and system requests and using domain-driven design to service these needs, data engineering is about providing insights, models and data in the same domain-driven way, to service user and system data needs. The key separation is that data engineering is always after the fact, on existing datasets that are used to create other datasets, insights or models. You also often don’t have complete control over how your users interact with your product, as they can query your datasets how they want, and will often use it in unexpected ways!

How did you get involved in data engineering?

It was an accident! I was on a fintech project as a software engineer, which required us to work closely with the client’s data scientists and provide real-time Insights from external sources, and produce aggregated counts that fed into their predictive models. This was all Scala/REST/Kafka work so I thought I was doing software engineering – really this was already data engineering, a good example of the blurred line between the two.

A group of us were asked if we could help improve the ETLs the data scientists were using to train their models, and spike out Spark compared to their existing Hadoop platform. We used Behaviour Driven Development and TDD to drive out our ETLs, just like software, and we got to work directly with data across the organisation. This was the best bit – with data transformation it feels like doing the best domain parts of software but even richer and with more understanding of the business.

What are the skills a data engineer must have that a software engineer usually doesn’t have?

I’d say 95% of skills are transferable, but some that would improve are:

  • DevOps and infrastructure as code. Most software engineers have experience in that, but with data you can expect more, both in the data platform infrastructure itself, plus sourcing/serving data from/to a variety of interfaces.
  • Multitool awareness and adaptability – there are a LOT of ways to solve data engineering problems out in the world. Not only are there different cloud providers with their own stacks, but every organisation is likely using it in very different ways to solve their problems with different architectures. Data engineering is still a while away from standardisation like software engineering tech stacks.
  • SQL – you will need to get better at it, even if it’s just for verifying your own data or doing analysis/debugging and not necessarily coding in it.
  • Optimisation – data is big, and will usually be slow and require several iterations of optimisation to work faster and cheaper.

What data trends are you keeping an eye on?

The Data Mesh architecture is the big one at the moment, as it looks to spell the end of the data monoliths (warehouses & lakes), and move towards the digital platform and domain-driven solutions that work so well in software. The tooling is fast developing to match this, with abstracted, TDD compliant frameworks like dbt that let data teams focus on the solution, and less on the underlying infrastructure.

MLOps I find very interesting as well, and providing an automated, dependable way to deploy models at the organisation level that are trusted by users is a great challenge. This part of the industry is moving really fast and best practices are only recently being encoded.

Do you have any recommendations for software engineers who want to be data engineers?

Most “data projects” have a mix of software and data engineering. Find one of those projects, enter it as software and start pairing with the data engineers. Your software skills will be useful immediately, and you’ll soon learn the rest.