Increasingly over the past few years, we have heard the industry talking about the roles of data engineers and data scientists. We have seen analogies such as data scientists are “racing car drivers” and data engineers are the “racing car mechanics” or data scientists are “rock stars” and data engineers are “the band”. The best one has to be “data scientists are Chris Martin and data engineers are the others”. Are these roles anything new?
For discussion in this article we are going to set aside the role of a data scientist. However, if you are a data scientist and you find yourself cleansing data, you should read our post on “Are you a data scientist or a data janitor?”
The History of a Data Engineer
First of all, this role has been around for a long time and people operating in this role have gone by many names throughout the years such as; software engineers, data integration specialists, ETL developers, data developers, big data developer, dashboard developers and the list could go on for some time. Anyone who has been working in IT for many years will have at some point gone by one or many of these titles and are still data engineers by every right, the concepts of bringing data together, cleansing data and making it fit for purpose goes back to the 1970’s and the birth of relational databases.
[km-cta-block block-classes=”has-dark-teal-background-colour has-white-colour” label=”Contact us to discuss your requirements”]
Want to know more?
Our data experts would love to hear from you
[km_button link=”https://www.dufrain.co.uk/contact/” classes=”cta-2″]Contact us[/km_button] or [km_button link=”tel:08001303656″ classes=”cta-2″]Call us on 0800 130 3656[/km_button][/km-cta-block]
What Disciplines fall Within a Data Engineer’s Role?
Data engineer responsibilities can encompass any or indeed all, of the following disciplines:
Metadata Management/ Data Lineage – there are three main areas of metadata management business, process and technical metadata. The ownership of these three are spread across the organisation. Chiefly a data engineer would concern themselves with technical metadata and potentially process metadata; ensuring that tables, views and source files have been profiled and that ETL jobs, pipelines, etc. have the appropriate and transparent data lineage. An engineer could also begin logging process metadata like how long jobs run for, how many rows were loaded and if you are still on-prem, have they started to slow down due to resource limitations.
Data Quality – every organisation needs to have a data quality strategy, if you don’t then this needs to be on a higher priority list than hiring that data scientist. A data engineer can take a lead role in the execution of a data quality strategy by implementing quality controls at numerous steps in data cycles whether that be in batch processes, on storage in a data warehouse or by guiding business users in wrangling their data in a self-service environment. Data engineers have the appropriate balance of technical knowhow and data best practices to guide up stream systems on how obtain good data quality at source.
Data Integration / ETL / ELT – this is the bread and butter for every business who uses data and is most likely the space where data engineers would most often spend their time. As discussed in the disciplines above, the art of building these pipelines and batch jobs is not as simple as picking up data and popping somewhere else or as some see it, techies throwing some code together. Data can be integrated or transformed when and if required, but it also needs to be a controlled and governed process. In these days of GDPR compliance, transparency of data is pivotal between extraction from source to appearing in your report at the end. It’s your data engineers that are driving this and this where they can provide you with the most value and insight.
Data Preparation – data wrangling, data munging or whatever the synonym, the concept is still the same; to prepare data for consumption by the business or for data scientists. In a self-service environment this data preparation can be done by business users and use the GUI-based data preparation tools to shape, discover and analyse the data, they also have the advantage of understanding the data in a business context but however this does come with its own risks. Data engineers can take on this task by supply cleansed and consistent data, once the business discoveries have been identified.
Data Presentation (BI/MI) – probably the oddest discipline that the industry classes as a data engineer’s responsibility. Where data science involves making financial projections, future trends, risks and opportunities, BI & MI reports and dashboards help us understand what has already happened in our business. This enables us to meet our compliance and regulatory commitments as well as adjusting our day to day business decisions based on what happened yesterday. Previously, we might class this discipline as being done by data analysts, as they have the business context behind the data and understand how to represent the data so it can be used to its fullest. However, many of these analysts now classify themselves as data engineers, but with a data presentation and insight focus. Therefore, not only do you need to to identify if you have a data engineer, but you also need recognise the type of data engineer you have. To hear about some of our data presentation tool assessments, you can read our case study
Cloud Data Engineering Solutions – you can’t have a blog post without mentioning cloud data engineering, mainly because its fundamentally important in the modern day data landscape. It’s important to note that the disciplines previously mentioned are not only technology agnostic but are also agnostic to on-prem or to cloud. You can implement them in either. In a world that’s moving to cloud, it’s beneficial for data engineers to familiarise themselves with the services available and the benefits cloud brings. The good news is that a lot of the skills are transferable, so they do not need to retrain completely to be able to implement their existing discipline knowledge in a cloud environment.
So, do you already have a Data Engineer?
We at Dufrain don’t see the data engineer role as being anything other than a new label for someone who implements the time tested data management principles, whether that’s creating some new jobs to extract and load data, implementing new data quality check points, wrangling (or grappling) with data for reporting or actually producing the end dashboards and reports. These disciplines are important, and no singular role or person can implement all off them. A data engineering team needs to be a collaborative group of people who can bring their knowledge and experience in each discipline and achieve a full, high quality data landscape, and that is the best way to drive your business to being data scientist ready.
If someone came to mind when reading through any of the points above, then yes you already have a data engineer. You do not need to search high and low for someone with data engineering skills or find “Data Engineer” referenced on LinkedIn, you already have them in your organisation. Recognise the type of data engineer you might have, invest in them and give them time to training and upskill. Ultimately your business is going to benefit from it.
[km-cta-block padding=20 block-classes=”has-dark-teal-background-colour has-white-colour” label=”Contact us to discuss your requirements” ]
Find out which approach works best for your data
Our data experts would love to hear from you
[km_button link=”https://www.dufrain.co.uk/contact/” classes=”cta-2″]Contact us[/km_button] or [km_button link=”tel:08001303656″ classes=”cta-2″]Call us on 0800 130 3656[/km_button][/km-cta-block]
