The Essential Guide to Interviewing Data Engineering Developers
Data is quickly becoming the most valuable resource in the world, and data engineering is the process of extracting that value from raw data. For this reason, a great team of data engineers can help a company make the most of their data and gain a huge competitive edge. On the other hand, failing to do can disastrous.
Hiring a great data scientist is difficult and competitive. Data engineering covers a very broad range of skills, and it can be hard to determine which ones you need. You’ll also be competing with other companies aggressively trying to hire the same candidates. But, with the right knowledge and careful planning, you’ll be able to identify and win the right candidates for your business. Let’s get into it.
- Who needs to hire top data engineers?
- Data science and engineering: extracting value from data
- The difficulties with finding data engineers for hire
- How this interviewing guide will help
- What separates expert data engineers from the rest?
- What separates expert data engineering teams from the rest?
- Interview questions and answers to identify top-level developers
- Interviewing teams
- Working with a team vs individual freelancer
Who Needs to Hire Top Data Engineers?
Any company that wants to build custom software to extract insights from their unique data needs someone to handle that process. Maybe you want insights into industry trends, customer buying patterns, how to cut marketing costs, inefficiencies in your business processes, weaknesses in your sales process, how to improve your products, and even ways to keep your organization safer.
Hiring teams of data engineers and mining data in this way used to be a luxury just for the biggest companies. Now, things have changed.
This guide is for:
- Startup founders and entrepreneurs looking for new ways to leverage data for a competitive edge
- Enterprise executives wanting insights from their data to help drive business decisions
- Project and product managers that want to streamline processes and improve products using data-driven techniques
Data Science and Engineering: Extracting Value from Data
Turning raw data into useful insights and knowledge is a relatively new field. It can be hard to know if you need a data engineer, data scientist, data analyst, statistician, deep learning expert, or a business analyst. To figure this out, you need to know what kind of data problem you have.
First, you have to know if you’re dealing with ‘small’ or ‘big’ data. Basically, if you can fit your data into a document or spreadsheet, its small data. If you can’t, and just storing the amount of data you have is a problem in itself, you have big data. Analyzing a small data set requires different techniques to a much larger one.
Next, you need to know what kind of data you have, structured or unstructured. These will also require different techniques.
- Small data – structured: Requires using business intelligence techniques to find patterns in spreadsheets
- Small data – unstructured: Requires techniques to find patterns in unstructured data such as text video, such as Natural language Processing
- Big data – structured: Requires sophisticated machine learning algorithms to get through all of the data
- Big data – unstructured: Requires large amounts of resources to run deep learning algorithms
Next, you need to know who will be consuming the insights you’ve generated – a human or a computer. Once your data engineers have analyzed your data, are they going to present the findings to you and other business decision makers, or are they going to feed the results to a computer as input into some other program?
The first will need skills in graphing, charting, and general ‘storytelling’, while the second will need razor-sharp programming and algorithmic knowledge.
The Difficulties with Finding Data Engineering Developers For Hire
It seems crazy, but data science really can deliver this much value to your business in this many ways. That’s why they are in such high demand around the world and pull in top salaries, especially in the USA. If you want great data engineers to work for you, you’re going to have to compete with other companies for their skills.
The hype around the field of data science is also a problem. The job demands a combination of technical prowess and understanding of business strategy that few teams or individuals have. But, the prestige and big paychecks have attracted many less qualified developers to start applying for these jobs. If you don’t know what you’re looking for, you could end up with a dud.
To make matters even more difficult, any great candidate you do find will almost certainly be getting offers from other companies. If your interview process is too boring or takes too long, great candidates will just go somewhere else.
Traditional interview techniques aren’t designed for this kind of hiring environment, and trying to hire your usual way won’t work in finding a data scientist. Specialized methods are needed to successfully secure the services of expert data engineering developers.
How this Interviewing Guide Will Help
In this guide, we’re going to go through the key traits of a great data engineer, and what separates them from the rest. We’ll go through the interview process you need to identify the developers and teams that will get the most out of your data, and get them to work on your project.
The objectives are to:
- Weed out the unskilled as quickly as possible
- Uncover real breadth and depth of understanding – not just knowledge
- Avoid boring good developers with a tedious interview process. Rather, we want to attract talent by presenting your project as something great to work on
What Separates Expert Data Engineers from the Rest?
With any type of software development, it’s tempting to go with the developers that ask for the lowest rate. However, cheaper developers are cheap for a reason – they usually have less experience and hard skills. Hiring them comes with all sorts of risks, they may:
- Have a limited skill set
- Take far too long to do the required tasks
- Fail to understand what you really mean
- Struggle to communicate effectively
- Try to cover up mistakes
- Make expensive mistakes that only reveal themselves later
- Struggle to solve problems unique to your business
- Not have the experience or insight to make helpful suggestions
- Be unable to work effectively in a team
- Have a risk of being stranded if the top developer in their team leaves
- Lack communication skills
- Take on too many projects at once due to lack of experience, and neglect your project as a result
With data engineering developers, this is even more important. It’s easy to tell if your web developers have built a great website or not, but it is much more difficult to tell if predictive models your data science team has put together are accurate and if you should use them to make business decisions. Experts won’t have these problems.
Great data engineering developers and teams are special. They have an intuition about how your business operates, what valuable information might be hidden within your data, how to uncover that information, and how to present it in a way that’s most helpful to you.
To find them, here are the key indicators you need to look for:
1. Math and statistics background
Math and statistics are at the core of data science. Strong math and statistical knowledge is an absolute must for any data developer.
Required skills include:
- Algebra and calculus
- Regression, linear regression
- Set theory
- Interval notation and algebra with inequalities
- Uses for summation and Sigma notation
- Exponents and logarithms
- Numerical analysis, Bayes' Law, and Central Limit Theorem
- Set theory
- Predictive modelling
These techniques are used to find patterns in data, and to extend those patterns to form predictive models that you can use. However, your data engineers will be doing a lot more than math. In fact, most of the real work your developers will end up doing will be writing software to store, clean, and analyze data, which brings us to the next point.
2. Coding, software, and hacking skills
Coding and software engineering skills are the bread and butter of data engineers. At the end of the day, the end product will be code and software that actually solves your problem. Data engineers also use a lot of software tools. Much of the heavy statistical lifting in data projects is done with libraries and tools. Top data engineering programmers need to know which ones to use and how to use them.
The key software skills needed are:
- R, MatLab, SAS – Programming languages and software environments for statistical analysis, data visualization, and predictive modeling
- Python, PHP, C++, Java, Perl
- Amazon Web Services, Salesforce, Heroku
- Data modeling tools (ERWin, Enterprise Architect and Visio)
- SQL (PostgreSQL and MySQL)
- NoSQL technologies (Cassandra and MongoDB)
- Hadoop – An open-source framework for distributed computing
- Hadoop-based technologies (MapReduce, Hive, and Pig)
- Tableau or other data exploration tools
- Building and using APIs
- NLP and text analysis
- Machine learning techniques
It’s a massive list, and each one requires years of experience to master. That’s why data scientists and engineers almost always work in teams to complete a project. The breadth and depth of required knowledge is just too much for an individual.
Hacking skills here refers to having the resourcefulness to find a way to get the job done. Data engineers frequently come across problems that have no simple solution. The best ones can roll up their sleeves and find a way to get the job done, even if the method is a little unconventional.
3. Business knowledge
This is what really separates data engineers from less qualified database programmers. Your data engineers will be building the pipelines, storage facilities, and analytical engines for your data. They will need to quickly get a feel for how your business operates, and build a data architecture that matches your business needs.
What Separates Expert Data Engineering Teams from the Rest
Data engineering is best done in teams. There are simply too many different skills for an individual to do well.
The best teams will have multiple specialists working together to get things done. Each of the individual members should have the necessary developer qualities we discussed above. However, teams of developers need to have some special qualities, too. Here are some of the things to look out for:
1. They work seamlessly as a unit
Just like a great sports team, functioning as a coherent unit and following a strong gameplan is much more important than having a team of superstars. Everyone needs to know their place in the team. Strong leadership and team structure will ensure that top quality work is delivered consistently. Good teams may have one or two junior developers, but their limitations are known and they have mechanisms in place to ensure code quality.
The developers will also hopefully have complementary skill sets and know each other's strengths and weaknesses. These things together will ensure that code is resilient, reliable, extensible, and easy to understand.
2. They have great internal and external communication
Good internal communication means that developers talk to each other, and always know who is working on what. External communication means talking to you and your team. You should always be in the loop. This includes things like:
- Who’s working on what
- Who’s responsible
- What’s been completed
- Current progress and deadlines
- Budget management
The best data engineering teams never make themselves indispensable. With each new completed feature or project, they should show you how it works, and how to use things without them.
3. They have a positive team culture
Big egos can really make a team of otherwise competent developers crash and burn. Blaming each other for mistakes, passing the buck, competitiveness, and arguing are a waste of everyone’s time and often lead to failed projects.
A team’s culture should be based on honest communication, helping each other out, and working collaboratively towards success.
Interview Questions and Answers to Identify Top-Level Developers
Your interview process should be done in stages. You don’t have time to spend hours interviewing every candidate and should try to weed out unqualified or unsuitable candidates as quickly as possible. A good process for interviewing data engineers will look something like this:
- Basic screening stage – Finding the candidates who meet the minimum experience and qualification requirements
- Phone screening interview – More in-depth test of technical skills including programming
- Final interview – In-house or video call interview to dig deeper into how the candidates might approach your unique project*
Step 1: Basic screening stage
The first stage is about removing unqualified candidates quickly so you don’t waste your time. The questions should be to do with logistics such as minimum experience, location, rates, and language requirements.
- How well do you speak English (or whatever language you need)? 1 – 10
- How well do you write in English? 1 – 10
- How many years of experience do you have with data engineer?
- How many data engineering projects have you worked on?
- Are you familiar with (specific technology, e.g. AWS, Salesforce)
- Where are you based?
Questions for Teams:
- Who will I be communicating with mainly?
- What project management tool do you use? What access will I have?
- Do you use agile?
- How well does your contact person speaks English?
- How well do your developers chat in English?
- Number of people on the team who have more than 3 years of data science experience
- Number of people on the team who have more than 3 years of database management experience
The questions should have a correct and incorrect answer only. For the first questions, you need to work out what “incorrect” is. Maybe a minimum English proficiency, minimum number of years experience.
The questions shouldn’t take too much time for good developers, but don’t make the quiz too long. You don’t want to bore your best candidates. Anyone who fails the compulsory questions or to get an overall minimum score is removed from your shortlist.
If you’re doing a lot of interviewing you can also try to automate this stage with an automatically marked online form.
Step 2: Phone Screening Interview / Take-home Test
This is where you are going to find out which of the candidates left on your list are actually talented data engineers and hopefully have some fun too. That last part is important. A quick search on Glassdoor shows that data engineering candidates loved the interview processes at companies like Facebook because the process was fun and inclusive, rather than harsh and difficult.
This stage of the interview can be done over the phone, ideally with some screen sharing technology so you can see the candidate solve some problems and do some live coding. Or, if you don’t want to do it this way, you can give the candidates a take-home test to complete.
When interviewing, you want to start off with the most straightforward questions. You can check the answers quickly and waste less time on unqualified candidates. With data engineers, it’s best to start off with some coding questions.
Example SQL Questions:
Question: Write an SQL query to get the second highest salary among all Employees, given the table:
ID, Salary 10, 6000 11, 5000 12, 8000 print(num)
SELECT MAX (Salary) FROM Employee WHERE Salary NOT IN (SELECT MAX(Salary) FROM Employee );
Question: Write an SQL Query to find employees that have the same name and email from the table
ID, NAME, EMAIL 10, John, jfitz 20, George, gsmith 30, James, jsmith
SELECT name, email, COUNT(*) FROM Employee GROUP BY name, email HAVING COUNT (*) >1
Question: Find the max salary and department name from each department from these two tables:
ID Salary DeptID 10 1000 2 20 5000 3 30 3000 2
ID DeptName 1 Marketing 2 IT 3 Finance
First, the candidate should clarify if there can be a department without any employees. The answer is yes, so the query should look like this:
SELECT d.DeptName, MAX(e.Salary) FROM Department d LEFT OUTER JOIN Employee ON e.DeptId = d.ID GROUP BY DeptName
Example Coding Questions
FizzBuzz questions are great for coding interviews. They are clear, don’t require domain knowledge, require problem-solving without being a ‘trick’ question, and there are plenty of resources online for you to compare answers with. Here’s an example:
Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”
E.g. 1, 2, Fizz, 4, Buzz, Fizz, 7, 8, Fizz, Buzz, 11, Fizz, 13, 14, FizzBuzz, 16, …
The reason FizzBuzz questions are great is that they don’t have one perfect answer. Rather, they have lots of possible approaches, and how your candidates solve the problem sill reveal things about the style of coder they are. Do they just jump in head first with the simplest solution? Or, do they plan for the future and take care to make an efficient solution?
# Solution 1
for num in range(1,101): string = "" if num % 3 == 0: string = string + "Fizz" if num % 5 == 0: string = string + "Buzz" if num % 5 != 0 and num % 3 != 0: string = string + str(num) print(string)
# Solution 2
for num in range(1, 101): if num % 3 == 0 and num % 5 == 0: print('FizzBuzz') elif num % 3 == 0: print('Fizz') elif num % 5 == 0: print('Buzz') else: print(num)
Solutions to FizzBuzz questions can get very intricate. Have a google around to find more questions and answers.
Stage 3: Final Interview
You’ve cut down your list to the best two or three candidates. All of them meet your minimum requirements, and all of them have the technical ability to get your project done. Now it’s time to find out who is the best.
This part of the interview should ideally be done in-house or over video chat. The questions will be extremely open-ended and the interaction should be very conversational. Remember, keep things relaxed, inclusive, and fun.
You don’t want to ask difficult generic, college style toy problems. These are a waste of time and often won’t really help you distinguish between experienced developers and those who have just finished studying and have the information fresh in their minds. The best way to do this is with open-ended questions to complex problems – preferably to do with your specific project.
Give a complex scenario and ask them how they would best approach it. The less time you give them to prepare answers, the better. Use a variety of questions, including ones on the development processes, timeframes, technologies they’ll need for the project.
- We have a search function in our product, describe how you might implement a string segmentation function
- Why would MapReduce be a useful tool in this project?
- Our data is quite unstructured, how might you process it to make it more manageable?
- How long would it take you to get an MVP for this project? Give me a breakdown of the components
- Describe the technologies you would use to complete this project. What are the trade-offs involved in using them?
- What value do you think my customers will get out of this project?
There are no definitive answers to these questions. The idea is to expose how these developers approach problems and the depth of their knowledge. When you get answers like this from multiple developers, the top-level developers identify themselves quickly. Their answers will make the most sense.
Tips on interviewing:
- No Gotchas. Don’t use interview questions that require a specific insight to solve and rely on the candidate to “get it”
- Don’t make the candidates code on a whiteboard or on paper, let them use a computer
- You can give partial credits for problems by helping them in the right direction
- Don’t test niche skills that can’t be learned online
- Make the phone interviews difficult enough. If someone makes it past this stage, there should be a good chance that you’ll hire them
The process will be slightly different for teams. Once a team has passed your minimum requirements test and technical skills, you want to communicate with them in a way that mimics how you will work with them on a project.
For example, having the team on a group chat or call for the interview will give you an insight into how they operate. If you ask a question about a specific topic, the team member that has the best knowledge in that area should lead the response - with other team members chipping in where necessary.
You are looking for the qualities we talked about in the dev teams section – working as a unit, good communication with you and each other, and a positive team culture.
Working with a Team vs an Individual Freelancer
Working with an individual developer isn’t advisable for data engineering projects. Working with an individual can be risky, especially for larger or enterprise projects. You are putting your faith of your entire project in the hands of one person. A team is more likely to have the depth of knowledge to make your project successful.
A data engineering team could have statistical modeling, coding, AWS, Salesforce, and business domain specialists – making it much more likely your project will deliver a great return on investment.
Another major aspect is development speed. Just because you hire a team, it doesn’t necessarily mean you are hiring them full time. A team of three developers can do the work of one in a third of the time or less. That means they bill you for the same number of hours, but things get done three times as fast. This is particularly important when scaling. A team can prepare to spend more time on your project at key times. A single developer can become a bottleneck.
Summing it All Up
Data engineers are a new type of developer that can launch a business way ahead of the competition. However, companies are figuring this out fast, and finding and hiring data engineering developers is no easy task.
With the knowledge and interview structure discussed above, you’ll be able to hire data engineering developers for your project quickly and with total confidence.