8 Red Flags To Watch Out For When Hiring Data Scientists

October 6, 2023 • Data Science

By Ken Swearengen

“Analytics is 50% math and 50% communication. If a person cannot express their ideas in written or presentation format, it doesn’t matter if they can do the math.“
Mia Umanos, CEO of Clickvoyant

A CV filled with impressive credentials can capture attention, but it’s the subtleties during an interview that reveal the most about a candidate. Unlike many other fields, data science requires a unique blend of technical expertise, business acumen, and interpersonal skills.

Every hiring manager knows the gravity of a wrong hire, especially in a domain as critical as data science.

A misfit can not only hinder project progress but can also disrupt team dynamics, making the interview process all the more crucial.

It’s not just about assessing the candidate’s knowledge of algorithms or programming languages, but also understanding their problem-solving approach, communication style, and adaptability.

In this landscape, knowing what to look for during interviews becomes paramount. Red flags can sometimes be subtle, easily masked by a candidate’s confidence or eloquence. However, a keen eye can spot these signs, which often hint at deeper underlying issues.

While no single interview technique guarantees a perfect hire, being aware of potential pitfalls can significantly enhance the hiring process’s effectiveness.

This post delves into eight red flags that job candidates might display during data science interviews, helping you make informed decisions and securing the best talent for your team.

1. They build models without business context

Many technical projects for data science interviews involve having the candidate working with real or simulated data to help solve an actual business problem that the hiring company may face.

This is a great way to see how a candidate would work on your team by seeing what actual insights they can drive, given some information about your business.

However, some candidates will ignore the business problem and instead focus on showing off their modeling skills in an effort to show you what kind of insights they can deliver with the little bit of information you gave them.

The problem with this is precisely that they’re only working with a little bit of information.

Unless you’ve spent a few hours with them going over your business model, all the various pieces of data you collect, the nuances of the business, and all the relevant business contexts of the data, then their model is going to be useless at best or drive harmful business decisions at worst.

When candidates create predictive models without knowing the business, they display a lack of humility and an inclination to jump to conclusions based on possibly faulty assumptions.

This careless behavior can waste a lot of resources for your team and your business.

2. They show a lack of curiosity about stakeholders

This is a requirement for every data role – a data scientist who doesn’t understand internal stakeholders and customers will fail to produce valuable data insights.

The logic behind this is similar to the first red flag. Without learning about how the business operates and who the primary users are, the candidate is forced to rely upon assumptions about the context of the data within the company.

A four panel comic of a man next to a whiteboard. For the first three panels, who looks confident. In the first the whiteboard says "i'll create amazing dashboards for your stakeholders". The next says "they'll use advanced predictive modeling techniques." the third says "all without stakeholder input". in the last panel the man looks unsure as he looks at the whiteboard that again says "all without stakeholder input".

Without input from the people actually utilizing the data, this candidate would be working in a black box with zero feedback from others. That’s a recipe for disaster and will undoubtedly lead to useless data insights.

3. They seem unwilling to learn and grow

Data science is a healthy balance of programming, stakeholder communication, good judgment, and some applied statistics.

No matter how senior, the candidate should show a willingness to improve those skills.

You can usually gauge this in an interview by asking them what they’re currently learning about or about a lesson they recently learned based on a mistake they made. If they are unwilling to learn or can’t tell you a story about improving on their mistakes, that is a noticeable red flag.

4. They are unwilling to receive feedback

Often, a data scientist is a black box to stakeholders. “I don’t know how they do it, but they make these models that predict the future, and it’s basically magic to me, but it works” is a sentiment a data scientist has likely heard at least once.

Data scientists, then, have to accept that stakeholders will regularly ask them to explain their output and conclusions in an easy-to-understand way – this is especially true when they deliver insights that go against common business intuition.

They will need to be able to field these kinds of curious questions as well as handle constructive feedback from others. If they’re unable to respond to these kinds of responses – they shut down or react with defensive anger – then this shows an unwillingness to either defend their ideas or have the humility to admit that they might be wrong.

You also probably won’t see these candidates interested in working with other teams or seeking feedback about their work if they were to join your team. Be careful if you choose to hire them.

5. They are unable to communicate with non-technical stakeholders

Non-technical stakeholders will always be involved in some aspect with data decision-makers – whether as the consumer of the data insights or someone responsible for sharing the context behind a new data source.

The ability to break down very technical information into a format that won’t overload people is crucial.

Some stakeholders won’t have the knowledge base to understand (or care to understand) the statistical methods behind your conclusion.

Frequently, they’re busy enough that they just want to know what insights your candidates will be able to provide them to make their lives or the finances of the company better.

Data science candidates should be willing and able to break down complex information for teams outside their own – whether for accounts payable, sales, marketing, or any other department that needs to utilize the information.

If candidates can’t do that, it’s a massive red flag because it means they likely won’t be able to hold on to stakeholder trust for long. If stakeholders don’t trust the source of their new insights, they won’t be willing to take action on it, and you have a big problem.

🔖 Related resource: Mastering Jupyter Notebooks: Best Practices for Data Science

6. They cannot justify their technical decisions

Just like being unable to communicate with non-technical stakeholders, if a candidate can’t describe and reasonably defend their choices at a technical level, then they will not be able to hold on to technical stakeholder trust.

A data scientist should be able to describe steps taken to clean, transform, and operate over data at a reasonably technical level. Some examples candidates could use:

With the help of a developer or data engineer, they used SQL queries to clean up the data they wanted to use.
They noticed a heavy imbalance of labeled data for model training, so they added medians wherever they found missing data (~<15% of all rows). They explain that doing this mitigates bias in the final model predictions.

They artificially standardized the metric they want to predict to a scale between 0 and 1 so that they have more easily interpretable prediction output.
They coded all categorical columns into a sparse dataset of 0s and 1s to include non-numerical predictors in the model, some of which help raise prediction accuracy.

Fortunately, this red flag is pretty easy to detect in an interview – you set up your question in something like a Jupyter Notebook, hand it off to the candidate, and then have them walk you through the logic behind their models or algorithms. You can ask them to explain things that don’t make sense to you or that you would have done differently, and if they can’t explain it, you may want to move on to the next candidate.

7. They lack proficiency in SQL and don’t understand databases

This goes along with the previous point, but anyone working in data should understand how to query that data and how it is collected and stored.

Panel comic. On the top a man is speaking to an audience and says "who wants to be a data scientist?"; everyone in the crowd has their hand raised. On the bottom the speaker now says "who wants to learn sql?", and no one in the crowd has their hand raised.

For junior candidates, you may want to include a few SQL questions in the coding portion of the interview. For more senior candidates, a few verbal questions about database design or query structure should suffice – they may be insulted if you hand them a technical question that is too simple. It is a waste of their time.

8. They have shiny object syndrome

Part of the appeal of getting into data science these days is all the new technologies and tools you work with.

That’s fine. In fact, that curiosity can be a boon to your team.

However, data scientist candidates should also be willing to show how they’ve done the tedious but essential grind work that often comes with algorithm and model development.

If they’re always worried about learning the newest technologies at the expense of doing necessary work (i.e., shiny object syndrome), you may want to pass on adding them to your team.

Conclusion

Detecting these red flags in an interview is easier than you think.

You can easily test the candidate’s ability to understand and communicate data and basic programming skills by using a tool like Jupyter Notebooks in your interviews.

CoderPad has an integration that allows you to do just that – check out the pad below for an example question you can use in your own data science interviews.

Some parts of this blog post were written with the assistance of ChatGPT.