I don’t have a graduate degree (Masters or PhD) and honestly will likely never get one. Sometimes this can feel like a limiting factor in a technical data scientist role. However, there are ways to fill in some of those gaps to limit this imposter syndrome. Really the main thing is to prevent learned helplessness around a subject where it all seems too complicated (i.e., “Avoidance is the root of anxiety”).
No one has complete knowledge of any topic, BUT some folks have anxiety breaking down a dauntingly complex problem and going deeper – THAT is more so an issue than confidence around all subjects. Not to have a computer science degree, but to be able to think like a computer scientist.
I like to think of the four common data science “academic” gaps I’m going to go into as similar to experience in several adjacent roles:
- Product Manager – MBA Business acumen for executive alignment, business presentations/soft skills, organization structure, and strategic decision making.
- Machine Learning Engineer – Statistics Masters/PhD Fundamentals around data, uncertainty, and modeling.
- Data Engineer – CS Masters Computer science for coding, algorithms, distributed computing, and pipeline infrastructure.
- Economist – Economics Master/PhD Causal inference for experimentation and econometrics causality methods (e.g., where experiments are not feasible).
With these four superpowers I believe I’d be able to add value in almost any data science arena, and, conversely, without knowledge in these subjects something will come up where I feel totally lost. Let’s dig in:
Masters of Business Administration (MBA) or Product Manager (PM) Experience
A mentor of mine said a PM’s role, in a word, is “influence.” That is, power to not only make decisions, but make sure they are carried out correctly and the team is clear on the direction the ship is heading. This involves strategic thinking, communicating effectively, and a deep understanding of different people, roles, and incentives. Here is a great resource of books I’ve read that scratched my lack of MBA itch (along with working with PMs making major product decisions for a few years).
Masters of Statistics (MS) or Machine Learning Engineer/Modeler (MLE/M) Experience
Here I specifically am thinking of the statistical components of machine learning (ML) and understanding statistical modeling. There are a lot of resources here online (StatsQuest, etc) and I highly recommend ChatGPT to review any concepts in depth (e.g., asking ChatGPT “Give me the code for a basic example,” etc). This is often the core technical work of what people consider to be “data science” – that of creating production machine learning models.
Masters of Computer Science or Data Engineering (DE)
There’s no real replacement for experience with coding (SQL, Python, R, etc). My favorite way to learn these things are to go through exercises (e.g., pgexercises.com for SQL), and technical screens (e.g., leetcode, etc) – although the best confidence builder is interview experience (similar to studying vs. taking a test in school). Once the coding/algorithms start to click, the complexity of engineering work and reading code starts to make a bit more sense (e.g., refactoring functions, debugging, git, automation, etc). My go-to wisdom is to try to break down bigger problems into smaller problems of inputs, outputs, and efficiency (i.e., “First make it work, then make it work better”).
Economics
I recently added this one to my list mainly because of econometrics (often used interchangeably with “causal inference”). Now, the gold standard of causal inference is experimentation (A/B testing), but there are a lot of alternative methods (instrumental variables, regression discontinuity, etc) that add a lot of value in determining causality. This has a LOT of overlap with a statistics degree, often with some slightly different terminology that takes a little getting used to (e.g., thinking of linear regression as “ML”).
Recap
So – if you could have the ultimate data science background, that might include a PhD in statistics and economics, a computer science masters (or PhD in AI), and an MBA while also founding a successful startup. The thing is, no one has a perfect background. And even if they did, they also need to apply the right solution/skill to the right problem, which can sometimes be a gamble. They also have to work with a team, where others might have more specific domain specialties.
Ultimately what matters is the ability to learn and adapt; to maintain neuroplasticity and flexibility; to collaborate and work with others on the problems at hand, going in-depth if need be. Even if you had all that knowledge, things change over time, as new technologies and methodologies come out. Data science requires collaboration and soft skills (i.e., “If you want to go fast, go alone. If you want to go far, go together”). Granted, knowledge of the fundamentals make adoption easier – in the same way that learning a second language is easier than learning a first, and learning a third is easier than learning a second, etc. Having a strong academic background can help stay sharp in a particular area – but there are always things we can learn. The important thing is to lean into the discomfort of not understanding why/how something happens and following that curiosity.
Daeus Jorento ’13 is a product data scientist at Square (Block, Inc.) living in Oakland, Calif. He has worked in data analytics for 11 years (two at the Federal Reserve in Washington, D.C., two at SoFi pre-iPO, and seven on three different product teams at Square). He writes about career progression in his blog – where the original, more in-depth version of this article resides – and sometimes on LinkedIn.