How To Break Into Data Science

We’re revisiting the How to Break Into X theme from our earlier edition on How to Break into Product Management to clarify another nebulous field with buzzword sex appeal but a lot of misconceptions.

Data science is a broad interdisciplinary web of statistics, programming, data analysis and mathematics, the goal of which is to extract insights from structured and unstructured data.

There’s a ton of distinct roles and specialties within this field, including data scientists, data analysts, data engineers, machine learning engineers, etc. and it can get hella confusing trying to work what you need to learn in order to break into this space.

Here are some simple definitions extracted from Chandra Reddy’s article on data careers to help break down the buzzwords:

  • Data Analyst: Engages in data inspection, cleaning, transformation and modelling, and communicates the result of the analysed data with their team.

  • Data Scientist: Analyses data using machine learning algorithms to gain future insights that could propel a company.

  • Data Engineer: Builds and optimises a platform that ensures accurate data for data scientists and analysts to work with.

  • Machine Learning Engineer: Trains the exiting system to make it learn and predict the trend or outputs if the dataset is given.

But in helping clarify how to land these sort of roles, we spoke to 3 young guns in the data science space for their top tips on what skills and experiences to build in order to land a data-aligned role, from what sort of skills to prioritise, to what sort of side projects to focus on:

Tristan Frizza - Associate Data Scientist @ Atlassian

  1. Code lots. Usually SWE skills aren't prioritised in analytics/DS roles as an actual developer, but in my opinion, coding is such a ubiquitous skill and it really helps to write clean, modular code.

  2. Do side projects that you actually care about. If you're just doing it for the resume stack, then you won't put that much effort in. It really stands out in interviews when candidates demo something unique that's not just some basic Titanic dataset model from Kaggle (that is a huge red flag, but maybe less of a red flag for intern roles because they're no doubt inexperienced).

  3. Be an independent learner. Take courses on e.g. Coursera, watch videos on Youtube, read papers, do Kaggle comps etc. (not all of the above but at least 1 or 2). Since it's a new field, there are so many new things to learn, and methods and techniques change pretty quickly. There's also not as established a curriculum for it in uni yet; I pretty much self taught everything DS related, and continue to do it daily on the job.

Tristan's also working on a dope machine learning side project. You can check it out on Github here.

Alex Bunn - Senior Data Scientist @ Westpac

  1. Expand your connections on LinkedIn and talk to some people in the industry. You learn about techniques and technology from the people you meet and then you can self teach yourself the techniques and methods.

  2. Sign up to some article publishers like Medium or Towards Data Science to keep on top of new models and techniques that are being used. This is important to build a breadth of knowledge.

  3. Start some projects of your own using new tools and techniques. Don't rely on the same method for everything, try something new.

  4. Collaborate on open source projects and make some contributions

  5. Build a capable skill set. Cover data engineering, analysis and data science. An employable data scientist is one that can ingest, analyse, model and build visualisations. Software to be familiar with: sklearn, Tableau, PowerBI, Tensorflow/PyTorch, SQL, PySpark, pandas.


Jacky Wong - Founding Scientist/Engineer @ Vector AI + Founder @ GALAT AI

Here’s the skills you need to get a DS job (in order of importance and difficulty):

  1. LEARN PYTHON. No one in industry uses R. If they do, you’re looking more at not so up to date data science firms/actuarial firms. But it’s also important to understand that python is far more versatile than R for projects. Project-based learning works well and is usually good enough to talk about in interviews. Bonus if this is a tangible product that you have metrics for (downloads/views/stats)

  2. Open Source projects. You should look to contribute to open source GitHub projects where possible (our project at https://github.com/vector-ai/vectorhub is open source and welcoming PRs! It’s also easy to contribute!) VectorHub is our project for maintaining state of the art vectors which form the base for AI search/AI recommendations. Feel free to check out at hub.vctr.ai for what this looks like!

  3. Winning data science competitions. If you are interested in a competition, definitely participate in them and try your best to win. Kaggle has a constant stream but the technical bar is high (don’t be afraid though - tonnes of resources to help out newbies). I recommend winning or ranking in University or national ones first. This greatly demonstrates interest and intelligence if you do end up doing well - otherwise, you simply learn something new to help you win the next competition. For university students, the annual EY one/UNSW DataSoc one should be an easier game.