It’s a big scary world out there, where do I start with data science

Michael Mahoney
6 min readApr 15, 2021

So you’re new to data science. Maybe you’ve worked in the tech industry and want to get more insight into the machine learning world. Maybe you’re outside the industry entirely and are desperately trying to transition in (I get your pain). The question is where to start. Well congratulations, you’ve stumbled on one of the questions that have been plaguing humanity at large since the invention of farming.

I’m not going to give you a step by step of foolproof instructions that’s sure to land you where you want to go. If you’re bothering to read this, I’m sure you’ve come to the conclusion that the yellow brick road was cheating. Sorry to burst the bubble, but the machine world doesn’t have such a road either. Perhaps even more nefarious, there are a hundred thousand roads that are of various different colors and levels of “shiny,” all claiming to lead to OZ. Seems fishy huh.

Where does this leave you?

The best answer to that question depends on who you are (don’t groan until I finish the explanation). The internet is a lovely, if not a horribly organized magical world. All the resources to get you from “What is machine learning” to a competition-winning neural network engineer exist and are free on the internet. Yes FREE. Before you ask, I have heard the saying, “there’s no free lunch.” Whoever said that died before the internet was invented.

“So there’s a no-cost way of ascending to the heavenly heights of cutting-edge technology?”

Technically yes, for the vast majority of people, no. I’m not doubting your fundamental ability to understand the material, I’m doubting your ability to navigate a technical industry filled with jargon and disparate source material all while raising a family/working/living life.

Can it be done? Of course, it can. If you’re the type of person who can dive into a subject and quickly explore, gather, organize and understand, then close this blog, pull out your magic wand (we call it google), and search “MACHINE LEARNING.” Good luck, have fun.

For everyone else, hold on for another minute.

Examples are often useful for conveying complex information in an intuitive way without segway-ing into a series of definitions that quickly lose peoples’ attention. That being said, examples are only a small slice, and in no circumstance, convey a complete understanding of a subject. With that in mind, here’s my example.

I’m good at math. What does that mean? My undergraduate degree says “Mathematics” on it. For the better part of a year, I contemplated jumping into grad school and going for the whole PhD thing people keep talking about. I even went as far as to take 15 credits worth of graduate-level courses prior to completing my degree.

I knew how to code on a basic level. Prior to the beginning of my data science career, I was proficient in both python and JavaScript (not a professional developer but capable of solving most coding issues with some trial and error + time).

In 2019, I spent 14,000 dollars on a technical certification course at flatiron school for data science.

Why did I do that?

Because I wasn’t going to get there on my own. Believe me, it doesn’t feel good to actually write it out.

I’m a reasonably driven guy. For months and months, I studied up on learning coding and general computer science things on my own. I did make progress. But I wasn’t making enough progress. Tech is moving quickly folks. It’s naïve to think that a couple of hours of hobby time every week is going to make you competitive in an industry that’s undergoing revolutions every year. Despite the fact that I have a technical math background and general familiarity with computers, I genuinely needed help to funnel the massive amount of information that exists in the machine learning world into a stream of information that I could consume and grow from. I paid for that help, quite literally.

Does this mean you won’t be able to do this without lots of math, programming, and money? Not exactly.

Data science does include a lot of math but in the same way that driving a car does. Sure there’s a bunch of fancy engineering going on under the hood — lol — but driving a car doesn’t require a fundamental understanding of how an internal combustion engine (or electric motor!) functions. It’s a lot more about examples and learning in a structured way than anything else.

As a math person, I won’t ever tell you that knowing some math is a bad thing when learning machine learning, quite the opposite. Continuing with the car analogy, I would describe a data scientist as the mechanic. Like the driver, the mechanic doesn’t need to know all the engineering of the car, but they are generally aware of the various parts of the engine, what they do and more importantly, how to fix them when they break. A data scientist doesn’t have to know every bit of math behind machine learning algorithms, however, they do need to know what the algorithms are doing so they can manipulate, fix and implement them.

What about programming skills? This one is a little more touchy. As it stands today, there are no good interfaces for creating tailored machine learning models that don’t require some basic level of programming. The good news is that python is the language of choice. I’m not going to detour too long, but python is an open-source object-oriented programming language with lots of free examples and tutorials. If you’re serious about learning data science, your first assignment is to get a basic familiarity with python. Lookup a video on youtube about installing python on your local computer and do a couple of quick programs to get the hang of the language's syntax and feel. I personally recommend installing python through anaconda (google this and read a quick explanation for more info). Good luck and happy hacking.

Back to the money issue…

14,000 is a scary number. It was scary for me. There is financing available that makes the hurdle much more easily traversed at the cost of some interest over time. I won’t pretend to understand your specific situation. A 450 dollar a month payment for a certificate is what it is, and only you know if it could work out. For me personally, it appeared to be the best way to transition into the industry quickly. Yes, there was a high cost, but it was worth it in my case given the promise of the industry’s median salary for entry-level positions being around 75,000 dollars (in the Denver area).

You don’t have to go the certificate route. This likely won’t be the fastest way to learn but it will certainly be cheaper. What should you do then? Look for online businesses or communities that specialize in machine learning education. Some will charge a subscription, some will be pay per course, some will be free! Udemy is the one that comes to mind. Udemy is a large network of computer science classes and communities that focus on specific parts of the tech world. There are many good data science courses for all levels of knowledge. These courses are online and reasonably affordable — always wait for the sales! Every now and again there will be 90% off sales. Buy everything you want to learn during these sales.

Regardless of the exact platform for your learning, be aware that the support for this route will be less inclusive than a more structured certificate. If you want to fill the gaps, or need an alternative approach to some concepts, YOU will need to be the one seeking out the answers. Google is always your friend (we’ll ignore the big brother implications for this blog). Chances are, someone has been confused by the same thing or has asked the same questions. It will take some searching, but the answer is out there 99.99% of the time. Most communities also have forums for technical discussions. Lesson one when using the forums: DON’T BE AFRAID TO LOOK LIKE AN IDIOT. There are no dumb questions, but people will likely make you feel like there are. Ignore rude people. I know, it’s easier said than done, but at the end of the day, your goal is to fill the gaps in your understanding so you can be a successful data scientist, not to rage at every fool who takes time out of their day to make others feel small.

I’m going to wrap this post up with some final pieces of advice. You will get stuck that’s ok. You are not dumb, computers are dumb. Break problems down into small manageable steps. Take breaks if you’re getting nowhere as burnout is a real danger if you’re taking the more unstructured route. Go on some detours if you find something interesting, machine learning will always be here when you get back. Besides, what you learn while wandering might help in the grand scheme of things. USE THE INTERNET.

Good luck out there my friends.

--

--

Michael Mahoney

I love life, family, math and the internet. I’ve done everything from academic research to digging holes. I can be stubborn but always try to keep and open mind