Learning Data Science For Free

pile of books

Thanks to community, university and corporate supported efforts, learning about data science has never been easier or cheaper. In fact, as long as you’re not looking to put a degree or certificate on your resume you can pick up the foundational knowledge you need to for just the cost of your time. Learning about data science isn’t just for data scientists either! As data becomes more integrated in modern business, there are benefits to knowing the basics no matter what your role. Let’s explore the landscape.

Books

For those who like to learn from a good old fashion book, there are a number of great eBooks available for free. Here are a few that I consider essential.

As far as free books go, the Python Data Science Handbook by Jake VanderPlas may be the best as far as building a foundation for data science concepts in Python. It covers essential techniques like linear regression, k-mean clustering, decision trees, and random forests. It also introduces the most popular Python data science and machine learning libraries including Pandas, NumPy, Scikit-Learn, and Matplotlib. If you’re just starting out or want a reference to bookmark, this is the one.

Think Stats book cover
Image Credit: Green Tea Press

Probability and statistics are important for data scientists to master. Let’s face it, even those of us who have been exposed to stats often need a refresher. Think Stats by Allen Downey is a great guide, and one I find to be quite accessible. It’s written for programmers which is a nice break from the traditional math text books you might be accustomed to.

Some more advanced options include:

Courses

The number of free courses is staggering! The time commitment for each varies from a few hours to several months. Though there are many paid courses available, you can save some money by auditing a course (meaning you don’t get a certificate or instructor feedback) or taking advantage of courses sponsored by a corporation.

Coursera offers a number of great data science related courses, and if you want to save the enrollment fee you can choose to audit. The Deep Learning Specialization is especially popular, and for good reason.

MIT offers an extensive collection of deep learning courses and lectures online, all for free. This is one example of a corporate/university sponsored effort. TensorFlow is a partner, so expect to see it highlighted at times.

One of many MIT’s Deep Learning Lectures

Kaggle offers a number of (mini) courses on Python, SQL, and various data science fundamentals. These are not as in-depth as a full course, but if you want to get though a concept in a few hours they’re a great option.

If you’re looking to learn about data products and tools on a particular platform, you’ll find free courses provided by the associated vendor. This is true for AWS, Google Cloud and others.

YouTube Videos

YouTube is an excellent resource for learning. In addition to entire courses hosting lectures here (see the MIT Deep Learning page above), you’ll find conference presentations, tutorials and vendor created content. It’s difficult to cover an extensive list in this post, but I’ll share a few examples to give you an idea of what’s out there.

If you want to understand a concept without lots of math and technical detail, YouTube is full of videos like this one from Simplilearn.

You’ll find numerous university course lectures on YouTube. This is one of many by MIT OpenCourseWare. These videos usually link back to the full course online, so if you like what you see you can go all in!

Going to conferences is expensive. Thankfully O’Reilly and other companies that run conferences often post a subset of talks to YouTube. You can find everything from keynotes to technical talks.

GitHub Repos

If you’re like me, you like to mix in some hands-on learning with books, videos and courses. I find myself starring and bookmarking GitHub repos that I can clone and start experimenting with. Here are a few that come highly recommended in the data science community.

  • Machine Learning From Scratch : A Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. A wonderful repo for beginners!
  • PyTorch-Transformers : A library of state-of-the-art pre-trained models for Natural Language Processing (NLP)
  • bsuite : is a collection of carefully-designed experiments that investigate core capabilities of a reinforcement learning (RL) agent

You can also view popular repos on GitHub by topic. I suggest browsing machine-learning, data-science, and deep-learning.

Competitions and Hackathons

Once you have the basics down, it can be fun to enter a competition or take part in a hackathon.

Kaggle is probably the most well-known site for competitions but there are others such as DrivenData.

As for hackathons, I like to search Meetup for ones nearby. Hackathons are great not only for learning, but also for networking, job hunting and of course free food and beverages. There are also sites like Analytics Vidhya that run hackathon-like contests online.

Twitter

There are numerous experts in the field, and accounts dedicated to sharing resources on Twitter. I run one such account called Data Science Reads (@DSReads) that shares resources like the ones above along with the latest news from the industry. I find the highest engagement I get with my followers is when I share content that appeals to those just learning the craft.

Some other accounts to consider following:

  • Towards Data Science – A stream of posts authored by independent writers from around the world. There are a lot of technical tutorials here.
  • Open Data Science – You may know them from the conferences they run, but they also share great content on Twitter.
  • Data Science Renee – Renee is a Director of Data Science, hosts the “Becoming a Data Scientist” podcast, and shares resources and wisdom with aspiring data scientists.

When to Spend, When to Save

Though this post is focused on free resources, if you have the money to spend your options expand significantly. I’ve spent a few hundred dollars on books over the past few years and they’ve served me well. I also like to support authors who make content available online for free by buying a copy of their book. Look for a post on some of those books in the future.

As for courses and degree programs, consider taking the leap if you want the credential on your resume, the structured support of learning and job placement, or you just learn better in such a setting. Don’t expect the content to be better than what you can find for free however, as evidenced by the free books courses offered by MIT and others above.

At the end of the day, free and low-cost resources are opening up opportunities for everyone. Whether you want to become a full-time data scientist or are just interested in gaining exposure to the concepts, there’s no shortage of places to start.