If you’re hiring data scientists, or just follow the field, you’re probably aware that Python has taken the top spot as the language of choice. Though there’s endless debate in the industry I’d say 2017 was the year that Python really pulled ahead of R, the language that a lot of data scientists had been traditionally learning in college and grad programs. A (controversial) KDNuggets poll from 2017 agrees.
What about R?
Before I state my case for why Python has overtaken R, let me say that there’s nothing inherently wrong with R. R has been popular for a long time for a reason. Data scientists, and others in the mathematics and scientific community find it to have an easier learning curve than more general programming languages. It’s designed for a lot of the problems that data scientists have been solving over the last 5+ years, and because it’s used in the classroom many entry level data scientists can get off the ground running with it when they join a company. There’s also a wonderful community of R users along with a plethora of packages and code examples to support data scientists.
That said, data science, and machine learning in particular, are maturing rapidly. A few years ago, it was common for data scientists to explore data and build models in R and then hand off the results and model to a team of software engineers to implement in production. That approach lives on, but I’ve seen a shift to data scientists working closer to application engineers or machine learning engineers (who are a sort of a hybrid application engineer and data scientist).
As that transition has taken place, the wall between building models in R and handing off to engineers to get into production has become a pain point. Python being a more robust programming language helps bridge the gap between model development and production implementation.
How? Even if your production application isn’t written in Python, it’s quite common to turn your data science / machine learning models into a service with a REST API on top of it. If your data scientists are already coding in Python, the handoff to engineering becomes a much easier process and a more collaborative exercise. Now your data scientists and engineers are literally speaking the same language.
Of course data scientists coding in Python in the first place wouldn’t be possible without a strong and growing community of data-minded folks. That community has built Python packages aimed at a data scientists and continues to innovate. Pandas, scikit-learn, and others are nearly universally used amongst data science teams today. In addition, distributed frameworks for processing large volumes of data such as Spark enjoy Python support.
Python vs. R When Hiring
Every data science team has their own preferences, but data scientists with experience in Python are in demand more than ever. That doesn’t mean that a data scientist needs to be able to ship production code in Python. A strong partnership between data scientists, machine learning engineers and application engineers is key to powering your application with machine learning. Data scientists working closely with the engineers that put models in front of your customers is a massive competitive advantage.
Don’t forget to sign up for the Data Liftoff mailing list to get more content and to stay up to date on the latest in data science and data engineering.