Introduction
Artificial Intelligence (AI) is one of the fastest-growing areas of computer science. More and more computer systems are adopting AI and machine learning. Machine Learning is a part of Artificial Intelligence that focuses on the study of computing and mathematical algorithms and data sets to make decisions without writing manual code. In other words, machine learning is writing code that lets machines make decisions based on pre-defined algorithms on provided datasets.
So, what is the most popular programming language for machine learning? Almost any programming language can be used to write ML-based applications. However, writing every algorithm from scratch is a time-consuming process. The best-suited programming language is the one that comes with pre-built libraries and has advanced support of data science and data models.
Github just announced (https://github.blog/2019-01-24-the-state-of-the-octoverse-machine-learning/) its list of top machine learning languages based on Github contributions. The top 10 machine learning languages in the list are Python, C++, JavaScript, Java, C#, Julia, Shell, R, TypeScript, and Scala.
I did some more digging and searching of various papers and online forums on the Internet. I also looked at Google Trends and search keywords in various SEO tools and websites. Here is my list of the most popular programming languages for machine learning.
- Python
- C++
- Java
- JavaScript
- C#
- R
- Julia
- GO
- TypeScript
- Scala
1. Python
Python is one of the most popular programming languages of recent times. Python, created by Guido van Rossum in 1991, is an open-source, high-level, general-purpose programming language. Python is a dynamic programming language that supports object-oriented, imperative, functional, and procedural development paradigms. Python is very popular in machine learning programming.
Python is one of the first programming languages that got the support of machine learning via a variety of libraries and tools.
Scikit and TensorFlow are two popular machine learning libraries available to Python developers.
2. C++
C++ is one of the oldest and most popular programming languages. Most of the machine learning platforms support C++ including TensorFlow.
TensorFlow's C++ API provides mechanisms for constructing and executing a data flow graph. The API is designed to be simple and concise: graph operations are clearly expressed using a "functional" construction style, including easy specification of names, device placement, etc., and the resulting graph can be efficiently run and the desired outputs fetched in a few lines of code.
Learn more about how to get started with TensorFlow using C++ here:
3. C#
The C# language was created by Anders Hejlsberg at Microsoft and launched in 2000. C# is a simple, modern, flexible, object-oriented, safe, and open-source programming language. C# is one of the most versatile programming languages in the world. C# allows developers to build all kind of applications including Windows clients, consoles, Web apps, mobile apps, and backend systems.
C# can be used for machine learning applications via a .NET Core machine learning platform, ML.NET. ML.NET is a cross-platform open-source machine learning framework that makes machine learning accessible to .NET developers.
ML.NET allows .NET developers to develop their own models and infuse custom machine learning into their applications, using .NET, even without prior expertise in developing or tuning machine learning models.
Along with these ML capabilities, this first release of ML.NET also brings the first draft of .NET APIs for training models, using models for predictions, as well as the core components of this framework such as learning algorithms, transforms, and ML data structures.
TensorFlowSharp is an open-source library that allows an API to work with TensorFlow library using C#, F# and .NET.
4. R
R language is a dynamic, array-based, object-oriented, imperative, functional, procedural, and reflective computer programming language. The language first appeared in 1993 but has become popular in past few years among data scientists and machine learning developers for its functional and statistical algorithm features.
R language was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand. R is open-source and available on r-project.org and Github. Currently R is managed and developed under the R Foundation and the R Development Core Team. The current version of R is 3.5.2 that was released on Dec 20, 2018.
R language is one of the most popular programming languages among data scientists and statistical engineers. R supports Linux, OS X, and Windows operating systems. There are several R packages available publicly to download on project R website here: https://cran.r-project.org/
The R interface to TensorFlow lets you work productively using the high-level Keras and Estimator APIs, and when you need more control provides full access to the core TensorFlow API: https://tensorflow.rstudio.com/
5. Java and JavaScript
Java is the most popular programming language in the world. Java was developed by James Gosling at Sun Microsystems that later acquired by Oracle. There are 9 million Java developers in the world.
JavaScript is the most popular Web scripting programming language. There are several machine learning libraries and frameworks that support Java and JavaScript.
6. Julia, Go, Shell, Prolog, Lisp, Ada, TypeScript, and Scala
Several other languages that provide machine learning support and usage include Julia, Go, Shell, Prolog, Lisp, Ada, TypeScript, and Scala.
Machine Learning usefulness depends on the frameworks and libraries available to developers. Two of the most popular machine learning frameworks are TensorFlow and scikit-learn. Github found the following packages are the top 10 in the list imported by machine learning projects.
Github reports:
We pulled data from the dependency graph to calculate the percentage of projects with machine learning or data science topics that import popular Python packages. The list above shows the top ten packages imported by these projects. Here’s what we found:
- Numpy—a package with support for mathematical operations on multidimensional data—was the most imported package, used in nearly three-quarters of machine learning and data science projects.
- Scipy, a package for scientific computation, pandas, a package for managing datasets, and matplotlib, a visualization library, are all used in over 40% of machine learning and data science projects.
- Scikit-learn is a popular machine learning package, containing implementations of a large number of machine learning algorithms—it’s used by nearly 40% of projects.
- Tensorflow, a package for working with neural nets, is used in nearly a quarter of packages.
What is TensorFlow?
TensorFlow is an open-source machine learning framework for everyone. Tensorflow was originally developed by Google’s research team and allows developers to build numerical computations for CPUs, GPUs, and TPUs and run on laptops, desktops, to cloud servers.
Tensorflow supports Python, Java, C++, JavaScript, Go, and Swift programming languages. There are also some community open source projects available for C#, Haskell, Julia, Ruby, Rust, and Scala.
Here is a quick intro to TensorFlow:
What is scikit-learn?
The scikit-learn is an open-source library for machine learning for Python language. The scikit-learn was written in C, C++, Cython, and Python were created by David Cournapeau in 2007.
Here is a video introduction of scikit-learn:
Conclusion
Machine learning is a growing area of computer science and several programming languages support ML framework and libraries. Among all of the programming languages, Python is the most popular choice followed by C++, Java, JavaScript, and C#.