Basic Setup for Data Science
January 17, 2021
The path to becoming a data scientist begins with setting up your computer with the appropriate tools. Most data science ecosystems are built around Python and R.
In this article, we will create a basic Python setup using the Anaconda Package Manager. This guide will guide you on installing popular machine learning packages.
Table of contents
- Table of contents
- Installing Anaconda Package Manager
- Installing machine learning libraries
- Creating and accessing virtual environments using conda
- Installing commonly used packages
- Advantage of using conda
Installing Anaconda Package Manager
Go to the Anaconda Package Manager website and install the individual edition. It is available at this link.
These are the set of steps required to install Anaconda Package Manager. Choose the installer based on your OS, and download it. Once downloaded, run the file, and you should see a dialog box pop up.
Step 3: Selecting the first option results in the Anaconda Package Manager getting added to the environment variables. This allows one to call the command
conda from the terminal.
Click on the install button to complete the installation.
Installing machine learning libraries
A few of the popular machine learning libraries are given below:
As new features are added and introduced, the libraries are updated continuously. Sometimes these updates result in incompatible changes, sometimes incompatibility issues.
For example, if scikit-learn depends on Numpy 2.X and numpy updates to the 3.X version, then scikit-learn cannot work due to compatibility issues. To avoid this, one should always refer to the compatibility requirements in the requirements.txt file.
Therefore, to avoid possible compatibility issues in the future, we use virtual environments that keep the installation of libraries local to the project. If another version of a library is needed, we can create a new virtual environment. Let us create a new virtual environment using conda.
Creating and accessing virtual environments using conda
Creating a virtual environment is easy using Anaconda Package Manager. This is accomplished using the
conda create command.
The structure of the command goes as follows:
conda create --n branch-name python=version_number
Let us look at an example:
conda create --n new-environment python=3.6
The above example creates an environment with Python version 3.6.
To activate the environment, we use the following command:
conda activate new-environment
Deactivating is done using the command
Installing commonly used packages
Let us install the libraries scikit-learn, TensorFlow, and PyTorch using conda. One resource of prime importance to us is the Anaconda search bar.
We could search for it in the search bar on Anaconda’s website to install it. Another option is to search for conda and the package names using any search engine of choice. This way, the process of going to the main website is bypassed.
The anaconda website for scikit-learn should look like this:
You will find a few varying commands. In most cases, the first command should work. However, in a rare scenario, if version conflicts occur, you may try the other commands and try it with other library versions.
Deep learning packages make use of the hardware available locally. Hence, a significant speedup is observed. However, hardware-specific software needs to be installed to support such hardware-based acceleration.
Installing these was a cumbersome process, and a small mistake would result in two hours of effort getting wasted. Using
conda, all this becomes very simple. Consider this command to install the tensorflow-GPU version. In the case of Tensorflow 2.X, there is no distinction between the CPU and GPU versions.
conda install tensorflow-GPU
This idea was inspired by this article published around two years ago:
PyTorch is relatively simple to install. Go to the Pytorch website. Under the Get Started tab, select the appropriate options.
Once selected, you should get your command below. Once that is done, just run the command in the virtual environment to avoid future compatibility issues.
In this case, we get the following command:
conda install PyTorch torchvision torchaudio cudatoolkit=10.2 -c PyTorch
Advantage of using conda
- The installation that we just finished using conda would have taken around 2-3 hours to completed using the standard approach. It could take a beginner days to figure out the installation instructions and might end up giving up. This approach ensures we get the same results with minimal effort.
- Moreover, the creation of virtual environments allows one to isolate compatibility issues.
Note: Creating virtual environments eats up a lot of space in the hard disk. Therefore, one must be thoughtful of the environments being created.
To check for all the environments present, use the following command:
conda env list
In this article, we have covered the installation of Anaconda Package Manager and the creation of virtual environments. For all the future articles, we will refer to this article. I hope you enjoyed reading the article.
Peer Review Contributions by: Saiharsha Balasubramaniam
About the authorLalithnarayan C
Lalithnaryan C is an ambitious and creative engineer pursuing his Masters in Artificial Intelligence at Defense Institute of Advanced Technology, DRDO, Pune. He is passionate about building tech products that inspire and make space for human creativity to flourish. He is on a quest to understand the infinite intelligence through technology, philosophy, and meditation.