How to use Python package manager for beginners (Ft. Conda with Cheatsheet)
Overview
By the end of this tutorial, you should be able to answer the following questions in about 10 to 15 minutes.
- What is the core problem in open-source development?
- What is a package manager and why do we need one?
- How do we use a package manager in practice?
Motivation
Let’s first discuss why we need a package manager. You might have used tools
such as numpy
for mathematical functions as shown below.
# my-fancy-code.py
from numpy import pi
print(pi) # 3.14...
NumPy is a Python collection of tools with functions and variables. However, we
must import this library or also known as “dependency”. If a program uses
numpy
, it “depends” on the functions provided by numpy
. One can install
numpy
via a command-line interface such as terminal as shown below.
# install numpy (will download 1.26.4 as of Feb 2024)
pip install numpy
Imagine you are required to run bob-software.py
, which relies on NumPy version
1.26.4 and Python version 3.11 or higher as shown below.
# bob-software.py (Python v3.11 or higher and NumPy v1.26.4)
import numpy as np
import sys
# Determine version compatibility
if np.__version__ == "1.26.4" and sys.version_info >= (3, 11):
print("Good! Your NumPy version is 1.26.4 and your Python version is 3.11 or higher!")
else:
print("Failed to compile. Check your Python and NumPy versions")
But what if you also have another software that you run daily,
alice-software.py
, which requires a different NumPy version?
# alice-software.py (Python v3.11 or higher and NumPy v1.26.3)
import numpy as np
import sys
# Determine version compatibility
if np.__version__ == "1.26.3" and sys.version_info >= (3, 11):
print("Good! Your NumPy version is 1.26.3 and your Python version is 3.11 or higher!")
else:
print("Failed to compile. Check your Python and NumPy versions")
Here, we see a simple difference. However, in reality, there can be dozens of dependencies with varying requirements. When collaborating as a team, your script may work on your local machine because you’re using specific versions. Yet, when you share your code with colleagues, it might not work due to their different software versions.
We want to avoid situations like this:
“Dear Lee, Please install these 12 libraries with versions a, b, and c”.
This message can easily be lost in our email chain, leaving the collaborator unable to run the code.
Introducing package manager
The core issue is the difficulty of manually tracking each dependency requirement. Understanding this issue helps us appreciate what a package manager resolves.
In just four lines of code, we can graecfully both handle executing
alice-software.py
and bob-software.py
in just a few lines of commands. The
package manager will
- create a folder (environment) with a specific Python version
- activate the folder and installs Python libraries with specific versions within it
- use the environment (Python and libraries inside the folder) to run
bob-software.py
Python package manager - Conda
Conda is one of the widely used package managers for Python. It is free, well-maintained, and favored by the scientific community.
Prerequisite: download & install
To install, visit https://docs.anaconda.com/free/miniconda/ and follow the listed steps to download and install. The process is straightforward—just copy and paste each line into your command line. It’s always recommended to download the latest version directly from the official website.
After successful installation, confirm by typing conda --version
.
conda --version
# conda 23.11.0
In case of any difficulties during installation, feel free to reach out via email.
Run bob-software.py with Conda
Now, we will run bob-software.py
with Conda.
Step 1. create environment
For simplicity, let’s create an environment called “bob”, as in our hypothetical example. Note that we’re installing Python version 3.12.
conda create -n bob python=3.12
Step 2. activate environment
By activating the environment, you’re instructing your machine to use the Python version and packages installed under “bob”. You can name it as you wish.
conda activate bob
Step 3. download Python libraries
Next, you’ll download the exact version of NumPy required by bob-software.py
.
pip install numpy==1.26.4
Test bob-software.py
Now, let’s check this out!
python bob-software.py
"Good! Your Python version is 3.11 or higher and NumPy version is 1.26.4"
Great, you’ve mastered it. Now, if you want to deactivate or exit from using the
bob
environment, simply type
conda deactivate bob
Test alice-software.py
But now, imagine you have alice-software.py
we have encountered previously.
Alice is your collaborator and has a specific requirement for her code. This
would not work with the current bob
environment since our numpy version is
different in that bob
environment.
python alice-software.py
# FAIL to compile. Check your Python and NumPy versions
It fails as expected if you use the bob
environment since the numpy version is
1.26.4 but Alice requires we use 1.26.3
instead. Now, it’s best to simply
create another environment shown below. Copy the line step by step except the
comment
you can create another environment called alice
with the following lines of
code as we have seen
# exit from the "bob' environment
conda deactivate
# install new env called "allice"
conda create -n alice python=3.12
conda activate alice
# install numpy 1.26.3, recall bob required 1.26.4
pip install numpy==1.26.3
Now, finally run
python alice-software.py
# Good! Your NumPy version is 1.26.3 and your Python version is 3.11 or higher!")
Congratulations! Now, you’ve learned how to create Conda environments and switch between them. If you are interested, keep going! We will be using a real life example in the following.
Bonus: relationship between Conda and pip
What is the relationship between Conda
and pip
? An analogy can clarify this.
Consider Conda
as an island creator and pip
as a ship that brings supplies
(Python packages) from the outside world (the internet) to the island. For
instance, pip
delivers wonderful packages like numpy to the specific Python
v3.11 island.
Real-life example with CIF Cleaner
Here, we will be using one of my Python packages that filters a list of CIF (Crystallographic Information File). A CIF is simply a file that stores information about a crystal. The details about the program are not important. The essence is that this package requires specific versions. Copy each line to your command-line and see if that works!
# clone the software known as "cif-cleaner"
git clone https://github.com/bobleesj/cif-cleaner.git
# go to the folder
cd cif-cleaner
# create an environment called "cif" with the specific Python
conda create -n cif python=3.12
# activate the environment
conda activate cif
# install all the packages (see below)
pip install -r requirements.txt
# run the code
python main.py
When you run python main.py
you should be able to see the following welcome
message
Welcome! Please choose an option to proceed:
[1] Move files based on unsupported CIF format
[2] Move files based on unreasonable distance
[3] Move files based on tags
[4] Copy files based on atomic occupancy and mixing
[5] Get file info in the folder
[6] Check CIF folder content against Excel file
Enter your choice (1-6):
One moment, do you see the requirements.txt
file as shown below?
Well, when you run pip install -r requirements.txt
instead of manually listing
dozens of Python libraries with specific versions, the developer, “Bob” in this
case, has simply laid out what to install. This makes everyone’s life simple and
easy.
Final remarks
How are you? I find it helpful to test my understanding by formulating questions and answering them mentally. I encourage you to gauge your learning with the following questions:
- What is the main challenge in open-source development?
- Can you define a package manager and explain its necessity?
- How can a package manager be used in practice?
- Bonus: Can you describe the step-by-step process of using Conda?
- Extra Bonus: Can you explain the relationship between
Conda
andpip
by using an analogy?
If you have found this tutorial useful for your scientific endeavor, please stay tuned for more by following me on GitHub or simply reaching out for any questions. Let’s keep learning!
Cheatsheet
Here is a list of commands I use frequently.
# create an environment called "bob"
conda create -n bob python=3.12
# activate the "bob" environment
conda activate bob
# install a package with a specific version
pip install numpy==1.26.4
# install a package with a version compatible with Python version
pip install numpy
# install packages in requirements.txt (only if the txt file is available)
pip install -r requirements.txt
# update your Python version in the activated environment
conda install python=3.xx
# list all packages associated with "bob"
conda list -n bob
# list all environments installed on your computer
conda env list
# remove the environment called "bob"
conda env remove --name bob