This is part of the AI Engineering: Zero to Master series on the DAIS YouTube channel. Subscribe to follow every Saturday.
Getting your development environment right from the start is one of the most valuable investments you can make as an AI engineer. A poorly configured environment leads to dependency conflicts, mysterious errors, and hours of wasted debugging time. This first episode walks you through building a solid, professional-grade AI development environment from scratch, covering everything from Python installation to verifying that your GPU is ready for deep learning workloads.
Installing Python 3.11
Python 3.11 is the recommended version for AI and machine learning work in 2024. It offers significant performance improvements over earlier versions and has excellent compatibility with the major AI libraries. Avoid Python 3.12 for now, as some libraries have not fully caught up with it.
Windows
Download the official Python 3.11 installer from python.org. During installation, tick the box that says Add Python to PATH before clicking Install Now. This single step saves considerable frustration later. After installation, open PowerShell and verify the setup:
python --version
# Expected output: Python 3.11.x
pip --version
# Expected output: pip 23.x.x from ... (python 3.11)
If Windows opens the Microsoft Store instead of running Python, go to Settings, search for App execution aliases, and disable the Python aliases there.
macOS
Use Homebrew rather than the python.org installer on macOS. Homebrew manages dependencies cleanly and integrates well with the rest of your toolchain. If you do not have Homebrew installed, run the following in your terminal first:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Then install Python 3.11 directly:
brew install python@3.11
python3.11 --version
Linux (Ubuntu/Debian)
Most Linux distributions ship with Python, but it may not be version 3.11. Use the deadsnakes PPA for a clean installation:
sudo apt update
sudo apt install software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt install python3.11 python3.11-venv python3.11-dev
python3.11 --version
Setting Up VS Code for AI Development
Visual Studio Code is the preferred editor for AI engineering work. It is lightweight, highly extensible, and has outstanding Python and Jupyter support. Download it from code.visualstudio.com and install the following extensions immediately after opening it for the first time.
- Python (Microsoft): provides IntelliSense, linting, and debugging
- Pylance: fast, type-aware language server that catches errors before you run your code
- Jupyter (Microsoft): run Jupyter notebooks directly inside VS Code
- GitLens: makes Git history and blame information visible inline
- Rainbow CSV: colour-codes CSV files, genuinely useful when inspecting datasets
- autoDocstring: generates Python docstring templates automatically
Once extensions are installed, open the Command Palette with Ctrl+Shift+P (or Cmd+Shift+P on macOS) and search for Python: Select Interpreter. This links VS Code to your Python installation and will become important when you create virtual environments.
Virtual Environments with venv and conda
Never install AI libraries into your system Python. Always work inside a virtual environment. This keeps project dependencies isolated and prevents the version conflicts that break AI projects at the worst possible moments.
Using venv
The built-in venv module is sufficient for most projects. Create and activate a new environment like this:
# Create the environment
python3.11 -m venv ai-env
# Activate on Windows (PowerShell)
ai-env\Scripts\Activate.ps1
# Activate on macOS and Linux
source ai-env/bin/activate
# Your prompt should now show (ai-env)
# Verify you are using the right Python
which python # macOS/Linux
where python # Windows
Using conda
Conda is particularly useful for AI work because it handles non-Python dependencies such as CUDA libraries. Install Miniconda (not Anaconda, which is bloated) from conda.io. Once installed:
# Create a new conda environment with Python 3.11
conda create -n ai-master python=3.11
# Activate it
conda activate ai-master
# Verify
python --version
conda info --envs
Use conda environments when working with GPU libraries or when you need precise control over CUDA versions. Use venv for lighter projects where system-level dependencies are not a concern.
Jupyter Notebooks and JupyterLab
Jupyter notebooks are the standard format for exploratory AI work, data analysis, and sharing reproducible experiments. Install JupyterLab, which is the modern interface, rather than the older classic notebook server:
pip install jupyterlab ipykernel
# Register your virtual environment as a Jupyter kernel
python -m ipykernel install --user --name=ai-master --display-name "Python (AI Master)"
# Launch JupyterLab
jupyter lab
JupyterLab will open in your browser. You should see your Python (AI Master) kernel available when creating a new notebook. This ensures the notebook uses the libraries from your virtual environment rather than any system packages.
A practical tip: keep your notebooks short and focused. Each notebook should address a single question or experiment. Long notebooks with hundreds of cells become unmaintainable and are difficult to version control meaningfully.
Git and GitHub for AI Projects
Version control is non-negotiable in professional AI engineering. Models, experiments, and datasets evolve constantly, and Git gives you the ability to track changes, collaborate, and recover from mistakes.
Initial Git Setup
git config --global user.name "Your Name"
git config --global user.email "you@example.com"
git config --global init.defaultBranch main
Creating an AI Project Repository
mkdir my-ai-project
cd my-ai-project
git init
# Create a .gitignore immediately
touch .gitignore
Your .gitignore file for AI projects should always include the following:
# Write these lines into your .gitignore file
__pycache__/
*.pyc
.ipynb_checkpoints/
ai-env/
.env
*.h5
*.pkl
*.pt
*.pth
data/raw/
models/weights/
Never commit large model weights or raw datasets to Git. Use DVC (Data Version Control) for those, which we will cover in a later episode. For now, connect your local repository to GitHub by creating a new repository on github.com, then running:
git remote add origin https://github.com/yourusername/my-ai-project.git
git branch -M main
git push -u origin main
GPU Drivers and CUDA Setup
If you have an NVIDIA GPU, configuring CUDA correctly unlocks dramatically faster training times. This section applies to Windows and Linux. macOS uses Metal Performance Shaders via PyTorch's MPS backend, which does not require CUDA.
Step 1: Install NVIDIA Drivers
Go to nvidia.com/drivers, select your GPU model, and install the latest Game Ready or Studio driver. After installation, verify it is working:
nvidia-smi
# This should display your GPU name, driver version, and CUDA version
Step 2: Install CUDA Toolkit
PyTorch bundles its own CUDA runtime, so you often do not need to install the full CUDA toolkit separately. However, if you plan to compile custom CUDA extensions, install CUDA 11.8 or 12.1 from developer.nvidia.com/cuda-toolkit. Match the version to what PyTorch supports at the time of installation.
Installing Essential Python Libraries
With your environment active, install the core AI and data science stack. This command installs everything you need for the exercises throughout this series:
pip install numpy pandas matplotlib scikit-learn jupyter ipykernel
# Install PyTorch (CPU version for now, GPU version below)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# For NVIDIA GPU support (CUDA 11.8)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
After installation, freeze your dependencies to a requirements file. This allows teammates and your future self to reproduce the environment exactly:
pip freeze > requirements.txt
Verifying the Environment End to End
Run the following script to confirm that every component of your environment is working correctly. Create a new file called verify_env.py and paste this code in:
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
import torch
print("=== Environment Verification ===")
print(f"Python version: {sys.version}")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")
print(f"Scikit-learn version: {sklearn.__version__}")
print(f"PyTorch version: {torch.__version__}")
# Check GPU availability
if torch.cuda.is_available():
print(f"GPU available: {torch.cuda.get_device_name(0)}")
print(f"CUDA version: {torch.version.cuda}")
elif torch.backends.mps.is_available():
print("Apple Silicon GPU (MPS) available")
else:
print("No GPU detected. Running on CPU.")
# Quick NumPy sanity check
arr = np.array([1, 2, 3, 4, 5], dtype=np.float32)
print(f"\nNumPy array mean: {arr.mean()}")
# Quick Pandas sanity check
df = pd.DataFrame({
"feature_a": np.random.randn(100),
"feature_b": np.random.randn(100),
"label": np.random.randint(0, 2, 100)
})
print(f"Pandas DataFrame shape: {df.shape}")
print(df.head(3))
# Quick PyTorch sanity check
tensor = torch.randn(3, 3)
print(f"\nRandom PyTorch tensor:\n{tensor}")
print(f"Tensor device: {tensor.device}")
# Quick Matplotlib sanity check
fig, ax = plt.subplots(figsize=(6, 4))
ax.plot(np.linspace(0, 10, 100), np.sin(np.linspace(0, 10, 100)))
ax.set_title("Environment Verification Plot")
ax.set_xlabel("x")
ax.set_ylabel("sin(x)")
plt.tight_layout()
plt.savefig("verify_plot.png", dpi=100)
print("\nMatplotlib plot saved as verify_plot.png")
print("\n=== All checks passed. Environment is ready. ===")
Run it with python verify_env.py. If you see the success message and a saved PNG file, your environment is fully operational. If any import fails, check that your virtual environment is active and re-run the pip install commands above.
Practical Exercise: Your First AI Data Pipeline
Before moving to Episode 2, complete this short exercise to confirm you can work end to end in your new environment. Open a Jupyter notebook and build a minimal data pipeline using the Iris dataset:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
# Load the dataset
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = iris.target
print("Dataset shape:", X.shape)
print("\nFirst 5 rows:")
print(X.head())
print("\nClass distribution:")
print(pd.Series(y).value_counts())
# Split and scale
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train a simple model
model = LogisticRegression(max_iter=200, random_state=42)
model.fit(X_train_scaled, y_train)
# Evaluate
y_pred = model.predict(X_test_scaled)
accuracy = accuracy_score(y_test, y_pred)
print(f"\nTest accuracy: {accuracy:.4f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))
# Visualise two features
fig, ax = plt.subplots(figsize=(8, 5))
scatter = ax.scatter(
X_test.iloc[:, 0],
X_test.iloc[:, 1],
c=y_pred,
cmap="viridis",
edgecolors="black",
linewidths=0.5,
s=80
)
ax.set_xlabel(iris.feature_names[0])
ax.set_ylabel(iris.feature_names[1])
ax.set_title("Logistic Regression Predictions on Iris Test Set")
plt.colorbar(scatter, ax=ax, label="Predicted Class")
plt.tight_layout()
plt.savefig("iris_predictions.png", dpi=100)
plt.show()
print("Plot saved.")
If this runs without errors and produces a scatter plot showing the predicted classes, you have successfully built and validated a complete AI development environment. This is the foundation on which every subsequent episode in this series will build.
Key Takeaways
- Always use Python 3.11 for AI work until the ecosystem has fully adopted 3.12. Use Homebrew on macOS, deadsnakes PPA on Linux, and the official installer with PATH enabled on Windows.
- Virtual environments are mandatory. Use
venvfor standard projects andcondawhen CUDA or complex binary dependencies are involved. Never install libraries into system Python. - VS Code with the Python and Jupyter extensions provides a professional, integrated workflow that scales from small scripts to large research projects.
- Register your virtual environment as a Jupyter kernel using
ipykernelto ensure notebooks use the correct Python interpreter and installed libraries. - Set up Git immediately on every project and commit a meaningful
.gitignorebefore writing a single line of code. Keep large files like model weights and datasets out of version control. - Verify your GPU setup by checking
torch.cuda.is_available(). On Apple Silicon, usetorch.backends.mps.is_available()instead. - Run the verification script at the end of any environment setup to confirm all libraries import correctly before starting real work.