Frequently Asked Questions
General Questions
What is deep-ml?
deep-ml is a high-level PyTorch training framework that simplifies deep learning workflows for computer vision tasks. It provides ready-to-use trainers, task implementations, and experiment tracking.
Why use deep-ml instead of pure PyTorch?
Less boilerplate: No need to write training loops, device management, distributed training setup
Best practices: Gradient accumulation, clipping, mixed precision built-in
Experiment tracking: Seamless TensorBoard, MLflow, and wandb integration
Reproducibility: Consistent checkpoint management and state restoration
How does deep-ml compare to PyTorch Lightning?
deep-ml is simpler and more focused:
Smaller API surface
Focused on computer vision
Multiple backend options (Fabric, Accelerate)
Less opinionated about code structure
PyTorch Lightning is more comprehensive but has a steeper learning curve.
Installation & Setup
What Python version is required?
Python 3.8 or higher is required.
Do I need CUDA for deep-ml?
No, deep-ml works on CPU, but training will be slower. For GPU training, install PyTorch with CUDA support before installing deep-ml.
Can I use deep-ml with Apple Silicon (M1/M2)?
Yes! Use device='mps' or accelerator='mps' to leverage Apple Silicon GPU:
trainer = FabricTrainer(
...,
accelerator='mps'
)
Training
How do I resume training from a checkpoint?
trainer.fit(
...,
resume_from_checkpoint='./checkpoints/epoch_50_model.pt',
load_optimizer_state=True,
load_scheduler_state=True
)
How do I use mixed precision training?
# FabricTrainer
trainer = FabricTrainer(
...,
precision='16-mixed' # or 'bf16-mixed'
)
# AcceleratorTrainer
trainer = AcceleratorTrainer(
...,
accelerator_config={'mixed_precision': 'fp16'}
)
How do I implement gradient accumulation?
trainer.fit(
...,
gradient_accumulation_steps=4 # Effective batch = 4 * batch_size
)
My training is slow. How can I speed it up?
Use mixed precision:
precision='16-mixed'Increase batch size: Larger batches = fewer iterations
Use multiple workers:
num_workers=4in DataLoaderEnable pin_memory:
pin_memory=Truein DataLoaderUse gradient accumulation: Instead of increasing batch size
Profile your code: Identify bottlenecks
# Fast DataLoader configuration
loader = DataLoader(
dataset,
batch_size=64, # As large as GPU memory allows
shuffle=True,
num_workers=4, # Parallel data loading
pin_memory=True, # Faster GPU transfer
persistent_workers=True # Keep workers alive
)
How do I handle class imbalance?
# Option 1: Weighted loss
from deepml.losses import WeightedBCEWithLogitsLoss
criterion = WeightedBCEWithLogitsLoss(w_p=10.0, w_n=1.0)
# Option 2: Class weights
class_weights = torch.tensor([1.0, 10.0])
criterion = torch.nn.CrossEntropyLoss(weight=class_weights)
# Option 3: Weighted sampling
from torch.utils.data import WeightedRandomSampler
samples_weight = compute_sample_weights(train_dataset)
sampler = WeightedRandomSampler(samples_weight, len(samples_weight))
train_loader = DataLoader(train_dataset, batch_size=32, sampler=sampler)
Distributed Training
How do I train on multiple GPUs?
trainer = FabricTrainer(
...,
accelerator='gpu',
strategy='ddp', # DistributedDataParallel
devices='auto' # Use all available GPUs
)
Can I train across multiple machines?
Yes, with FabricTrainer:
trainer = FabricTrainer(
...,
accelerator='gpu',
strategy='ddp',
devices=4, # GPUs per node
num_nodes=2 # Number of machines
)
Run on each node:
# Node 0 (master)
fabric run --node-rank=0 --num-nodes=2 --main-address=192.168.1.1 train.py
# Node 1
fabric run --node-rank=1 --num-nodes=2 --main-address=192.168.1.1 train.py
What’s the difference between DP and DDP?
DP (DataParallel): Single process, thread-based, slower, easier to debug
DDP (DistributedDataParallel): Multi-process, faster, recommended
Use DDP for training, DP for quick debugging.
Data & Datasets
How do I use custom datasets?
Inherit from torch.utils.data.Dataset:
class MyDataset(torch.utils.data.Dataset):
def __init__(self, ...):
# Load data
pass
def __len__(self):
return num_samples
def __getitem__(self, idx):
# Load and return (image, label)
return image, label
dataset = MyDataset(...)
loader = DataLoader(dataset, batch_size=32)
How do I handle large datasets?
Use streaming: Load data on-the-fly, don’t load all into memory
Use multiple workers: Parallel data loading
Use pin_memory: Faster GPU transfers
Consider data format: Use efficient formats (LMDB, HDF5, WebDataset)
from torch.utils.data import IterableDataset
class StreamingDataset(IterableDataset):
def __iter__(self):
# Stream data from disk/network
for sample in data_source:
yield preprocess(sample)
How do I apply data augmentation?
Use torchvision transforms or Albumentations:
# torchvision (for classification)
from torchvision import transforms
transform = transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor()
])
# Albumentations (for segmentation)
import albumentations as A
from albumentations.pytorch import ToTensorV2
transform = A.Compose([
A.Resize(512, 512),
A.HorizontalFlip(p=0.5),
A.Normalize(),
ToTensorV2()
])
Models
Can I use any PyTorch model?
Yes! deep-ml works with any torch.nn.Module:
# torchvision models
from torchvision.models import resnet50
# timm models
import timm
model = timm.create_model('efficientnet_b0', pretrained=True)
# Custom models
class MyModel(torch.nn.Module):
...
# All work with deep-ml
task = ImageClassification(model=model, ...)
How do I use a pre-trained model?
from torchvision.models import resnet50, ResNet50_Weights
model = resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)
# Replace classifier for your number of classes
model.fc = torch.nn.Linear(model.fc.in_features, num_classes)
How do I freeze layers?
# Freeze all layers
for param in model.parameters():
param.requires_grad = False
# Unfreeze specific layers
for param in model.fc.parameters():
param.requires_grad = True
# Create optimizer only for trainable parameters
optimizer = torch.optim.Adam(
filter(lambda p: p.requires_grad, model.parameters()),
lr=1e-3
)
Errors & Debugging
CUDA out of memory error
Solutions:
Reduce batch size
Use gradient accumulation:
trainer.fit(..., gradient_accumulation_steps=4)
Use mixed precision:
trainer = FabricTrainer(..., precision='16-mixed')
Clear cache:
torch.cuda.empty_cache()
Check for memory leaks: Don’t accumulate tensors in lists during training
Validation loss is NaN
Possible causes:
Learning rate too high: Reduce it
Gradient explosion: Use gradient clipping
Numerical instability: Use
BCEWithLogitsLossinstead ofBCELossInvalid inputs: Check for NaN/Inf in data
# Add gradient clipping
trainer.fit(
...,
gradient_clip_max_norm=1.0
)
# Check data
for x, y in train_loader:
assert not torch.isnan(x).any()
assert not torch.isinf(x).any()
break
Model not learning (loss not decreasing)
Check:
Learning rate: Try different values (1e-4 to 1e-2)
Model frozen: Ensure layers are trainable
Loss function: Correct for your task?
Data preprocessing: Normalized correctly?
Batch size: Not too small?
# Debug: check gradient flow
for name, param in model.named_parameters():
if param.grad is not None:
print(f"{name}: {param.grad.abs().mean()}")
Performance
How many epochs should I train?
Depends on:
Dataset size: Smaller → more epochs needed
Model complexity: Larger → more epochs
Learning rate: Lower → more epochs
General guidelines:
Small datasets (<10K images): 100-500 epochs
Medium datasets (10K-100K): 50-100 epochs
Large datasets (>100K): 20-50 epochs
Use early stopping:
patience = 10
best_loss = float('inf')
epochs_without_improvement = 0
for epoch in range(max_epochs):
trainer.fit(..., epochs=1)
if trainer.best_val_loss < best_loss:
best_loss = trainer.best_val_loss
epochs_without_improvement = 0
else:
epochs_without_improvement += 1
if epochs_without_improvement >= patience:
print("Early stopping!")
break
What learning rate should I use?
Use learning rate finder:
from torch_lr_finder import LRFinder
lr_finder = LRFinder(model, optimizer, criterion)
lr_finder.range_test(train_loader, end_lr=1, num_iter=100)
lr_finder.plot()
lr_finder.reset()
General guidelines:
Adam/AdamW: 1e-3 to 1e-4
SGD: 1e-1 to 1e-2
Fine-tuning: 1e-4 to 1e-5
Compatibility
What PyTorch version is required?
PyTorch 1.12 or higher. Latest version recommended.
Does deep-ml work with torch.compile?
Yes! Available in PyTorch 2.0+:
model = torch.compile(model)
task = ImageClassification(model=model, ...)
trainer.fit(...)
Can I use deep-ml with other libraries?
Yes:
timm:
import timm; model = timm.create_model(...)transformers: Works with Vision Transformers
segmentation-models-pytorch: Pre-built segmentation models
albumentations: For data augmentation
torchmetrics: Additional metrics
Getting Help
Where can I get help?
Check this documentation
Review examples in the repository
Open an issue on GitHub
Check existing issues for solutions
How do I report a bug?
Open an issue on GitHub with:
Minimal reproducible example
Error message and stack trace
Environment info (Python, PyTorch, deep-ml versions)
Expected vs actual behavior
How do I request a feature?
Open a feature request on GitHub describing:
Use case
Proposed API
Why it’s useful
Are you willing to contribute?
Contributing
See Contributing for guidelines on contributing to deep-ml.