Optimization Techniques Summary¶
Quick Reference: Where do different techniques apply in CF-Ensemble?
The Three Techniques¶
1. Label-Aware Confidence ⚙️¶
- Purpose: Approximate supervision in closed-form ALS
- Method: Modulates confidence matrix C based on label agreement
- Parameters: X, Y (latent factors)
- Applies to: ALS only
2. Class-Weighted Gradients 📊¶
- Purpose: Balance class contributions in gradient descent
- Method: Weight instances by inverse class frequency
- Parameters: w, b (aggregator) in ALS; all parameters in PyTorch
- Applies to: Wherever we use gradient descent
3. Focal Loss 🎯¶
- Purpose: Focus on hard examples, down-weight easy ones
- Method: Weight instances by \((1-p_t)^\gamma\)
- Parameters: w, b (aggregator) in ALS; all parameters in PyTorch
- Applies to: Wherever we use gradient descent
Quick Decision Guide¶
"Which parameters should I set?"¶
Using ALS Trainer?
├─ Is your data imbalanced? (e.g., 10% positive)
│ ├─ YES → use_label_aware_confidence=True ✅ (for X, Y)
│ │ use_class_weights=True ✅ (for w, b)
│ └─ NO → Can use defaults (both are safe to enable)
│
└─ Do you have easy/hard example variance? (high disagreement)
├─ YES → focal_gamma=2.0 ✅ (for w, b)
└─ NO → focal_gamma=0.0 (default)
Using PyTorch Trainer?
├─ Is your data imbalanced?
│ ├─ YES → use_class_weights=True ✅ (for all parameters)
│ └─ NO → Can use default (safe to enable)
│
└─ Do you have easy/hard example variance?
├─ YES → focal_gamma=2.0 ✅ (for all parameters)
└─ NO → focal_gamma=0.0 (default)
Optimization Method Comparison¶
| Component | ALS Method | PyTorch Method |
|---|---|---|
| Latent factors (X, Y) | Closed-form ALS fast, approximate |
Gradient descent slow, exact |
| Aggregator (w, b) | Gradient descent iterative |
Gradient descent iterative |
| Supervision for X, Y | Label-aware confidence approximate |
Direct gradients exact |
| Supervision for w, b | Direct BCE loss exact |
Direct BCE loss exact |
Where Each Technique Applies¶
Quick Reference Table¶
| Technique | ALS: X, Y | ALS: w, b | PyTorch: All |
|---|---|---|---|
| Label-aware confidence | ✅ Yes | ❌ No | ❌ No |
| Class-weighted gradients | ❌ No | ✅ Yes | ✅ Yes |
| Focal loss | ❌ No | ✅ Yes | ✅ Yes |
Why This Pattern?¶
# ALS: Hybrid approach
for iteration in range(max_iter):
X = closed_form_solution(Y, R, C_label_aware, λ) # Uses label-aware conf
Y = closed_form_solution(X, R, C_label_aware, λ) # Uses label-aware conf
w, b = gradient_descent(X, Y, labels, class_weights, focal) # Uses class + focal
# PyTorch: Unified approach
for epoch in range(max_epochs):
loss = reconstruction + supervised_with_class_weights_and_focal
loss.backward() # Gradients flow to ALL parameters
optimizer.step() # Updates X, Y, w, b together
Recommended Configurations¶
For Imbalanced Data (e.g., 10% positive)¶
ALS (recommended for speed):
trainer = CFEnsembleTrainer(
n_classifiers=10,
latent_dim=20,
rho=0.5,
use_label_aware_confidence=True, # Handle imbalance in X, Y
use_class_weights=True, # Handle imbalance in w, b
focal_gamma=0.0 # Optional: add if needed
)
PyTorch (recommended for accuracy):
trainer = CFEnsemblePyTorchTrainer(
n_classifiers=10,
latent_dim=20,
rho=0.5,
use_class_weights=True, # Handle imbalance in all parameters
focal_gamma=0.0 # Optional: add if needed
)
For Imbalanced + High Disagreement¶
ALS:
trainer = CFEnsembleTrainer(
use_label_aware_confidence=True, # Imbalance in X, Y
use_class_weights=True, # Imbalance in w, b
focal_gamma=2.0 # Hard examples in w, b
)
PyTorch:
trainer = CFEnsemblePyTorchTrainer(
use_class_weights=True, # Imbalance everywhere
focal_gamma=2.0 # Hard examples everywhere
)
Common Misconceptions¶
❌ "Class weighting applies to all parameters in ALS"¶
Wrong! Class-weighted gradients only apply to the aggregator (w, b) in ALS. The latent factors (X, Y) use closed-form solutions (no gradients).
Correct: ALS uses label-aware confidence for X, Y and class weighting for w, b.
❌ "Label-aware confidence applies to PyTorch"¶
Wrong! Label-aware confidence is an ALS-specific approximation trick. PyTorch has exact gradients and doesn't need it.
Correct: PyTorch uses class-weighted loss for all parameters, no approximation needed.
❌ "Focal loss applies to latent factors in ALS"¶
Wrong! Focal loss requires gradients. ALS updates X, Y with closed-form solutions (no gradients).
Correct: Focal loss only applies to the aggregator (w, b) in ALS, or all parameters in PyTorch.
Technical Deep Dive¶
Why Can't We Apply Class Weighting to ALS Updates?¶
ALS uses closed-form solutions that directly compute the optimal X, Y:
This is solved via: $\(X = (YC^TY^T + \lambda I)^{-1}YC^TR^T\)$
There are no gradients here! It's a direct matrix equation. We can't "weight" the solution because it's already optimal for the given C matrix.
Instead: We modulate C itself (via label-aware weighting) to incorporate supervision.
Why Does PyTorch Apply Weighting Everywhere?¶
PyTorch uses gradient descent for all parameters:
where \(\theta = \{X, Y, w, b\}\) are all the parameters.
The loss function: $\(L = \rho \cdot L_{\text{recon}} + (1-\rho) \cdot L_{\text{sup}}^{\text{weighted}}\)$
When we apply class weighting to \(L_{\text{sup}}\), the gradients flow back to all parameters via backpropagation: - \(\nabla_X L_{\text{sup}}\) ← affected by class weighting - \(\nabla_Y L_{\text{sup}}\) ← affected by class weighting - \(\nabla_w L_{\text{sup}}\) ← affected by class weighting - \(\nabla_b L_{\text{sup}}\) ← affected by class weighting
See Also¶
- Class-Weighted Gradients - Full documentation
- Focal Loss - Full documentation
- ALS Mathematical Derivation - Label-aware confidence
- ALS vs PyTorch - Detailed comparison
Last Updated: 2026-01-25
Status: Reference document for technique applicability