MATLAB to Python Migration Guide
This guide helps MATLAB users transition from the original MATLAB SUBMARIT implementation to the Python version.
Note
The original MATLAB implementation was developed by Stephen France (Mississippi State University) and other contributors. This Python implementation maintains compatibility while offering modern improvements and performance optimizations.
Function Mapping Table
Core Functions
MATLAB Function |
Python Equivalent |
Notes |
|---|---|---|
|
|
Identical interface |
|
|
Object-oriented API |
|
|
Returns dict instead of struct |
|
|
Parameter name change |
|
|
Class-based approach |
Data Structure Conversions
MATLAB |
Python |
Conversion Example |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Common Operations
MATLAB |
Python |
Notes |
|---|---|---|
|
|
Returns tuple |
|
|
Use |
|
|
|
|
|
|
|
|
0-based indexing |
|
|
|
|
|
Specify axis |
|
|
MATLAB uses ddof=1 |
Code Conversion Examples
Example 1: Basic Substitution Matrix Creation
MATLAB:
% Load data
data = readtable('products.csv');
X = table2array(data(:, 2:end));
% Create substitution matrix
S = create_substitution_matrix(X, 'metric', 'euclidean');
% Cluster
options.maxiter = 100;
options.nrestarts = 10;
[clusters, obj] = local_search(S, 5, options);
Python:
# Load data
import pandas as pd
import numpy as np
from submarit.core import create_substitution_matrix
from submarit.algorithms import LocalSearch
data = pd.read_csv('products.csv')
X = data.iloc[:, 1:].values # All columns except first
# Create substitution matrix
S = create_substitution_matrix(X, metric='euclidean')
# Cluster
ls = LocalSearch(n_clusters=5, max_iter=100, n_restarts=10)
clusters = ls.fit_predict(S)
obj = ls.objective_
Example 2: Evaluation and Visualization
MATLAB:
% Evaluate clustering
metrics = evaluate_clusters(S, clusters);
fprintf('Silhouette: %.3f\n', metrics.silhouette);
% Visualize
figure;
imagesc(S);
colorbar;
title('Substitution Matrix');
% Plot sorted matrix
[sorted_S, idx] = sort_matrix_by_clusters(S, clusters);
figure;
imagesc(sorted_S);
Python:
# Evaluate clustering
from submarit.evaluation import ClusterEvaluator
evaluator = ClusterEvaluator()
metrics = evaluator.evaluate(S, clusters)
print(f"Silhouette: {metrics['silhouette']:.3f}")
# Visualize
import matplotlib.pyplot as plt
from submarit.evaluation.visualization import plot_substitution_matrix
plt.figure(figsize=(10, 8))
plt.imshow(S, cmap='viridis')
plt.colorbar()
plt.title('Substitution Matrix')
plt.show()
# Plot sorted matrix
fig, ax = plt.subplots(figsize=(10, 8))
plot_substitution_matrix(S, clusters, ax=ax)
plt.show()
Example 3: Cross-Validation
MATLAB:
% K-fold cross-validation
nfolds = 5;
scores = zeros(nfolds, 1);
for i = 1:nfolds
[train_idx, test_idx] = get_fold_indices(size(X, 1), nfolds, i);
X_train = X(train_idx, :);
X_test = X(test_idx, :);
% Train and evaluate
S_train = create_substitution_matrix(X_train);
clusters_train = local_search(S_train, 5);
score = evaluate_fold(X_test, clusters_train);
scores(i) = score;
end
fprintf('CV Score: %.3f ± %.3f\n', mean(scores), std(scores));
Python:
# K-fold cross-validation
from sklearn.model_selection import KFold
from submarit.validation import KFoldValidator
# Method 1: Using SUBMARIT's validator
validator = KFoldValidator(n_splits=5)
scores = validator.validate(X, n_clusters=5)
print(f"CV Score: {np.mean(scores):.3f} ± {np.std(scores):.3f}")
# Method 2: Manual implementation (similar to MATLAB)
kf = KFold(n_splits=5, shuffle=True, random_state=42)
scores = []
for train_idx, test_idx in kf.split(X):
X_train = X[train_idx]
X_test = X[test_idx]
# Train and evaluate
S_train = create_substitution_matrix(X_train)
ls = LocalSearch(n_clusters=5)
clusters_train = ls.fit_predict(S_train)
score = evaluate_fold(X_test, clusters_train)
scores.append(score)
print(f"CV Score: {np.mean(scores):.3f} ± {np.std(scores):.3f}")
Common Pitfalls and Solutions
1. Indexing Differences
MATLAB (1-based):
X(1, 1) % First element
X(end, :) % Last row
X(2:5, :) % Rows 2-5
Python (0-based):
X[0, 0] # First element
X[-1, :] # Last row
X[1:5, :] # Rows 2-5 (exclusive end)
2. Broadcasting Behavior
MATLAB:
A = [1; 2; 3]; % Column vector
B = [4, 5, 6]; % Row vector
C = A + B; % Error in MATLAB
Python:
A = np.array([[1], [2], [3]]) # Column vector
B = np.array([4, 5, 6]) # Row vector
C = A + B # Broadcasting works!
3. Function Return Values
MATLAB:
[U, S, V] = svd(X); % Multiple outputs
[~, idx] = max(x); % Ignore first output
Python:
U, S, V = np.linalg.svd(X) # Multiple outputs
idx = np.argmax(x) # Direct function for index
4. Default Random State
MATLAB:
rng(42); % Set random seed
x = rand(100, 1);
Python:
np.random.seed(42) # Set random seed
x = np.random.rand(100, 1)
# Better: use RandomState
rng = np.random.RandomState(42)
x = rng.rand(100, 1)
Numerical Differences
Precision and Tolerance
# MATLAB and Python may have different default tolerances
# Be explicit about tolerances
# MATLAB: eps
# Python equivalent:
eps = np.finfo(float).eps
# For algorithms
ls = LocalSearch(n_clusters=5, tol=1e-6) # Specify tolerance
Linear Algebra Differences
# MATLAB uses LAPACK/BLAS, Python uses NumPy's version
# Results may differ slightly
# For exact reproducibility
import scipy.linalg
# Use same backend as MATLAB
eigenvalues = scipy.linalg.eigh(S, driver='ev')
MATLAB Integration
Using MATLAB Engine
import matlab.engine
# Start MATLAB engine
eng = matlab.engine.start_matlab()
# Call MATLAB functions from Python
matlab_result = eng.your_matlab_function(data)
# Convert to Python
python_result = np.array(matlab_result)
# Stop engine
eng.quit()
Loading MATLAB Files
from scipy.io import loadmat, savemat
# Load .mat file
mat_data = loadmat('data.mat')
X = mat_data['X']
clusters = mat_data['clusters'].squeeze() # Remove singleton dimensions
# Save to .mat file
savemat('results.mat', {
'clusters': clusters,
'metrics': metrics,
'S': S
})
Performance Comparison
Operation |
MATLAB |
Python (NumPy) |
|---|---|---|
Matrix multiplication |
Very fast (MKL) |
Fast (OpenBLAS/MKL) |
For loops |
Slow |
Very slow (use vectorization) |
Memory usage |
Copy-on-write |
Views when possible |
Parallel computing |
Parallel Computing Toolbox |
multiprocessing/joblib |
GPU support |
GPU Computing Toolbox |
CuPy/PyTorch/TensorFlow |
Best Practices for Migration
Start with small examples - Verify numerical equivalence
Use MATLAB compatibility layer during transition:
from submarit.utils.matlab_compat import matlab_style_api # Use MATLAB-like interface S = matlab_style_api.create_substitution_matrix(X)
Validate results against MATLAB output:
# Load MATLAB results matlab_results = loadmat('matlab_results.mat') # Compare np.testing.assert_allclose( python_clusters, matlab_results['clusters'].squeeze(), rtol=1e-5 )
Profile both implementations to ensure performance parity
Document any numerical differences for your team