MATLAB to Python Migration Guide

This guide helps MATLAB users transition from the original MATLAB SUBMARIT implementation to the Python version.

Note

The original MATLAB implementation was developed by Stephen France (Mississippi State University) and other contributors. This Python implementation maintains compatibility while offering modern improvements and performance optimizations.

Function Mapping Table

Core Functions

Core Function Mappings

MATLAB Function

Python Equivalent

Notes

create_substitution_matrix(X)

submarit.core.create_substitution_matrix(X)

Identical interface

local_search(S, k, options)

LocalSearch(n_clusters=k, **options).fit_predict(S)

Object-oriented API

evaluate_clusters(S, clusters)

ClusterEvaluator().evaluate(S, clusters)

Returns dict instead of struct

gap_statistic(S, k, B)

gap_statistic(S, k, n_bootstrap=B)

Parameter name change

kfold_validation(X, k, nfolds)

KFoldValidator(n_splits=nfolds).validate(X, k)

Class-based approach

Data Structure Conversions

Data Structure Mappings

MATLAB

Python

Conversion Example

matrix

numpy.ndarray

X = np.array(matlab_matrix)

cell array

list or numpy.object_

names = list(cell_array)

struct

dict or namedtuple

params = dict(matlab_struct)

table

pandas.DataFrame

df = pd.DataFrame(table_data)

sparse matrix

scipy.sparse

S = scipy.sparse.csr_matrix(S_matlab)

Common Operations

Operation Mappings

MATLAB

Python

Notes

size(X)

X.shape

Returns tuple

length(x)

len(x) or x.size

Use len for 1D

X'

X.T or X.transpose()

X(:)

X.flatten() or X.ravel()

X(1:10, :)

X[0:10, :] or X[:10, :]

0-based indexing

find(X > 0)

np.where(X > 0) or np.argwhere(X > 0)

mean(X)

np.mean(X, axis=0)

Specify axis

std(X)

np.std(X, axis=0, ddof=1)

MATLAB uses ddof=1

Code Conversion Examples

Example 1: Basic Substitution Matrix Creation

MATLAB:

% Load data
data = readtable('products.csv');
X = table2array(data(:, 2:end));

% Create substitution matrix
S = create_substitution_matrix(X, 'metric', 'euclidean');

% Cluster
options.maxiter = 100;
options.nrestarts = 10;
[clusters, obj] = local_search(S, 5, options);

Python:

# Load data
import pandas as pd
import numpy as np
from submarit.core import create_substitution_matrix
from submarit.algorithms import LocalSearch

data = pd.read_csv('products.csv')
X = data.iloc[:, 1:].values  # All columns except first

# Create substitution matrix
S = create_substitution_matrix(X, metric='euclidean')

# Cluster
ls = LocalSearch(n_clusters=5, max_iter=100, n_restarts=10)
clusters = ls.fit_predict(S)
obj = ls.objective_

Example 2: Evaluation and Visualization

MATLAB:

% Evaluate clustering
metrics = evaluate_clusters(S, clusters);
fprintf('Silhouette: %.3f\n', metrics.silhouette);

% Visualize
figure;
imagesc(S);
colorbar;
title('Substitution Matrix');

% Plot sorted matrix
[sorted_S, idx] = sort_matrix_by_clusters(S, clusters);
figure;
imagesc(sorted_S);

Python:

# Evaluate clustering
from submarit.evaluation import ClusterEvaluator

evaluator = ClusterEvaluator()
metrics = evaluator.evaluate(S, clusters)
print(f"Silhouette: {metrics['silhouette']:.3f}")

# Visualize
import matplotlib.pyplot as plt
from submarit.evaluation.visualization import plot_substitution_matrix

plt.figure(figsize=(10, 8))
plt.imshow(S, cmap='viridis')
plt.colorbar()
plt.title('Substitution Matrix')
plt.show()

# Plot sorted matrix
fig, ax = plt.subplots(figsize=(10, 8))
plot_substitution_matrix(S, clusters, ax=ax)
plt.show()

Example 3: Cross-Validation

MATLAB:

% K-fold cross-validation
nfolds = 5;
scores = zeros(nfolds, 1);

for i = 1:nfolds
    [train_idx, test_idx] = get_fold_indices(size(X, 1), nfolds, i);
    X_train = X(train_idx, :);
    X_test = X(test_idx, :);

    % Train and evaluate
    S_train = create_substitution_matrix(X_train);
    clusters_train = local_search(S_train, 5);

    score = evaluate_fold(X_test, clusters_train);
    scores(i) = score;
end

fprintf('CV Score: %.3f ± %.3f\n', mean(scores), std(scores));

Python:

# K-fold cross-validation
from sklearn.model_selection import KFold
from submarit.validation import KFoldValidator

# Method 1: Using SUBMARIT's validator
validator = KFoldValidator(n_splits=5)
scores = validator.validate(X, n_clusters=5)
print(f"CV Score: {np.mean(scores):.3f} ± {np.std(scores):.3f}")

# Method 2: Manual implementation (similar to MATLAB)
kf = KFold(n_splits=5, shuffle=True, random_state=42)
scores = []

for train_idx, test_idx in kf.split(X):
    X_train = X[train_idx]
    X_test = X[test_idx]

    # Train and evaluate
    S_train = create_substitution_matrix(X_train)
    ls = LocalSearch(n_clusters=5)
    clusters_train = ls.fit_predict(S_train)

    score = evaluate_fold(X_test, clusters_train)
    scores.append(score)

print(f"CV Score: {np.mean(scores):.3f} ± {np.std(scores):.3f}")

Common Pitfalls and Solutions

1. Indexing Differences

MATLAB (1-based):

X(1, 1)      % First element
X(end, :)    % Last row
X(2:5, :)    % Rows 2-5

Python (0-based):

X[0, 0]      # First element
X[-1, :]     # Last row
X[1:5, :]    # Rows 2-5 (exclusive end)

2. Broadcasting Behavior

MATLAB:

A = [1; 2; 3];  % Column vector
B = [4, 5, 6];  % Row vector
C = A + B;      % Error in MATLAB

Python:

A = np.array([[1], [2], [3]])  # Column vector
B = np.array([4, 5, 6])         # Row vector
C = A + B                       # Broadcasting works!

3. Function Return Values

MATLAB:

[U, S, V] = svd(X);  % Multiple outputs
[~, idx] = max(x);   % Ignore first output

Python:

U, S, V = np.linalg.svd(X)  # Multiple outputs
idx = np.argmax(x)           # Direct function for index

4. Default Random State

MATLAB:

rng(42);  % Set random seed
x = rand(100, 1);

Python:

np.random.seed(42)  # Set random seed
x = np.random.rand(100, 1)

# Better: use RandomState
rng = np.random.RandomState(42)
x = rng.rand(100, 1)

Numerical Differences

Precision and Tolerance

# MATLAB and Python may have different default tolerances
# Be explicit about tolerances

# MATLAB: eps
# Python equivalent:
eps = np.finfo(float).eps

# For algorithms
ls = LocalSearch(n_clusters=5, tol=1e-6)  # Specify tolerance

Linear Algebra Differences

# MATLAB uses LAPACK/BLAS, Python uses NumPy's version
# Results may differ slightly

# For exact reproducibility
import scipy.linalg

# Use same backend as MATLAB
eigenvalues = scipy.linalg.eigh(S, driver='ev')

MATLAB Integration

Using MATLAB Engine

import matlab.engine

# Start MATLAB engine
eng = matlab.engine.start_matlab()

# Call MATLAB functions from Python
matlab_result = eng.your_matlab_function(data)

# Convert to Python
python_result = np.array(matlab_result)

# Stop engine
eng.quit()

Loading MATLAB Files

from scipy.io import loadmat, savemat

# Load .mat file
mat_data = loadmat('data.mat')
X = mat_data['X']
clusters = mat_data['clusters'].squeeze()  # Remove singleton dimensions

# Save to .mat file
savemat('results.mat', {
    'clusters': clusters,
    'metrics': metrics,
    'S': S
})

Performance Comparison

Performance Characteristics

Operation

MATLAB

Python (NumPy)

Matrix multiplication

Very fast (MKL)

Fast (OpenBLAS/MKL)

For loops

Slow

Very slow (use vectorization)

Memory usage

Copy-on-write

Views when possible

Parallel computing

Parallel Computing Toolbox

multiprocessing/joblib

GPU support

GPU Computing Toolbox

CuPy/PyTorch/TensorFlow

Best Practices for Migration

  1. Start with small examples - Verify numerical equivalence

  2. Use MATLAB compatibility layer during transition:

    from submarit.utils.matlab_compat import matlab_style_api
    
    # Use MATLAB-like interface
    S = matlab_style_api.create_substitution_matrix(X)
    
  3. Validate results against MATLAB output:

    # Load MATLAB results
    matlab_results = loadmat('matlab_results.mat')
    
    # Compare
    np.testing.assert_allclose(
        python_clusters,
        matlab_results['clusters'].squeeze(),
        rtol=1e-5
    )
    
  4. Profile both implementations to ensure performance parity

  5. Document any numerical differences for your team

Additional Resources