Contributing Guide
Thank you for your interest in contributing to SUBMARIT! This guide will help you get started.
Note
This is the Sphinx documentation version of our contributing guide. For the most up-to-date version, please see the CONTRIBUTING.md file in the repository.
Getting Started
Development Setup
Fork and clone the repository:
git clone https://github.com/m-marinucci/SUBMARIT.git
cd SUBMARIT
Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
Install in development mode:
pip install -e ".[dev]"
Install pre-commit hooks:
pre-commit install
Development Dependencies
The [dev] extras include:
Testing: pytest, pytest-cov, pytest-benchmark
Linting: flake8, black, isort, mypy
Documentation: sphinx, sphinx-rtd-theme
Profiling: memory-profiler, line-profiler
Code Style
We follow PEP 8 with these specifics:
Line length: 88 characters (Black default)
Use type hints for function signatures
Write docstrings for all public functions
Example:
from typing import Optional, Tuple
import numpy as np
def calculate_metrics(
data: np.ndarray,
clusters: np.ndarray,
metric: str = "euclidean"
) -> Tuple[float, float]:
"""Calculate clustering metrics.
Parameters
----------
data : np.ndarray
Input data matrix of shape (n_samples, n_features).
clusters : np.ndarray
Cluster assignments of shape (n_samples,).
metric : str, optional
Distance metric to use, by default "euclidean".
Returns
-------
Tuple[float, float]
Silhouette score and Davies-Bouldin index.
Examples
--------
>>> X = np.random.rand(100, 10)
>>> clusters = np.random.randint(0, 5, 100)
>>> sil, db = calculate_metrics(X, clusters)
"""
# Implementation here
pass
Testing
Writing Tests
All new features must include tests:
# tests/test_new_feature.py
import pytest
import numpy as np
from submarit.new_module import new_function
class TestNewFeature:
def test_basic_functionality(self):
"""Test basic use case."""
result = new_function([1, 2, 3])
assert result == expected_value
def test_edge_cases(self):
"""Test edge cases."""
with pytest.raises(ValueError):
new_function([])
@pytest.mark.parametrize("input,expected", [
([1, 2], 3),
([0, 0], 0),
([-1, 1], 0),
])
def test_various_inputs(self, input, expected):
"""Test various input combinations."""
assert new_function(input) == expected
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=submarit --cov-report=html
# Run specific test file
pytest tests/test_algorithms.py
# Run benchmarks
pytest benchmarks/ --benchmark-only
Documentation
Writing Documentation
Docstrings: Use NumPy style docstrings
User Guide: Update relevant .rst files in docs/source/
Examples: Add to docstring examples or create notebook
Building Documentation
cd docs
make clean
make html
# View at docs/build/html/index.html
Adding Examples
Create Jupyter notebooks in examples/:
# examples/new_feature_demo.ipynb
"""
# New Feature Demo
This notebook demonstrates the new feature.
"""
import numpy as np
from submarit import new_feature
# Step-by-step demonstration
# ...
Pull Request Process
1. Create a Feature Branch
git checkout -b feature/your-feature-name
2. Make Your Changes
Write code following style guidelines
Add tests for new functionality
Update documentation
Add entry to CHANGELOG.md
3. Commit Your Changes
Use clear, descriptive commit messages:
git add .
git commit -m "Add new clustering metric
- Implement Dunn index calculation
- Add tests for edge cases
- Update documentation with examples"
4. Run Quality Checks
# Format code
black submarit tests
isort submarit tests
# Run linters
flake8 submarit tests
mypy submarit
# Run tests
pytest
# Check documentation
cd docs && make doctest
5. Push and Create PR
git push origin feature/your-feature-name
Then create a pull request on GitHub with:
Clear description of changes
Link to related issue (if any)
Screenshots (for visualizations)
Performance comparisons (if relevant)
Development Guidelines
Adding New Algorithms
Inherit from
BaseClusterer:
from submarit.core.base import BaseClusterer
class MyAlgorithm(BaseClusterer):
def __init__(self, n_clusters, **kwargs):
super().__init__(n_clusters=n_clusters)
# Initialize parameters
def fit(self, X):
# Implement fitting logic
self.labels_ = ...
return self
def predict(self, X):
# Optional: for new data
return self.labels_
Add tests in
tests/test_algorithms.pyAdd documentation in
docs/source/api/algorithms.rstAdd example in docstring or notebook
Adding New Metrics
Create function in appropriate module:
def new_metric(S, clusters):
"""Calculate new metric.
Parameters
----------
S : array-like
Substitution matrix
clusters : array-like
Cluster assignments
Returns
-------
float
Metric value
"""
# Implementation
Add to
ClusterEvaluatorif appropriateAdd tests with known results
Document mathematical formula
Performance Considerations
Use NumPy vectorization over loops
Consider memory usage for large datasets
Add benchmarks for performance-critical code
Profile before optimizing
Release Process
Version Numbering
We follow Semantic Versioning (MAJOR.MINOR.PATCH):
MAJOR: Incompatible API changes
MINOR: New functionality, backwards compatible
PATCH: Bug fixes
Release Checklist
Update version in
submarit/__init__.pyUpdate CHANGELOG.md
Run full test suite
Build and test documentation
Create release branch
Tag release
Build and upload to PyPI
Community
Code of Conduct
We follow the Contributor Covenant Code of Conduct. Be respectful and inclusive.
Getting Help
Questions: Use GitHub Discussions
Bugs: Open an Issue with reproducible example
Features: Discuss in Issue before implementing
Recognition
Contributors are recognized in:
AUTHORS.md file
Release notes
Documentation credits
Thank you for contributing to SUBMARIT!