Back to Skills

Ml Model Training

Train ML models with scikit-learn, PyTorch, TensorFlow. Use for classification/regression, neural networks, hyperparameter tuning, or encountering overfitting, underfitting, convergence issues.

ai
By secondsky
17928Updated 1 day agoTypeScriptMIT

Skill Content

# ML Model Training

Train machine learning models with proper data handling and evaluation.

## Training Workflow

1. Data Preparation → 2. Feature Engineering → 3. Model Selection → 4. Training → 5. Evaluation

## Data Preparation

```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder

# Load and clean data
df = pd.read_csv('data.csv')
df = df.dropna()

# Encode categorical variables
le = LabelEncoder()
df['category'] = le.fit_transform(df['category'])

# Split data (70/15/15)
X = df.drop('target', axis=1)
y = df['target']
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5)

# Scale features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)
X_test = scaler.transform(X_test)
```

## Scikit-learn Training

```python
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

y_pred = model.predict(X_val)
print(classification_report(y_val, y_pred))
```

## PyTorch Training

```python
import torch
import torch.nn as nn

class Model(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(input_dim, 64),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.layers(x)

model = Model(X_train.shape[1])
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.BCELoss()

for epoch in range(100):
    model.train()
    optimizer.zero_grad()
    output = model(X_train_tensor)
    loss = criterion(output, y_train_tensor)
    loss.backward()
    optimizer.step()
```

## Evaluation Metrics

| Task | Metrics |
|------|---------|
| Classification | Accuracy, Precision, Recall, F1, AUC-ROC |
| Regression | MSE, RMSE, MAE, R² |

## Complete Framework Examples

- **PyTorch**: See [references/pytorch-training.md](references/pytorch-training.md) for complete training with:
  - Custom model classes with BatchNorm and Dropout
  - Training/validation loops with early stopping
  - Learning rate scheduling
  - Model checkpointing
  - Full evaluation with classification report

- **TensorFlow/Keras**: See [references/tensorflow-keras.md](references/tensorflow-keras.md) for:
  - Sequential model architecture
  - Callbacks (EarlyStopping, ReduceLROnPlateau, ModelCheckpoint, TensorBoard)
  - Training history visualization
  - TFLite conversion for mobile deployment
  - Custom training loops

## Best Practices

**Do:**
- Use cross-validation for robust evaluation
- Track experiments with MLflow
- Save model checkpoints regularly
- Monitor for overfitting
- Document hyperparameters
- Use 70/15/15 train/val/test split

**Don't:**
- Train without a validation set
- Ignore class imbalance
- Skip feature scaling
- Use test set for hyperparameter tuning
- Forget to set random seeds

## Known Issues Prevention

### 1. Data Leakage
**Problem**: Scaling or transforming data before splitting leads to test set information leaking into training.

**Solution**: Always split data first, then fit transformers only on training data:
```python
# ✅ Correct: Fit on train, transform train/val/test
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)  # Only transform
X_test = scaler.transform(X_test)  # Only transform

# ❌ Wrong: Fitting on all data
X_all = scaler.fit_transform(X)  # Leaks test info!
```

### 2. Class Imbalance Ignored
**Problem**: Training on imbalanced datasets (e.g., 95% class A, 5% class B) leads to models that predict only the majority class.

**Solution**: Use class weights or resampling:
```python
from sklearn.utils.class_weight import compute_class_weight

# Compute class weights
class_weights = compute_class_weight('balanced', classes=np.unique(y_train), y=y_train)
model = RandomForestClassifier(class_weight='balanced')

# Or use SMOTE for oversampling minority class
from imblearn.over_sampling import SMOTE
smote = SMOTE()
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)
```

### 3. Overfitting Due to No Regularization
**Problem**: Complex models memorize training data, perform poorly on validation/test sets.

**Solution**: Add regularization techniques:
```python
# Dropout in PyTorch
nn.Dropout(0.3)

# L2 regularization in scikit-learn
RandomForestClassifier(max_depth=10, min_samples_split=20)

# Early stopping in Keras
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
model.fit(X_train, y_train, validation_data=(X_val, y_val), callbacks=[early_stop])
```

### 4. Not Setting Random Seeds
**Problem**: Results are not reproducible across runs, making debugging and comparison impossible.

**Solution**: Set all random seeds:
```python
import random
import numpy as np
import torch

random.seed(42)
np.random.seed(42)
torch.manual_seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(42)
```

### 5. Using Test Set for Hyperparameter Tuning
**Problem**: Optimizing hyperparameters on test set leads to overfitting to test data.

**Solution**: Use validation set for tuning, test set only for final evaluation:
```python
from sklearn.model_selection import GridSearchCV

# ✅ Correct: Tune on train+val, evaluate on test
param_grid = {'n_estimators': [50, 100, 200], 'max_depth': [5, 10, 15]}
grid_search = GridSearchCV(RandomForestClassifier(), param_grid, cv=5)
grid_search.fit(X_train, y_train)  # Cross-validation on training set
best_model = grid_search.best_estimator_

# Final evaluation on held-out test set
final_score = best_model.score(X_test, y_test)
```

## When to Load References

Load reference files when you need:
- **PyTorch implementation details**: Load `references/pytorch-training.md` for complete training loops with early stopping, learning rate scheduling, and checkpointing
- **TensorFlow/Keras patterns**: Load `references/tensorflow-keras.md` for callback usage, custom training loops, and mobile deployment with TFLite

How to use

  1. Copy the skill content above
  2. Create a .claude/skills directory in your project
  3. Save as .claude/skills/claude-skills-ml-model-training.md
  4. Use /claude-skills-ml-model-training in Claude Code to invoke this skill

Claude Code Skills Collection

170 production-ready skills for Claude Code CLI

Version 3.3.1 | Last Updated: 2026-05-14

<div align="center">

🔌 Platform Support

This repository uses Claude Plugin Patterns — natively supported by:

PlatformStatusNotes
Claude CodeNativeFull marketplace support
Factory DroidNativeFull marketplace support
</div> **For all other Platforms like opencode, codex and others, you can use https://github.com/enulus/OpenPackage **

A curated collection of battle-tested skills for building modern web applications with Cloudflare, AI integrations, React, Tailwind, and more.

PS: if skills.sh warns about any skill: Their scan process is a outdated LLM which flags newest versions pins (like in ZOD) as non existent and by that potentially malicous.


Quick Start

Marketplace Installation (Recommended)

# Add the marketplace
/plugin marketplace add https://github.com/secondsky/claude-skills

# Install individual skills as needed
/plugin install cloudflare-d1@claude-skills
/plugin install tailwind-v4-shadcn@claude-skills
/plugin install ai-sdk-core@claude-skills

See MARKETPLACE.md for complete catalog of all 170 skills.

Bulk Installation (Contributors)

# Clone the repository
git clone https://github.com/secondsky/claude-skills.git
cd claude-skills

# Install all 170 skills at once
./scripts/install-all.sh

# Or install individual skills
./scripts/install-skill.sh cloudflare-d1

Repository Structure

This repository contains 170 production-tested skills for Claude Code, each focused on a specific technology or capability.

Individual Skills: Each skill is a standalone unit with:

  • SKILL.md - Core knowledge and guidance
  • Templates - Working code examples
  • References - Extended documentation
  • Scripts - Helper utilities

Installation Options:

  1. Individual - Install only the skills you need via marketplace
  2. Bulk - Install all 170 skills using ./scripts/install-all.sh

Available Skills (170 Individual Skills)

Each skill is individually installable. Install only the skills you need.

Full Catalog: See MARKETPLACE.md for detailed listings.

Categories

CategorySkillsExamples
tooling29turborepo, plan-interview, code-review
frontend26nuxt-v4, nuxt-v5, tailwind-v4-shadcn, tanstack-query, nuxt-studio, maz-ui, threejs
cloudflare21cloudflare-d1, cloudflare-workers-ai, cloudflare-agents
ai20openai-agents, claude-api, ai-sdk-core
api16api-design-principles, graphql-implementation
web10hono-routing, firecrawl-scraper, web-performance
mobile7swift-best-practices, react-native-app, react-native-skills
database6drizzle-orm-d1, neon-vercel-postgres, supabase-postgres-best-practices
security6csrf-protection, access-control-rbac
auth4better-auth
testing4vitest-testing, playwright-testing
design4design-review, design-system-creation
woocommerce4woocommerce-backend-dev
cms4hugo, sveltia-cms, wordpress-plugin-core
architecture3microservices-patterns, architecture-patterns
data3sql-query-optimization, recommendation-engine
seo2seo-optimizer, seo-keyword-cluster-builder
documentation1technical-specification

How It Works

Auto-Discovery

Claude Code automatically checks ~/.claude/skills/ for relevant skills before planning tasks:

User: "Set up a Cloudflare Worker with D1 database"
           ↓
Claude: [Checks skills automatically]
           ↓
Claude: "Found cloudflare-d1 skills.
         These prevent 12 documented errors. Use them?"
           ↓
User: "Yes"
           ↓
Result: Production-ready setup, zero errors, ~65% token savings

Note: Due to token limits, not all skills may be visible at once. See ⚠️ Important: Token Limits below.

Skill Structure

Each skill includes:

skills/[skill-name]/
├── SKILL.md              # Complete documentation
├── .claude-plugin/
│   └── plugin.json       # Plugin metadata
├── templates/            # Ready-to-copy templates
├── scripts/              # Automation scripts
└── references/           # Extended documentation

Recent Additions

May 2026

Supply Chain Security (cross-cutting):

  • dependency-upgrade expanded with Socket CLI integration — proactive malicious package detection, typosquatting alerts, and CI/CD security gates. New 418-line reference guide, 2 GitHub Actions templates, and expanded supply chain security comparison (3 tools)
  • 31 skills now include "Secure Installation" guidance — contextually-tailored security sections across all high-risk skill categories (scaffolding, MCP/agent SDKs, multi-provider installs, Docker, CI/CD). Covers 8 Bun skills, 5 Nuxt skills, 6 Cloudflare skills, 4 AI/agent skills, and 8 frontend/tooling skills
  • Supply chain security is now a first-class cross-cutting concern woven into the skill collection — not a standalone topic

February - April 2026

Full-Stack Frameworks:

  • nuxt-v5 (v1.0.0) - Full Nuxt 5 support with 4 skills (core, data, server, production), 3 diagnostic agents, and interactive setup wizard
  • supabase-postgres-best-practices - 30 Postgres optimization rules from Supabase across 8 categories
  • threejs (v1.0.0) - 3D web graphics: scenes, geometries, shaders, animations, post-processing

Infrastructure:

  • JSON schema validation - Automated plugin.json validation with CI support
  • GitHub issue templates - Skill-specific issue templates for bug reports, feature requests, and submissions

Plugin Enhancements:

  • mutation-testing - Added Bun native runner support
  • dependency-upgrade - Added supply chain security content

December 2025 - January 2026

Frontend Expansion:

  • nuxt-studio (v1.0.0) - Visual CMS for Nuxt Content with live preview, OAuth auth, and R2 storage integration
  • maz-ui (v1.0.0) - 50+ Vue/Nuxt components with theming, i18n, form generation, and 14 composables

Developer Workflow:

  • plan-interview (v2.0.0) - Adaptive interview-driven spec generation with autonomous quality review
  • turborepo (v2.8.0) - Updated to official Vercel skill with enhanced monorepo build optimization

Mobile Development:

  • react-native-skills (v1.0.0) - React Native & Expo best practices with performance optimization patterns

Enhanced Authentication:

  • better-auth (v2.2.0) - Expanded to 18 framework integrations with 30+ authentication plugins

⚠️ Important: Token Limits

Skill Visibility Constraint

Claude Code has a 15,000 character limit for the total size of skill descriptions in the system prompt. This limit also applies to commands and agents.

What this means:

  • Not all 170 skills may be visible in Claude's context at once
  • Skills are loaded based on relevance and available token budget
  • You can verify how many skills Claude currently sees by asking: "How many skills do you see in your system prompt?"

Checking Visible Skills

To verify which skills are currently loaded:

# Ask Claude Code directly
"Check what skills/plugins you see in your system prompt"

Claude will report something like: "85 of 170 skills visible due to token limits"

Workaround: Increase Token Budget

You can double the headroom for skill descriptions by setting an environment variable:

# Increase limit to 30,000 characters
export SLASH_COMMAND_TOOL_CHAR_BUDGET=30000

# Then launch Claude Code
claude

This gives you approximately 2x more skill visibility in the system prompt.

Note: This is a temporary workaround. The Claude Code team is working on better solutions for skill discovery and loading.


Token Efficiency

MetricManual SetupWith SkillsSavings
Average Tokens12,000-15,0004,000-5,000~65%
Typical Errors2-4 per service0 (prevented)100%
Setup Time2-4 hours15-45 minutes~80%

Across all 170 skills: 400+ documented errors prevented.


Contributing

Prerequisites for Contributors

Install the official plugin development toolkit:

/plugin install plugin-dev@claude-code-marketplace

This provides:

  • /plugin-dev:create-plugin command (8-phase guided workflow)
  • 7 comprehensive skills (hooks, MCP, structure, agents, commands, skills)
  • 2 specialized agents (agent-creator, plugin-validator)

Quick Steps

  1. Create skill directory in plugins/
  2. Add SKILL.md with YAML frontmatter
  3. Run ./scripts/sync-plugins.sh
  4. Submit pull request

See CONTRIBUTING.md and PLUGIN_DEV_BEST_PRACTICES.md for detailed guidelines.


Documentation

DocumentPurpose
START_HERE.mdStart here! Quick navigation guide
PLUGIN_DEV_BEST_PRACTICES.mdRepository-specific best practices (marketplace, budget, quality)
MARKETPLACE.mdFull skill catalog and installation guide
MARKETPLACE_MANAGEMENT.mdTechnical infrastructure (plugin.json, scripts, validation)
CLAUDE.mdProject context and development standards
CONTRIBUTING.mdContribution guidelines

Links


Built with ❤️ by Claude Skills Maintainers

View source on GitHub