Recommendation System

Name: Recommendation System
Author: secondsky

Deploy production recommendation systems with feature stores, caching, A/B testing. Use for personalization APIs, low latency serving, or encountering cache invalidation, experiment tracking, quality monitoring issues.

testingmonitoringapi

By secondsky

179 28Updated 1 day agoTypeScriptMIT

Skill Content

# Recommendation System

Production-ready architecture for scalable recommendation systems with feature stores, multi-tier caching, A/B testing, and comprehensive monitoring.

## When to Use This Skill

Load this skill when:
- **Building Recommendation APIs**: Serving personalized recommendations at scale
- **Implementing Caching**: Multi-tier caching for sub-millisecond latency
- **Running A/B Tests**: Experimenting with recommendation algorithms
- **Monitoring Quality**: Tracking CTR, conversion, diversity, coverage
- **Optimizing Performance**: Reducing latency, increasing throughput
- **Feature Engineering**: Managing user/item features with feature stores

## Quick Start: Recommendation API in 5 Steps

```bash
# 1. Install dependencies
pip install fastapi==0.109.0 redis==5.0.0 prometheus-client==0.19.0

# 2. Start Redis (for caching and feature store)
docker run -d -p 6379:6379 redis:alpine

# 3. Create recommendation service: app.py
cat > app.py << 'EOF'
from fastapi import FastAPI
from pydantic import BaseModel
from typing import List
import redis
import json

app = FastAPI()
cache = redis.Redis(host='localhost', port=6379, decode_responses=True)

class RecommendationResponse(BaseModel):
    user_id: str
    items: List[str]
    cached: bool

@app.post("/recommendations", response_model=RecommendationResponse)
async def get_recommendations(user_id: str, n: int = 10):
    # Check cache
    cache_key = f"recs:{user_id}:{n}"
    cached = cache.get(cache_key)

    if cached:
        return RecommendationResponse(
            user_id=user_id,
            items=json.loads(cached),
            cached=True
        )

    # Generate recommendations (simplified)
    items = [f"item_{i}" for i in range(n)]

    # Cache for 5 minutes
    cache.setex(cache_key, 300, json.dumps(items))

    return RecommendationResponse(
        user_id=user_id,
        items=items,
        cached=False
    )

@app.get("/health")
async def health():
    return {"status": "healthy"}
EOF

# 4. Run API
uvicorn app:app --host 0.0.0.0 --port 8000

# 5. Test
curl -X POST "http://localhost:8000/recommendations?user_id=user_123&n=10"
```

**Result**: Working recommendation API with caching in under 5 minutes.

## System Architecture

```
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│ User Events │────▶│ Feature     │────▶│ Model       │
│ (clicks,    │     │ Store       │     │ Serving     │
│  purchases) │     │ (Redis)     │     │             │
└─────────────┘     └─────────────┘     └─────────────┘
                           │                    │
                           ▼                    ▼
                    ┌─────────────┐     ┌─────────────┐
                    │ Training    │     │ API         │
                    │ Pipeline    │     │ (FastAPI)   │
                    └─────────────┘     └─────────────┘
                                               │
                                               ▼
                                        ┌─────────────┐
                                        │ Monitoring  │
                                        │ (Prometheus)│
                                        └─────────────┘
```

## Core Components

### 1. Feature Store

Centralized storage for user and item features:

```python
import redis
import json

class FeatureStore:
    """Fast feature access with Redis caching."""

    def __init__(self, redis_client):
        self.redis = redis_client
        self.ttl = 3600  # 1 hour

    def get_user_features(self, user_id: str) -> dict:
        cache_key = f"user_features:{user_id}"
        cached = self.redis.get(cache_key)

        if cached:
            return json.loads(cached)

        # Fetch from database
        features = fetch_from_db(user_id)

        # Cache
        self.redis.setex(cache_key, self.ttl, json.dumps(features))
        return features
```

### 2. Model Serving

Serve multiple models for A/B testing:

```python
class ModelServing:
    """Serve multiple recommendation models."""

    def __init__(self):
        self.models = {}

    def register_model(self, name: str, model, is_default: bool = False):
        self.models[name] = model
        if is_default:
            self.default_model = name

    def predict(self, user_features: dict, item_features: list, model_name: str = None):
        model = self.models.get(model_name or self.default_model)
        return model.predict(user_features, item_features)
```

### 3. Caching Layer

Multi-tier caching for low latency:

```python
class TieredCache:
    """L1 (memory) -> L2 (Redis) -> L3 (database)."""

    def __init__(self, redis_client):
        self.l1_cache = {}  # In-memory
        self.redis = redis_client  # L2

    def get(self, key: str):
        # L1: In-memory (fastest)
        if key in self.l1_cache:
            return self.l1_cache[key]

        # L2: Redis
        cached = self.redis.get(key)
        if cached:
            value = json.loads(cached)
            self.l1_cache[key] = value  # Promote to L1
            return value

        # L3: Miss (fetch from database)
        return None
```

## Key Metrics

| Metric | Description | Target |
|--------|-------------|--------|
| **CTR** | Click-through rate | >5% |
| **Conversion Rate** | Purchases from recs | >2% |
| **P95 Latency** | 95th percentile response time | <200ms |
| **Cache Hit Rate** | % served from cache | >80% |
| **Coverage** | % of catalog recommended | >50% |
| **Diversity** | Variety in recommendations | >0.7 |

## Known Issues Prevention

### 1. Cold Start for New Users
**Problem**: No recommendations for users without history, poor initial experience.

**Solution**: Use popularity-based fallback:
```python
def get_recommendations(user_id: str, n: int = 10):
    user_features = feature_store.get_user_features(user_id)

    # Check if new user (no purchase history)
    if user_features.get('total_purchases', 0) == 0:
        # Fallback to popular items
        return get_popular_items(n)

    # Personalized recommendations
    return generate_personalized_recs(user_id, n)
```

### 2. Cache Invalidation on User Actions
**Problem**: User makes purchase, cache still shows purchased item in recommendations.

**Solution**: Invalidate cache on relevant actions:
```python
INVALIDATING_ACTIONS = {'purchase', 'rating', 'add_to_cart'}

def on_user_action(user_id: str, action: str):
    if action in INVALIDATING_ACTIONS:
        cache_key = f"recs:{user_id}:*"
        redis_client.delete(cache_key)
        logger.info(f"Invalidated cache for {user_id} due to {action}")
```

### 3. Thundering Herd on Cache Expiry
**Problem**: Many users' caches expire simultaneously, overload database/model.

**Solution**: Add random jitter to TTL:
```python
import random

def set_cache(key: str, value: dict, base_ttl: int = 300):
    # Add ±10% jitter
    jitter = random.uniform(-0.1, 0.1) * base_ttl
    ttl = int(base_ttl + jitter)
    redis_client.setex(key, ttl, json.dumps(value))
```

### 4. Poor Diversity = Filter Bubble
**Problem**: Recommendations too similar, users only see same category.

**Solution**: Implement diversity constraint:
```python
def rank_with_diversity(items: list, scores: list, n: int = 10):
    selected = []
    category_counts = {}

    for item, score in sorted(zip(items, scores), key=lambda x: -x[1]):
        category = item['category']

        # Limit 3 items per category
        if category_counts.get(category, 0) >= 3:
            continue

        selected.append(item)
        category_counts[category] = category_counts.get(category, 0) + 1

        if len(selected) >= n:
            break

    return selected
```

### 5. No Monitoring = Silent Degradation
**Problem**: Recommendation quality drops, nobody notices until users complain.

**Solution**: Continuous monitoring with alerts:
```python
from prometheus_client import Counter, Histogram

recommendation_clicks = Counter('recommendation_clicks_total')
recommendation_latency = Histogram('recommendation_latency_seconds')

@app.post("/recommendations")
async def get_recommendations(user_id: str):
    start = time.time()

    recs = generate_recs(user_id)

    latency = time.time() - start
    recommendation_latency.observe(latency)

    return recs

@app.post("/track/click")
async def track_click(user_id: str, item_id: str):
    recommendation_clicks.inc()
    # Alert if CTR drops below 3%
```

### 6. Stale Features = Outdated Recommendations
**Problem**: User preferences change but features don't update, recommendations irrelevant.

**Solution**: Set appropriate TTLs and update triggers:
```python
class FeatureStore:
    def __init__(self, redis_client):
        self.redis = redis_client
        # Shorter TTL for frequently changing features
        self.user_ttl = 300  # 5 minutes
        self.item_ttl = 3600  # 1 hour

    def update_on_event(self, user_id: str, event: str):
        # Invalidate on important events
        if event in ['purchase', 'rating']:
            self.redis.delete(f"user_features:{user_id}")
            logger.info(f"Refreshed features for {user_id}")
```

### 7. A/B Test Sample Size Too Small
**Problem**: Declare winner too early, results not statistically significant.

**Solution**: Calculate required sample size first:
```python
def calculate_sample_size(
    baseline_rate: float,
    min_detectable_effect: float,
    alpha: float = 0.05,
    power: float = 0.8
) -> int:
    """Calculate required sample size per variant."""
    from scipy import stats

    z_alpha = stats.norm.ppf(1 - alpha/2)
    z_beta = stats.norm.ppf(power)

    p1 = baseline_rate
    p2 = baseline_rate * (1 + min_detectable_effect)
    p_avg = (p1 + p2) / 2

    n = (
        (z_alpha + z_beta)**2 * 2 * p_avg * (1 - p_avg) /
        (p2 - p1)**2
    )

    return int(n)

# Example: detect 10% lift with baseline CTR=5%
n_required = calculate_sample_size(
    baseline_rate=0.05,
    min_detectable_effect=0.10
)
print(f"Required sample size: {n_required} per variant")
# Wait until both variants reach this size before concluding
```

## When to Load References

Load reference files for detailed production implementations:

- **Production Architecture**: Load `references/production-architecture.md` for complete FeatureStore, ModelServing, and RecommendationService implementations with batch fetching, caching integration, and FastAPI deployment patterns.

- **Caching Strategies**: Load `references/caching-strategies.md` when implementing multi-tier caching (L1/L2/L3), cache warming, invalidation strategies, probabilistic refresh, or thundering herd prevention.

- **A/B Testing Framework**: Load `references/ab-testing-framework.md` for deterministic variant assignment, Thompson sampling (multi-armed bandits), Bayesian and frequentist significance testing, and experiment tracking.

- **Monitoring & Alerting**: Load `references/monitoring-alerting.md` for Prometheus metrics integration, dashboard endpoints, alert rules, and quality monitoring (diversity, coverage).

## Best Practices

1. **Feature Precomputation**: Compute features offline, serve from cache
2. **Batch Fetching**: Use Redis MGET for multiple users/items
3. **Cache Aggressively**: 5-15 minute TTL for user recommendations
4. **Fail Gracefully**: Return popular items if personalization fails
5. **Monitor Everything**: Track CTR, latency, diversity, coverage
6. **A/B Test Continuously**: Always be experimenting with new algorithms
7. **Diversity Constraint**: Ensure varied recommendations
8. **Explain Recommendations**: Provide reasons ("Highly rated", "Popular")

## Common Patterns

### Recommendation Service
```python
class RecommendationService:
    def __init__(self, feature_store, model_serving, cache):
        self.feature_store = feature_store
        self.model_serving = model_serving
        self.cache = cache

    def get_recommendations(self, user_id: str, n: int = 10):
        # 1. Check cache
        cached = self.cache.get(f"recs:{user_id}:{n}")
        if cached:
            return cached

        # 2. Get features
        user_features = self.feature_store.get_user_features(user_id)
        candidates = self.get_candidates(user_id)

        # 3. Score candidates
        scores = self.model_serving.predict(user_features, candidates)

        # 4. Rank with diversity
        recommendations = self.rank_with_diversity(candidates, scores, n)

        # 5. Cache
        self.cache.set(f"recs:{user_id}:{n}", recommendations, ttl=300)

        return recommendations
```

### A/B Testing
```python
def assign_variant(user_id: str, experiment_id: str) -> str:
    """Deterministic assignment - same user always gets same variant."""
    import hashlib

    hash_input = f"{user_id}:{experiment_id}"
    hash_value = int(hashlib.md5(hash_input.encode()).hexdigest(), 16)

    # 50/50 split
    return 'control' if hash_value % 2 == 0 else 'treatment'

# Usage
variant = assign_variant('user_123', 'rec_algo_v2')
model_name = 'main' if variant == 'control' else 'experimental'
recs = get_recommendations(user_id, model_name=model_name)
```

### Monitoring
```python
from prometheus_client import Counter, Histogram

requests_total = Counter('recommendation_requests_total', ['status'])
latency_seconds = Histogram('recommendation_latency_seconds')

@app.post("/recommendations")
async def get_recommendations(user_id: str):
    with latency_seconds.time():
        try:
            recs = generate_recs(user_id)
            requests_total.labels(status='success').inc()
            return recs
        except Exception as e:
            requests_total.labels(status='error').inc()
            raise
```

How to use

Copy the skill content above
Create a .claude/skills directory in your project
Save as .claude/skills/claude-skills-recommendation-system.md
Use /claude-skills-recommendation-system in Claude Code to invoke this skill

README

View on GitHub

Claude Code Skills Collection

170 production-ready skills for Claude Code CLI

Version 3.3.1 | Last Updated: 2026-05-14

🔌 Platform Support

This repository uses Claude Plugin Patterns — natively supported by:

Platform	Status	Notes
Claude Code	✅ Native	Full marketplace support
Factory Droid	✅ Native	Full marketplace support

</div> **For all other Platforms like opencode, codex and others, you can use https://github.com/enulus/OpenPackage **

A curated collection of battle-tested skills for building modern web applications with Cloudflare, AI integrations, React, Tailwind, and more.

PS: if skills.sh warns about any skill: Their scan process is a outdated LLM which flags newest versions pins (like in ZOD) as non existent and by that potentially malicous.

Quick Start

Marketplace Installation (Recommended)

# Add the marketplace
/plugin marketplace add https://github.com/secondsky/claude-skills

# Install individual skills as needed
/plugin install cloudflare-d1@claude-skills
/plugin install tailwind-v4-shadcn@claude-skills
/plugin install ai-sdk-core@claude-skills

See MARKETPLACE.md for complete catalog of all 170 skills.

Bulk Installation (Contributors)

# Clone the repository
git clone https://github.com/secondsky/claude-skills.git
cd claude-skills

# Install all 170 skills at once
./scripts/install-all.sh

# Or install individual skills
./scripts/install-skill.sh cloudflare-d1

Repository Structure

This repository contains 170 production-tested skills for Claude Code, each focused on a specific technology or capability.

Individual Skills: Each skill is a standalone unit with:

SKILL.md - Core knowledge and guidance
Templates - Working code examples
References - Extended documentation
Scripts - Helper utilities

Installation Options:

Individual - Install only the skills you need via marketplace
Bulk - Install all 170 skills using ./scripts/install-all.sh

Available Skills (170 Individual Skills)

Each skill is individually installable. Install only the skills you need.

Full Catalog: See MARKETPLACE.md for detailed listings.

How It Works

Auto-Discovery

Claude Code automatically checks ~/.claude/skills/ for relevant skills before planning tasks:

User: "Set up a Cloudflare Worker with D1 database"
           ↓
Claude: [Checks skills automatically]
           ↓
Claude: "Found cloudflare-d1 skills.
         These prevent 12 documented errors. Use them?"
           ↓
User: "Yes"
           ↓
Result: Production-ready setup, zero errors, ~65% token savings

Note: Due to token limits, not all skills may be visible at once. See ⚠️ Important: Token Limits below.

Skill Structure

Each skill includes:

skills/[skill-name]/
├── SKILL.md              # Complete documentation
├── .claude-plugin/
│   └── plugin.json       # Plugin metadata
├── templates/            # Ready-to-copy templates
├── scripts/              # Automation scripts
└── references/           # Extended documentation

Recent Additions

May 2026

Supply Chain Security (cross-cutting):

dependency-upgrade expanded with Socket CLI integration — proactive malicious package detection, typosquatting alerts, and CI/CD security gates. New 418-line reference guide, 2 GitHub Actions templates, and expanded supply chain security comparison (3 tools)
31 skills now include "Secure Installation" guidance — contextually-tailored security sections across all high-risk skill categories (scaffolding, MCP/agent SDKs, multi-provider installs, Docker, CI/CD). Covers 8 Bun skills, 5 Nuxt skills, 6 Cloudflare skills, 4 AI/agent skills, and 8 frontend/tooling skills
Supply chain security is now a first-class cross-cutting concern woven into the skill collection — not a standalone topic

February - April 2026

Full-Stack Frameworks:

nuxt-v5 (v1.0.0) - Full Nuxt 5 support with 4 skills (core, data, server, production), 3 diagnostic agents, and interactive setup wizard
supabase-postgres-best-practices - 30 Postgres optimization rules from Supabase across 8 categories
threejs (v1.0.0) - 3D web graphics: scenes, geometries, shaders, animations, post-processing

Infrastructure:

JSON schema validation - Automated plugin.json validation with CI support
GitHub issue templates - Skill-specific issue templates for bug reports, feature requests, and submissions

Plugin Enhancements:

mutation-testing - Added Bun native runner support
dependency-upgrade - Added supply chain security content

December 2025 - January 2026

Frontend Expansion:

nuxt-studio (v1.0.0) - Visual CMS for Nuxt Content with live preview, OAuth auth, and R2 storage integration
maz-ui (v1.0.0) - 50+ Vue/Nuxt components with theming, i18n, form generation, and 14 composables

Developer Workflow:

plan-interview (v2.0.0) - Adaptive interview-driven spec generation with autonomous quality review
turborepo (v2.8.0) - Updated to official Vercel skill with enhanced monorepo build optimization

Mobile Development:

react-native-skills (v1.0.0) - React Native & Expo best practices with performance optimization patterns

Enhanced Authentication:

better-auth (v2.2.0) - Expanded to 18 framework integrations with 30+ authentication plugins

⚠️ Important: Token Limits

Skill Visibility Constraint

Claude Code has a 15,000 character limit for the total size of skill descriptions in the system prompt. This limit also applies to commands and agents.

What this means:

Not all 170 skills may be visible in Claude's context at once
Skills are loaded based on relevance and available token budget
You can verify how many skills Claude currently sees by asking: "How many skills do you see in your system prompt?"

Checking Visible Skills

To verify which skills are currently loaded:

# Ask Claude Code directly
"Check what skills/plugins you see in your system prompt"

Claude will report something like: "85 of 170 skills visible due to token limits"

Workaround: Increase Token Budget

You can double the headroom for skill descriptions by setting an environment variable:

# Increase limit to 30,000 characters
export SLASH_COMMAND_TOOL_CHAR_BUDGET=30000

# Then launch Claude Code
claude

This gives you approximately 2x more skill visibility in the system prompt.

Note: This is a temporary workaround. The Claude Code team is working on better solutions for skill discovery and loading.

Token Efficiency

Metric	Manual Setup	With Skills	Savings
Average Tokens	12,000-15,000	4,000-5,000	~65%
Typical Errors	2-4 per service	0 (prevented)	100%
Setup Time	2-4 hours	15-45 minutes	~80%

Across all 170 skills: 400+ documented errors prevented.

Contributing

Prerequisites for Contributors

Install the official plugin development toolkit:

/plugin install plugin-dev@claude-code-marketplace

This provides:

/plugin-dev:create-plugin command (8-phase guided workflow)
7 comprehensive skills (hooks, MCP, structure, agents, commands, skills)
2 specialized agents (agent-creator, plugin-validator)

Quick Steps

Create skill directory in plugins/
Add SKILL.md with YAML frontmatter
Run ./scripts/sync-plugins.sh
Submit pull request

See CONTRIBUTING.md and PLUGIN_DEV_BEST_PRACTICES.md for detailed guidelines.

Documentation

Document	Purpose
START_HERE.md	Start here! Quick navigation guide
PLUGIN_DEV_BEST_PRACTICES.md	Repository-specific best practices (marketplace, budget, quality)
MARKETPLACE.md	Full skill catalog and installation guide
MARKETPLACE_MANAGEMENT.md	Technical infrastructure (plugin.json, scripts, validation)
CLAUDE.md	Project context and development standards
CONTRIBUTING.md	Contribution guidelines

Category	Skills	Examples
tooling	29	turborepo, plan-interview, code-review
frontend	26	nuxt-v4, nuxt-v5, tailwind-v4-shadcn, tanstack-query, nuxt-studio, maz-ui, threejs
cloudflare	21	cloudflare-d1, cloudflare-workers-ai, cloudflare-agents
ai	20	openai-agents, claude-api, ai-sdk-core
api	16	api-design-principles, graphql-implementation
web	10	hono-routing, firecrawl-scraper, web-performance
mobile	7	swift-best-practices, react-native-app, react-native-skills
database	6	drizzle-orm-d1, neon-vercel-postgres, supabase-postgres-best-practices
security	6	csrf-protection, access-control-rbac
auth	4	better-auth
testing	4	vitest-testing, playwright-testing
design	4	design-review, design-system-creation
woocommerce	4	woocommerce-backend-dev
cms	4	hugo, sveltia-cms, wordpress-plugin-core
architecture	3	microservices-patterns, architecture-patterns
data	3	sql-query-optimization, recommendation-engine
seo	2	seo-optimizer, seo-keyword-cluster-builder
documentation	1	technical-specification