Template Discovery & HuggingFace Integration
Syaala Platform provides seamless integration with HuggingFace Hub, giving you access to 500,000+ AI models with intelligent recommendations and auto-configuration.
Phase 18 Feature: Template-based Inference-as-a-Service (IaaS) architecture with personalized recommendations based on your use case and budget.
Overview
The Template Discovery system provides:
- Personalized Recommendations: AI-powered suggestions based on your use case and budget
- HuggingFace Integration: Search and deploy from 500K+ models on HuggingFace Hub
- Auto-Configuration: Automatic runtime, GPU, and Docker setup
- Cost Estimation: Real-time pricing for GPU types and deployment configurations
- Community Sharing: Publish and discover templates from the Syaala community
Personalized Recommendations
Get AI-powered template recommendations tailored to your specific needs.
Via Dashboard
- Navigate to Dashboard → Templates
- Complete the quick survey (optional):
- Use Case: Text generation, image processing, etc.
- Monthly Budget: Low ($100), Medium ($500), High ($5000+)
- View personalized recommendations with:
- Auto-configured GPU type
- Estimated monthly cost
- One-click deployment
Via CLI
# Get recommendations based on use case
syaala models recommend --use-case text-generation --budget medium
# Interactive mode with prompts
syaala models recommend --interactiveExample Output:
✓ Fetching personalized recommendations...
Recommended Templates for text-generation (Medium Budget: $500/month)
┌────────────────────────────────────────────────────────────────────┐
│ 1. Llama 3.3 70B Instruct │
│ Runtime: vllm │
│ GPU: NVIDIA-A40 (1x) │
│ Cost: ~$456/month (95% of budget) │
│ Tags: llm, instruction-following, chat │
│ │
│ ► syaala deployments create --template tpl_llama33_70b │
└────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────┐
│ 2. Mistral 7B Instruct v0.3 │
│ Runtime: vllm │
│ GPU: NVIDIA-RTX-4090 (1x) │
│ Cost: ~$234/month (47% of budget) │
│ Tags: llm, efficient, general-purpose │
│ │
│ ► syaala deployments create --template tpl_mistral_7b │
└────────────────────────────────────────────────────────────────────┘HuggingFace Model Discovery
Search and explore models from HuggingFace Hub directly through Syaala.
Search Models
Via CLI
# Search by keyword
syaala models discover "llama"
# Filter by task
syaala models discover "image classification" --task image-classification
# Sort by downloads or likes
syaala models discover "stable diffusion" --sort downloadsExample Output:
✓ Found 234 models matching "llama"
┌────────────────────────────────────────────────────────────────────┐
│ meta-llama/Llama-3.3-70B-Instruct │
│ Downloads: 2.4M Likes: 15.2K Task: text-generation │
│ │
│ Suggested GPU: NVIDIA-A40 │
│ Estimated Cost: ~$456/month │
│ │
│ ► syaala templates create-from-hf \ │
│ meta-llama/Llama-3.3-70B-Instruct \ │
│ --name "Llama 3.3 70B" --category llm │
└────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────┐
│ meta-llama/Llama-3.1-8B-Instruct │
│ Downloads: 5.7M Likes: 23.1K Task: text-generation │
│ │
│ Suggested GPU: NVIDIA-RTX-4090 │
│ Estimated Cost: ~$234/month │
│ │
│ ► syaala templates create-from-hf \ │
│ meta-llama/Llama-3.1-8B-Instruct \ │
│ --name "Llama 3.1 8B" --category llm │
└────────────────────────────────────────────────────────────────────┘Via API
curl "https://platform.syaala.com/api/huggingface/search?query=llama&task=text-generation" \
-H "Authorization: Bearer sk_live_..."Response:
{
"models": [
{
"id": "meta-llama/Llama-3.3-70B-Instruct",
"downloads": 2400000,
"likes": 15200,
"task": "text-generation",
"license": "llama3",
"suggestedGpu": "NVIDIA-A40",
"estimatedCost": 456,
"autoConfigured": {
"runtime": "vllm",
"gpuType": "NVIDIA-A40",
"gpuCount": 1,
"dockerImage": "runpod/pytorch:2.1.0-py3.10-cuda12.1.0-devel"
}
}
],
"total": 234
}Creating Templates from HuggingFace
Convert any HuggingFace model into a deployable Syaala template.
Quick Start
Search for Model
syaala models discover "mistral instruct"Create Template
syaala templates create-from-hf \
mistralai/Mistral-7B-Instruct-v0.3 \
--name "Mistral 7B Instruct" \
--category llm \
--tags "instruct,chat,efficient" \
--visibility publicDeploy Template
syaala deployments create --template tpl_mistral_7b_xyzCommand Reference
syaala templates create-from-hf <model-id> [options]Options:
| Flag | Required | Description |
|---|---|---|
--name <name> | Yes | Display name for the template |
--category <category> | Yes | Template category: llm, vision, multimodal, audio, embedding |
--description <desc> | No | Custom description (defaults to HuggingFace description) |
--tags <tags> | No | Comma-separated tags (max 10) |
--visibility <vis> | No | public or private (default: public) |
Example:
syaala templates create-from-hf \
meta-llama/Llama-3.3-70B-Instruct \
--name "Llama 3.3 70B Instruct" \
--category llm \
--tags "chat,instruction-following,large-context" \
--description "Meta's latest 70B parameter model with extended context" \
--visibility publicOutput:
✓ Template created successfully!
Template Details:
ID: tpl_abc123xyz
Name: Llama 3.3 70B Instruct
HuggingFace Model: meta-llama/Llama-3.3-70B-Instruct
Category: llm
Visibility: public
Auto-Configuration:
Runtime: vllm
GPU Type: NVIDIA-A40
GPU Count: 1
Docker Image: runpod/pytorch:2.1.0-py3.10-cuda12.1.0-devel
Estimated Cost: ~$456/month
Deploy this template:
syaala deployments create --template tpl_abc123xyzAuto-Configuration Details
Syaala automatically configures optimal settings for HuggingFace models:
Runtime Selection
- vLLM: For decoder-only LLMs (Llama, Mistral, GPT-style)
- Triton: For encoder models and multi-modal inference
- FastAPI: For custom models or special requirements
GPU Selection
Based on model size and task:
| Model Size | Recommended GPU | Monthly Cost |
|---|---|---|
| < 7B params | NVIDIA-RTX-4090 | ~$234 |
| 7B - 13B | NVIDIA-L40S | ~$312 |
| 13B - 70B | NVIDIA-A40 | ~$456 |
| 70B+ | NVIDIA-A100 (80GB) | ~$912 |
Docker Image
Pre-configured images with:
- PyTorch 2.1+
- CUDA 12.1+
- Runtime-specific dependencies (vLLM, Triton, etc.)
- Model-specific optimizations
Community Template Sharing
Share your templates with the Syaala community.
Publishing Templates
# Create template (defaults to public visibility)
syaala templates create-from-hf \
mistralai/Mixtral-8x7B-Instruct-v0.1 \
--name "Mixtral 8x7B Instruct" \
--category llm \
--visibility publicDiscovering Community Templates
# List all public templates
syaala templates list --visibility public
# Filter by category
syaala templates list --category llm
# Search by tags
syaala templates list --tags chat,instruction-followingTemplate Engagement Tracking
Syaala tracks template usage to surface popular configurations:
- Deployment Count: How many times the template has been deployed
- Success Rate: Percentage of successful deployments
- Average Cost: Real-world cost data from deployments
- User Ratings: Community feedback and ratings
API Integration
Integrate template discovery into your applications.
Get Recommendations
const response = await fetch(
"https://platform.syaala.com/api/templates/recommendations?useCase=text-generation&maxBudget=500",
{
headers: {
Authorization: `Bearer ${apiKey}`,
},
},
);
const { recommendations } = await response.json();Search HuggingFace
const response = await fetch(
"https://platform.syaala.com/api/huggingface/search?query=llama&task=text-generation",
{
headers: {
Authorization: `Bearer ${apiKey}`,
},
},
);
const { models, total } = await response.json();Create Template from HuggingFace
const response = await fetch(
"https://platform.syaala.com/api/templates/save-from-hf",
{
method: "POST",
headers: {
Authorization: `Bearer ${apiKey}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
modelId: "meta-llama/Llama-3.3-70B-Instruct",
name: "Llama 3.3 70B Instruct",
category: "llm",
tags: ["chat", "instruction-following"],
visibility: "public",
}),
},
);
const { template } = await response.json();Best Practices
Model Selection
- Start Small: Begin with 7B models (RTX 4090) before scaling to 70B+ models
- Check Licenses: Verify commercial use permissions for production deployments
- Test Throughput: Use recommended GPU types for optimal performance
- Monitor Costs: Track actual usage vs. estimates for budget optimization
Template Creation
- Meaningful Names: Use descriptive names that indicate model size and purpose
- Accurate Tags: Add relevant tags for discoverability (max 10)
- Public First: Share successful templates publicly to help the community
- Update Descriptions: Customize descriptions with use case specifics
Deployment
- Start with Recommendations: Use AI-recommended configurations first
- Validate Endpoints: Test inference endpoints before production traffic
- Set Budget Limits: Configure spending alerts for cost control
- Monitor Performance: Track latency, throughput, and error rates
Troubleshooting
Model Not Found
Error: HuggingFace model 'invalid/model-id' not foundSolution: Verify model ID exists on HuggingFace Hub:
- Visit
https://huggingface.co/{model-id} - Check for typos in organization/model name
- Ensure model is public or you have access
GPU Unavailable
Error: Requested GPU type 'NVIDIA-H100' is not availableSolution: Use alternative GPU or request access:
- Check available GPUs:
syaala gpu-types list - Use recommended GPU from discovery results
- Contact support for enterprise GPU access
Budget Exceeded
Warning: Estimated cost ($912/month) exceeds budget ($500/month)Solution: Adjust configuration or budget:
- Use smaller model variant (70B → 7B)
- Reduce GPU count or type (A100 → A40)
- Increase budget limit in settings
Related Documentation
- CLI Commands Reference - Complete CLI documentation
- API Reference - REST API endpoints
- Deployment Guide - Creating and managing deployments
- Cost Optimization - Reducing GPU costs
Support
Need help with template discovery?
- Discord Community - Ask questions and share templates
- GitHub Discussions - Feature requests
- Support Email - Enterprise assistance