GuidesTemplate Discovery & HuggingFace Integration

Template Discovery & HuggingFace Integration

Syaala Platform provides seamless integration with HuggingFace Hub, giving you access to 500,000+ AI models with intelligent recommendations and auto-configuration.

Phase 18 Feature: Template-based Inference-as-a-Service (IaaS) architecture with personalized recommendations based on your use case and budget.


Overview

The Template Discovery system provides:

  • Personalized Recommendations: AI-powered suggestions based on your use case and budget
  • HuggingFace Integration: Search and deploy from 500K+ models on HuggingFace Hub
  • Auto-Configuration: Automatic runtime, GPU, and Docker setup
  • Cost Estimation: Real-time pricing for GPU types and deployment configurations
  • Community Sharing: Publish and discover templates from the Syaala community

Personalized Recommendations

Get AI-powered template recommendations tailored to your specific needs.

Via Dashboard

  1. Navigate to Dashboard → Templates
  2. Complete the quick survey (optional):
    • Use Case: Text generation, image processing, etc.
    • Monthly Budget: Low ($100), Medium ($500), High ($5000+)
  3. View personalized recommendations with:
    • Auto-configured GPU type
    • Estimated monthly cost
    • One-click deployment

Via CLI

# Get recommendations based on use case
syaala models recommend --use-case text-generation --budget medium
 
# Interactive mode with prompts
syaala models recommend --interactive

Example Output:

✓ Fetching personalized recommendations...

Recommended Templates for text-generation (Medium Budget: $500/month)

┌────────────────────────────────────────────────────────────────────┐
│ 1. Llama 3.3 70B Instruct                                          │
│    Runtime: vllm                                                   │
│    GPU: NVIDIA-A40 (1x)                                            │
│    Cost: ~$456/month (95% of budget)                               │
│    Tags: llm, instruction-following, chat                          │
│                                                                     │
│    ► syaala deployments create --template tpl_llama33_70b          │
└────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────┐
│ 2. Mistral 7B Instruct v0.3                                        │
│    Runtime: vllm                                                   │
│    GPU: NVIDIA-RTX-4090 (1x)                                       │
│    Cost: ~$234/month (47% of budget)                               │
│    Tags: llm, efficient, general-purpose                           │
│                                                                     │
│    ► syaala deployments create --template tpl_mistral_7b           │
└────────────────────────────────────────────────────────────────────┘

HuggingFace Model Discovery

Search and explore models from HuggingFace Hub directly through Syaala.

Search Models

Via CLI

# Search by keyword
syaala models discover "llama"
 
# Filter by task
syaala models discover "image classification" --task image-classification
 
# Sort by downloads or likes
syaala models discover "stable diffusion" --sort downloads

Example Output:

✓ Found 234 models matching "llama"

┌────────────────────────────────────────────────────────────────────┐
│ meta-llama/Llama-3.3-70B-Instruct                                  │
│ Downloads: 2.4M  Likes: 15.2K  Task: text-generation               │
│                                                                     │
│ Suggested GPU: NVIDIA-A40                                          │
│ Estimated Cost: ~$456/month                                        │
│                                                                     │
│ ► syaala templates create-from-hf \                                │
│     meta-llama/Llama-3.3-70B-Instruct \                            │
│     --name "Llama 3.3 70B" --category llm                          │
└────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────┐
│ meta-llama/Llama-3.1-8B-Instruct                                   │
│ Downloads: 5.7M  Likes: 23.1K  Task: text-generation               │
│                                                                     │
│ Suggested GPU: NVIDIA-RTX-4090                                     │
│ Estimated Cost: ~$234/month                                        │
│                                                                     │
│ ► syaala templates create-from-hf \                                │
│     meta-llama/Llama-3.1-8B-Instruct \                             │
│     --name "Llama 3.1 8B" --category llm                           │
└────────────────────────────────────────────────────────────────────┘

Via API

curl "https://platform.syaala.com/api/huggingface/search?query=llama&task=text-generation" \
  -H "Authorization: Bearer sk_live_..."

Response:

{
  "models": [
    {
      "id": "meta-llama/Llama-3.3-70B-Instruct",
      "downloads": 2400000,
      "likes": 15200,
      "task": "text-generation",
      "license": "llama3",
      "suggestedGpu": "NVIDIA-A40",
      "estimatedCost": 456,
      "autoConfigured": {
        "runtime": "vllm",
        "gpuType": "NVIDIA-A40",
        "gpuCount": 1,
        "dockerImage": "runpod/pytorch:2.1.0-py3.10-cuda12.1.0-devel"
      }
    }
  ],
  "total": 234
}

Creating Templates from HuggingFace

Convert any HuggingFace model into a deployable Syaala template.

Quick Start

Search for Model

syaala models discover "mistral instruct"

Create Template

syaala templates create-from-hf \
  mistralai/Mistral-7B-Instruct-v0.3 \
  --name "Mistral 7B Instruct" \
  --category llm \
  --tags "instruct,chat,efficient" \
  --visibility public

Deploy Template

syaala deployments create --template tpl_mistral_7b_xyz

Command Reference

syaala templates create-from-hf <model-id> [options]

Options:

FlagRequiredDescription
--name <name>YesDisplay name for the template
--category <category>YesTemplate category: llm, vision, multimodal, audio, embedding
--description <desc>NoCustom description (defaults to HuggingFace description)
--tags <tags>NoComma-separated tags (max 10)
--visibility <vis>Nopublic or private (default: public)

Example:

syaala templates create-from-hf \
  meta-llama/Llama-3.3-70B-Instruct \
  --name "Llama 3.3 70B Instruct" \
  --category llm \
  --tags "chat,instruction-following,large-context" \
  --description "Meta's latest 70B parameter model with extended context" \
  --visibility public

Output:

✓ Template created successfully!

Template Details:
  ID: tpl_abc123xyz
  Name: Llama 3.3 70B Instruct
  HuggingFace Model: meta-llama/Llama-3.3-70B-Instruct
  Category: llm
  Visibility: public

Auto-Configuration:
  Runtime: vllm
  GPU Type: NVIDIA-A40
  GPU Count: 1
  Docker Image: runpod/pytorch:2.1.0-py3.10-cuda12.1.0-devel

Estimated Cost: ~$456/month

Deploy this template:
  syaala deployments create --template tpl_abc123xyz

Auto-Configuration Details

Syaala automatically configures optimal settings for HuggingFace models:

Runtime Selection

  • vLLM: For decoder-only LLMs (Llama, Mistral, GPT-style)
  • Triton: For encoder models and multi-modal inference
  • FastAPI: For custom models or special requirements

GPU Selection

Based on model size and task:

Model SizeRecommended GPUMonthly Cost
< 7B paramsNVIDIA-RTX-4090~$234
7B - 13BNVIDIA-L40S~$312
13B - 70BNVIDIA-A40~$456
70B+NVIDIA-A100 (80GB)~$912

Docker Image

Pre-configured images with:

  • PyTorch 2.1+
  • CUDA 12.1+
  • Runtime-specific dependencies (vLLM, Triton, etc.)
  • Model-specific optimizations

Community Template Sharing

Share your templates with the Syaala community.

Publishing Templates

# Create template (defaults to public visibility)
syaala templates create-from-hf \
  mistralai/Mixtral-8x7B-Instruct-v0.1 \
  --name "Mixtral 8x7B Instruct" \
  --category llm \
  --visibility public

Discovering Community Templates

# List all public templates
syaala templates list --visibility public
 
# Filter by category
syaala templates list --category llm
 
# Search by tags
syaala templates list --tags chat,instruction-following

Template Engagement Tracking

Syaala tracks template usage to surface popular configurations:

  • Deployment Count: How many times the template has been deployed
  • Success Rate: Percentage of successful deployments
  • Average Cost: Real-world cost data from deployments
  • User Ratings: Community feedback and ratings

API Integration

Integrate template discovery into your applications.

Get Recommendations

const response = await fetch(
  "https://platform.syaala.com/api/templates/recommendations?useCase=text-generation&maxBudget=500",
  {
    headers: {
      Authorization: `Bearer ${apiKey}`,
    },
  },
);
 
const { recommendations } = await response.json();

Search HuggingFace

const response = await fetch(
  "https://platform.syaala.com/api/huggingface/search?query=llama&task=text-generation",
  {
    headers: {
      Authorization: `Bearer ${apiKey}`,
    },
  },
);
 
const { models, total } = await response.json();

Create Template from HuggingFace

const response = await fetch(
  "https://platform.syaala.com/api/templates/save-from-hf",
  {
    method: "POST",
    headers: {
      Authorization: `Bearer ${apiKey}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      modelId: "meta-llama/Llama-3.3-70B-Instruct",
      name: "Llama 3.3 70B Instruct",
      category: "llm",
      tags: ["chat", "instruction-following"],
      visibility: "public",
    }),
  },
);
 
const { template } = await response.json();

Best Practices

Model Selection

  1. Start Small: Begin with 7B models (RTX 4090) before scaling to 70B+ models
  2. Check Licenses: Verify commercial use permissions for production deployments
  3. Test Throughput: Use recommended GPU types for optimal performance
  4. Monitor Costs: Track actual usage vs. estimates for budget optimization

Template Creation

  1. Meaningful Names: Use descriptive names that indicate model size and purpose
  2. Accurate Tags: Add relevant tags for discoverability (max 10)
  3. Public First: Share successful templates publicly to help the community
  4. Update Descriptions: Customize descriptions with use case specifics

Deployment

  1. Start with Recommendations: Use AI-recommended configurations first
  2. Validate Endpoints: Test inference endpoints before production traffic
  3. Set Budget Limits: Configure spending alerts for cost control
  4. Monitor Performance: Track latency, throughput, and error rates

Troubleshooting

Model Not Found

Error: HuggingFace model 'invalid/model-id' not found

Solution: Verify model ID exists on HuggingFace Hub:

  • Visit https://huggingface.co/{model-id}
  • Check for typos in organization/model name
  • Ensure model is public or you have access

GPU Unavailable

Error: Requested GPU type 'NVIDIA-H100' is not available

Solution: Use alternative GPU or request access:

  • Check available GPUs: syaala gpu-types list
  • Use recommended GPU from discovery results
  • Contact support for enterprise GPU access

Budget Exceeded

Warning: Estimated cost ($912/month) exceeds budget ($500/month)

Solution: Adjust configuration or budget:

  • Use smaller model variant (70B → 7B)
  • Reduce GPU count or type (A100 → A40)
  • Increase budget limit in settings


Support

Need help with template discovery?