Getting StartedQuickstart

Quickstart: Deploy Your First Model in 10 Minutes

Get started with Syaala Platform by deploying a production-ready AI model on GPU infrastructure.

Prerequisites:

Install the Syaala CLI

npm install -g @syaala/cli

Verify installation:

syaala --version

Expected output: @syaala/cli v0.1.0

Authenticate with Your API Key

Retrieve your API key from the Dashboard Settings page.

syaala auth login --api-key YOUR_API_KEY

Authentication successful!

Your credentials are securely stored in ~/.syaala/config.json

Alternative: Interactive Login

syaala auth login

The CLI will prompt you for:

  • Email address
  • Password
  • Organization selection (if you belong to multiple)

Deploy a Model

Deploy Meta’s Llama 3.1 8B Instruct model on NVIDIA RTX 4090 GPU:

syaala deployments create \
  --name "my-first-deployment" \
  --model "meta-llama/Meta-Llama-3.1-8B-Instruct" \
  --runtime vllm \
  --gpu "NVIDIA-RTX-4090" \
  --min-replicas 1 \
  --max-replicas 3

Expected output:

✓ Validating model access on HuggingFace...
✓ Calculating GPU memory requirements...
✓ Provisioning NVIDIA RTX 4090 infrastructure...
✓ Deploying vLLM runtime container...
✓ Loading model weights (8B parameters)...

Deployment created successfully!

┌─────────────────────────────────────────────────┐
│ Deployment Details                              │
├─────────────────────────────────────────────────┤
│ ID:        dep_a1b2c3d4e5f6                     │
│ Name:      my-first-deployment                  │
│ Status:    deploying → running (ETA: 2-3 min)  │
│ Endpoint:  https://dep-a1b2c3d4.syaala.run     │
│ Runtime:   vLLM 0.5.4                           │
│ GPU:       NVIDIA RTX 4090 (24GB)               │
│ Replicas:  1/3 (min/max)                        │
└─────────────────────────────────────────────────┘

Run 'syaala deployments get dep_a1b2c3d4e5f6' to check status

Check Deployment Status

Monitor your deployment:

syaala deployments get dep_a1b2c3d4e5f6

Wait until status shows running (usually 2-3 minutes).

Test Your Endpoint

Once running, send your first inference request:

curl -X POST https://dep-a1b2c3d4.syaala.run/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
    "messages": [
      {"role": "user", "content": "Explain what a GPU is in one sentence."}
    ],
    "max_tokens": 100,
    "temperature": 0.7
  }'

Expected response:

{
  "id": "chatcmpl-9a7b8c6d5e4f3",
  "object": "chat.completion",
  "created": 1728000000,
  "model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "A GPU (Graphics Processing Unit) is a specialized electronic circuit designed to rapidly process and render graphics, but is now widely used for parallel computing tasks like AI model training and inference due to its ability to handle thousands of simultaneous calculations."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 48,
    "total_tokens": 63
  }
}

Monitor Metrics

View real-time metrics for your deployment:

syaala deployments metrics dep_a1b2c3d4e5f6

Output shows:

  • GPU utilization (%)
  • Memory usage (GB)
  • Requests per second
  • Average latency (ms)
  • Active replicas

What You Just Deployed

You’ve successfully:

  • ✅ Installed and authenticated the Syaala CLI
  • ✅ Deployed Llama 3.1 8B on NVIDIA RTX 4090 GPU
  • ✅ Configured auto-scaling (1-3 replicas)
  • ✅ Received a production OpenAI-compatible API endpoint
  • ✅ Sent your first inference request
  • ✅ Monitored GPU metrics in real-time

Next Steps

Troubleshooting

Authentication Fails

Error: Invalid API key

Solution: Ensure you’re using a valid API key from Dashboard Settings. API keys start with sk_live_ for production or sk_test_ for development.

Deployment Stuck in “Provisioning”

Error: Deployment status remains provisioning for >10 minutes

Solution:

  1. Check System Status for GPU availability
  2. Try a different GPU type: --gpu "NVIDIA-A100-40GB"
  3. Contact Support if issue persists

Rate Limit Exceeded

Error: 429 Too Many Requests

Solution: You’ve exceeded the rate limit for your plan tier. Upgrade at Billing or wait for the rate limit window to reset (60 seconds).

Cost Estimation

This quickstart deployment costs approximately:

  • GPU Time: $0.50/hour for NVIDIA RTX 4090
  • Idle Time: $0.05/hour when scaled to 0 replicas
  • API Requests: Free for first 1M tokens/month

Estimated cost for testing: ~$0.50 for 1 hour of experimentation.

💡 Pro Tip: Use syaala deployments delete dep_a1b2c3d4e5f6 when you’re done testing to avoid idle charges.