Quickstart: Deploy Your First Model in 10 Minutes
Get started with Syaala Platform by deploying a production-ready AI model on GPU infrastructure.
Prerequisites:
- Node.js 20+ installed
- Active Syaala account (Sign up)
- API key from Dashboard Settings
Install the Syaala CLI
npm install -g @syaala/cliVerify installation:
syaala --versionExpected output: @syaala/cli v0.1.0
Authenticate with Your API Key
Retrieve your API key from the Dashboard Settings page.
syaala auth login --api-key YOUR_API_KEY✅ Authentication successful!
Your credentials are securely stored in ~/.syaala/config.json
Alternative: Interactive Login
syaala auth loginThe CLI will prompt you for:
- Email address
- Password
- Organization selection (if you belong to multiple)
Deploy a Model
Deploy Meta’s Llama 3.1 8B Instruct model on NVIDIA RTX 4090 GPU:
syaala deployments create \
--name "my-first-deployment" \
--model "meta-llama/Meta-Llama-3.1-8B-Instruct" \
--runtime vllm \
--gpu "NVIDIA-RTX-4090" \
--min-replicas 1 \
--max-replicas 3Expected output:
✓ Validating model access on HuggingFace...
✓ Calculating GPU memory requirements...
✓ Provisioning NVIDIA RTX 4090 infrastructure...
✓ Deploying vLLM runtime container...
✓ Loading model weights (8B parameters)...
Deployment created successfully!
┌─────────────────────────────────────────────────┐
│ Deployment Details │
├─────────────────────────────────────────────────┤
│ ID: dep_a1b2c3d4e5f6 │
│ Name: my-first-deployment │
│ Status: deploying → running (ETA: 2-3 min) │
│ Endpoint: https://dep-a1b2c3d4.syaala.run │
│ Runtime: vLLM 0.5.4 │
│ GPU: NVIDIA RTX 4090 (24GB) │
│ Replicas: 1/3 (min/max) │
└─────────────────────────────────────────────────┘
Run 'syaala deployments get dep_a1b2c3d4e5f6' to check statusCheck Deployment Status
Monitor your deployment:
syaala deployments get dep_a1b2c3d4e5f6Wait until status shows running (usually 2-3 minutes).
Test Your Endpoint
Once running, send your first inference request:
curl -X POST https://dep-a1b2c3d4.syaala.run/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"messages": [
{"role": "user", "content": "Explain what a GPU is in one sentence."}
],
"max_tokens": 100,
"temperature": 0.7
}'Expected response:
{
"id": "chatcmpl-9a7b8c6d5e4f3",
"object": "chat.completion",
"created": 1728000000,
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "A GPU (Graphics Processing Unit) is a specialized electronic circuit designed to rapidly process and render graphics, but is now widely used for parallel computing tasks like AI model training and inference due to its ability to handle thousands of simultaneous calculations."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 48,
"total_tokens": 63
}
}Monitor Metrics
View real-time metrics for your deployment:
syaala deployments metrics dep_a1b2c3d4e5f6Output shows:
- GPU utilization (%)
- Memory usage (GB)
- Requests per second
- Average latency (ms)
- Active replicas
What You Just Deployed
You’ve successfully:
- ✅ Installed and authenticated the Syaala CLI
- ✅ Deployed Llama 3.1 8B on NVIDIA RTX 4090 GPU
- ✅ Configured auto-scaling (1-3 replicas)
- ✅ Received a production OpenAI-compatible API endpoint
- ✅ Sent your first inference request
- ✅ Monitored GPU metrics in real-time
Next Steps
🔐 Authentication
Learn about API keys, JWT tokens, and security best practices
🚀 First Deployment
Deep dive into deployment configuration, GPU selection, and optimization
📦 TypeScript SDK
Integrate Syaala into your Node.js/TypeScript applications
📚 API Reference
Complete REST API documentation for programmatic access
Troubleshooting
Authentication Fails
Error: Invalid API key
Solution: Ensure you’re using a valid API key from Dashboard Settings. API keys start with sk_live_ for production or sk_test_ for development.
Deployment Stuck in “Provisioning”
Error: Deployment status remains provisioning for >10 minutes
Solution:
- Check System Status for GPU availability
- Try a different GPU type:
--gpu "NVIDIA-A100-40GB" - Contact Support if issue persists
Rate Limit Exceeded
Error: 429 Too Many Requests
Solution: You’ve exceeded the rate limit for your plan tier. Upgrade at Billing or wait for the rate limit window to reset (60 seconds).
Cost Estimation
This quickstart deployment costs approximately:
- GPU Time: $0.50/hour for NVIDIA RTX 4090
- Idle Time: $0.05/hour when scaled to 0 replicas
- API Requests: Free for first 1M tokens/month
Estimated cost for testing: ~$0.50 for 1 hour of experimentation.
💡 Pro Tip: Use syaala deployments delete dep_a1b2c3d4e5f6 when you’re done testing to avoid idle charges.