TraceMyPods β Application Workflow (Dev Focused)
TraceMyPods is AI ChatBot with OllamaAI integration, enabling users to interact with AI models effortlessly. This platform is designed for developers and DevOps engineers, offering a comprehensive solution for building, deploying, and managing microservice architectures
Tech Stack :
Kafka
|Redis
|MongoDB
|Ollama (AI)
|Node.js
|Python
|GO
|VectorDB (qdrant)
|Microservices
|Payments (razorpay)
|Invoice/Reports
|PostmanCollection
|LoadTesting
|OpenTelemetry
|Gemini_AI
|S3
(File Browser) |Image Generation API
π Landing Page
π€ AI Assistant (Chat Box)
Workflow Overview
Workflow Architecture
This document outlines the architecture and workflow of the TraceMyPods application, detailing how various components interact to provide a seamless AI chatbot experience.
- App level communication
- kafka uses
- why vectors and otel
- why razorpay payment
- why mongo db and redis
- why ollama and free model -->
Routes and APIs and post requests
/api/ask
: Handles AI queries and routes them to the appropriate model./api/redis-data
: Provides Redis analytics and active user data./api/db-data
: Displays MongoDB analytics and user data./api/s3-data
: Fetches S3 analytics and file browser data./api/s3-page
: Displays S3 file browser with pagination./api/s3-analytics
: Provides S3 analytics and file statistics./api/s3-presigned
: Generates presigned URLs for S3 files./api/s3-folders
: Lists S3 folders for file organization./api/s3-search
: Implements advanced search capabilities for S3 files./deliver
: Handles invoice delivery via email./order
: Manages order processing and payment verification./send-otp
: Sends OTP for email verification during order processing./verify-otp
: Verifies OTP for email confirmation./create-order
: Creates an order after payment verification./verify-payment
: Verifies payment status with Razorpay./api/token
: Manages token generation and validation./api/validate-premium-token
: Validates premium tokens for model access./api/services
: Lists services of jaeger and their traces./api/heatmap
: Provides a heatmap of user interactions and application performance from jaeger./api/traces
: Displays trace information for requests and their paths./api/chat
: Manages chat sessions and user interactions in oteldashapi./api/optimize
: Provides optimization suggestions for application performance./otel
(rewritten to/
) : Integrates OpenTelemetry for observability./
(default/fallback route) : Serves the main application interface, including the chat box and admin dashboard.- Public API used :
- Cloudflare workers AI for Image Generation
- Gemini AI assistant in Jaeger Traces Explanation
- Internal Routes:
/api/embedding
: Handles embedding generation for user queries./api/vector
: Manages vector storage and retrieval using Qdrant.
π§© Components Overview (ARCH)
1. Frontend Pod: ai-frontend
Simple browser UI as Landing Page build on pure html
- Tech: HTML, CSS, JavaScript
- /chat.html : Frontend chat box with prompt & token input
- /image.html : Image Generation UI with text input
- /admin.html : Admin dashboard for analytics and data management
- s3 file browser
- active users
- redis and mongo db analytics
- Bucket Invoices
- S3 Search
- Total Orders and Revenue
- /otel : OpenTelemetry dashboard for observability integrated with jaeger and Gemini AI
- /pay.html : Payment page to buy premium model access
2. adminapi microservice
- Tech: Node.js, Express, MongoDB, Redis, S3
- Handles admin operations, analytics, and data management.
- Provides endpoints for viewing Redis, MongoDB, and S3 analytics.
- Show active users, total orders, and revenue.
- Integrates with S3 for invoice preview with presigned URLs.
- Backed by Advanced search capabilities for S3 files.
3. askapi microservice
- Tech: Node.js, Express, Redis, MongoDB, Ollama
- Handles AI queries and token validation from mongo db and redis.
- Routes requests to the appropriate AI model.
- Integrates with vectorapi to create embedding and store in vector DB and handle vector search for enhanced query handling.
- Uses OpenTelemetry for tracing and send otel traces to Jaeger and then later visualized in otel dashboard.
4. deliverapi microservice
- Tech: Node.js, Express, S3, Kafka
- Manages invoice generation and and Email delivery to users.
- Integrates with S3 for invoice storage
- Uses Kafka for event-driven processing of order events.
5. tokenapi microservice
- Tech: Node.js, Express, MongoDB, Redis
- Manages token generation and validation.
- Issues free tokens for basic model access and store in Redis with 1-hour TTL.
- Generate and Validates premium tokens against MongoDB for paid model and cache in Redis.
- Integrates with orderapi for premium token issuance after payment.
6. orderapi microservice
- Tech: Node.js, Express, MongoDB, Redis, Kafka
- Handles order processing, Email verification via OTP and payment verification.
- Integrates
paymentapi
which is backend by Razorpay for payment processing. - Integrate with
Tokenapi
to Generates premium tokens upon successful payment and stores them in Redis and MongoDB. - Created orders and tokens send to
kafka topic
for further processing e.g.deliverapi
7. paymentapi microservice
- Tech: Node.js, Express, Razorpay
- Manages payment operations using
Razorpay
(test-keys). - Handles OTP verification and payment completion.
- Integrates with
orderapi
to create premium tokens after payment.
8. oteldash microservice
- Tech: React, Gemini AI
- Provides observability dashboards for monitoring application traces and performance.
- Integrates with Jaeger for distributed tracing visualization.
- Displays metrics and traces from various microservices.
- Dashboard AI Feature include :
- Trace visualization with detailed spans and logs.
- Integration with Gemini AI for enhanced trace explanations.
- Get Optimized Recommendations based on trace data.
- Trace Explanation using Gemini AI for better understanding of complex traces.
9. otelapi microservice
- Tech: Go, OpenTelemetry, Jaeger
- Backend for OpenTelemetry metrics and traces.
- Get traces from jaeger and visualized in oteldash.
- Provides APIs for querying and visualizing metrics.
- handle the AI explanation of traces using Gemini AI.
10. vectorapi microservice
- Tech: Python, Qdrant, Embeddings
- Manages vector embeddings and similarity search.
- Integrates with Qdrant for efficient vector storage and retrieval.
- Integrate with
embeddingapi
to create embeddings for user queries and store them in Qdrant. - Integrates with
askapi
to check vector cache before querying AI models.
11. embeddingapi microservice
- Tech: Python, OpenAI Embeddings, Qdrant
- Generates embeddings for user queries using OpenAI's embedding models.
- Handles embedding generation for text queries.
- Integrates with
vectorapi
for storing and querying embeddings.
12. Other Services
ollamapods
- Hosts various AI models using Ollama.
- Provides endpoints for querying AI models.
- Supports Custom Model Hosting and management.
REDIS
- Caching layer for token storage and user sessions
ttl 1 hour
- Used for quick lookups and reducing database load
MongoDB
- Primary database for user data, orders details and premium tokens
- Stores persistent data with high availability
Qdrant (VectorDB)
- Specialized database for storing and querying vector embeddings
- Supports efficient similarity search and retrieval
Kafka (Event Streaming)
- Used to handle order and delivery events
- Ensures decoupled communication between services
S3 (AWS)
- Object storage for invoices and other files
- Provides scalable storage with presigned URLs for secure access
Cloudflare AI Workers
- Provides AI capabilities for text/image generation
- Integrates with the application for enhanced AI features
13. Load Testing
- Load testing is performed using
k6
to ensure the application can handle high traffic. - Simulates user interactions and measures performance metrics.
AI Models Overview
Top Models
Model | Size (Quantized) | RAM (Min) | GPU (Optional) | Notes |
---|---|---|---|---|
TinyLlama | ~1.1 GB | 4 GB | None / 2 GB+ | Lightweight |
Mistral-7B | ~4.2 GB | 8β16 GB | 8 GB+ VRAM | Powerful general-purpose |
CodeLlama | 4.5β10 GB | 16β24 GB | 8β16 GB+ VRAM | Code-optimized |
LLaMA 2 | 4.5β40 GB | 16β80 GB | 8β64 GB+ VRAM | Versatile but resource-heavy |
Phi-2 | ~1.7 GB | 6β8 GB | 4 GB+ VRAM | Efficient and compact |
Mini Models
Model Name | Approx. Size | RAM Required | Description |
---|---|---|---|
TinyLlama (1.1B) | ~1.1 GB | 2β3 GB | Extremely lightweight; suitable for simple QA/chat |
Phi-1.5 / Phi-2 | ~1.5β1.7 GB | 3β4 GB | Compact model from Microsoft optimized for reasoning |
Gemma-2B (Google) | ~2.1 GB (quantized) | ~4 GB | Lightweight open-source model focused on chat |
π Author
Ahmad Raza
Sr. DevOps Engineer | Cloud Infra Specialist
π ahmadraza.in (opens in a new tab)
π linkedin.com/in/ahmad-raza-devops (opens in a new tab)
For more, visit ahmadraza.in (opens in a new tab)
Detailed commands, manifests, and guides are available on my blog.