π§ TraceMyPods β DevOps First Microservice AI
Platform
TraceMyPods is a Kubernetes-native DevOps Focused (chat WITH OllamaAI) token-gated service platform that leverages a multiple microservices architecture with secure Istio service mesh, GPU acceleration, and a robust monitoring stack with full Terraform and Helm packaged to deploy on AWS EKS. It provides a seamless user experience for AI interactions, token management, and advanced analytics.
Visit Site : https://tracemypods.ahmadraza.in
Demo Video :
π οΈ InfraStack
Istio Service Mesh
| Jaeger
| AWS EKS
| GPU Nodes
| Grafana
| Loki
| Prometheus
HELM
| Terraform
| Trevy
| Github Actions
| Falco
(Security) | Kyverno
(PSPolicy) | Kube-Hunter
(CIS Benchmark) |
βΈοΈ Kubernetes Stack
IstioGateway/VirtualService
| Ingress
| HPA
| VPA
| Pod Disruption Budget
| Network Policy
| Resource Quotas
| SecurityContext
| PV, Storage Class, PVC, EBS
| PodSecurityPolicies
| RBAC
|
ConfigMaps / Secrets
| Taints & Tolerations
| Readiness & Liveness Probes
Deployments / CronJobs
| Affinity / Anti-Affinity
| Kiali
| Kube-Hunter
| Minikube
(local)
π Application Stack
Kafka
| Redis
| MongoDB
| Ollama (AI)
| Node.js
| Python
| GO
|VectorDB (qdrant)
| Microservices
| Payments (razorpay)
| Invoice/Reports
| EMAIL
| PostmanCollection
| LoadTesting
| OpenTelemetry
| Gemini_AI
| S3
(File Browser) | Image Generation API
πΈ UI Previews
π Landing Page
- Landing page where users can explore the platform features, chat with AI models, and access various services.
π€ Chat With Premium Models (Chat Box)
- Chat interface for users to interact with premium AI models, powered by Ollama.
π€ Free AI Assistant (Chat Box)
- Chat interface for users to interact with free AI models, providing a seamless user experience.
πΌοΈ Image Generator API
- Image generation API that allows users to create images based on text prompts, integrated with Cloudflare AI Workers.
π Platform Features
- Features overview showcasing the capabilities of the TraceMyPods platform, including AI interactions, token management, and advanced analytics.
π³ Purchase API (Model Selection)
- Model purchase interface where users can select models and proceed with payment.
πΈ Payment Success
- Payment success page confirming the successful transaction and model activation.
πΈ RazorPay Checkout (Card Details)
- Card details page for entering payment information securely.
πΈ RazorPay Checkout (Confirm Payment)
- Payment confirmation page for reviewing and confirming payment details.
πΈ RazorPay Checkout (Payment Successful)
- Payment successful page confirming the completion of the transaction.
π Purchase Invoice (S3-Bucket)
- Preview of the invoice generated after a successful purchase. (sent to user email and stored in S3 bucket)
π Purchase Successful Email
- Preview of the email sent to users upon successful purchase, containing invoice/token details and confirmation.
Admin/Analytics Dashboard
Admin dashboard to view purchase history, manage free and paid tokens and earnings and export business purchase data.
- Here we can view and search invoices stored in S3 Bucket
- Here we can view order history, earnings and export business purchase data including the active tokens and free tokens
π OtelAPI and Otel Dashboard
- OtelAPI for code-level tracing and AI integration debugging, providing insights into application performance.
π Kiali Observability
- To view the service mesh topology, traffic flow, and health status of microservices.
π Prometheus Metrics in Grafana
- To visualize metrics collected from the Kubernetes cluster and applications.
π Logs from Loki
- Viewing and searching logs from various microservices using Loki.
π Jaeger Tracing (OTEL + GenAI)
- Distributed tracing for microservices using Jaeger.
π Trivy Vulnerability Scanning
- Scanning container images and Kubernetes clusters for vulnerabilities using Trivy.
π Falco Runtime Security Monitoring
- Real-time monitoring and detection of security threats in running containers using Falco.
π Kube-Hunter Vulnerability Scanning
- Active reconnaissance tool for Kubernetes clusters to identify potential security issues.
π Kyverno PodSecurityPolicy Enforcement
- Enforcing security policies for Kubernetes pods using Kyverno.
π ArgoCD
- GitOps to deploy tracemypods and kafka in Kubernetes.
π CloudWatch GPU Monitoring
- Monitoring GPU utilization and performance metrics using CloudWatch.
π HashiCorp Vault
- Secrets management and data protection for sensitive information.
πΈ Infra Architecture
βΈοΈ EKS APP Architecture
This architecture diagram illustrates the deployment of the TraceMyPods application on AWS EKS, showcasing the integration of Istio service mesh, GPU nodes, and various microservices communication and workflows. It highlights the use of Istio for secure service communication, Jaeger for distributed tracing, and the overall infrastructure setup including monitoring and security components.
Key Components:
- EKS Cluster: The Kubernetes cluster where the TraceMyPods application is deployed. This cluster is configured with GPU nodes for AI workloads.
- Istio Gateway: Manages ingress traffic, routing to different APIs and load balancing. IstioGateway.yaml
- Istio Service Mesh: Enables secure, observable communication between microservices, with features like mutual TLS, traffic management, and telemetry. Istio
- Kiali: Visualizes and manages the service mesh.
- Jaeger: Jaeger with Otelapi used to display distributed traces for microservices, helping to identify performance bottlenecks and trace requests across services with
AI
integration. Jaeger (opens in a new tab) - Kubecost: Integrates for real-time cost monitoring and optimization. (inprogress)
- ALB Ingress Controller: Handles external traffic routing via AWS Application Load Balancer. Ingress.tf
- HPA & VPA: Horizontal and Vertical Pod Autoscalers for dynamic scaling based on resource usage.
- PodDisruptionBudget: Maintains application availability during node updates or disruptions.
- Resource Quotas: Enforces resource limits per namespace.
- SecurityContext: Applies security settings at the pod and container level.
- ConfigMaps & Secrets: Manages configuration and sensitive data.
- Taints & Tolerations: Controls pod scheduling on specific nodes.
- Readiness & Liveness Probes: Ensures pod health and availability.
- NetworkPolicy: Restricts and controls pod-to-pod communication.
- Affinity & Anti-Affinity: Optimizes pod placement for reliability and performance.
- Persistent Storage: Uses
StorageClass
to dynamically provision EBS volumes for PVCs. - IRSA: Implements IAM Roles for Service Accounts for secure AWS service access (e.g., S3).
- GPU Nodes: Supports AI workloads with NVIDIA T4 GPUs (
g4dn.xlarge
).tolerations: - key: "gpu" operator: "Equal" value: "true" effect: "NoSchedule"
AWS Architecture
This architecture diagram illustrates the AWS infrastructure setup for the TraceMyPods application, including EKS, VPC with Public and Private Subnets, NAT Gateway, S3, SES, Nodes and other AWS services. It highlights the use of Terraform for Infrastructure as Code (IAC) to provision and manage resources.
IAC Stack | TF
Terraform to provision AWS resources for the TraceMyPods application, including EKS, VPC, IAM roles, S3 buckets, and more. This support reusable modules to provision infrastructure components.
Terraform modules include:
- EKS Cluster with GPU nodes
- VPC with Public and Private Subnets
- IAM roles for EKS and IRSA
- S3 bucket for file storage
- NAT Gateway for internet access in private subnets
- Cloudflare for DNS and CDN
- Security Groups for network access control
- ALB Ingress Controller for traffic routing
- Global Accelerator for improved application availability and performance
- Helm Installation of (Istio, Kiali, Prometheus, Loki, Grafana, falco, Kafka)
Monitoring Stack
This Stack includes Prometheus for metrics collection, Loki for log aggregation, and Grafana for visualization. It provides comprehensive monitoring and alerting capabilities for the TraceMyPods application. Monitoring
Security Stack
This Stack includes security measures such as Kube-Hunter for vulnerability scanning, Falco for runtime security monitoring, Kyverno for PodSecurityPolicy enforcement and Trivy for container image vulnerability scanning and Code Scanning. It ensures the TraceMyPods application is secure and compliant with best practices.
Tools used:
- Kube-Hunter for vulnerability scanning | docs
- Falco for runtime security monitoring | docs
- Kyverno for PodSecurityPolicy enforcement | docs
- GitHub Actions for CI to automate Trivy for container image vulnerability scanning and code scanning | docs | CI | Code Scanning
- Kube-Bench for CIS Benchmark compliance | docs
- Vault and Kubernetes Secrets for sensitive data management | docs
- Network Policies for pod communication control | NP
Deployment Strategy
This Stack includes ArgoCD for GitOps, Helm for package management, and Terraform for Infrastructure as Code (IAC). It supports canary and blue-green deployments and provides a robust deployment strategy for the TraceMyPods application.
Tools used:
- ArgoCD for GitOps | docs
- Helm for package management | HELM
- IAC with Terraform | TF
- Canary, Rolling, and Blue-Green deployments in YAML files
- PriorityClass for pod priority scheduling | PriorityClass
Devops Stack
This Stack includes various tools and technologies used in the TraceMyPods application, such as Istio for service mesh, Jaeger for distributed tracing, Prometheus for metrics collection, Loki for log aggregation, and Grafana for visualization. It provides a comprehensive DevOps stack for the TraceMyPods application.
Istio Service Mesh & Istio Gateway
- Service mesh for secure, observable microservices | docs
- Istio Gateway for traffic routing | IstioConfig.yaml
- Kiali for service mesh observability
Jaeger
- Distributed tracing for microservices | Jaeger.yaml
- Otelapi for code level tracing and AI integration debugging| otelapi | oteldashboard (opens in a new tab)
Kube-Hunter
- Cluster Security scanning for vulnerabilities | text
falco
- Runtime security monitoring | text
kyerno
Trivy
- Container image vulnerability scanning. text
- Cluster security checks text
- Github Actions for CI/CD for trivy code scanning ( text )
Github Actions
- CI pipeline to build docker image (x86 and ARM) and push to hub.docker.com | CI
Helm
- Kubernetes package management for deployments | HelM
PostMan Collection
- Postman collection for API testing and interaction with the TraceMyPods application. LoadTest/postman_collection.js
Load Testing Grafana K6
- Load testing the application using Grafana K6 to ensure performance and scalability. LoadTest/k6_API-test.js
Inprogress Features
- KubeCost for cost monitoring
- Spot Intance for cost optimization
- AWS API Gateway for API management
- EKS RBAC for fine-grained access control
- Litmus chaos engineering
- EFK stack (Elasticsearch, Fluentbit, Kibana) or AWS OpenSearch (Remove loki logs and use EFK stack)
π Learning Resources
π Author
Ahmad Raza
Sr. DevOps Engineer | Cloud Infra Specialist
π ahmadraza.in (opens in a new tab)
π linkedin.com/in/ahmad-raza-devops (opens in a new tab)
For more, visit ahmadraza.in (opens in a new tab)
Detailed commands, manifests, and guides are available on my blog.