Leveraging the Scalable Power of 3DiVi Omni Platform on Amazon AWS

Release and Its Implications

In August, we unveiled the Omni Platform with support for external AWS base modules and Amazon EKS deployment. This development has significant implications for our customers. As the SaaS model gains traction in IT businesses, the burden of maintaining infrastructure, along with its associated costs, becomes apparent. It involves substantial hardware investments, dedicated staff, and meticulous control over network bandwidth, fault tolerance, and security. Utilizing cloud servers from providers like Google and Amazon can mitigate these challenges, particularly during the nascent stages of IT business growth.

Challenges with Pilots and Customer Testing

During pilot phases and customer tests, load fluctuations can be erratic, oscillating between complete downtime and peak loads multiple times daily. Manually predicting and managing these hardware load fluctuations and service maintenance is daunting, and renting excess capacity is cost-prohibitive.

Goal

Our aim was to minimize the costs and staff involvement in maintaining a cloud-based facial recognition service, with a focus on hardware load and service upkeep.

Solution

We devised a solution involving the Omni platform and Image API, which support automatic scaling in the Amazon cloud. This system dynamically adjusts server capacity based on load changes, scaling up during high demand and scaling down when usage drops.

Implementation Details

Cluster Installation and Configuration: We isolated the cluster installation and configuration into the Self-Managed Cluster (SMC) module. This separation reduces dependencies and streamlines the configuration process for administrators.
Support for Horizontal Pod Autoscaler: This feature was integrated into all image-api deployments, as well as the platform’s processing and quality deployments. It enables automatic scaling and efficient utilization of cluster resources.
NodeSelector Feature: This allows the deployment of load-intensive services on selected nodes, reducing the need for costly performance nodes.
Ingress Proxy: This enables the use of Image API services within the OMNI Platform distribution.
Key Functionality: We added support for deploying solutions in Amazon EKS and a module for customizing the AWS environment for platform deployment.

First Trial and Error Correction

In our initial trial, feedback from clients revealed a cost increase from $280 to $1138 per month for virtual machine maintenance. To address this, we adjusted aws.eks.cluster.yaml and realized that different services were inaccurately grouped as resource-intensive. With limited direct support from AWS, we relied on forums and documentation to select optimal parameters, such as instance type and resource limits.

Results

After optimization, the total cost amounted to $480 per month, compared to the $280 for normal virtualization. This price also brought the advantages of easier scaling under load and high system availability (including backup, fault tolerance, and recovery). CloudWatch further enhanced our capabilities with built-in AWS tools for logging and monitoring. Unlike single-platform virtualization, AWS allows for multiple platform deployments on the same hardware.

Moving Forward

The insights gained from our first real-world application have been integrated into our system scripts and documentation. We are now exploring translations of this experience to Google Cloud Engine (GCE) and have deployed our own cloud on AWS to investigate cost-reduction strategies and system flexibility, including the ability to stop and restart systems as needed.

Learn more about OMNI Platform

Download OMNI Platform Presentation (PDF, 1.8MB)