Pixis: An AI-driven Data Analytics Platform
Category: Advertising and marketing
Services: DevOps, Migration, Cloud Architecture Design and Review, Managed Engineering Teams.
- 25% reduction in ETL infrastructure costs
- 30% improvement in data processing efficiency
- 40% reduction in data latency for real-time processing
About Pixis
Pixis.ai helps brands scale marketing through data infrastructure and modeling, enabling data-driven decision-making in the face of complex consumer behavior. It offers a codeless AI platform that leverages machine learning and optimization algorithms to analyze audience, media platforms, budget allocation, and other marketing data.
Problem statement
- The platform sourced data from multiple sources, making all the information available at centralized storage challenging.
- Further, Pixis was also facing the issues of slower development cycles and a lack of a standardized process for tracking infrastructure costs.
- Pixis was also battling the issue of high costs which was affecting company’s profitability and revenue figures
- There were issues with the scalability of data infrastructure and background tasks where several services still needed to be migrated to new infrastructure.
Proposed solution
- Our team leveraged AWS MSK to consolidate data from different systems into a centralized streaming data pipeline.
- Further, MSK helps capture data in real-time from different sources and load it into a data lake in S3, reducing the siloed data issues.
- Utilized AWS Managed Flink running on Amazon MSK to perform real-time and batch processing tasks. This enabled Pixis to transform and analyze data effectively before loading it into the data warehouse.
- Our team of experts used Amazon GuardDuty to secure the entire data lake infrastructure by continuously monitoring for unauthorized data access.
- Used AWS EKS for containerization and simplifying deployment, management, and scaling of data analytics platform.
- Leveraged Amazon S3 for storing processed data in the data lake. This cost-effective storage solution provided scalability and flexibility for handling large volumes of data generated in the ETL process.
- Set up an observability stack using Grafana, Prometheus, and CloudWatch for monitoring, alerting, and logs.
- We used Amazon RDS to store data types, including campaign data, ad accounts, cross-platform engagement scores, ML datasets, and tenant data.
Outcome
- Optimizing data storage and resource allocation led to a 25% reduction in infrastructure costs associated with ETL operations.
- Containerizing ETL infrastructure on AWS EKS enabled Pixis to scale their data processing capacity dynamically, accommodating fluctuating workloads seamlessly.
- Streamlining the ETL process using AWS services resulted in a 30% improvement in data processing efficiency.
- Leveraging AWS MSK for real-time data ingestion and processing reduced data latency by 40%, enabling Pixis to make faster, data-driven marketing decisions.
Arhitecture Diagram
AWS Service
- AWS MSK – We used AWS MSK to consolidate real-time data from multiple sources into a centralized streaming pipeline and store it in an S3 data lake, reducing siloed data concerns.
- Amazon RDS – We leveraged Amazon RDS as our primary storage solution for various data types, including campaign data, ad accounts, cross-platform engagement scores, ML datasets, tenant data, and more.
- Amazon S3 buckets – Our experts leveraged Amazon S3 buckets as a highly secure storage solution for storing various data types in the system, including configuration and customer data files.
- Amazon EKS – We leveraged Amazon EKS to manage and scale microservices easily, enabling background jobs on a containerized infrastructure.
- Amazon MQ – We utilized Amazon MQ to facilitate data ingestion from various sources into a centralized repository.
- AWS Trusted Advisor – Our team used AWS Trusted Advisor to identify overprovisioned resources and improve our security posture.
- AWS CloudTrail – AWS CloudTrail enables auditing, security monitoring, and operational troubleshooting by tracking user activity for enhanced data analytics.
- AWS ECR – We used AWS ECR to manage and scan the Docker images of our microservices securely.
- AWS Lambda – We used AWS Lambda to trigger and run machine learning pipelines and alerts.
- Redis – We leveraged Redis to store user sessions, which allowed us to retrieve session data easily and provide a better user experience.
- AWS SecurityHub – We utilized Security Hub to get a comprehensive view of our security state in AWS and to ensure our environment adhered to security industry standards and best practices.
- AWS NLB – Our team uses AWS NLB as our load balancer to distribute incoming traffic across multiple targets in different availability zones.
- Amazon CloudWatch – We used CloudWatch to monitor AWS services like RDS and MQ. Customized metrics, dashboards, and alarms helped us respond to issues quickly.
- AWS Secrets Manager – Our team used AWS Secrets Manager to securely store and manage sensitive information for our microservices, including API keys and database credentials.
- Amazon GuardDuty – Our experts used Amazon GuardDuty to monitor and secure the entire data lake infrastructure against unauthorized access.