Summary of How Shopify runs their biggest business event of the year with GKE
The video discusses how Shopify manages its largest business event, Black Friday Cyber Monday (BFCM), using Google Kubernetes Engine (GKE). The presenters detail the strategies and technologies employed to ensure performance, scalability, and availability during peak traffic times.
Main Financial Strategies and Business Trends:
- Preparation for High Traffic Events: Shopify prepares months in advance for BFCM by conducting large-scale tests to stress-test their infrastructure and ensure it can handle expected traffic levels.
- Use of Real-Time Data Visualization: They utilize a 3D visualization tool called the BFCM map to track real-time sales data, enhancing engagement and operational oversight.
- Infrastructure Scalability: Shopify employs GKE to dynamically manage compute resources across multiple regions, optimizing for latency and cost-effectiveness.
- Custom Compute Classes: The introduction of custom compute classes allows Shopify to manage workloads flexibly, automatically reallocating resources based on availability and performance needs.
- Autoscaling and Resource Management: The methodology includes autoscaling features that help manage resource allocation efficiently without overprovisioning, thereby controlling costs.
Methodology/Step-by-Step Guide:
- Preparation Steps for BFCM:
- Conduct large-scale stress tests on infrastructure months in advance.
- Monitor real-time sales data through the BFCM map.
- Infrastructure Management:
- Use GKE to manage compute resources across different geographical regions.
- Implement custom compute classes to allow workloads to spill over into alternate zones when primary resources are unavailable.
- Resource Allocation:
- Create multiple node pools with different machine types to optimize for performance and cost.
- Automate the migration of applications between clusters to manage resource availability dynamically.
- Utilizing New GKE Features:
- Leverage the new compute class capabilities to define fallback priorities for workloads.
- Enable auto-reconciliation for workloads to automatically revert to higher-priority resources as they become available.
Presenters:
- Justin (Engineer, Infrastructure Group at Shopify)
- Jeremy (Product Manager, GKE Infrastructure Autoscaling)
- Victor Salv (Product Manager, GKE)
- Roman (Product Manager, GKE Team)
Notable Quotes
— 00:00 — « No notable quotes »
Category
Business and Finance