Director of Platform Engineering

Posted 2 Days Ago
Be an Early Applicant
Austin, TX
Senior level
Artificial Intelligence • eCommerce • Retail
The Role
The Director of Platform Engineering at Upshop will drive the success of enterprise grocery and convenience store chains by working closely with sales and product teams to demonstrate the value of technology solutions, conduct market research, train the sales team, and present solutions to C-Level audiences. The role involves ongoing collaboration with various teams to improve products and secure new business opportunities.
Summary Generated by Built In

About Upshop

Upshop is the market leader in Total Store Operation solutions for the Grocery and C-Store markets. We offer an AI-powered, SaaS platform connecting Fresh, Center, eCommerce, and DSD department operations to deliver a simplified, smarter, more connected store experience. Customers running Upshop realize significant improvements in sales, shrink, food safety and sustainability across the entire store. 450+ retail chain accounts trust our software in over 50k+ stores, 35 countries, and 3 continents.

Overview of the role

We are seeking an experienced and strategic Director of Platform Engineering to lead the team responsible for the infrastructure, tools, and processes that power our mission-critical platform. This system is at the heart of food retail operations and plays a vital role in ensuring the seamless operation of the global food supply chain, especially in the US. As the Director of Platform Engineering, you will own and evolve the platform's operational excellence, ensuring it is scalable, reliable, and cost-efficient, while enabling an exceptional developer experience across the organization. 

Requirements 

Technical Expertise: 

  • Extensive experience with cloud hosting platforms (Azure and GCP preferred).
  • Proven expertise in building and managing CI/CD pipelines and developer experience tools.
  • Strong background in observability, monitoring, tracing, and alerting technologies.
  • Deep understanding of SRE practices and deployment strategies (multi-region, rolling, and canary).
  • Proficient in infrastructure as code (e.g., Terraform, Ansible) and high availability systems.
  • Experience with cost management and optimization in cloud environments.
  • Experience with monitoring and configuring Azure Service Plans,  Azure functions, Cosmos, Service Bus, Event Grid, SignalR, Azure Table storage and other Azure technologies.
  • Strong experience in networking, network security, and security operations. 

Leadership Skills: 

  • Proven ability to build and lead high-performing technical teams.
  • Exceptional communication and collaboration skills across technical and non-technical audiences.
  • Experience managing incident response processes and fostering a culture of operational excellence. 

Strategic Thinking: 

  • Visionary leadership with the ability to balance long-term strategy with day-to-day operations.
  • Commitment to proactively detecting and resolving system issues before they impact customers.
  • Passion for building robust systems that are critical to societal infrastructure, such as the global food supply chain, especially in the US. 

Preferred Qualifications: 

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
  • Minimum of 10 years of experience in platform engineering, site reliability engineering, or related technical roles, with at least 5 years in leadership positions.
  • Strong experience in Microsoft Azure, Serverless, Containers, Event-Driven systems (Kafka, Service Bus), and Relational and Document Data Stores.
  • Experience in mission-critical systems, especially in industries like retail, logistics, or supply chain.
  • Familiarity with synthetic testing and application performance monitoring tools (e.g., Datadog, New Relic, Prometheus).
  • Track record of successful disaster recovery planning and execution. 

Key Responsibilities 

Infrastructure & Deployment: 

  • Lead the design, implementation, and optimization of tools and processes that accelerate development and deployment while enhancing the developer experience.
  • Drive multi-region deployments, rolling updates, and canary deployments, ensuring minimal disruption to production systems. 

Observability & Monitoring: 

  • Establish and maintain comprehensive observability and monitoring systems to ensure complete visibility into platform health.
  • Develop a “one pane of glass” solution for system-wide health monitoring, enabling proactive issue detection and resolution. 

Site Reliability Engineering: 

  • Champion best practices in SRE to deliver high availability and optimal performance of mission-critical systems. 

Incident Management: 

  • Develop and oversee incident management and escalation processes, coordinating across Platform Engineering and other teams to swiftly address and resolve critical issues. 

Cloud Hosting & Cost Management: 

  • Manage cloud hosting platforms (primarily Azure, with some GCP), foster vendor relationships, and optimize cloud usage and costs.
  • Monitor and manage cloud expenditures, aligning costs with organizational goals while maintaining platform performance and scalability. 

Continuous Monitoring: 

  • Collaborate with QA and Engineering to implement synthetic tests and monitor application metrics for production deployments, ensuring consistent reliability. 

Team Leadership & Security: 

  • Build, mentor, and retain a high-performing Platform Engineering team, fostering a culture of collaboration and continuous improvement.
  • Partner with Security to address code dependency vulnerabilities and with Engineering to manage end-of-life dependencies effectively. 

Disaster Recovery & Best Practices: 

  • Own and continuously improve disaster recovery (DR) plans and processes, ensuring regular testing and validation.
  • Ensure adherence to best practices such as infrastructure as code, high availability configurations, and operational resilience. 

Architecture & Scaling: 

  • Work closely with Engineering teams to design platform architectures that are scalable, reliable, and efficient, meeting current and future needs. 

What We Offer 

  • Competitive salary and benefits package.
  • Opportunity to lead a team responsible for a platform critical to the global food supply chain, especially in the US.
  • A collaborative, innovative environment where your impact will be both strategic and hands-on.
  • The ability to shape and influence the technical direction of a mission-critical system. 

Join Us 

If you are ready to lead a world-class Platform Engineering team and take ownership of a system critical to the global food supply chain, we encourage you to apply and help us drive the future of food supply technology. 


Top Skills

Cloud
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Tampa, Florida
95 Employees
On-site Workplace
Year Founded: 1989

What We Do

Upshop is the first total store operations platform synchronizing Fresh, Center, eCommerce, and DSD solutions to make retail operations simplified, smart, and more connected.

Upshop has been pioneering total store operations technology for over 30 years; delivering SaaS-based solutions which offer a simplified, smarter, more connected solution to retail store associates. The business leveraged the technology of leading products FreshIQ®, ShopperKit, Date Check Pro, and Itasca Retail's Magic Inventory Intelligence to synchronize one platform, providing retailers the visibility needed to increase sales, cut waste, and streamline labor efficiencies. Over 150+ retail chain accounts trust our software in over 30,000+ stores, 9 countries, and 3 continents.

Similar Jobs

Vertex, Inc. Logo Vertex, Inc.

Senior Director, Cloud Platform Engineering

Cloud • Information Technology • Other • Software
Remote
51 Locations
1637 Employees
Irving, TX, USA
23630 Employees
152K-253K Annually
5 Locations
8926 Employees
187K-311K Annually
McKinney, TX, USA
2591 Employees

Similar Companies Hiring

Stepful Thumbnail
Software • Healthtech • Edtech • Artificial Intelligence
New York, New York
60 Employees
HERE Technologies Thumbnail
Software • Logistics • Internet of Things • Information Technology • Computer Vision • Automotive • Artificial Intelligence
Amsterdam, NL
6000 Employees
True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account