
Google Cloud Platform AI Platform Architect - Remote - CG
Role: Google Cloud Platform AI Platform Architect
Locations: 100% Remote
Duration: 12+ Months Contract
Job Description:
- We are building a new team of platform specialists tsupport and enhance high-performance AI services.
- These are highly technical, hands-on roles focused on customer, application, and platform support of AI-focused workloads.
- As an AI Platform Specialist, these roles will provide application and GPU support.
- The team will deliver Tier 1 and Tier 2 support tdevelopers and engineers while collaborating closely with Tier 3 and 4 platform teams and vendors for issue resolution.
- The roles require user knowledge of Kubernetes, virtualization, and cloud-native technologies as well as operator knowledge of GPUs and other AI supporting services.
- Each specialist should have a focus on customer service along with goals of reliability, scalability, and performance.
Key Responsibilities:
Platform Support & Incident Response:
- Provide Tier 1 & Tier 2 support for AI-driven applications and workloads.
- Troubleshoot and resolve issues related tKubernetes deployments, GPU utilization, and service performance.
- Collaborate with Tier 3+ teams, including Kubernetes engineers and external vendors, tescalate and resolve complex issues.
Kubernetes & Cloud-Native Operations:
- Full adoption, creation, and integrations intautomated services using Helm, Ansible, Terraform, etc.
- Deploy, manage, and support containerized AI workloads on Google Anthos-powered Kubernetes clusters.
- Ensure adherence tpod security policies, automated rollouts/rollbacks, and best practices for scalable and secure Kubernetes environments.
GPU Infrastructure & AI Services Management
- Optimize and support GPU-enabled workloads including CUDA and other AI acceleration frameworks.
- Assist in the installation, configuration, and support of AI coding assistants (e.g., Codeium).
Observability & Documentation:
- Maintain detailed operational documentation, runbooks, and troubleshooting guides.
- Utilize monitoring/logging tools like New Relic, Big Panda, Prometheus, Grafana, and other observability frameworks.
Process Improvement & Collaboration
- Work cross-functionally with developers, IT teams, and vendors tensure seamless deployment and support of AI services.
- Contribute tCI/CD pipelines, automation, service, and security best practices.
- Track and communicate work through task management platforms (ServiceNow and Jira).
Required Skills & Experience:
- Hybrid Cloud In-depth knowledge of private (on-premises) and public (Google Cloud Platform & AWS) cloud architectures and services.
- AI/ML Software Developer experience with DevOps practices (Git, Jenkins, etc.) as well as working with AI/ML engineers and data scientists.
- AI/ML Hardware Experience deploying, supporting, and optimizing on-premises and cloud GPUs (NVIDIA & AMD) enabled infrastructure (VMs & Containers).
- Kubernetes Expertise Hands-on experience with deploying and managing containerized workloads in Kubernetes.
- Technical Support & Troubleshooting Proven ability tdiagnose and resolve customer and platform issues in production environments.
- Strong Communication & Documentation Ability tclearly document procedures, write knowledge base articles, and collaborate with customers and teams.
- Time Management & Accountability Ability twork independently, prioritize tasks, and manage workload effectively.
Preferred Qualifications:
- Experience with GPU orchestration tools like Run:AI, NVIDIA AI Enterprise, VMWare Private AI Foundation, etc.
- Exposure tAI coding assistants like Codeium, Copilot, or Tabnine.
- Proficient in development tools like Python, PyTorch, TensorFlow, Jupyter Notebooks, etc.
About the Team & Reporting Structure:
- These positions will report tthe Senior AI Architect and work as peers within a specialized AI support team.
- Collaboration with internal VM and container support teams as well as NVIDIA, Codeium, and other vendor specialists will be essential for supporting customers, troubleshooting, and optimizing AI workloads.
Role: Google Cloud Platform AI Platform Architect
Locations: 100% Remote
Duration: 12+ Months Contract
Job Description:
- We are building a new team of platform specialists tsupport and enhance high-performance AI services.
- These are highly technical, hands-on roles focused on customer, application, and platform support of AI-focused workloads.
- As an AI Platform Specialist, these roles will provide application and GPU support.
- The team will deliver Tier 1 and Tier 2 support tdevelopers and engineers while collaborating closely with Tier 3 and 4 platform teams and vendors for issue resolution.
- The roles require user knowledge of Kubernetes, virtualization, and cloud-native technologies as well as operator knowledge of GPUs and other AI supporting services.
- Each specialist should have a focus on customer service along with goals of reliability, scalability, and performance.
Key Responsibilities:
Platform Support & Incident Response:
- Provide Tier 1 & Tier 2 support for AI-driven applications and workloads.
- Troubleshoot and resolve issues related tKubernetes deployments, GPU utilization, and service performance.
- Collaborate with Tier 3+ teams, including Kubernetes engineers and external vendors, tescalate and resolve complex issues.
Kubernetes & Cloud-Native Operations:
- Full adoption, creation, and integrations intautomated services using Helm, Ansible, Terraform, etc.
- Deploy, manage, and support containerized AI workloads on Google Anthos-powered Kubernetes clusters.
- Ensure adherence tpod security policies, automated rollouts/rollbacks, and best practices for scalable and secure Kubernetes environments.
GPU Infrastructure & AI Services Management
- Optimize and support GPU-enabled workloads including CUDA and other AI acceleration frameworks.
- Assist in the installation, configuration, and support of AI coding assistants (e.g., Codeium).
Observability & Documentation:
- Maintain detailed operational documentation, runbooks, and troubleshooting guides.
- Utilize monitoring/logging tools like New Relic, Big Panda, Prometheus, Grafana, and other observability frameworks.
Process Improvement & Collaboration
- Work cross-functionally with developers, IT teams, and vendors tensure seamless deployment and support of AI services.
- Contribute tCI/CD pipelines, automation, service, and security best practices.
- Track and communicate work through task management platforms (ServiceNow and Jira).
Required Skills & Experience:
- Hybrid Cloud In-depth knowledge of private (on-premises) and public (Google Cloud Platform & AWS) cloud architectures and services.
- AI/ML Software Developer experience with DevOps practices (Git, Jenkins, etc.) as well as working with AI/ML engineers and data scientists.
- AI/ML Hardware Experience deploying, supporting, and optimizing on-premises and cloud GPUs (NVIDIA & AMD) enabled infrastructure (VMs & Containers).
- Kubernetes Expertise Hands-on experience with deploying and managing containerized workloads in Kubernetes.
- Technical Support & Troubleshooting Proven ability tdiagnose and resolve customer and platform issues in production environments.
- Strong Communication & Documentation Ability tclearly document procedures, write knowledge base articles, and collaborate with customers and teams.
- Time Management & Accountability Ability twork independently, prioritize tasks, and manage workload effectively.
Preferred Qualifications:
- Experience with GPU orchestration tools like Run:AI, NVIDIA AI Enterprise, VMWare Private AI Foundation, etc.
- Exposure tAI coding assistants like Codeium, Copilot, or Tabnine.
- Proficient in development tools like Python, PyTorch, TensorFlow, Jupyter Notebooks, etc.
About the Team & Reporting Structure:
- These positions will report tthe Senior AI Architect and work as peers within a specialized AI support team.
- Collaboration with internal VM and container support teams as well as NVIDIA, Codeium, and other vendor specialists will be essential for supporting customers, troubleshooting, and optimizing AI workloads.