REMOTE - API Production Support specialist
Location: REMOTE
Position Type: Multiyear Contract
Requirements
MUST be available to work in a 24x7, Level 2 API support and incident response service team
ON CALL Required
Expertise in MuleSoft API troubleshooting and support
Experience using monitoring tools for API management like Azure Monitor, Splunk and Dynatrace
Familiarity with ServiceNow tools for incident tracking and documentation
Ability to use enterprise runbooks and wiki documentation for issue resolution
Ability to collaborate with multiple internal and external stakeholders, including the Tier 3 team and Support Lead
Preferably a Java background to understand stack traces, logs in order to pinpoint root cause
Experience with SOAP/REST APIs with Spring Boot and Java microservices
Experience with MuleSoft AnyPoint Platform including Exchange and monitoring
Use Azure, Splunk and Dynatrace-based dashboards for monitoring and resolution
Conduct root cause analysis, escalate issues to internal Tier 3 team as necessary, and engage multiple vendors for resolution when required
Use enterprise runbooks, wiki documentation, and collaboration with the Tier 3 team or Support Lead
Provide 24x7 on-call support as a primary or secondary contact (rotation basis)
Serve as API support on least one major incident call per day, averaging 2 hours
API-related incidents through ServiceNow and based on Moogsoft tickets
Troubleshoot and resolve issues within L2 incident criteria
Ensure timely response and resolution of API-related incidents per agreed SLAs
Perform initial triage, log analysis, and impact assessment
Ensure monitoring and alerts are accurate, current, and functional
Utilize enterprise runbooks and wiki documentation for troubleshooting and resolution
Participate in Problem and Knowledge Management process as requested
Observability support for incident management to proactively identify, diagnose and resolve issues
Conduct detailed RCA (Root Cause Analysis) for recurring or high-impact incidents
Provide RCA reports with contributing factors, corrective actions, and long-term recommendations
Work with internal teams to implement preventative measures
Collaborate with the Tier 3 team or support lead when necessary to resolve complex issues
Maintain documentation of escalations, including logs, timestamps and resolution progress
After RCA, determine and contact relevant vendors required for issue resolution
Provide necessary logs, issue descriptions, and troubleshooting details to vendors
Track vendor resolution progress, coordinate efforts, and update stakeholders
Location: REMOTE
Position Type: Multiyear Contract
Requirements
MUST be available to work in a 24x7, Level 2 API support and incident response service team
ON CALL Required
Expertise in MuleSoft API troubleshooting and support
Experience using monitoring tools for API management like Azure Monitor, Splunk and Dynatrace
Familiarity with ServiceNow tools for incident tracking and documentation
Ability to use enterprise runbooks and wiki documentation for issue resolution
Ability to collaborate with multiple internal and external stakeholders, including the Tier 3 team and Support Lead
Preferably a Java background to understand stack traces, logs in order to pinpoint root cause
Experience with SOAP/REST APIs with Spring Boot and Java microservices
Experience with MuleSoft AnyPoint Platform including Exchange and monitoring
Use Azure, Splunk and Dynatrace-based dashboards for monitoring and resolution
Conduct root cause analysis, escalate issues to internal Tier 3 team as necessary, and engage multiple vendors for resolution when required
Use enterprise runbooks, wiki documentation, and collaboration with the Tier 3 team or Support Lead
Provide 24x7 on-call support as a primary or secondary contact (rotation basis)
Serve as API support on least one major incident call per day, averaging 2 hours
API-related incidents through ServiceNow and based on Moogsoft tickets
Troubleshoot and resolve issues within L2 incident criteria
Ensure timely response and resolution of API-related incidents per agreed SLAs
Perform initial triage, log analysis, and impact assessment
Ensure monitoring and alerts are accurate, current, and functional
Utilize enterprise runbooks and wiki documentation for troubleshooting and resolution
Participate in Problem and Knowledge Management process as requested
Observability support for incident management to proactively identify, diagnose and resolve issues
Conduct detailed RCA (Root Cause Analysis) for recurring or high-impact incidents
Provide RCA reports with contributing factors, corrective actions, and long-term recommendations
Work with internal teams to implement preventative measures
Collaborate with the Tier 3 team or support lead when necessary to resolve complex issues
Maintain documentation of escalations, including logs, timestamps and resolution progress
After RCA, determine and contact relevant vendors required for issue resolution
Provide necessary logs, issue descriptions, and troubleshooting details to vendors
Track vendor resolution progress, coordinate efforts, and update stakeholders