Skip to main content
Astreya

Business Analyst IV - Alert Management Lead

3d

Astreya

US · Full-time · $98,040 – $154,800

About this role

The Business Analyst IV serves as Alert Management and Observability Standards Lead. This position provides solutions that help attain business outcomes through alert governance. The role defines standards and ensures alerts align with service reliability goals and operational coverage models.

Day-to-day responsibilities center on rationalizing alerts for business criticality and actionability. The lead conducts regular reviews of new and existing alerts while reducing signal-to-noise issues. Routing decisions determine whether alerts reach the 24x7 Eyes-on-Glass team or follow other paths.

This role operates at the intersection of the IT Operations Command Center, engineering teams, platform owners, and service owners. Collaboration ensures alerts remain actionable with clear ownership and escalation paths. Standards are embedded into monitoring tooling through templates and validation rules.

The position builds a scalable knowledge system for consistent incident response. Runbooks are versioned and maintained on a defined cadence to support high-quality actions by responders. Continuous improvement efforts preserve detection of true incidents while minimizing alert fatigue.

Requirements

  • Experience working with IT Operations Command Center and 24x7 monitoring teams
  • Knowledge of observability platforms and alert management tooling
  • Understanding of service reliability goals and operational coverage models
  • Ability to define severity thresholds and routing rules for incident response
  • Familiarity with runbook and playbook development for consistent remediation actions
  • Skill in alert rationalization to reduce noise while preserving true incident detection

Responsibilities

  • Establish and maintain a department-wide alert rationalization framework evaluating business criticality and actionability
  • Define and enforce alerting standards including severity definitions, metadata requirements, and naming conventions
  • Act as gatekeeper for determining alert routing to 24x7 Eyes-on-Glass, on-call engineering, or ticket creation
  • Establish a consistent approach to cataloging response instructions covering symptoms, triage steps, and escalation triggers
  • Perform regular alert reviews to ensure quality, correct routing, and alignment with operational coverage
  • Create a standardized Alert Design Checklist and approval workflow for alert onboarding
  • Partner with tool and platform owners to embed standards in monitoring tooling through templates and validation
  • Own the runbook template ensuring versioned maintenance and review on a defined cadence