Skip to Content

IND Lead Engineer, Infrastructure

Apply
Share Job Back

Job Details

Location:
Hyderabad, Telangāna, IN
Category:
Information Technology
Employment Type:
Full time
Job Ref:
R2624993-333

IND Lead Engineer, Infrastructure - GCC035

We’re determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals – and to help others accomplish theirs, too. Join our team as we help shape the future.

The Service Recovery Lead (Major Incident Lead) is responsible for overseeing end-to-end incident management processes within large-scale enterprise environments. This role ensures swift response, coordination, and resolution of major IT service disruptions, leveraging ITIL-based methodologies and AI-driven tools to minimize impact and restore services efficiently. The position demands strong leadership, strategic thinking, and advanced communication skills to drive continuous improvement, maintain high standards, and support business continuity. 

Key Requirements 

  • Extensive experience in ITIL-based incident management within enterprise environments 

  • Proven leadership and advanced communication capabilities 

  • Proficiency in AI-driven incident management and monitoring tools 

  • Ability to coordinate across cross-functional teams and stakeholders 

  • Track record of process improvement and post-incident analysis 

  • Commitment to compliance, on-call support, and business continuity planning 

 

Role Overview: Service Recovery Lead (Major Incident Lead) 

The Service Recovery Lead acts as the central authority for major incident response, driving incident coordination and service restoration efforts. The role requires proactive monitoring, rapid decision-making, and the ability to leverage cutting-edge technologies to streamline incident management. This position is pivotal in fostering a culture of continuous improvement, ensuring adherence to regulatory requirements, and safeguarding organizational resilience. 

 

Key Responsibilities 

  • Incident Management and Coordination: Lead the identification, assessment, and resolution of major incidents, ensuring timely restoration of services and minimizing business disruption. 

  • Communication: Serve as the primary point of contact during incidents, providing clear, concise updates to stakeholders, executives, and affected users. 

  • Process Ownership: Maintain and enhance ITIL-based incident management processes, driving adherence and continuous optimization. 

  • AI-Driven Tools Utilization: Implement and leverage AI-powered platforms for real-time monitoring, automated incident detection, and data-driven decision-making. 

  • Post-Incident Activities: Facilitate post-incident reviews, root cause analysis, and documentation of lessons learned to inform future improvements. 

  • Collaboration: Work closely with technical teams, business units, and third-party vendors to resolve incidents and enhance service reliability. 

  • Service Monitoring: Oversee continuous monitoring of critical services, proactively identifying risks and vulnerabilities. 

  • Compliance: Ensure incident management practices comply with organizational policies, industry standards, and regulatory requirements. 

  • 24/7/365 On-Call Support: Flexibility to participate in on-call rotations, providing expert guidance and support during off-hours and high-impact incidents. 

  • Business Continuity: Contribute to business continuity planning and execution, ensuring preparedness for major disruptions and rapid recovery. 

Qualifications 

  • Education: Bachelor’s degree in Information Technology, Computer Science, or a related discipline (Master’s preferred). 

  • Experience: Minimum 6 years of experience in a Command Center or similar environment, with at least 3 years of experience working highly visible and significantly impacting Major Incidents 

  • Technical Skills: Deep understanding of Command Center monitoring tools (i.e. Splunk, Dynatrace, ThousandEyes, Moogsoft, etc) platforms, and AI-driven monitoring tools. 

  • Strong knowledge of three primary areas of IT infrastructure: Computing platforms, Networking and communications and Data storage in both cloud and on prem environments. 

  • Communication: Exceptional verbal and written communication skills, adept at conveying complex information to diverse audiences. 

  • Personal Attributes: Strategic thinker with strong emotional intelligence, resilience, and a collaborative approach. 

Employment Conditions 

Employment in this role is contingent upon successfully passing a comprehensive background check. 

 

Apply
Share Job Back

A Glimpse Inside
The Hartford

About Us

We believe every day is a day to do right.

And that belief has guided us for over 200 years. Showing up for people isn’t just what we do, it’s who we are. We’re devoted to finding innovative ways to serve our customers, communities and employees – continually asking ourselves what more we can do.

And while how we contribute looks different for each of us, it’s these values that drive all of us to do more and to do better every day.