Home Online Principal Site Reliability Engineer - Incident Response Commander

Home Online Principal Site Reliability Engineer - Incident Response Commander

Principal Site Reliability Engineer - Incident Response Commander

Full time at DocuSign, Inc. in Online
Posted on May 30, 2024

Job details

Company Overview

Docusign brings agreements to life. Over 1.5 million customers and more than a billion people in over 180 countries use Docusign solutions to accelerate the process of doing business and simplify people’s lives. With intelligent agreement management, Docusign unleashes business-critical data that is trapped inside of documents. Until now, these were disconnected from business systems of record, costing businesses time, money, and opportunity. Using Docusign’s Intelligent Agreement Management platform, companies can create, commit, and manage agreements with solutions created by the #1 company in e-signature and contract lifecycle management (CLM).

What you'll do

The Principal Incident response commander at Docusign is an expert in leading, identifying, facilitating, and resolving various critical incidents including technology, products, and security. The role involves strategic project management, effective communication with stakeholders including executive leadership, and handling difficult incidents independently. They play a pivotal role in developing Docusign’s overall service excellence practice by creating standard operating procedures, training material, and mentoring team members. The role also requires daily incident management support across various Docusign infrastructures globally, ensuring the maintenance of service levels. The role includes on-call responsibilities outside business hours and weekends, daily reporting, ticket administration, and general production assurance duties. The ideal candidate is self-motivated, responsible and able to prioritize and operate under time constraints. Adherence to established procedures and detailed documentation of incidents and resolution steps is critical. This position is an individual contributor role reporting to the Director, SRE Incident Response (Software Engineering). Responsibility
  • Serve as a subject matter expert for Docusign’s incident management
  • Lead and handle complex and sensitive critical incidents to conclusion, identifying and resolving challenges to ensure timely resolution
  • Lead strategic projects to meet incident management objectives and participate in post-incident analysis and documentation to identify root causes and preventive actions
  • Monitor, evaluate and report on incident management programs, processes and statistics to assure continuous improvement
  • Create and maintain process documentation, including standard operating procedures, demonstrating excellent judgment when addressing new challenges
  • Interact regularly with senior leaders to facilitate effective incident handling or project delivery, producing suitable communications
  • Generate communications for multiple audience types, both customer-facing and internal
  • Provide oversight, mentorship, and guidance to a team responsible for assessing critical issues and making immediate decisions towards incident resolution, fostering a supportive and collaborative environment
  • Facilitate resolution for all major incidents, and handle communications via bridge calls and emails
  • Prioritize incidents based on impact and importance, and classify them based on customer and operational impact
  • Engage resources to resolve major incidents and minimize customer and business impact
  • Serve as an escalation point within the Incident Management process, initiating Crisis Incident response processes and applying the escalation process when required
  • Monitor and evaluate high-level incident-based data for anomalies and correlations to provide valuable insights into trends
  • Run calls to drive incidents forward and inform management
  • Drive incidents to resolution 
  • Coordinate with multiple internal teams to establish containment and remediation strategies are implemented and carried out
  • Participate in a rotational shift 24 x 7 x 365

Job Designation

Hybrid: Employee divides their time between in-office and remote work. Access to an office location is required. (Frequency: Minimum 2 days per week; may vary by team but will be weekly in-office expectation) Positions at Docusign are assigned a job designation of either In Office, Hybrid or Remote and are specific to the role/job. Preferred job designations are not guaranteed when changing positions within Docusign. Docusign reserves the right to change a position's job designation depending on business needs and as permitted by local law.

What you bring

Basic 
  • 12+ years experience in Incident Management
  • Experience with Incident Management tools
  • Working knowledge of monitoring platforms and applications like Prometheus, Grafana, Azure Data Explorer, PagerDuty
  • Proficiency in troubleshooting techniques and problem-solving in a 24x7x365 environment
Preferred 
  • Demonstrated ability to lead under pressure, make decisions in ambiguous situations, and drive cross-functional collaboration swiftly.
  • Able to lead incident calls confidently and independently to a successful resolution
  • Ability to understand and work within complex, large enterprise business environments
  • Capability to lead process analysis and provide recommendations for process improvements
  • Experience handling complex security and privacy investigations
  • Familiarity with Google SRE, Agile/Scrum, and DevOps principles
  • Excellent oral and written communication skills, including the ability to build and present effective business presentations
  • Advanced problem identification, judgment, resolution, and decision-making skills
  • Ability to work well interpersonally across various levels and subject areas, as well as influence and lead without direct authority
  • Ability to prioritize and effectively handle priorities and projects
  • Incident management experience within product, SaaS, Azure, AWS, and with SQL, Networking, and DevOps teams is highly desired
  • Familiarity with basic troubleshooting in public and private cloud environments

Life at Docusign

Working here Docusign is committed to building trust and making the world more agreeable for our employees, customers and the communities in which we live and work. You can count on us to listen, be honest, and try our best to do what’s right, every day. At Docusign, everything is equal. We each have a responsibility to ensure every team member has an equal opportunity to succeed, to be heard, to exchange ideas openly, to build lasting relationships, and to do the work of their life. Best of all, you will be able to feel deep pride in the work you do, because your contribution helps us make the world better than we found it. And for that, you’ll be loved by us, our customers, and the world in which we live. Accommodation Docusign is committed to providing reasonable accommodations for qualified individuals with disabilities in our job application procedures. If you need such an accommodation, or a religious accommodation, during the application process, please contact us at accommodations@docusign.com . If you experience any issues, concerns, or technical difficulties during the application process please get in touch with our Talent organization at taops@docusign.com for assistance. #LI-HYBRID #LI-BP1#J-18808-Ljbffr

Apply safely

To stay safe in your job search, information on common scams and to get free expert advice, we recommend that you visit SAFERjobs, a non-profit, joint industry and law enforcement organization working to combat job scams.

Share this job
See All Principal Jobs
Feedback Feedback