Standard Operating Procedure Template

August 7, 2017 | Author: souri_a | Category: Incident Management, Business Process, Itil, Information Technology Management, Business

Share Embed Donate

Report this link

Short Description

SOP Template...

Description

Standard Operating Procedure [Title] [Version] [Company Name] [Street Address] [City, State Zip Code] [Creation Date] Notes:  The following template is provided for writing a Standard Operating Procedure (SOP) document.  [Inside each SOP section, text in green font between brackets is included to provide guidance to the author and should be deleted before publishing the final document.]  Inside each section, text in black font is included to provide a realistic example in which a Standard Operating Procedure is written for the first-line support in an Incident Management.  You are free to edit and use this Standard Operating Procedure template and its contents within your organization; however, we do ask that you don't distribute this template on the web without explicit permission from us. Copyrights: ITIL® is a Registered Trade Mark of the Office of Government Commerce in the United Kingdom and other countries.

Document Control Preparation Action

Name

Date

Prepared by:

Release Version

Date Released

1.0

Change Notice

Pages Affected

Remarks

N/A

All

First Release

Distribution List Name

Organization

Page 2

Title

Table of Contents 1.

INTRODUCTION

1.1 1.2 1.3 1.4

Purpose 5 Scope 5 Responsibilities 5 Summary 6

2.

PROCEDURE 7

2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18 2.19

Receive Call and Create Incident Ticket 7 Validate Incident 7 Gather Information 7 Identify Configuration Items (CI) Affected. 7 Categorize Incident 8 Look for Duplicate Incidents 8 Determine Impact 8 Determine Urgency 8 Calculate Priority 8 Process Major Incident 8 Perform First Diagnosis 9 Escalations 9 Get a Workaround or Resolution 9 Create a Resolution Plan 9 Apply the Resolution Plan 10 Check Restoration of Normal Service 10 Initiate a Problem Management Process for Recurring Incidents 10 Get User Satisfaction 10 Close the Incident 10

3.

HANDLING OF EXCEPTIONS

3.1 3.2 3.3

Major Incidents 11 Functional Escalation 11 Hierarchical Escalation 11

4.

ANNEX12

4.1 4.2 4.3

Glossary 12 List of tables Bibliography

Page 3

13 13

5

11

1. Introduction [ITIL Standard Operating Procedures (SOP's) are the documented procedures for routine work, exception response and making changes for every device, system or procedure. SOPs are used by IT Operations Management as part of ITIL Service Operations. This section is devoted to provide overall information about the document. The example provided in this template is for Incident Management - First Line Support.]

1.1 Purpose [Specify what the intention of the whole Standard Operating Procedure (SOP) document is.] The purpose of this document is to describe the procedures that the first-line support must perform as part of the Incident Management process.

1.2 Scope [Define here the scope and limits on which the procedures are applied.] This document encompasses all of the activities that the first-line support must perform in handling incidents originating in the applications and IT infrastructure within the organization. It does not include the handling that the team must perform for other types of requests that get to them, like service requests.

1.3 Responsibilities [Describe the role or roles performed by the team, department or group targeted by this document. Also list their responsibilities.] The first-line support performs the roles of Incident Owner and Incident Analyst. Each member of the team is responsible for the handling of assigned incidents. Their responsibilities are:       

Oversee the handling of the incident from the start to the closure. Find the Configuration Items (CI) affected. Perform initial diagnosis. Escalate the incident to the corresponding skilled team when needed. Apply workarounds and permanent solutions when is possible and permitted under their knowledge and authorization. Ensure the incident information is updated. Close the incident.

Page 4

The first-level support works under the supervision of the Incident Manager.

1.4 Summary [Describe here the structure of the rest of the Standard Operating Procedure document.] The main activities in the document are described on Section 2 Procedure. Section 3 Handling of Exceptions, describe how to perform special activities like the handling of major incidents and escalations.

Page 5

2.

Procedure [The Standard Operating Procedure (SOP) describes the routine work that needs to be done for every device, system or procedure. They also outline the procedures to be followed if an exception is detected or if a change is required. List here the activities that should be performed.]

2.1 Receive Call and Create Incident Ticket To increase the reliability and effectiveness of the Incident Management process, users are encouraged to report incidents and create the corresponding ticket from the web-based console. Efforts are also made to detect incidents and, when possible, self-heal from the Event Management automated tools. On both cases the incident ticket is created from sources other than the first-level support. For all other cases, as when the incident is just reported by a single call, the incident ticket must be created manually. Do so by using the Incident Management module of the Service Management Automated System. Fill in all the information required (marked with a “*”) and the optional information deemed helpful to the case.

2.2 Validate Incident If the purported incident is actually a service request then reclassify it and direct it into the Request Fulfillment module. Check the incident data. Modify or complete the information wherever is needed. Classify the incident into one of the pre-defined types. In case a new type is needed, classify the incident as “Generic” and coordinate with the Incident Manager to initiate a Change process.

2.3 Gather Information Gather as much information as required to understand the causes and solve the incident. You can also import any file deemed relevant into the incident record workspace.

2.4 Identify Configuration Items (CI) Affected.

Page 6

Identify CIs affected. This includes CIs failing, degrading or in imminent risk of failing or degrading as a result of the incident. Update into the incident data. Check for dependencies in the appropriate tab. Alert Configuration management if there is any discrepancy in the Configuration Management System (CMS).

2.5 Categorize Incident Categorize according to what the incident appears to be. Categorization determines the initial handling of the incident and could possibly be changed later.

2.6 Look for Duplicate Incidents Use the option “Search duplicated incidents” to find out previous incidents that are similar or related. Then use the option to concatenate with similar incidents or to relate with related incidents.

2.7 Determine Impact Assign a value for impact. By default, system calculates impact according to the CIs affected.

2.8 Determine Urgency Assign a value for urgency. By default, urgency equals the value set for the type of incident. Only the Incident Manager role may change this value.

2.9 Calculate Priority Priority is automatically calculated by the system, combining the impact and urgency according to a pre-defined set of rules. To change the rules for calculating priority, a Change process must be initiated by the Incident Manager. Only the Incident Manager may override the priority calculated by the system.

2.10 Process Major Incident

Page 7

For incidents of the highest impact, the system will advise to treat it as major incident. The Incident Manager can start the option “Treat as Major Incident” for any other incident. See the section 3.1 Major Incidents.

2.11 Perform First Diagnosis Investigate the incident trying to find out its causes, effects and means of solving it. Look for solutions from the following sources:  Document “Common incidents and troubleshooting”.  Web-based knowledge base.  List of known errors for each application.  Experiences from related incidents.  Common sense.

2.12 Escalations If you cannot solve the incident within the stipulated times for the first-line support, or if the investigation and solution requires specialized knowledge, you should perform a functional escalation to the appropriate team at the second or third level. See the section 3.2 Functional Escalation. A hierarchical escalation is also needed when the Incident should be treated as a major incident, or when the solution requires authorization from the appropriate level of decision. Most hierarchical escalations go first to the Incident Manager. See section 3.3 Hierarchical Escalation. .

2.13 Get a Workaround or Resolution Remember that your goal is to restore service as soon as possible. If a permanent solution can be implemented within the agreed response times, apply it. If not, try an effective workaround and recommend an analysis by Problem Management at the end of the process.

2.14 Create a Resolution Plan Write into the system the details on how the incident shall be solved. Whenever reasonable, include pre-testing, post-testing and backup options.

Page 8

2.15 Apply the Resolution Plan Execute the steps in the Resolution Plan. Document any update needed during the implementation. In case the solution is going to be applied by the user, check its effectiveness. Send a change request and monitor the change when the solution requires a non-standard change.

2.16 Check Restoration of Normal Service Check that solution succeed as intended. If not, repeat the process since diagnosis.

2.17 Initiate a Problem Management Process for Recurring Incidents Activate a request for Problem Management is the incident is recurring or if the incident is likely to recur.

2.18 Get User Satisfaction Ask the user to fill in the optional survey. Document any other feedback from the process.

2.19 Close the Incident Update and close the incident record along with any concatenated incident records.

Page 9

3.

Handling of Exceptions [You may devote a section of the Standard Operating Procedures to detail how to handle deviations from the normal flow.]

3.1 Major Incidents Incidents with the higher impact on the business are treated as major incidents. A special team is convened under the direct supervision of the Incident Manager to handle the incident faster than usual. Getting findings and conclusions are mandatory at the end of the process. Once you escalate a major incident, stay in contact with the handling team and the user as well, providing any support you are asked for. See section 3.3 Hierarchical Escalation.

3.2 Functional Escalation If you cannot solve the incident within the stipulated times for the first level, or if the investigation and solution requires specialized knowledge, you should perform a functional escalation to the appropriate team at the second or third tier. a) b) c) d) e) f)

Identify first the appropriate expert or team to escalate the incident. Escalate the incident. Provide the information required for the expert or team. Provide updates to the user. If the incident is re-routed to other functional area, ensure that the incident is re-classified. Continue normal flow from step 2.17.

3.3 Hierarchical Escalation A hierarchical escalation is needed when the Incident should be treated as a major incident, or when the solution requires authorization from the appropriate level of decision. At the point where a hierarchical escalation is needed, insert the following steps: a) Identify that the incident should be hierarchically escalated. b) Escalate to the appropriate authority, usually to the Incident Manager and, in some cases, to the specific authority supervising the affected area. c) Provide the information needed for the authority to make a decision. d) Continue with the regular process. Major incidents are usually handled by a dedicated team (see section 3.1 Major Incidents.

Page 10

4. Annex [Insert here anything you may like to attach to support the Standard Operating Procedure (SOP) document.]

4.1 Glossary [This section of the Standard Operating Procedures provides the definitions of terms, acronyms, and abbreviations required to understand this document.]

Term

Definition

Change

The addition, modification or removal of anything that could have an effect on IT services.

Change Management

The process responsible for controlling the lifecycle of all changes.

Configuration Item (CI)

Any component or other service asset that needs to be managed in order to deliver an IT service.

Configuration Management System (CMS)

A set of tools, data and information that is used to support service asset and configuration management.

Diagnosis

A stage in the incident and problem lifecycles aimed at identifying a workaround for an incident or the root cause of a problem.

Escalation

An activity that obtains additional resources when these are needed to meet service level targets or customer expectations.

Event Management

The process responsible for managing events throughout their lifecycle.

First-line support

The first level in a hierarchy of support groups involved in the resolution of incidents.

Functional escalation

Transferring an incident, problem or change to a technical team with a higher level of expertise to assist in an escalation.

Hierarchic escalation

Informing or involving more senior levels of management to assist in an escalation.

Impact

A measure of the effect of an incident, problem or change on business processes.

Incident

An unplanned interruption to an IT service or reduction in the quality of an IT service.

Incident Management

The process responsible for managing the lifecycle of all incidents.

Incident record

A record containing the details of an incident.

Known errors

A problem that has a documented root cause and a workaround.

Major Incident

The highest category of impact for an incident.

Priority

A category used to identify the relative importance of an incident, problem or change.

Page 11

Term

Definition

Problem

A cause of one or more incidents.

Problem Management

The process responsible for managing the lifecycle of all problems.

Resolution

Action taken to repair the root cause of an incident or problem, or to implement a workaround.

Restore

Taking action to return an IT service to the users after repair and recovery from an incident.

Root cause

The underlying or original cause of an incident or problem.

Second-line support

The second level in a hierarchy of support groups involved in the resolution of incidents and investigation of problems.

Standard Change

A pre-authorized change that is low risk, relatively common and follows a procedure or work instruction

Standard Operating Procedure (SOP)

Procedures used by IT operations management.

Third-line support

The third level in a hierarchy of support groups involved in the resolution of incidents and investigation of problems.

Urgency

A measure of how long it will be until an incident, problem or change has a significant impact on the business.

Workaround

Reducing or eliminating the impact of an incident or problem for which a full resolution is not yet available.

Table 1. Glossary.

4.2 List of tables [This section of the Standard Operating Procedure includes a list of all of the tables in the document.] Table 1. Glossary..................................................................................................................

4.3 Bibliography (n.d.). Common incidents and troubleshooting. Noname Software Company. (2012). Service Management Automated System's User Guide.

Find many more free ITIL templates at www.FastITILtemplates.com

Page 12

Standard Operating Procedure Template

Short Description

Description

Comments

We need your help!