PTS Alarms Reference Guide R7.40 C07

March 22, 2023 | Author: Anonymous | Category: N/A
Share Embed Donate


Short Description

Download PTS Alarms Reference Guide R7.40 C07...

Description

 

Sandvine Policy T Sandvine Traffic raffic Switch Alarms Reference Guide, Release 7.40

05-00262 C07 2016-11-26

 

Notices

The most most cur curren rentt versi version on of thi this s doc docume ument nt is ava availilabl able e on the San Sandvi dvine ne Cus Custom tomer er Suppor Supportt websit website e at https://support.sandvine.com. This document and the products described within are subject to copyright. Under copyright laws, neither this document nor the product may be reproduced, translated, or reduced to any electronic medium or machine readable or other form without prior  written authorization from Sandvine. Copyright 2016, Sandvine Incorporated ULC. All rights reserved. Sandvine™ is a trademark of Sandvine Incorporated ULC. All other product names mentioned herein are trademarks of their respective owners. Sandvine is committed to ensuring the accuracy of our documentation and to continuous improvement. If you encounter errors or omissions in this user guide, or have comments, questions, or ideas, we welcome your feedback. Please send your comments https://support.sandvine.com sandvine.com. to Sandvine via email at   https://support. Contacting Sandvine

To view the latest Sandvine documentation or to contact Sandvine Customer Support, register for an account at https://support.sandvine.com. See http://www.sandvine.com/ about_us/contact.asp asp for a list of Sandvine Sales and Support offices. http://www.sandvine.com/about_us/contact.

2

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

Related Documentation

Related documentation is available from Sandvine’s Customer Support web site. All documents are in PDF format and can be opened and read or printed using Adobe® Acrobat® Reader®. You can obtain a free copy of this software from the Adobe® web site. Document

Part Number 

Getting Started with Sandvine

05-00011

PTS Administration Guide PTS Alarms Reference Guide

05-00192 05-00262

PTS CLI Reference Guide

05-00263

PTS Hardware Installation Guide

05-00185

PTS Software Installation and Upgrade Guide

05-00245

PTS SandScript Guide

05-00217

PTS Virtual Platform User Guide

05-00269

Sandvine API User Guide

05-00330

Subscriber Mapping User Guide

05-00209

Network Protection User Guide

05-00301

Web Content Intelligence User Guide

05-00325

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

3

 

4 | Sandvine | TOC

Contents 1

Notifications................................................................................................................................................................14 1.1

Notifications.........................................................................................................................................................15

1.1.1 Sandvine Notifications..................................................................................................................................15 2 Key Performance Indicators.......................................................................................................................................18 2.1 General Resources.............................................................................................................................................19 2.1.1 Memory Resources......................................................................................................................................19 2.1.2 PTS Per Module General Resources...........................................................................................................19 2.2 Inspection Performance Monitoring.....................................................................................................................20 2.2.1 CPU Resource.............................................................................................................................................20 2.2.2 Memory Resource........................................................................................................................................20 2.2.3 Inspection Engine.........................................................................................................................................21 2.2.4 Flow Management........................................................................................................................................21 2.3

Interfaces.............................................................................................................................................................22

2.3.1 Bitrate Capacity............................................................................................................................................22 2.4 Subscriber Monitoring.........................................................................................................................................23 2.4.1 PTS Subscribers Count................................................................................................................................23 3 PTS Alarms................................................................................................................................................................24 3.1 Alarm Models......................................................................................................................................................27 3.2 Alarm Model 1: Faulted Hardware.......................................................................................................................27 3.2.1 Faulted Hardware.........................................................................................................................................28 3.2.2 Impact and Suggested Resolution, Alarm Model 1......................................................................................29 3.3 Alarm Model 2: Faulted Disk...............................................................................................................................30 3.3.1 Degraded Disk Notification...........................................................................................................................30 3.3.2 Faulted Disk Notification...............................................................................................................................31 3.3.3 Faulted Disk Cleared....................................................................................................................................31 3.3.4 Impact and Suggested Resolution, Alarm Model 2......................................................................................31 3.4 Alarm Model 3: High Temperature.......................................................................................................................32 3.4.1 High Temperature - Notification....................................................................................................................32 3.4.2 High Temperature Cleared...........................................................................................................................33 3.4.3 Impact and Suggested Resolution, Alarm Model 3......................................................................................34 3.5 Alarm Model 4: Faulted Fan................................................................................................................................35 3.5.1 Faulted Fan: Major Notification....................................................................................................................35 3.5.2 Faulted Fan: Clear Notification.....................................................................................................................36 3.5.3 Impact and Suggested Resolution, Alarm Model 4......................................................................................36 3.6 Alarm Model 5: Faulted Power Supply................................................................................................................37 3.6.1 Faulted Power Supply - Major Notification...................................................................................................39

4

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

Sandvine Sandvi ne | TOC |  5

3.6.2 Faulted Power Supply - Notification.............................................................................................................40 3.6.3 Faulted Power Supply Cleared.....................................................................................................................44 3.6.4 Impact and Suggested Resolution, Alarm Model 5......................................................................................46 3.6.5 Suggested Resolutions for All Platforms, Alarm Model 5.............................................................................46 3.6.6 Suggested Resolutions for PTS 24000 and PTS 32000 Series Platforms...................................................47 3.7 Alarm Model 6: High Power Usage.....................................................................................................................47 3.7.1 High Power Usage – Notification..................................................................................................................47 3.7.2 High Power Usage Cleared..........................................................................................................................48 3.7.3 Power supply in non-redundant state - alarms.............................................................................................49 3.7.4 Impact and Suggested Resolution, Alarm Model 6......................................................................................49 3.8 Alarm Model 7 : High Resource Usage...............................................................................................................49 3.8.1 Major Notification: High Resource Usage ...................................................................................................50 3.8.2 Minor Notification: High Resource Usage ...................................................................................................51 3.8.3 Warning Notification: High Resource Usage................................................................................................51 3.8.4 Clear Notification: High Resource Usage ....................................................................................................52 3.8.5 Impact and Suggested Resolution, Alarm Model 7......................................................................................52 3.9 Alarm Model 8: Overloaded Processor...............................................................................................................61 3.9.1 Overloaded Processor – Notification............................................................................................................62 3.9.2 Overloaded Processor Cleared....................................................................................................................63 3.9.3 Impact and Suggested Resolution, Alarm Model 8......................................................................................63 3.10 Alarm Model 9: Unavailable Processing Module...............................................................................................65 3.10.1 Unavailable Processing Module - Notification............................................................................................66 3.10.2 Unavailable Processing Module Cleared...................................................................................................66 3.10.3 Impact and Suggested Resolution, Alarm Model 9....................................................................................67 3.11 Alarm Model 10: Unavailable Service Component............................................................................................68 3.11.1 Unavailable Service Component: Major.....................................................................................................69 3.11.2 Unavailable Service Component: Clear......................................................................................................70 3.11.3 Background Service Processes..................................................................................................................70 3.11.4 Impact and Suggested Resolution: Alarm Model 10..................................................................................71 3.12 Alarm Model 11: Unavailable Bypass Group.....................................................................................................72 3.12.1 Bypassing Traffic – Notification..................................................................................................................73 3.12.2 Bypassing Traffic Cleared..........................................................................................................................74 3.12.3 Unavailable Bypass Group—Critical Alarms..............................................................................................75 3.13 Alarm Model 12: Network Interface Errors........................................................................................................76 3.13.1 Network Interface Errors –Major and Minor Notifications...........................................................................77 3.13.2 Network Interface Errors- Clear..................................................................................................................78 3.13.3 PTS—Impact and Suggested Resolution, Alarm Model 12.......................................................................79 3.13.4 SPB—Impact and Suggested Resolution, Alarm Model 12.......................................................................80 3.14 Alarm Model 13: Discarded Packets.................................................................................................................81

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

5

 

6 | Sandvine | TOC

3.14.1 Discarded Packets - Notifications...............................................................................................................82 3.14.2 Discarded Packets- Clear...........................................................................................................................83 3.14.3 PTS—Impact and Suggested Resolution, Alarm Model 13.......................................................................84 3.14.4 SPB—Impact and Suggested Resolution, Alarm Model 13.......................................................................86 3.15 Alarm Model 14: Network Interface Down.........................................................................................................87 3.15.1 Network Interface Down - Notification........................................................................................................87 3.15.2 Network Interface Down - Clear.................................................................................................................88 3.15.3 Network Interface Down - Major Alarms.....................................................................................................88 3.15.4 Impact and Suggested Resolutions for Alarm Model 14............................................................................89 3.16 Alarm Model 15: Unavailable Processing Module.............................................................................................89 3.16.1 Load Balancer Down - Notification.............................................................................................................89 3.16.2 Load Balancer Down - Clear......................................................................................................................90 3.16.3 Load balancer down - minor alarms...........................................................................................................91 3.16.4 Load balancer down - major alarms...........................................................................................................92 3.16.5 Load balancer down - warning alarms.......................................................................................................93 3.17 Alarm Model 17: Degraded Cluster...................................................................................................................93 3.17.1 Degraded Cluster - Warning.......................................................................................................................93 3.17.2 Degraded Cluster -Clear............................................................................................................................94 3.17.3 Impact and Suggested Resolution, Alarm Model 17..................................................................................95 3.18 Alarm Model 18: Disconnected SPB.................................................................................................................95 3.18.1 Disconnected SPB - Major.........................................................................................................................96 3.18.2 Disabled SPB - Minor.................................................................................................................................96 3.18.3 Disconnected SPB -Clear...........................................................................................................................97 3.18.4 Impact and Suggested Resolution: Alarm Model 18..................................................................................98 3.19 Alarm Model 19: Invalid Software License........................................................................................................98 3.19.1 Invalid Software License—Critical..............................................................................................................99 3.19.2 Expiring Software License—Major.............................................................................................................99 3.19.3 Expiring Software License—Minor...........................................................................................................100 3.19.4 Expiring Software License—Warning.......................................................................................................100 3.19.5 Invalid Software License—Clear..............................................................................................................100 3.19.6 Impact and Suggested Resolution, Alarm Model 19................................................................................101 3.20 Alarm Model 20: Overloaded Cluster..............................................................................................................102 3.20.1 Overloaded Cluster - Major......................................................................................................................102 3.20.2 Overloaded Cluster - Minor......................................................................................................................102 3.20.3 Overloaded Cluster - Clear.......................................................................................................................103 3.20.4 Impact and Suggested Resolution, Alarm Model 20................................................................................103 3.21 Alarm Model 21: Overloaded Subcluster.........................................................................................................104 3.21.1 Overloaded Subcluster - Major.................................................................................................................104 3.21.2 Overloaded Subcluster - Clear.................................................................................................................105

6

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

Sandvine Sandvi ne | TOC |  7

3.21.3 Impact and Suggested Resolution, Alarm Model 21................................................................................105 3.22 Alarm Model 22: Misconfigured Network Awareness......................................................................................106 3.22.1 Misconfigured Network Awareness Alarm................................................................................................106 3.22.2 Misconfigured Network Awareness- Minor...............................................................................................106 3.22.3 Misconfigured Network Awareness - Clear..............................................................................................107 3.22.4 Impact and Suggested Resolution, Alarm Model 22................................................................................107 3.23 Alarm Model 23: Runtime SandScript Errors..................................................................................................108 3.23.1 Runtime SandScript Errors.......................................................................................................................109 3.23.2 Runtime SandScript Errors - Major .........................................................................................................110 3.23.3 Runtime Sandscript Errors - Minor...........................................................................................................110 3.23.4 Runtime Sandscript Errors - Clear...........................................................................................................111 3.23.5 Runtime SandScript Errors- Possible Instances......................................................................................112 3.23.6 SandScript Errors.....................................................................................................................................112 3.23.7 Impact and Suggested Resolution, Alarm Model 23................................................................................114 3.24 Alarm Model 24: High Network Interface Rx Rate...........................................................................................117 3.24.1 High Network Interface Rx Rate - Major...................................................................................................117 3.24.2 High Network Interface Rx Rate Cleared.................................................................................................118 3.24.3 Impact and Suggested Resolution, Alarm Model 24................................................................................119 3.25 Alarm Model 25: High Network Interface Tx Rate...........................................................................................120 3.25.1 High Network Interface Tx Rate- Major....................................................................................................120 3.25.2 High Network Interface Tx Rate Cleared..................................................................................................121 3.25.3 Impact and Suggested Resolution, Alarm Model 25................................................................................121 3.26 Alarm Model 26: Unavailable Disk..................................................................................................................122 3.26.1 Unavailable Disk.......................................................................................................................................123 3.26.2 Unavailable Disk Cleared.........................................................................................................................123 3.26.3 Impact and Suggested Resolution, Alarm Model 26................................................................................124 3.27 Alarm Model 27: Faulted Hardware.................................................................................................................125 3.27.1 Hardware fault..........................................................................................................................................126 3.27.2 Hardware no longer faulted......................................................................................................................127 3.27.3 Impact and Suggested Resolution, Alarm Model 27................................................................................127 3.28 Alarm Model 28: Discarded Subscriber State.................................................................................................127 3.28.1 Subscriber Mappings Cleared - Notification.............................................................................................128 3.28.2 Subscriber Mappings Cleared - Clear......................................................................................................128 3.28.3 Subscriber Mappings on SPB and PTS/SDE Cleared - Minor Alarm.......................................................129 3.28.4 Impact and Suggested Resolution, Alarm Model 28................................................................................129 3.29 Alarm Model 29: Disabled Subscriber Lookups..............................................................................................129 3.29.1 Disabled Subscriber Lookups...................................................................................................................130 3.29.2 Disabled Subscriber Lookups Cleared.....................................................................................................130 3.29.3 Subscriber Lookups Disabled—Minor......................................................................................................131

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

7

 

8 | Sandvine | TOC

3.29.4 Impact and Suggested Resolution, Alarm Model 29................................................................................131 3.30 Alarm Model 30: Delayed Subscriber Mapping...............................................................................................131 3.30.1 Delayed Subscriber Mapping...................................................................................................................131 3.30.2 Delayed Subscriber Mapping Cleared......................................................................................................132 3.30.3 Impact and Suggested Resolution, Alarm Model 30................................................................................133 3.31 Alarm Model 32: Disconnected Diameter Peer...............................................................................................134 3.31.1 Disconnected Diameter Peer...................................................................................................................134 3.31.2 Disconnected Diameter Peer Cleared......................................................................................................135 3.31.3 Impact and Suggested Resolution, Alarm Model 32................................................................................135 3.32 Alarm Model 33: Failed Power On Self Test....................................................................................................135 3.32.1 Failed Power On Self Test—Major...........................................................................................................136 3.32.2 Failed Power On Self Test—Clear............................................................................................................136 3.32.3 Impact and Suggested Resolution, Alarm Model 33................................................................................137 3.33 Alarm Model 34: High Traffic Discrepancy......................................................................................................137 3.33.1 High Traffic Discrepancy—Major..............................................................................................................137 3.33.2 High Traffic Discrepancy—Clear..............................................................................................................138 3.33.3 Impact and Suggested Resolution, Alarm Model 34................................................................................138 3.34 Alarm Model 35: Exhausted Resource............................................................................................................139 3.34.1 Exhausted Resource ...............................................................................................................................140 3.34.2 Exhausted Resource Cleared..................................................................................................................141 3.34.3 Exhausted Resource - Minor ...................................................................................................................144 3.34.4 Impact and Suggested Resolution, Alarm Model 35................................................................................144 3.35 Alarm Model 36: Faulted Form-factor Pluggable Module................................................................................150 3.35.1 Faulted Form-Factor Pluggable Module—Major......................................................................................150 3.35.2 Faulted Form-Factor Pluggable Module—Clear ......................................................................................152 3.35.3 Impact and Suggested Resolution, Alarm Model 36................................................................................153 3.36 Alarm Model 37: Faulted Blade.......................................................................................................................154 3.36.1 Faulted Blade—Major...............................................................................................................................155 3.36.2 Inactive Blade—Minor..............................................................................................................................156 3.36.3 Faulted Blade—Clear...............................................................................................................................156 3.36.4 Impact and Suggested Resolution, Alarm Model 37................................................................................157 3.37 Alarm Model 38: Diameter Error......................................................................................................................158 3.37.1 Unknown Diameter Session ID Error—Raise...........................................................................................158 3.37.2 Diameter Error—Clear..............................................................................................................................159 3.37.3 Impact and Suggested Resolution, Alarm Model 38................................................................................159 3.38 Alarm Model 39: Diameter Server Outgoing Message Age Exceeded Maximum Threshold..........................160 3.38.1 Diameter Server Outgoing Message Age Exceeded Maximum Threshold—Raise.................................161 3.38.2 Diameter Server Outgoing Message Age Exceeded Maximum Threshold—Clear..................................161 3.38.3 Impact and Suggested Resolution, Alarm Model 39................................................................................161

8

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

Sandvine Sandvi ne | TOC |  9

3.39 Alarm Model 40: Diameter Peer Failed Back Over.........................................................................................162 3.39.1 Diameter Peer Failed Back Over - Raise.................................................................................................162 3.39.2 Diameter Peer Failed Back Over—Clear.................................................................................................162 3.39.3 Impact and Suggested Resolution, Alarm Model 40................................................................................163 3.40 Alarm Model 41: Diameter Server Connection with Client Peer Lost..............................................................163 3.40.1 Diameter Server Connection with Client Peer Lost—Raise.....................................................................164 3.40.2 Diameter Server Connection with Client Peer Lost—Clear......................................................................164 3.40.3 Impact and Suggested Resolution, Alarm Model 41................................................................................164 3.41 Alarm Model 42: Diameter Client Outgoing Message Age Reached Early Threshold....................................165 3.41.1 Diameter Client Outgoing Message Age Reached Early Threshold—Raise...........................................165 3.41.2 Diameter Client Outgoing Message Age Reached Early Threshold—Clear............................................165 3.41.3 Impact and Suggested Resolution, Alarm Model 42................................................................................166 3.42 Alarm Model 43: Diameter Client Outgoing Message Age Exceeded Maximum Threshold...........................166 3.42.1 Diameter Client Outgoing Message Age Exceeded Maximum Threshold—Raise..................................167 3.42.2 Diameter Client Outgoing Message Age Exceeded Maximum Threshold—Clear...................................167 3.42.3 Impact and Suggested Resolution, Alarm Model 43................................................................................167 3.43 Alarm Model 44: Diameter Server Outgoing Message Age Reached Early Threshold...................................168 3.43.1 Diameter Server Outgoing Message Age Reached Early Threshold—Warning......................................168 3.43.2 Diameter Server Outgoing Message Age Reached Early Threshold - Clear...........................................169 3.43.3 Impact and Suggested Resolution, Alarm Model 44................................................................................169 3.44 Alarm Model 50: Unknown Diameter Session-ID............................................................................................169 3.44.1 Unknown Diameter Session ID Error- Minor............................................................................................170 3.44.2 Impact and Suggested Resolution, Alarm Model 50................................................................................170 3.45 Alarm Model 51: Diameter Interface Error.......................................................................................................171 3.45.1 Raise and clear notifications....................................................................................................................171 3.45.2 Diameter Interface Error—Major..............................................................................................................172 3.45.3 Impact and Suggested Resolutions, Alarm Model 51..............................................................................173 3.46 Alarm Model 52: Diameter Missing Subscriber Information............................................................................174 3.46.1 Missing subscriber information—Minor....................................................................................................174 3.46.2 Missing subscriber information—Clear.....................................................................................................175 3.46.3 Impact and Suggested Resolutions, Alarm Model 52..............................................................................175 3.47 Alarm Model 53: Unknown Diameter Service..................................................................................................175 3.47.1 Unknown Diameter Service - Major..........................................................................................................176 3.47.2 Unknown Diameter Service - Clear..........................................................................................................177 3.47.3 Impact and Suggested Resolutions, Alarm Model 53..............................................................................178 3.48 Alarm Model 59: Unavailable BGP Master......................................................................................................179 3.48.1 Unavailable BGP Master - Major..............................................................................................................179 3.48.2 Unavailable BGP Master - Clear..............................................................................................................180 3.48.3 Impact and Suggested Resolution, Alarm Model 59................................................................................180

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

9

 

10 | Sandvine | TOC

3.49 Alarm Model 60: Disconnected BGP Peer......................................................................................................181 3.49.1 Disconnected BGP Peer - Minor..............................................................................................................181 3.49.2 Disconnected BGP Peer - Major..............................................................................................................182 3.49.3 Disconnected BGP Peer - Clear...............................................................................................................182 3.49.4 Impact and Suggested Resolutions, Alarm Model 60..............................................................................183 3.50 Alarm Model 61: Analyzer parse errors—DEPRECATED...............................................................................184 3.50.1 svAnalyzerParseErrorRateNotification—Minor notification......................................................................184 3.50.2 svAnalyzerNoParseErrorRateNotification—Clear notification..................................................................185 3.50.3 Impact and Suggested Resolutions, Alarm Model 61..............................................................................185 3.51 Alarm Model 62: Too Many Concurrent Analyzer Flows—DEPRECATED.....................................................186 3.51.1 Impact and Suggested Resolutions, Alarm Model 62..............................................................................187 3.52 Alarm Model 63: Shunting Traffic Inspection...................................................................................................187 3.52.1 Shunting Traffic Inspection- Major............................................................................................................188 3.52.2 Shunting Traffic Inspection - Clear...........................................................................................................188 3.52.3 Impact and Suggested Resolutions, Alarm Model 63..............................................................................188 3.53 Alarm Model 77: Misconfigured Network Interface..........................................................................................189 3.53.1 Misconfigured Network Interface- Minor...................................................................................................189 3.53.2 Misconfigured Network Interface - Cleared..............................................................................................190 3.53.3 Impact and Suggested Resolutions, Alarm Model 77..............................................................................190 3.54 Alarm Model 79: Failed Reload.......................................................................................................................191 3.54.1 svSysLastReloadFailedNotification: Failed Reload .................................................................................191 3.54.2 svSysLastReloadSucceededNotification: Failed Reload—Clear.............................................................192 3.54.3 Failed Reload—All Alarms.......................................................................................................................193 3.54.4 Impact and Suggested Resolution, Alarm Model 79................................................................................193 3.55 Alarm Model 82: Shunting Abusive IPs...........................................................................................................193 3.55.1 svLBHighUsageIPShuntedNotification: Shunting Abusive IPs.................................................................194 3.55.2 svLBHighUsageIPClearNotification: Shunting Abusive IPs Cleared........................................................195 3.55.3 Impact and Suggested Resolutions, Alarm Model 82..............................................................................195 3.56 Alarm Model 87: Failed Health Check.............................................................................................................196 3.56.1 svServerHealthCheckErrorNotification: Failed Health Check..................................................................196 3.56.2 svServerHealthCheckErrorClearNotification: Failed Health Check Cleared............................................197 3.56.3 Impact and Suggested Resolutions, Alarm Model 87..............................................................................198 3.57 Alarm Model 88: Failed Health Check.............................................................................................................198 3.57.1 svInlineHealthCheckErrorNotification: Failed Health Check—Minor........................................................199 3.57.2 svInlineHealthCheckErrorClearNotification: Failed Health Check—Clear...............................................200 3.57.3 Impact and Suggested Resolutions, Alarm Model 88..............................................................................200 3.58 Alarm Model 110: Subscriber Mapping timestamps are in the past................................................................201 3.58.1 Subscriber Mapping timestamps are in the past — Warning ..................................................................201 3.58.2 Subscriber Mapping timestamps are in the past — Clear .......................................................................202

10

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

Sandvine | TOC |  11

3.58.3 Impact and Suggested Resolution, Alarm Model 110..............................................................................203 3.59 Alarm Model 111: Subscriber Mapping timestamps are in the future..............................................................204 3.59.1 Alarm Model 111: Subscriber Mapping timestamps are in the future.......................................................204 3.59.2 Subscriber Mapping timestamps are in the Future — Warning................................................................204 3.59.3 Subscriber Mapping timestamps are in the Future — Clear....................................................................205 3.59.4 Impact and Suggested Resolution, Alarm Model 111...............................................................................206 3.59.5 Debugging Mapping Failure on the PTS..................................................................................................207 3.60 Alarm Model 114: Subscriber Mapping Overloaded........................................................................................207 3.60.1 ProvisionOverloadMajEvt: Provision Subsystem is Dropping Events......................................................208 3.60.2 ProvisionOverloadClearEvt: Provision Subsystem is Not Overloaded.....................................................209 3.60.3 Impact and Suggested Resolution, Alarm Model 114..............................................................................210 3.61 Alarm Model 115: Invalid Subscriber Provisioning Parameters.......................................................................214 3.61.1 Invalid Subscriber Provisioning Parameters - Raise................................................................................215 3.61.2 Invalid Subscriber Provisioning Parameters - Clear.................................................................................216 3.61.3 Impact and Suggested Resolution, Alarm Model 115..............................................................................216 3.62 Alarm Model 116: Discarded Subscriber Provisioning Update........................................................................218 3.62.1 Discarded Subscriber Provisioning Update - Raise.................................................................................218 3.62.2 Discarded Subscriber Provisioning Update - Clear..................................................................................219 3.62.3 Impact and Suggested Resolution, Alarm Model 116..............................................................................220 3.63 Alarm Model 117: Delayed Mapping...............................................................................................................222 3.63.1 High Subscriber Provisioning Latency......................................................................................................222 3.63.2 Medium Subscriber Provisioning Latency................................................................................................223 3.63.3 Moderate Subscriber Provisioning Latency..............................................................................................224 3.63.4 Optimal Subscriber Provisioning Latency.................................................................................................225 3.63.5 Impact and Suggested Resolution, Alarm Model 117..............................................................................225 3.64 Alarm Model 125: Subscriber Mappings have Stalled or Halted.....................................................................227 3.64.1 Number of Mapping Requests is Stalling.................................................................................................227 3.64.2 Number of Mapping Requests is Not Stalling - Clear...............................................................................228 3.64.3 Impact and Suggested Resolution, Alarm Model 125..............................................................................228 3.65 Alarm Model 129: Disconnected Tee Destination............................................................................................229 3.65.1 svTeeDestinationDownNotification: Disconnected Tee Destination—Minor.............................................229 3.65.2 svTeeDestinationUpNotification: Disconnected Tee Destination—Clear..................................................230 3.65.3 Impact and Suggested Resolution, Alarm Model 129..............................................................................231 3.66 Alarm Model 130: Disconnected Divert Destination........................................................................................231 3.66.1 svDivertDestinationDownNotification: Disconnected Divert Destination—Minor.....................................232 3.66.2 svDivertDestinationUpNotification: Disconnected Divert Destination—Clear..........................................233 3.66.3 Impact and Suggested Resolution, Alarm Model 130..............................................................................233 3.67 Alarm Model 131: Disconnected Divert Sequence Destination.......................................................................234 3.67.1 svDivertSeqDestinationDownNotification : Disconnected Divert Sequence Destination.........................235

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

11

 

12 | Sandvine | TOC

3.67.2 svDivertDestinationSeqUpNotification:Disconnected Divert Sequence Destination—Clear....................235 3.67.3 Impact and Suggested Resolution, Alarm Model 131..............................................................................236 3.68 Alarm Model 132: Worn Solid State Drive.......................................................................................................237 3.68.1 svEnvStorageSSDLifespanNotification:Worn Solid State Drive...............................................................237 3.68.2 svEnvStorageSSDLifespanClearNotification: Worn Solid State Drive Cleared.......................................238 3.68.3 Impact and Suggested Resolution, Alarm Model 132..............................................................................239 3.69 Alarm Model 133: Misconfigured Cluster Name .............................................................................................240 3.69.1 svClusterInvalidNameErrorNotification:Misconfigured Cluster Name—Major.........................................240 3.69.2 svClusterInvalidNameClearNotification:Misconfigured Cluster Name—Clear.........................................241 3.69.3 Impact and Suggested Resolution, Alarm Model 133..............................................................................242 3.70 Alarm Model 136: Misconfigured Load Balancer ............................................................................................242 3.70.1 svLBCConfigErrorNotification: Misconfigured Load Balancer—Major.....................................................243 3.70.2 svLBCConfigClearNotification: Misconfigured Load Balancer Cleared....................................................243 3.70.3 Impact and Suggested Resolution, Alarm Model 136..............................................................................244 3.71 Alarm Model 146: Incompatible Blade ............................................................................................................245 3.71.1 Incompatible Blade- Critical .....................................................................................................................245 3.71.2 Impact and Suggested Resolution, Alarm Model 146..............................................................................246 3.72 Alarm Model 154: Invalid Subscriber Operation Name ..................................................................................247 3.72.1 Invalid Subscriber Operation Name - Major.............................................................................................247 3.72.2 Invalid Subscriber Operation Name - Minor.............................................................................................248 3.72.3 Invalid Subscriber Operation Name - Clear..............................................................................................248 3.72.4 Impact and Suggested Resolution, Alarm Model 154..............................................................................249 3.73 Alarm Model 157: Unavailable Accounting Server .........................................................................................249 3.73.1 svSystemAccountingQueueErrorNotification—Raise............................. svSystemAccountingQueueErrorNotification—Raise............... ............................. ............................. ............................ ....................250 ......250 3.73.2 svSystemAccountingQueueClearNotification—Clear.................. svSystemAccountingQueueClearNotification—Clear................................ ............................ ............................ ............................ ..................250 ....250 3.73.3 Impact and Suggested Resolution, Alarm Model 157..............................................................................251 3.74 Alarm Model 167: Delayed Distribution Event.................................................................................................252 3.74.1 svDelayedShapingDistributionNotification—Major...................................................................................252 3.74.2 svDelayedShapingDistributionNotification—Minor...................................................................................252 3.74.3 svDelayedShapingDistributionNotification—Clear.................... svDelayedShapingDistributionNotification—Clear.................................. ............................ ............................ ............................ .....................253 .......253 3.74.4 Impact and Suggested Resolution, Alarm Model 167..............................................................................253 3.75 Alarm Model 168: High Login Failures............................................................................................................253 3.75.1 High Login Failures- Minor ......................................................................................................................254 3.75.2 High Login Failures- Clear .......................................................................................................................254 3.75.3 Impact and Suggested Resolution, Alarm Model 168..............................................................................255 3.76 Alarm Model 175: License Service Unavailable..............................................................................................255 3.76.1 License Service Unavailable–Major.........................................................................................................256 3.76.2 License Service Unavailable–Clear..........................................................................................................256 3.76.3 Impact and Suggested Resolution, Alarm Model 175..............................................................................257

12

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

Sandvine | TOC |  13

3.77 Alarm Model 187: SandScript Alarm...............................................................................................................257 3.77.1 SandScript Alarm Severity.......................................................................................................................258 3.77.2 Impact and Suggested Resolution, Alarm Model 187..............................................................................259 3.78 Alarm Model 191: Overloaded Diameter Aggregator......................................................................................259 3.78.1 Overloaded Diameter Aggregator–Major.................................................................................................259 3.78.2 Overloaded Diameter Aggregator–Clear..................................................................................................260 3.78.3 Impact and Suggested Resolution, Alarm Model 191..............................................................................260 4 Miscellaneous Traps.................................................................................................................................................262 4.1 Miscellaneous Traps..........................................................................................................................................263 4.1.1 coldStart Trap.............................................................................................................................................263 4.1.2 warmStart Trap...........................................................................................................................................263 4.1.3 Administratively Enabled Interface.............................................................................................................264 4.1.4 Administratively Disabled Interface............................................................................................................265 4.1.5 SNMP Agent Started..................................................................................................................................265 4.1.6 SNMP Agent Shutting Down......................................................................................................................266

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

13

 

Notifications

1 Notifications • "Notifications" on page 15

14

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

Notifications Notifications

1.1 Notifications Notifications are the communication tool for alarms. They follow the structure derived from: • • • •

RFC 3413 SNMP-NO SNMP-NOTIFIC TIFICA ATION-M TION-MIB IB RFC 3014 NOTIF NOTIFICA ICATIONTION-LOG-MI LOG-MIB B RFC 341 3413 3 SNMP SNMP-T -TARG ARGETET-MIB MIB RF RFC C 3877 3877 Alar Alarm m MIB

Managers can review alarm model Managers models s in the Alarm MIB to determine if a notifi notification cation is of intere interest st for alarm manage management. ment. If there are no entries in the alarmModelTable that match a particular notification, that notification is not relevant to the alarm models defined. Information in the alarm model, such as the Notification ID or the description, specify which error or warning condition the alarm indicates. If the ITU-ALARM-MIB is also supported, additional information is provided via the probable cause. Note: Unless indicated to the contrary, all varbinds referenced in this document, with names beginning with  sv , derive from SANDVINE-MIB.

1.1.1 Sandvine Notifications Sandvine notifications are raised and cleared in the Sandvine alarms suite.  All MIB M IB references in notifications are from the S SANDVINE-RAIDMON-MIB ANDVINE-RAIDMON-MIB and S SNMPv2-MIB. NMPv2-MIB.

1.1.1.1 1.1.1. 1 Bad Logical Drive This provides a notification on the health of logical drives. Good and bad logical drive notifications share these MIB references: • • • • •

sy sysN sNam ame e sv svSe Seve veri rity ty svRaid svRaidMonDa MonDataLogi taLogicalDri calDriveDevi veDeviceName ceName svRaid svRaidMonDa MonDataLogi taLogicalDri calDriveRai veRaidLevel dLevel svRaid svRaidMonDa MonDataPhysi taPhysicalDri calDriveRaid veRaidState State

 A bad logical drive notification is sent when the state of the logical drive is anything other than optimal. Severity

Description

Warning

The drive is rebuilding.

Major 

The drive is degraded.

Critical

The drive is either failed or offline.

Profile

Description

Trap Name

svRaidMonLogicalDriveBadStateNotification

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

15

 

Notifications Notifications

Profile

Description

Varbinds

svClusterConfigName

1.1.1.2 1.1.1. 2 Good Logical Drive  A notification is sent when the logical drive returns to an optimal stat state. e. Profile

Description

Trap Name

svRaidMonLogicalDriveGoodStateNotification

Varbinds

svClusterConfigName

1.1.1.3 Faulted Physical Drive This provides a notification on the physical drive. Faulted/not faulted physical drive notifications share these MIB references: • •

sy sysN sNam ame e sv svSe Seve veri rity ty

• • •

svRaid svRaidMonDa MonDataPhysi taPhysicalDri calDriveChan veChannelNu nelNumber  mber  svRaid svRaidMonDa MonDataPhysi taPhysicalDri calDriveDevi veDeviceNumb ceNumber  er  svRaid svRaidMonDa MonDataPhysi taPhysicalDri calDriveRaid veRaidState State

The faulted physical drive notification is sent when a physical device enters a state other than online or hot spare. Severity

Description

Warning

Drive is ready.

Minor 

Drive is rebuilding.

Major 

Drive is failed.

Profile

Description

MIB reference

SANDVINE-RAIDMON-MIB and SNMPv2-MIB • • • • •

sysN sysNam ame e svSe svSeve veri rity ty svRaidMonDa svRaidMonDataPhysi taPhysicalDri calDriveChan veChannelNu nelNumber  mber  svRaidMonDa svRaidMonDataPhysi taPhysicalDri calDriveDevi veDeviceNumb ceNumber  er  svRaidMonDa svRaidMonDataPhysi taPhysicalDri calDriveRaid veRaidState State

Trap Name

svRaidMonPhysicalDeviceFaultedNotification

Varbinds

svClusterConfigName

16

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

Notifications Notifications

1.1.1.4 1.1.1. 4 Physical Physical Drive Not Faulted This provides a notification on the physical drive. The physical drive not faulted notification is sent when a physical device enters either the online or hot spare state. Profile

Description

MIB reference

SANDVINE- RAIDMON-MIB and SNMPv2-MIB • sysN sysNam ame e • svSe svSeve veri rity ty • svRaidMonDa svRaidMonDataPhysi taPhysicalDri calDriveChan veChannelNu nelNumber  mber  • svRaidMonDa svRaidMonDataPhysi taPhysicalDri calDriveDevi veDeviceNumb ceNumber  er  • svRaidMonDa svRaidMonDataPhysi taPhysicalDri calDriveRaid veRaidState State

Trap Name

svRaidMonPhysicalDeviceNotFaultedNotification

Varbinds

svClusterConfigName

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

17

 

Key Performance Indicators

2 Key Performance Indicators • "General Resources" on page 19 • "Inspection Performance Monitoring" on page 20 • "Interfaces" on page 22 • "Subscriber Monitoring" on page 23

18

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

Key Performance Indicators General Resources

2.1 General Resources These are the general resources that are shared by different processes. These resources are listed as part of the host resources storage table described in HOST-RESOURCES-MIB. HOST-RESOURCES-MIB::hrStorageTable (.1.3.6.1.2.1.25.2.3).

2.1.1 Memory Resources These resources are described in the host resource storage table with type hrStorageRam (1.3.6.1.2.1.25.2.1.2).

2.1.2 PTS Per Module General Resources Different processes share these resources are shared on a per-module basis and described in the host resource storage table with type hrStorageByModuleTable (1.3.6.1.4.1.11610.435.15747.1.25.2.4). The table index is based on these values values:: •

Th The e re reso sour urce ce,of , such su ch  show as Re Real alsyste Me Memor mory (2 (2), ), Shap Shrces apin ing g CLI me memo mory ry (3 (35) 5),, CP CPU U Re Reso sour urce ce (6 (67) 7),, PTS PTS Subsc Subscri ribe berr Coun Countt (2 (21) 1),, as indi indica cate ted d in the output the command. system m yresou resources



The module ID—0 for co controll ntroller er an and d 1-10 1-10 for module modules. s.



The in instance stance ID, ID, which is the resource resource instan instance ce on the given m module odule..

Memory Resources

Per module real memory is listed in host resources storage table under hrStorageIndex.2. These parameters provide usage information: •   SANDVINE-MIB::hrStorageAllocationUnits.2 (1.3.6.1.4.1.11610.435.15747.1.25.2.4.1.6.2) •   SANDVINE-MIB::hrStorageSize.2 (1.3.6.1.4.1.11610.435.15747.1.25.2.4.1.7.2) •   SANDVINE-MIB::hrStorageUsed.2 (1.3.6.1.4.1.11610.435.15747.1.25.2.4.8.2) •   SANDVINE-MIB::hrStorageAllocationFailures.2 (1.3.6.1.4.1.11610.435.15747.1.25.2.4.1.9.2) Per module module sha shapin ping g mem memory ory is lis listed ted in hos hostt res resour ources ces storage storage table table und under er hrStora hrStorageIn geIndex dex.35. .35. The These se parame parameters ters provid provide e usage usage information: •   SANDVINE-MIB::hrStorageAllocationUnits.35 (1.3.6.1.4.1.11610.435.15747.1.25.2.4.1.6.35) •   SANDVINE-MIB::hrStorageSize.35 (1.3.6.1.4.1.11610.435.15747.1.25.2.4.1.7.35) •   SANDVINE-MIB::hrStorageUsed.35 (1.3.6.1.4.1.11610.435.15747.1.25.2.4.8.35) •   SANDVINE-MIB::hrStorageAllocationFailures.35 (1.3.6.1.4.1.11610.435.15747.1.25.2.4.1.9.35) Inspection Performance Monitoring

Per module CPU resource is listed in host resources storage table under hrStorageIndex.67. These parameters provide usage information: •   SANDVINE-MIB::hrStorageAllocationUnits.67 (1.3.6.1.4.1.11610.435.15747.1.25.2.4.1.6.67) •   SANDVINE-MIB::hrStorageSize.67 (1.3.6.1.4.1.11610.435.15747.1.25.2.4.1.7.67)

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

19

 

Key Performance Indicators Inspection Performance Monitoring

•   SANDVINE-MIB::hrStorageUsed.67 (1.3.6.1.4.1.11610.435.15747.1.25.2.4.1.8.67) •   SANDVINE-MIB::hrStorageAllocationFailures.67 (1.3.6.1.4.1.11610.435.15747.1.25.2.4.1.9.67) Per module module memory memory resour resource ce is listed listed in hos hostt resour resources ces storag storage e tab table le und under er hrStor hrStorage ageInd Index. ex.65. 65. These These par parame ameter ters s provid provide e usage usage information: •   SANDVINE-MIB::hrStorageAllocationUnits.65 (1.3.6.1.4.1.11610.435.15747.1.25.2.4.1.6.65) •   SANDVINE-MIB::hrStorageSize.65 (1.3.6.1.4.1.11610.435.15747.1.25.2.4.1.7.65) •   SANDVINE-MIB::hrStorageUsed.65 (1.3.6.1.4.1.11610.435.15747.1.25.2.4.1.8.65) •   SANDVINE-MIB::hrStorageAllocationFailures.65 (1.3.6.1.4.1.11610.435.15747.1.25.2.4.1.9.65)

2.2 Inspection Performance Monitoring Inspection performance monitoring provides information on resources.

2.2.1 CPU Resource These variables are used to monitor the CPU usage the inspection process consumes. • SANDVINE-MIB::svPtsResourcesStatsCpu (1.3.6.1.4.1.11610.435.8374.1.7723.3.1). The current maximum percentage utilization of the CPU by the process across all processing modules in the cluster. •

SANDVIN SANDVINE-MIB::s E-MIB::svPtsRes vPtsResources ourcesStatsPe StatsPeakcpu akcpu (1.3.6.1 (1.3.6.1.4.1.1 .4.1.11610.4 1610.435.837 35.8374.1.77 4.1.7723.3.2) 23.3.2).. The peak CPU percentage that the process has used since last reset.

2.2.1.1 Threshold When SANDVINE-MIB::svPtsResourcesStatsCpu exceeds an average of 90%, it is best to contact Sandvine Customer Support or its authorized partner to have them examine the system to determine the stability of the system. Generally, this number is higher during peak load (often during peak hours). Instantaneous peaks may occur under network failure conditions (massive number of new subscribers re-attach to the network for example), so the intent is to trend this value over time.

2.2.2 Memory Resource These variables are used to monitor how much memory usage the inspection process consumes. SANDVINE-MIB::svPtsResourcesStatsMemory (1.3.6.1.4.1.11610.435.8374.1.7723.3.3). The percentage of memory that the process is using. SANDVINE-MIB::svPtsResourcesStatsPeakmemory (1.3.6.1.4.1.11610.435.8374.1.7723.3.4). The peak memory percentage that the process has used since last reset.

2.2.2.1 Threshold When SANDVINE-MIB:: svPtsResourcesStatsMemory exceeds an average of 90%, it is best to contact Sandvine Customer  Support to have them examine the system. Generally, this number is highest during peak load (often during peak hours).

20

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

Key Performance Indicators Inspection Performance Monitoring

2.2.3 Inspection Engine SANDVINE-MIB::svPtsInspectionInspectTimeout (1.3.6.1.4.1.11610.435.8157.1.3.3.1) This is a counte counterr that counts the number of flows that have timed out while the PTS still inspects flows flows.. Possible causes for  timeout are extremely short flows—as in unrecognised connection attempts from P2P sessions over either TCP or UDP, TCP flows with no data after handshakes, and port scan attempts. SANDVINE-MIB::svPtsInspectionNotInspected (1.3.6.1.4.1.11610.435.8157.1.3.3.2) This is a counter that shows how many packets that the switch was not able to send for inspection because the inspection engine was too busy and the packets queue was full. SANDVINE-MIB::svPtsInspectionNoInspectionEngine (1.3.6.1.4.1.11610.435.8157.1.3.3.3) This This is a coun counter ter th that at show shows s ho how w many many packe packets ts that that the the swit switch ch wa was s not not able able to send send to the the insp inspec ecti tion on engi engine ne beca becaus use e the the daem daemon on did not exist at that time. SANDVINE-MIB::svPtsInspectionEarlyDiscard (1.3.6.1.4.1.11610.435.8157.1.3.3.4) This is a counter that shows the number of packets that were not sent from the switch to the inspection engine due to a limitation in the free portion of the messag message e queue. Threshold

When any of the above values exceed 0, it is best to contact Sandvine Customer Support or its authorized partner to have them examine the SandScript, as well as the other performance metrics, to understand whether additional capacity is needed.

2.2.3.1 Threshold When any of the above values exceed 0, it is best to contact Sandvine Customer Support or its authorized partner to have them examine the SandScript, as well as the other performance metrics, to understand whether additional capacity is needed.

2.2.4 Flow Management SANDVINE-MIB::svPtsFlowsTotal (1.3.6.1.4.1.11610.435.8157.1.3.1.1) This is the maximu maximum m number of flow states that the switch can support. This represe represents nts the maximum number of flows that the policy traffic traffic switch can manage, at any given time. This number is consta constant. nt. SANDVINE-MIB::svPtsFlowsAvailable (1.3.6.1.4.1.11610.435.8157.1.3.1.2) This This is a coun counter ter th that at show shows s ho how w ma many ny mo more re flow flows s tha thatt the the switc switch h can can ma mana nage ge befo before re ru runn nnin ing g out out of me memo mory ry (fl (flow ow re reco cord rd spac space) e).. SANDVINE-MIB::svPtsFlowsMaxExceeded (1.3.6.1.4.1.11610.435.8157.1.3.1.4) This is a counter that shows how many flows that the switch did not process due to running out of flow records space. SANDVINE-MIB::svPtsFlowsNew (1.3.6.1.4.1.11610.435.8157.1.3.1.3) This is a counter that shows how many flows the box has seen since it started. You can use this to calculate the rate of new flows per second.

2.2.4.1 Threshold When SANDVINE-MIB::svPtsFlowsAvailable divided by SANDVINE-MIB::svPtsFlowsTotal exceeds 70% or if  SANDVINE-MIB::svPtsFlowsMaxExceeded is greater than 0, contact Sandvine Customer Support or its authorized partner to discuss an upgrade strategy.

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

21

 

Key Performance Indicators Interfaces

Instan Instantan taneou eous s pea peaks ks may occur occur on the num number ber of flows flows due to mal malici iciou ous s beh behavi avior or (SY (SYN N flood? flood? Flo Flow w flood) flood) or handse handsets ts reatta reattachi ching. ng. It is best to look at trends in these values over time. Polling the SANDVINE-MIB:: SANDVINE-MIB:: svPtsFlowsNew svPtsFlowsNew value to calculate a rate of  new flows per second will help understand the performance of the network. As the rate approaches and exceeds 100,000 new flows per second, it is likely that you will see similar behavior in CPU usage. These two variables are correlated.

2.3 Interfaces The interfaces interfaces table is part of the standa standard rd interfaces MIB that used to retrie retrieve ve stats information such as ifInOctets and ifSpeed. This table is located in IF-MIB::ifTable (1.3.6.1.2.1.2.2).

2.3.1 Bitrate Capacity  Aggregate throughput of the sy system stem is oft often en a good indicator in wireline networks as to t o the capacity of tthe he system. syst em. In mobile, m obile, however, we have found that looking at trends compared to CPU, subscriber count, memory and new flows/second are all required. The interfaces interfaces table is part of the standa standard rd interfaces MIB and can be used to retrieve retrieve statistical statistical informatio information n such as ifInOctets and ifSpeed. This table can be found in IF-MIB::ifTable (1.3.6.1.2.1.2.2). To retrieve the list of interfaces, their description and ifIndex, run: snmpwalk snmp walk -c publ public ic -v 1 loca localhos lhost t IF-M IF-MIB:: IB::ifDe ifDescr scr

You can then use the ifIndex to query on the specific interface. In the interface interface table, the value ifInOctets can be sampled sampled over a specifi specific c amount of time to calculate the receive rate of each interface. interfa ce. The line rate can be compared compared to the maximum line rate provided by ifSpeed. The MIBs to use are: MIB IF-MIB::ifOutOctets (1.3.6.1.2.1.2. 2.1.10) IF-MIB::ifInOctets (1.3.6.1.2.1.2. IF-MIB::ifInOctets (1.3.6.1.2.1.2. 2.1.10)

Description

The total total num number ber of oct octets ets transm transmitt itted ed on the int interf erface ace,, includ including ing framin framing g cha charac racter ters. s.

The total number of octets received on the interface, including framing characters.

 An estimate of tthe he interface's current bandwidth in bits per second. For interfaces IF-MIB::ifDescr (1.3.6.1.2.1.2.2.1. which do not vary in bandwidth or for those where no accurate estimati estimation on can be 2)

made, this object should contain the nominal bandwidth. IF-MIB::ifHCInOctets IF-MIB::ifHCInOcte ts (1.3.6.1.2 (1.3.6.1.2.1. .1. 31.1.1.1.6)

The total number of octets received on the interface, including framing characters. This object is a 64-bit version of ifInOctets and should be used for interfaces that are faster than 1 Gbps. The total number of octets transmitted out of the interface, including framing

IF-MIB::ifHCOutOctets (1.3.6.1.2.1. characters. This object is a 64-bit version of ifOutOctets and should be used for  31.1.1.1.10)

interfaces interfa ces that are faster than 1 Gbps. SANDVINE-MIB::svPortTopology

To under understa stand nd the agg aggreg regate ate thr throug oughp hput ut of the elemen element, t, the bri bridge dge gro group up inform informati ation on is required.

Conversion from Bridge Group ID into Ifindex is available. The delta of IfInOctets+ifOutOctets over time for all subscriber (or internet) facing ports will calculate the aggregate bitrate.

22

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

Key Performance Indicators Subscriber Monitoring

2.3.1.1 Threshold You should consider planning to upgrade when the aggregate bandwidth reaches 7 Gbps. There are a variety of factors that play into the solution sizing — 7 Gbps with few subscribers and few flows is likely not near capacity, but lots of subscribers and flows could be. Contact Sandvine Customer Support or its authorized partner to discuss an upgrade strategy.

2.4 Subscriber Monitoring This section contains information useful for monitoring the current capacity of the subscriber handling feature.

2.4.1 PTS Subscriber Subscribers s Count Resource PTS Subscriber Mappings, in the host resources storage table (HOST-RESOURCES-MIB::hrStorageTable (1. (1.3.6 3.6.1. .1.2.1 2.1.25 .25.2. .2.3)) 3)),, are use used d to calcul calculate ate the curren currentt sub subscr scribe iberr han handli dling ng capaci capacity ty.. Thi This s resour resource ce is ide identi ntifie fied d in thi this s host host res resour ource ce table with hrStorageIndex 21. This information presents the current and total sizes of memory used by subscriber handling: • • • •

HOSTHOST-RESOUR RESOURCES-MIB CES-MIB::hrStor ::hrStorageAl ageAllocati locationUnit onUnits.21 s.21 (1.3. (1.3.6.1.2.1 6.1.2.1.25.2.3 .25.2.3.1.4.21 .1.4.21)) HOSTHOST-RESOUR RESOURCES-MIB CES-MIB::hrStor ::hrStorageSi ageSize.21 ze.21 (1.3.6. (1.3.6.1.2.1.2 1.2.1.25.2.3.1 5.2.3.1.5.21) .5.21) HOSTHOST-RESOUR RESOURCES-MIB CES-MIB::hrStor ::hrStorageUse ageUsed.21 d.21 (1.3.6 (1.3.6.1.2.1.2 .1.2.1.25.2.3.1 5.2.3.1.6.21) .6.21) HOSTHOST-RESOUR RESOURCES-MIB CES-MIB::hrStor ::hrStorageAl ageAllocati locationFai onFailures. lures.21 21 (1.3.6.1.2. (1.3.6.1.2.1.25.2 1.25.2.3.1.7.2 .3.1.7.21) 1)

2.4.1.1 Threshold When HOST-RESOURCES-MIB::hrStorageUsed.21 divided by HOST-RESOURCES-MIB::hrStorageSize.21 exceeds 70% or if  HOST-RESOURCES-MIB::hrStorageAllocationFailures.21is greater than 0, please contact Sandvine Customer Support or its authorized partner to discuss an upgrade strategy.

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

23

 

PTS Alarms

3 PTS Alarms • "Alarm Models" on page 27 • "Alarm Model 1: Faulted Hardware" on page 27 • "Alarm Model 2: Faulted Disk" on page 30 • "Alarm Model 3: High Temperature" on page 32 • "Alarm Model 4: Faulted Fan" on page 35 • "Alarm Model 5: Fault Faulted ed Power Supply" on page 37 • "Alarm Model 6: High Power Usage" on page 47 • "Alarm Model 7 : High Resource Usage" on page 49 • "Alarm Model 8: Overloaded Processor" on page 61 • "Alarm Model 9: Unavailable Processing Module" on page 65 • "Alarm Model 10: Unavailable Service Component" on page 68 • "Alarm Model 11: Unavailable Bypass Group" on page 72 • "Alarm Model 12: Networ Network k Interface Errors" on page 76 • "Alarm Model 13: Discarded Packets" on page 81 • "Alarm Model 14: Networ Network k Interface Down" on page 87 • "Alarm Model 15: Unavailable Processing Module" on page 89

24

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms

• "Alarm Model 17: Degraded Cluster" on page 93 • "Alarm Model 18: Disconnected SPB" on page 95 • "Alarm Model 19: Invalid Software License" on page 98 • "Alarm Model 20: Overloaded Cluster" on page 102 • "Alarm Model 21: Overloaded Subcluster" on page 104 • "Alarm Model 22: Misconfigured Network Awareness" on page 106 • "Alarm Model 23: Runtime SandScript Errors" on page 108 • "Alarm Model 24: High Network Network Interface Rx Rate" on page 117 • "Alarm Model 25: High Network Network Interface Tx Rate" on page 120 • "Alarm Model 26: Unava Unavailabl ilable e Disk" on page 122 • "Alarm Model 27: Faulted Hardware" on page 125 • "Alarm Model 28: Discarded Subscriber Subscriber State" on page 127 • "Alarm Model 29: Disabled Subscriber Lookups" on page 129 • "Alarm Model 30: Delayed Subscriber Mapping" on page 131 • "Alarm Model 32: Disconnected Diameter Peer" on page 134 • "Alarm Model 33: Failed Power On Self Test" on page 135 • "Alarm Model 34: High Traffic Discrepancy" on page 137 • "Alarm Model 35: Exhausted Resource" on page 139 • "Alarm Model 36: Faulted Form-factor Pluggable Module" on page 150 • "Alarm Model 37: Fault Faulted ed Blade" on page 154 • "Alarm Model 38: Diameter Error" on page 158 • "Alarm Model 39: Diameter Server Outgoing Message Age Exceeded Maximum Threshold" on page 160 • "Alarm Model 40: Diame Diameter ter Peer Failed Back Over" on page 162 • "Alarm Model 41: Diameter Server Connection with Client Peer Lost" on page 163 • "Alarm Model 42: Diameter Client Outgoing Message Age Reached Early Threshold" on page 165 • "Alarm Model 43: Diameter Client Outgoing Message Age Exceeded Maximum Threshold" on page 166 • "Alarm Model 44: Diameter Server Outgoing Message Age Reached Early Threshold" on page 168 • "Alarm Model 50: Unknown Diameter Session-ID" on page 169 • "Alarm Model 51: Diame Diameter ter Interface Error" on page 171 • "Alarm Model 52: Diameter Missing Subscriber Information" on page 174 • "Alarm Model 53: Unknown Diameter Service" on page 175 • "Alarm Model 59: Unavailable BGP Master" on page 179 • "Alarm Model 60: Disconnected BGP Peer" on page 181 • "Alarm Model 61: Analyzer parse errors—DEPRECATED" on page 184 • "Alarm Model 62: Too Many Concurrent Analyzer Flows—DEPRECATED" on page 186

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

25

 

PTS Alarms

• "Alarm Model 63: Shunting Traffic Inspection" on page 187 • "Alarm Model 77: Misconfigured Network Interface" on page 189 • "Alarm Model 79: Failed Reload" on page 191 • "Alarm Model 82: Shunti Shunting ng Abusive IPs" on page 193 • "Alarm Model 87: Fail Failed ed Health Check" on page 196 • "Alarm Model 88: Fail Failed ed Health Check" on page 198 • "Alarm Model 110: Subscriber Mapping timestamps are in the past" on page 201 • "Alarm Model 111: Subscriber Mapping timestamps are in the future" on page 204 • "Alarm Model 114: Subscriber Mapping Overloaded" on page 207 • "Alarm Model 115: Invalid Subscriber Provisioning Parameters" on page 214 • "Alarm Model 116: Discarded Subscriber Provisioning Update" on page 218 • "Alarm Model 117: Delayed Mapping" on page 222 • "Alarm Model 125: Subscriber Mappings have Stalled or Halted" on page 227 • "Alarm Model 129: Disconnected Tee Destination" on page 229 • "Alarm Model 130: Disconnected Divert Destination" on page 231 • "Alarm Model 131: Disconnected Divert Sequence Destination" on page 234 • "Alarm Model 132: Worn Solid State Drive" on page 237 • "Alarm Model 133: Misconfigured Cluster Name " on page 240 • "Alarm Model 136: Misconfigured Load Balancer " on page 242 • "Alarm Model 146: Incompatible Blade Blade " on page 245 • "Alarm Model 154: Invalid Subscriber Subscriber Operation Name " on page 247 • "Alarm Model 157: Unavailable Unavailable Accounting Server " on page 249 • "Alarm Model 167: Delayed Distribution Event" on page 252 • "Alarm Model 168: High Login Failures" on page 253 • "Alarm Model 175: License Service Unavailable" on page 255 • "Alarm Model 187: SandScript Alarm" on page 257 • "Alarm Model 191: Overloaded Diameter Aggregator" on page 259

26

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Models

3.1 Alarm Models Sandvine alarm models follow the structural guidelines of RFC 3877 Alarm Management Information Base (MIB). Each model is a group of alarms with different severities and their respective notifications. The severities include: Severity Level 1

Severity Clear 

2

Indeterminate

3

Critical

4

Major 

5

Minor 

6

Warning Note: The Unique Instance Identifier listed for many alarms, identifies (or specifies) the particular device type involved. Each independently monitored device type is a separate alarm 'instance'.

3.2 Alarm Model 1: Faulted Hardware This alarm is raised when a system first detects machine-check errors. These errors are usually uncorrectable or fatal, and can have a severe impact on system operations, depending on the device on which they occurred.  All device types are monitored independently and ffailures ailures in multiple devices result in multiple notifications. Uncorrec Uncorrectable table machine-check errors generate major notifications, and fatal errors generate critical notifications. Correctable machine-check errors do not generate any notifications. Note:  Alarm model 1 (Faulted Hardware) is not supported on the PTS Linux platf platform. orm. Profile

Description

Severities

Major 

Raise Notification

Critical svEnvMachineCheckErrorNotification

Clear Notification

svEnvMachineCheckErrorClearNotification

Triggers



Mach Machine ineChe CheckF ckFata atall



Machin MachineCheck eCheckUncorr Uncorrectabl ectable e



Mach Machine ineChe CheckN ckNoEr oError  ror 

Unique Instance Identifier 

svMachineCheckDeviceType

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

27

 

PTS Alarms  Alarm Model 1: Faulted Hardware

Related CLI show syst system em hard hardware ware

3.2.1 Faulted Hardware

This notification is sent if the number of machine-check uncorrectable errors exceeds zero, and the number of machine-check fatal errors is equal to zero. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svEnvMachineCheckErrorNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.1.0.1

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svMachineCheckDeviceDescription

1.3.6.1.4.1.11610.435.11250.1.10.1.6

svMachineCheckDeviceType

1.3.6.1.4.1.11610.435.11250.1.10.1.2

svMachineCheckDeviceCorrectable

1.3.6.1.4.1.11610.435.11250.1.10.1.3

svMachineCheckDeviceUnCorrectable

1.3.6.1.4.1.11610.435.11250.1.10.1.4

svMachineCheckDeviceFatal

1.3.6.1.4.1.11610.435.11250.1.10.1.5

3.2.1.1 Degraded Hardware This major notification indicates there is an uncorrectable machine-check error. There are non-zero entries in front of the specific device dev ice/de /devic vices es that that raised raised the ala alarm. rm. This This noti notifica ficatio tion n is sen sentt if mach machine ine-ch -check eck uncorr uncorrecta ectable ble err errors ors exceed exceed 0 and mach machine ine-ch -check eck fatal errors are equal to 0. Profile

Description

Frequency Severity

8 seconds Major 

Condition

(SANDVINE-MIB::svMachineCheckDeviceUnCorrectable > 0) && (SANDVINE-MIB::svMachin eCheckDeviceFatal) == 0

3.2.1.2 Faulted Hardware This critical notification indicates a fatal machine-check error has occurred. There are non-zero entries in front of the specific device/devices that raised the alarm. This notification is sent if machine-check fatal errors exceed 0. Profile

Description

Frequency

8 seconds

Severity

Critical

Condition

SANDVINE-MIB::svMachineCheckDeviceFatal > 0

28

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 1: Faulted Hardware

3.2.1.3 Faulted Hardware Cleared This notification is sent if machine-check errors for a particular device have stopped. This notification is sent only if there was a previous svMachineCheckErrorNotification. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svEnvMachineCheckErrorClearNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.1.0.2

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svMachineCheckDeviceDescription

1.3.6.1.4.1.11610.435.11250.1.10.1.6

svMachineCheckDeviceType

1.3.6.1.4.1.11610.435.11250.1.10.1.2

svMachineCheckDeviceCorrectable

1.3.6.1.4.1.11610.435.11250.1.10.1.3

svMachineCheckDeviceUnCorrectable

1.3.6.1.4.1.11610.435.11250.1.10.1.4

svMachineCheckDeviceFatal

1.3.6.1.4.1.11610.435.11250.1.10.1.5

3.2.1.4 3.2.1. 4 machine-check machine-check clear errors This alarm is cleared when machine-check uncorrectable errors are equal to 0 and machine-check fatal errors are equal to 0. Profile

Description

Frequency

8 seconds

Severity

Cleared

Condition

(SANDVINE-MIB::svMachineCheckDeviceUnCorrectable == 0) && (SANDVINE-MIB::svMachin eCheckDeviceFatal == 0)

3.2.2 Impact and Suggested Resolution, Alarm Model 1

error,, data has been corrupted and the hardware cannot cannot fix it, but the failing interface is aware 1.   If there is an uncorrectable error that an error has occurred. Normal operation may be affected depending on where the error occurred. It is recommended that the unit be removed, since it is no longer functioning properly. Note:  A small number of correctable errors are normal and do not necessarily indicate a hardware problem. Performance Perf ormance is not affected and corrective action is not required. 2.   If there is a fatal error error,, data has been corrupte corrupted d and the hardware can cannot not fix it, and the failing inte interface rface is not operati operating ng properly. Normal operation will likely be affected depending on where the error occurred.

It is recommended that the unit be removed, since it is no longer functioning properly.

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

29

 

PTS Alarms  Alarm Model 2: Faulted Disk

3.3 Alarm Model 2: Faulted Disk This alarm is raised when one or more severe errors have occurred on a disk. It is normal for a disk to develop growth errors. Even uncorrectable read and write errors are normal and the RAID controller typically corrects it. Different hardware platforms have different number of disks installed. Each disk can raise an instance of this alarm. Run the   sho show w sys system tem storag storage e disk disk CLI command to identify the faulted disk. Note:  Alarm Model 2 (Faulted Disk) is not supported on the PTS Linux platform. Profile

Description

Severities

• •

Raise Notification

svEnvStorageDiskErrorNotification

Clear Notification

svEnvStorageDiskNoErrorNotification

Triggers

diskErrorsTrigger 

Unique Instance Identifier 

svStorageDiskTableSlot

Major   Clear  

3.3.1 Degraded Disk Notification This notification notification is sent if the disk has read or write errors. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svEnvStorageDiskErrorNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.1.0.3

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svStorageDiskTableSlot

1.3.6.1.4.1.11610.435.11249.1.12.1.9

svStorageDiskTableDescription

1.3.6.1.4.1.11610.435.11249.1.12.1.6

svStorageDiskTableGrowthDefects

1.3.6.1.4.1.11610.435.11249.1.12.1.14

svStorageDiskTableUncorrectableReadErrors

1.3.6.1.4.1.11610.435.11249.1.12.1.15

svStorageDiskTableUncorrectableWriteErrors

1.3.6.1.4.1.11610.435.11249.1.12.1.16

30

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 2: Faulted Disk

3.3.2 Faulted Disk Notification This notification notification is sent if the sum of the total accumulate accumulated d read and write disk errors reaches or exceeds 2000. Profile

Description

Frequency

12 hours

Severity

Major 

Condition

((SANDVINE-MIB::svStorageDiskTableUncorrectableReadErrors + SANDVINE-MIB::svStorageD ((SANDVINE-MIB::svStorageDiskTableUncorrectableReadErrors iskTableUncorrectableWriteErrors) > 2000)

3.3.3 Faulted Disk Clear Cleared ed This notification notification is sent if a disk has less than 2000 errors. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svEnvStorageDiskNoErrorNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.1.0.4

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svStorageDiskTableSlot

1.3.6.1.4.1.11610.435.11249.1.12.1.9

svStorageDiskTableDescription

1.3.6.1.4.1.11610.435.11249.1.12.1.6

svStorageDiskTableGrowthDefects

1.3.6.1.4.1.11610.435.11249.1.12.1.14

svStorageDiskTableUncorrectableReadErrors

1.3.6.1.4.1.11610.435.11249.1.12.1.15

svStorageDiskTableUncorrectableWriteErrors

1.3.6.1.4.1.11610.435.11249.1.12.1.16

3.3.4 Impact and Suggested Resolution, Alarm Model 2 Service continues with decreased confidence in redundancy. The RAID controller corrects isolated read or write errors. Multiple errors may signal the disk in questi question on will soon become unreliabl unreliable. e. Errors can happen happen if the temperature temperature of the system has become dangerously dangerously high, or if the drive has worn out faster than normal. Replacement Replacement of the drive is recommended. recommended. See the PTS Hardware Installation Guide or the  SPB Installation Guide for additional information. Note: The solid state drives (SSD) in the PTS 22000 and PTS 32000 can only be written a fixed number of times before the disk begins to fail.

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

31

 

PTS Alarms  Alarm Model 3: High Temperature

3.4 Alarm Model 3: High Temperature This alarm is raised if a component's internal operating temperature is is too high. The alarm is triggered when the temperature is o o above 55 C. If the temperature temperature increases beyond 80 C, the PPU modules shut themselves down, no further notification is sent out, and a PTS crash is likel likely y. Note:  Alarm Model 3 (High Temperature) is not supported on the PTS Linux platf platform. orm. Profile

Description

Severities

Major 

Raise Notification

svEnvTemperatureHighNotification

Clear Notification

svEnvTemperatureOkNotification

Triggers

CPUthermalwarningmoduleXBad CPUthermalwarningmoduleXGood Where X is the processing module index. TemperaturediskNBad TemperaturediskNGood Where N is the disk number.

Unique Instance Identifier 

ENTITY-SENSOR-MIB:entPhySensorValue

3.4.1 High Temperature - Notification This notification is sent if a CPU module overheats. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svEnvTemperatureHighNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.1.0.5

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

ENTITY-MIB:entPhysicalDescr 

1.3.6.1.2.1.47.1.1.1.1.2

ENTITY-MIB:entPhysicalIsFRU

1.3.6.1.2.1.47.1.1.1.1.16

ENTITY-MIB:entPhysicalParentRelPos

1.3.6.1.2.1.47.1.1.1.1.6

ENTITY-SENSOR-MIB:entPhySensorValue

1.3.6.1.2.1.99.1.1.1.4

ENTITY-SENSOR-MIB:entPhySensorOperStatus

1.3.6.1.2.1.99.1.1.1.5

32

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 3: High Temperature

3.4.1.1 Thermal Overheat This notification notification is sent if the CPU module temperatur temperature e is greate greaterr than 55

 o

C.

Profile

Description

Frequency

3600 seconds

Severity

Major 

Condition

ENTITY-SENSOR-MIB::entPhySensorValue > 1

Platform

CPU module or disk sensor number 

 All PTS elements

102, 102, where CPU# ranges from 00 to 10. For example, module 5 temperature high would be sensor number 10502.

PTS 24000

12 (Disk 1), 13 (Disk 2)

PTS 22000

9

3.4.1.2 Hard Disk Temperature Error  This notification is sent if the hard disk temperature if any of the hard disks (1 and 2) is greater than 55 degrees. Profile

Description

Frequency

0 seconds (immediate)

Severity

Major 

Condition

ENTITY-SENSOR-MIB::entPhySensorValue > 55

3.4.2 High Temperature Cleared This notification indicates that the temperature of a component has returned to normal operating range. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svEnvTemperatureOkNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.1.0.6

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

ENTITY-MIB:entPhysicalDescr 

1.3.6.1.2.1.47.1.1.1.1.2

ENTITY-MIB:entPhysicalIsFRU

1.3.6.1.2.1.47.1.1.1.1.16

ENTITY-MIB:entPhysicalParentRelPos

1.3.6.1.2.1.47.1.1.1.1.6

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

33

 

PTS Alarms  Alarm Model 3: High Temperature

Varbind Varb ind Name

Varbind OID

ENTITY-SENSOR-MIB:entPhySensorValue

1.3.6.1.2.1.99.1.1.1.4

ENTITY-SENSOR-MIB:entPhySensorOperStatus

1.3.6.1.2.1.99.1.1.1.5

3.4.2.1 Thermal Overheat: Clear  This This al alar arm m is cl clea eare red d when when the coun countt for for all all of the the CP CPU U ther therma mall wa warn rnin ing g coun counter ter sens sensor ors s is 0 over over a 1 ho hour ur (3 (360 600 0 seco second nds) s) peri period od.. Profile

Description

Frequency

3600 seconds

Severity

Cleared

Condition

DELTA(ENTITY-SENSOR-MIB::entPhySensorValue) DELT A(ENTITY-SENSOR-MIB::entPhySensorValue) == 0

3.4.2.2 Hard Disk Temperature Error: Clear  This alarm is cleared when the Hard disk T Tempera emperature ture for any of the hard disks (1 and 2) is less than 55 degrees. Profile

Description

Frequency

0 seconds (immediate)

Severity Condition

Cleared ENTITY-SENSOR-MIB::entPhySensorValue < 55 The PTS 24000 conditions are: ENTITY-SENSOR-MIB::entPhySensorValue < 55

3.4.3 Impact and Suggested Resolution, Alarm Model 3 Hard dri Hard drives ves:: Overhe Overheate ated d dis disks ks are mor more e likely likely to hav have e err errors ors or fail fail pre premat mature urely ly.. Dif Differe ferent nt hardwa hardware re pla platfor tforms ms hav have e dif differ ferent ent levels levels of redundancy in their disks, so the impact varies by platform. See  Alarm Model 2: Faulted Disk  on  on page 30 and  Alarm Model    on page 122 for additional information. 26: Unavailable Disk  on CPUs/modules: When the ambient temperature gets too high, a CPU reduces its clock frequency to maintain its temperature at a safe maximum. When the number of warnings increments, it indicates that the CPU temporarily reduced its clock frequency at some time during the polling interval. interval. This has an impact on perform performance, ance, but the impact is only significant significant if the number of  warnings is constantly increasing. • Check that the am ambient bient temp temperatur erature e in the equipm equipment ent rack wher where e the unit is mounte mounted d is in accordan accordance ce with the environmen environmental tal specif spe cifica icatio tions ns for that that uni unit. t. See the PTS Hardw additiona onall inform informati ation. on. Add Additi itiona onall coolin cooling g or inc increa reased sed Hardware are Inst Installat allation ion Guide for additi airflow may be required in the rack to maintain the proper ambient temperature. • Check to se see e if there ar are e any alar alarm m notific notifications ations fo forr the fans. If th there ere are al alarms, arms, see Alarm Alarm model 4 4:: Fan fail failure. ure. • Check all chassis chassis ffans ans an and d ensu ensure re they are op operatio erational. nal. S See ee the PTS Hardware Installation Guide  and  PTS Administration Guide for additional information. • If all checks ar are e good, the hard dri drives ves or CPUs/mo CPUs/modules dules may be fau faulty lty.. If performan performance ce is being imp impacted, acted, contact contact Sandvine Sandvine Support. Note: For hard drives in a RAID config configuratio uration, n, action may not be required until the drive actually fails. See Alarm Model 26: Disk down.

34

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 4: Faulted Fan

3.5 Alarm Model 4: Faulted Fan This This al alar arm m is ra rais ised ed if on one e or mo more re fans fans ar are e ru runn nnin ing, g, or mo mome ment ntar arilily y (a (as s in the the case case of the the PTS PTS 32 3200 000) 0) belo below w the the mini minimu mum m re requ quir ired ed speed, or they have stopped completely. This results in a high temperature condition in the system. (The speed is defined in revolutions per minute or RPM.) Note:  Alarm Model 4 (Faulted Fan) is not supported on the PTS Linux platform. Profile

Description

Severities

Major 

Raise Notification

svEnvFanFailureNotification

Clear Notification

svEnvFanOkNotification

Unique Instance Identifier 

ENTITY-SENSOR-MIB:entPhySensorValue

Note: In PTS 24000 platfo platform, rm, the internal software periodicall periodically y checks the fan speed (in RPM) and raises the alarm. In PTS 32000 platform, the internal hardware checks the fan speed (in RPM) and raises a flag; consequently, the internal software periodically checks the flag and raises the alarm.

3.5.1 Faulted Fan: Major Notificat Notification ion This notification notification is sent if operational operational status is faulted for a chassis fan, or if operation operational al status is okay but the sensor value for a chassis fan's RPM is lower than the minimu chassis minimum m value. Profile

Description

Frequency

0 seconds (immediate)

Severity

Major 

Condition

This condition condition is for PTS 22000, and PTS 24000 platfor platforms: ms: (ENTITY-SENSOR-MIB::entPhySensorOperStatus != [Ok]) || ((ENTITY-SENSOR-MIB::e ntPhySensorOperStatus = Ok && (ENTITY-SENSOR-MIB::entPhySensorValue < minimum RPM )) This condition is for PTS 32000 platform: (ENTITY-SENSOR-MIB::entPhySensorOperStatus == [Ok]) && (ENTITY-SENSOR-MIB::entPhySensorValue == 2)

When viewed from the back of the unit, fan 5 is on the left and fan 1 is on the right. The chassis fan sensor numbers are: Platform

Sensor Number 

PTS 32000

37 to 40 (fans 1 to 4)

PTS 24000

58 to 62 (fans 1 to 5)

PTS 22000

53 to 55 (fans 1 to 3)

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

35

 

PTS Alarms  Alarm Model M odel 4: Faulted Fan

3.5.2 Faulted Fan: Clear Notificat Notification ion This clear notification is sent if a failed fan has been replaced and is working correctly. When this notification is sent, the operational status for a chassis is restored to okay, and/or the sensor value for a chassis fan goes above the minimum RPM. Profile Frequency

Description 0 seconds (immediate)

Severity

Cleared

Condition

This condition condition is for PTS 22000 and PTS 24000 platforms: platforms: (ENTITY-SENSOR-MIB::entPhySensorOperStatus == [Ok]) && (ENTITY-SENSOR-MIB::entPhySensorValue > minimumRPM ) This condition is for PTS 32000 platform: (ENTITY-SENSOR-MIB::entPhySensorOperStatus == [Ok]) && (ENTITY-SENSOR-MIB::entPhySensorValue == 1)

MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svEnvFanOkNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.1.0.8

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

ENTITY-MIB:entPhysicalDescr 

1.3.6.1.2.1.47.1.1.1.1.2

ENTITY-MIB:entPhysicalIsFRU

1.3.6.1.2.1.47.1.1.1.1.16

ENTITY-MIB:entPhysicalParentRelPos

1.3.6.1.2.1.47.1.1.1.1.6

ENTITY-SENSOR-MIB:entPhySensorValue

1.3.6.1.2.1.99.1.1.1.4

ENTITY-SENSOR-MIB:entPhySensorOperStatus

1.3.6.1.2.1.99.1.1.1.5

3.5.3 Impact and Suggested Resolution, Alarm Model 4 Loss of any cooling capacity translates to higher than normal operating temperature for some components. Long term operation with a faulty fan is not recomme recommended; nded; replace replace faulty fans as soon as possib possible. le. Fan blade obstructions obstructions or a total fan failur failure e are common causes of this alarm. Inspect the unit for obstructio obstructions ns and replace replace the fan if none are found. This table provides the minimum RPM values and details of fans for different PTS hardware models, SPC 1000, and SRP 3000A/B/lite platforms.

36

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 5: Faulted Power Supply

Platform

Description

PTS 32000

Minimum RPM: 4000 This This plat platfor form m has has 4 fans fans inst instal alle led d at the ba back ck of the the chas chassi sis. s. The The fan fans s ar are e 3+1 3+1 re redu dund ndan ant, t, me mean anin ing g failure failu re of a single fan will not affect the performance of the system. Note: When there is a fan fault, the PTS 32000 operates in a  non-redundant  fan  fan operation mode, where the system attempts to boost the speed of all of the chassis fans in order to compensate for a faulted fan. The fan fault on the PTS is latched and does not get cleared until the PTS is rebooted. Moreover Moreov er,, the fault may be re-tri re-triggered ggered upon the next bootup if there is still a fault with the fan.

PTS 24000

Minimum RPM: 4600 This This plat platfor form m has has 5 fans fans inst instal alle led d at the ba back ck of the the chas chassi sis. s. The The fan fans s ar are e 4+1 4+1 re redu dund ndan ant, t, me mean anin ing g failure failu re of a single fan will not affect the performance of the system.

PTS 22000

Minimum RPM: 6000 Minimum This This plat platfor form m has has 2 fans fans inst instal alle led d at the ba back ck of the the chas chassi sis. s. The The fan fans s ar are e 1+1 1+1 re redu dund ndan ant, t, me mean anin ing g failure failu re of a single fan will not affect the performance of the system.

SPC 1000

Minimum RPM: 4000 This platform has 2 fans installed internal to the chassis. The fans are non-redundant and failure of either of these fans will result in overheating and performance degradation.

SRP 3000A/B/lite

Minimum RPM: 4000 These platforms have 6 fans installed internal to the chassis. The fans are redundant and the system can tolerate a single-fan failure.

3.5.3.1 Replacing PTS or SRP 3000A/B/C/D/lite Fans Fans for PTS 32000, 24000, and SRP 3000A/B/C/D/lite are field-replaceable: 1.   Identi Identify fy the faulty faulty fan and and its location location.. 2.   Replace the fan. See the rel related ated  PTS Hardware Installation Guide  for specific procedures. 3.   Run the   show syste system m envir environmen onmental tal fans CLI command to verify that the new fan is operational. If the command output indicates that the new fan is not operational, there may be a problem with the fan slot and you may have to replace the system.

3.5.3.2 3.5.3. 2 Suggested Suggested Resolution for SPC 1000 Fans in this platform are not field replaceable, therefore, you have to replace the system.

3.6 Alarm Model 5: Faulted Power Supply This alarm is raised to indicate that there is an issue with the power supply. Warning and minor notifications are generated for temperature-related issues, while major notifications are generated for a failed power supply fan or loss of input power or output power power.. An input or output power notificat notification ion typicall typically y indicates that one of the power cords is not plugged plugged in, or connected connected,, and some of the power suppli supplies es are not active. active. Upon power restoration restoration from a full power loss, the element reboots without intervention. Note:  Alarm Model 5 (Faulted Power Supply) is not supported on the PTS Linux platform.

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

37

 

PTS Alarms  Alarm Model 5: Faulted Power Supply

Profile

Description

Severities

Warning, Major, Minor, and Clear 

Raise Notification

svEnvPowerFailureNotification

Clear Notification

svEnvPowerOkNotification

Triggers for PTS 22000



PSXcurrentbelowmaximumcriticBad



PSXcurrentbelowmaximumcriticGood



PSXcurrentbelowmaximumwarninBad



PSXcurrentbelowmaximumwarninGood



PSXfanstatusBad



PSXfanstatusGood



PSXinputpowerstatusBad



PSXinputpowerstatusGood



PSXoutputpowerstatusBad



PSXoutputpowerstatusGood



PSXpresenceBad



PSXpresenceGood



PSXtemperaturewithincriticalBad



PSXtemperaturewithincriticalGood



PSXtemperaturewithinwarningtBad



PSXtemperaturewithinwarningtGood



PSXvoltageaboveminimumcriticBad



PSXvoltageaboveminimumcriticGood



PSXvoltageaboveminimumwarninBad



PSXvoltageaboveminimumwarninGood



PSXvoltagebelowmaximumcriticBad



PSXvoltagebelowmaximumcriticGood

• •

PSXvoltagebelowmaximumwarninBad PSXvoltagebelowmaximumwarninGood

Where  X  is the power supply number number,, either 1 or 2. Triggers for PTS 24000

38



PSXfanNBad



PSXfanNGood



PSXoutputpowerstatusBad



PSXoutputpowerstatusGood



PSXpresenceBad



PSXpresenceGood



PSXtemperaturewithincriticalBad



PSXtemperaturewithincriticalGood

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 5: Faulted Power Supply

Profile

Description



PSXtemperaturewithinwarningtBad



PSXtemperaturewithinwarningtGood

Where  X  is the power supply number from 1 to 4 and N is the fan number from 1 to 3. Triggers for SRP 3000

• •

PSXfanBad PSXfanGood



PSXpresenceBad



PSXpresenceGood



PSs PSsta tatus tusBa Bad d



PSs PSsta tatus tusGo Good od

Where  X  is a power supply number from 1 to 3. Unique Instance Identifier 

ENTITY-SENSOR-MIB:entPhySensorValue

The sensor numbers are: Platform

Supply Supp ly Numb Number  er  Power Supply Present

Power Supply Fan Failure

Power Supply Output Power 

Power Supply Temperature

Power Supply Temperature Critical 297

PTS 32000 (x supplies maximum)

1

285

298 (1 fan)

290

High Warning 296

2

286

319 (1   fan)

309

317

318

PTS 24000 (4 supplies maximum)

1

14

25 to 27 (Fan 1 to 3)

19

23

24

2

15

35 to 37 (Fan 1 to 3)

29

33

34

3

16

45 to 37 (Fan 1 to 3)

39

43

44

4

17

55 to 57 (Fan 1 to 3)

49

53

54

1

11

24, Fan 1

16

22

23

2

11

45, Fan   1

35

43

44

PTS 22000 (2 supplies maximum)

3.6.1 Faulted Power Supply - Major Notification This notification indicates that a power supply is not installed or is faulty. It may also indicate a power outage. Because each platform may have a different number of power supplies, their level of redundancy varies. Run the   show syste system m envir environmen onmental tal power CLI command to identify a faulty power supply. Platform

Description and impact

SRP 3000 A/B/lite - AC

These platforms have 3 power supplies with a 2+1 redundancy. The unit is still operational with no performance degradation if 1 power supply fails.

SRP 3000 A/B/lite - DC

These platforms have 2 power supplies with a 1+1 redundancy. The unit is still operational with no performance degradation if 1 power supply fails.

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

39

 

PTS Alarms  Alarm Model 5: Faulted Power Supply

Platform

Description and impact

PTS 24100, 24101

These platforms only require 2 power supplies with a 1+1 redundancy. The unit is still operational with no performance degradation if 1 power supply fails. These supplies are installed in positions 1 and 3.

PTS 24300, 24500, 24700

These platforms are AC input power models that require 4 power supplies with a 2+2 redundancy. The unit is still operat operational ional with no performanc performance e degradation degradation if up to 2 of the power supplies fail.

PTS 24301, 24501, 24701

These are DC input power models that require 4 power supplies with a 2+2 redundancy. Each inlet supplies 2 DC input power connections (input A powers supplies 1 and 2, input B powers supplies suppl ies 3 and 4). The unit is still operationa operationall with no performance performance degradation degradation if up to 2 of the power supplies fail.

PTS 22000

This platform has 2 power supplies with a 1+1 redundancy. The unit is still operational with no performance degradation if 1 power supply fails.

PTS 32000

This platform has 2 power supplies maximum, with a 1 + 1 redundancy.

3.6.2 Faulted Power Supply - Notification This notification is sent for individual power supplies if the entPhySensorValue for that power supply is set to bad (2). MIB Reference

Description

MIB Trap Name

SANDVINE-MIB svEnvPowerFailureNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.1.0.9

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

ENTITY-MIB:entPhysicalDescr 

1.3.6.1.2.1.47.1.1.1.1.2

ENTITY-MIB:entPhysicalIsFRU

1.3.6.1.2.1.47.1.1.1.1.16

ENTITY-MIB:entPhysicalParentRelPos

1.3.6.1.2.1.47.1.1.1.1.6

ENTITY-SENSOR-MIB:entPhySensorValue

1.3.6.1.2.1.99.1.1.1.4

ENTITY-SENSOR-MIB:entPhySensorOperStatus

1.3.6.1.2.1.99.1.1.1.5

3.6.2.1 3.6.2. 1 Faulted Power Supply: Not present This notification is sent if entPhySensor for a power supply is set to   not prese present nt (value=2). When set to  present (value= 1), the power supply is present. Profile

Description

Frequency

0 seconds (immediate)

Severity

Major 

Condition

ENTITY-SENSOR-MIB::entPhySensorValue == 2 (bad)

40

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 5: Faulted Power Supply

3.6.2.2 Faulted Power Supply: Fan Bad This notification notification is sent if entPhy entPhySensor SensorV Value for a power supply fan is set to bad (value = 2). Profile

Description

Frequency

0 seconds (immediate)

Severity Condition

svSeverityMinor  ENTITY-SENSOR-MIB::entPhySensorValue == 2 (bad)

3.6.2.3 3.6.2. 3 Faulted Power Supply: Output Power  This notification is sent if entPhySensorValue for output power for a power supply is set to bad (value=2), even though the power  supply is physically present. Profile

Description

Frequency

0 seconds (immediate)

Severity

Major 

Condition

ENTITY-SENSOR-MIB::entPhySensorValue == 2

3.6.2.4 Degraded Power Supply: Temperature high warning This notification is sent if entPhysSensor Value for temperature high warning for a power supply is set to bad (value=2). Profile

Description

Frequency

0 seconds (immediate)

Severity

Warning

Condition

ENTITY-SENSOR-MIB::entPhySensorValue == 2 (bad)

Description

This alarm is raised for individual power supplies.

3.6.2.5 Faulted Power Supply: Temperature critical This notification is sent if entPhysSensor Value for temperature critical for a power supply is set to bad (value=2). Profile

Description

Frequency

0 seconds (immediate)

Severity

Minor 

Condition

ENTITY-SENSOR-MIB::entPhySensorValue == 2 (bad)

3.6.2.6 Faulted Power Supply: Input power bad This notification is sent if entPhysSensorValue for the status of the power supply input power stage is set to bad (value=2). Profile

Description

Frequency

0 seconds (immediate)

Severity

Major 

Condition

ENTITY-SENSOR-MIB::entPhySensorValue == 2 (bad)

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

41

05 00262 C07

 

PTS Alarms  Alarm Model 5: Faulted Power Supply

3.6.2.7 3.6.2. 7 Faulted Power Supply: Over current warning This notification is sent if entPhysSensorValue for the status of over current warning is set to bad (value=2). This happens if the DC output current is close to the limit for activating over current protection. protection. Profile

Description

Frequency Severity

0 seconds (immediate) Major 

Condition

ENTITY-SENSOR-MIB::entPhySensorValue == 2 (bad)

3.6.2.8 3.6.2. 8 Faulted Power Supply: Over current critical This notification notification is sent if entPhy entPhysSenso sSensorV rValue alue for the status of over current critical is set to bad (value=2). This happen happens s if the DC output has been latched off due to an internal over current protection circuit. To restore power supply module output, input power must momentarily be disconnected. Profile

Description

Frequency

0 seconds (immediate)

Severity

Major 

Condition

ENTITY-SENSOR-MIB::entPhySensorValue == 2 (bad)

3.6.2.9 3.6.2. 9 Faulted Power Supply: Over voltage voltage warning warning This notification is sent if entPhysSensorValue for the status of over voltage warning is set to bad (value=2). This happens if the DC output voltage is close to the limit for activating over voltage protection. To restore power supply module output, input power  must momentarily be disconnected. Profile

Description

Frequency

0 seconds (immediate)

Severity

Major 

Condition

ENTITY-SENSOR-MIB::entPhySensorValue == 2 (bad)

3.6.2.10 3.6.2. 10 Faulted Power Supply: Over voltage voltage critical critical This notification is sent if entPhysSensorValue for the status of over voltage critical is set to bad (value=2). This happens when the DC output has been latched off due to an internal over voltage protection circuit. Profile

Description

Frequency

0 seconds (immediate)

Severity

Major 

Condition

ENTITY-SENSOR-MIB::entPhySensorValue == 2 (bad)

3.6.2.11 Faulted Power Supply: Under voltage warning This notification notification is sent if entPhy entPhysSenso sSensorV rValue alue for the status of under voltage warning is set to bad (value=2). (value=2). This happens happens if  the DC output voltage is close to the minimum for activating under voltage protection.

42

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40

05 00262 C07

 

PTS Alarms  Alarm Model 5: Faulted Power Supply

Profile

Description

Frequency

0 seconds (immediate)

Severity

Major 

Condition

ENTITY-SENSOR-MIB::entPhySensorValue == 2 (bad)

3.6.2.12 3.6.2. 12 Faulted Power Supply: Under voltage voltage critical critical This notification is sent if entPhysSensorValue for the status of under voltage critical is set to bad (value=2). This happens when the DC output has been latched off due to an internal under voltage protection circuit. Profile

Description

Frequency

0 seconds (immediate)

Severity

Major 

Condition

ENTITY-SENSOR-MIB::entPhySensorValue == 2 (bad)

3.6.2.13 3.6.2. 13 Faulted Power Supply Cleared: Input Power clear  This notification is cleared if entPhysSensorValue for the status of the power supply input power stage is set to good (value=1). Profile

Description

Frequency

0 seconds (immediate)

Severity

Cleared

Condition

ENTITY-SENSOR-MIB::entPhySensorValue == 1 (good)

3.6.2.14 3.6.2. 14 Faulted Power Supply Cleared: Over current warning clear  This notification is cleared if entPhysSensorValue for the status of over current warning is set to good (value=1). Profile

Description

Frequency

0 seconds (immediate)

Severity

Cleared

Condition

ENTITY-SENSOR-MIB::entPhySensorValue == 1 (good)

3.6.2.15 3.6.2. 15 Faulted Power Supply Cleared: Over current critical clear  This notification is cleared if entPhysSensorValue for the status of over current critical is set to good (value=1). Profile

Description

Frequency

0 seconds (immediate)

Severity

Cleared

Condition

ENTITY-SENSOR-MIB::entPhySensorValue == 1 (good)

3.6.2.16 3.6.2. 16 Faulted Power Supply Cleared: Over voltage warning clear  This notification is sent if entPhysSensorValue for the status of over voltage warning is set to good (value=1).

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40

43

05 00262 C07

 

PTS Alarms  Alarm Model 5: Faulted Power Supply

Profile

Description

Frequency

0 seconds (immediate)

Severity

Cleared

Condition

ENTITY-SENSOR-MIB::entPhySensorValue == 1 (good)

3.6.2.17 3.6.2. 17 Faulted Power Supply Cleared: Over voltage critical clear  This notification is sent if entPhysSensorValue for the status of over voltage critical is set to good (value=1). Profile

Description

Frequency

0 seconds (immediate)

Severity

Cleared

Condition

ENTITY-SENSOR-MIB::entPhySensorValue == 1 (good)

3.6.2.18 3.6.2. 18 Faulted Power Supply Cleared: Under voltage warning clear  This notification is sent if entPhysSensorValue for the status of under voltage warning is set to good (value=1). Profile

Description

Frequency

0 seconds (immediate)

Severity

Cleared

Condition

ENTITY-SENSOR-MIB::entPhySensorValue == 1 (good)

3.6.2.19 3.6.2. 19 Faulted Power Supply Cleared: Under voltage critical clear  This notification is sent if entPhysSensorValue for the status of under voltage critical is set to good (value=1). Profile

Description

Frequency

0 seconds (immediate)

Severity

Cleared

Condition

ENTITY-SENSOR-MIB::entPhySensorValue == 1 (good)

3.6.3 Faulted Power Supply Cleared This notification notification is sent when a power supply or one of its componen components ts has returned returned to normal opera operation. tion. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svEnvPowerOkNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.1.0.10

44

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40

05-00262 C07

 

PTS Alarms  Alarm Model 5: Faulted Power Supply

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

ENTITY-MIB:entPhysicalDescr 

1.3.6.1.2.1.47.1.1.1.1.2

ENTITY-MIB:entPhysicalIsFRU

1.3.6.1.2.1.47.1.1.1.1.16

ENTITY-MIB:entPhysicalParentRelPos

1.3.6.1.2.1.47.1.1.1.1.6

ENTITY-SENSOR-MIB:entPhySensorValue

.1.3.6.1.2.1.99.1.1.1.4

ENTITY-SENSOR-MIB:entPhySensorOperStatus

1.3.6.1.2.1.99.1.1.1.5

3.6.3.1 3.6.3. 1 Faulted Power Supply Cleared: Cleared: Present This alarm is cleared if entPhySensor Value for a power supply is set to good (value=1). Profile

Description

Frequency

0 seconds (immediate)

Severity Condition

Cleared ENTITY-SENSOR-MIB::entPhySensorValue == 1 (good)

3.6.3.2 3.6.3. 2 Faulted Power Supply Cleared: Good This notification is sent if entPhysSensorValue for a power supply fan is set to good (value=1). Profile

Description

Frequency

0 seconds (immediate)

Severity

Cleared

Condition

ENTITY-SENSOR-MIB::entPhySensorValue == 1 (good)

3.6.3.3 3.6.3. 3 Faulted Power Supply Cleared: Output Power  This notification is sent if entPhysSensorValue for out power for a power supply is set to good (value=1), or the power supply is physically not present (value=2). Profile

Description

Frequency

0 seconds (immediate)

Severity

Cleared

Condition

ENTITY-SENSOR-MIB::entPhySensorValue == 1 (good), ENTITY-SENSOR-MIB::entPh ySensorValue == 2 (not present)

3.6.3.4 Faulted Power Supply Cleared: Temperature high warning clear  This notification is sent if entPhysSensor Value for temperature high warning for a power supply is set to good (value=1), or the power supply is physically not present (value=2). There is no performance impact and the unit is still fully functional.

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40

45

05-00262 C07

 

PTS Alarms  Alarm Model 5: Faulted Power Supply

Profile

Description

Frequency

0 seconds (immediate)

Severity

Cleared

Condition

((ENTITY-SENSOR-MIB::entPhySensorValue == 1 (good)) || (ENTITY-SENSOR-MIB::ent PhySensorValue == 2 (not present)))

3.6.3.5 Faulted Power Supply Cleared: Temperature critical clear  This notification is sent if the entPhysSensor value for temperature critical for a power supply is set to good (value=1). Profile

Description

Frequency

0 seconds (immediate)

Severity

Cleared

Condition

ENTITY-SENSOR-MIB::entPhySensorValue == 1 (good)

3.6.4 Impact and Suggested Resolution, Alarm Model 5 The impacts include include the unit over heating, or power outage to the unit.

3.6.4.1 3.6.4. 1 Suggested Suggested Resolution for Alarm Model 5 1.   Make sure al alll power supp supply ly fans are running running and no there ar are e no fan alarms. See Alarm Model 4: Faulted Fan  on page 35 for additional information. 2.   Make sure the ambient temperature of the equipment rack is in accordance with the environmental specification of that unit. 3.   Verify that the air intake/exhaust ports are not blocked.

3.6.4.2 Suggested Resolution - High temperature on a power supply This alarm indicates a high temperature on a power supply, this means the power supply has failed and the system is using redundant power supplies: • Make sure tthat hat al alll power power sup supply ply fa fans ns ar are e ope operation rational. al. • Make sure the ambi ambient ent tempera temperature ture of the equipm equipment ent rack is in accordan accordance ce with the enviro environmental nmental spe specifica cification tion of that unit. • Ve Verify rify th that at the air intake/ex intake/exhaust haust ports a are re not blocke blocked. d. • Make sure that all chassi chassis s fan fans s are opera operational tional..

3.6.5 Suggested Resolutions for All Platforms, Alarm Model 5 1.   Make sure all the power supplies are installed and fully inserted in their slots. 2.   Make sure in input put cables are properly connected a and nd input po power wer is present. 3.   Ve Verify rify that input powe powerr feed voltage is in the acceptable acceptable range for the type of facility power power used.

46

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40

05-00262 C07

 

PTS Alarms  Alarm Model 6: High Power Usage

3.6.6 Suggested Resolutions for PTS 24000 and PTS 32000 Series Platforms If two suppli supplies es fail at the same time (suppl (supplies ies 1&2 or suppl supplies ies 3&4), it is likely that something happe happened ned to the corresponding corresponding power input wiring: 1.   Check the power input wiring.

supply.. 2.   If the problem is not resolved, replace the power supply Installl new supp supplies lies and s see ee if the pro problem blem stil stilll exists. 3.   Instal Chassis replacement is required if the problem is not resolved.

3.7 Alarm Model 6: High Power Usage This This al alar arm m is ra rais ised ed to in indi dica cate te that that powe powerr supp supplilies es ar are e at 90% 90% of re redu dund ndan antt capa capaci city ty and and the elem elemen entt no long longer er has has powe powerr supp supply ly redundancy. Note:  Alarm Model 6 (High Power Usage) is not supported on tthe he PTS Linux platform. platform . Profile

Description

Severities

Minor and Cleared

Raise Notification

svEnvPowerLoadHighNotification

Clear Notification

svEnvPowerLoadNormalNotification

Triggers

•   TotalCurrentMinor  •   TotalCurrentClear 

3.7.1 High Power Usage – Notification This notification is sent if power supply load is high. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svEnvPowerLoadHighNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.1.0.11

Where  n  is the row number for a given power supply: Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40

47

05-00262 C07

 

PTS Alarms  Alarm Model M odel 6: High Power Usage

Varbind Varb ind Name

Varbind OID

ENTITY-SENSOR-MIB::entPhySensorType(for power supply 1)

1.3.6.1.2.1.99.1.1.1.4.

ENTITY-SENSOR-MIB::entPhySensorType(for power supply 2)

1.3.6.1.2.1.99.1.1.1.4.

ENTITY-SENSOR-MIB::entPhySensorType(for power supply 3)

1.3.6.1.2.1.99.1.1.1.4.

ENTITY-SENSOR-MIB::entPhySensorType(for power supply 4)

1.3.6.1.2.1.99.1.1.1.4.

3.7.1.1 Current Draw 130A (14k —9700) limit: Raise This notification is sent for a power supply (number 1 to 4) if entPhySensorValue total for all power supplies exceeds 130000. Profile

Description

Frequency

0 seconds (immediate)

Severity

Minor 

Condition

(ENTITY-SENSOR-MIB::entPhySensorType(for power supply 1)) + (ENTITY-SENSOR-MIB::ent PhySensorType(for power supply 2)) + (ENTITY-SENSOR-MIB::entPhySensorType(for power  supply 3)) + (ENTITY-SENSOR-MIB::entPhySensorType(for power supply 4)) > 130000

3.7.2 High Power Usage Cleared This notification is sent if the total load on the power supplies return to normal range and the power supplies are fully redundant. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svEnvTemperatureHighNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.1.0.12

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

.1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

ENTITY-SENSOR-MIB::entPhySensorType(for power supply 1)

1.3.6.1.2.1.99.1.1.1.4.

ENTITY-SENSOR-MIB::entPhySensorType(for power supply 2)

1.3.6.1.2.1.99.1.1.1.4.

ENTITY-SENSOR-MIB::entPhySensorType(for power supply 3)

1.3.6.1.2.1.99.1.1.1.4.

ENTITY-SENSOR-MIB::entPhySensorType(for power supply 4)

1.3.6.1.2.1.99.1.1.1.4.

48

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40

05-00262 C07

 

PTS Alarms  Alarm Model 7 : High Resource Usage

3.7.2.1 3.7.2. 1 Current Draw: Clear  This notification is cleared for a power supply (number 1 to 4) if entPhySensorValue total for all power supplies is less than/equal to 120000. Profile

Description

Frequency Severity

0 seconds (immediate) Cleared

Condition

(ENTITY-SENSOR-MIB::entPhySensorType(for power supply 1)) + (ENTITY-SENSOR-MIB::ent PhySensorType(for power supply 2)) + (ENTITY-SENSOR-MIB::entPhySensorType(for power  supply 3)) + (ENTITY-SENSOR-MIB::entPhySensorType(for power supply 4)) 20%) than the others, repla replace ce it. Otherwise, Otherwise, it is c.   If one of the power suppli likely likel y there is an internal hardw hardware are problem in the system and you should replace it.

3.8 Alarm Model 7 : High Resource Usage This This al alar arm m is ra rais ised ed when when the there re is an over over usag usage e of sys syste tem m re reso sour urce ces s due due to is issu sues es su such ch as sy syste stem m over over-p -pro rovi visi sion onin ing g or softw softwar are e that is not properly tuned. It can also indicate that the resource has allocation failures. This alarm indicates that a system resource in the host resources table (hrStorageTable) is at 90% capacity or exceeds 90% capacity. If a resource is at 90% capacity, a warning notification is sent. If a resource has allocation failures, a minor notification is sent. Profile

Description

Severities

• •

Major   Minor  

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40

49

05-00262 C07

 

PTS Alarms  Alarm Model 7 : High Resource Usage

Profile

Description

• •

War arni ning ng Clear  

Raise Notification

svSystemResourceLowNotification

Clear Notification Triggers

svSystemResourceOkNotification hrStorageResourceTrigger, where is the resource ID.

Unique Instance Identifier 

• •

HOST-RESOUR HOST-RESOURCES-MIB CES-MIB:hrStor :hrStorageInd ageIndex ex SANDVINE-RESO SANDVINE-RESOURCESURCES-MIB.txt MIB.txt

Resource Capacity Thresholds and Severities

Each resource is monitored for capacity and allocation failures. For example, if capacity is greater than 90%, an alarm is raised. If resource allocation experiences failures then Alarm Model 35 is raised. Ther There e is a tr trig igge gerr fo forr each each re reso sour urce ce in the the host resour table. le. The The trigg trigger er is rep repre resen sented ted as hrStorageResourceTrigger, resources ces tab where is the resour resource ce ID. For example, hrStorageResourceTrigger2 is real memory. The   show syste system m resou resources rces CLI command lists resources to which this alarm applies. Listed resources depend on the platform (PTS,SDE/SPB), release or configuration. This table lists these resources, capacity thresholds, and allocation failure severities: ID

Description

Capacity Thres Capacity Threshold holds s Capacity Severity (Rising and Falling)

Allocation Failure Severity

2

Real memory

90,80

Warning

Minor 

3

Swap space

70-80, 80-90

Warning, Minor 

Major 

4

Mbuf clusters

90,80

Warning

Minor 

6

Filesystem /

(80,70)(95,80)

Minor, Major 

Major 

7

Filesystem /d2

(80,70)(95,80)

Minor, Major 

Major 

8

File descriptors

90,80

Warning

Minor 

18

Kernel address space

90,80

Warning

Minor 

19 80

Kernel memory PTS Map entrie entries s

90,80 (90,80) (10,0)

Warning Warning

Minor  Minor 

3.8.1 Major Notificatio Notification: n: High Resource Usage  A major notification notif ication is sent when a system resource (64 bit counter in the t he  hrStorageTable) is at capacity or exceeds the capacity mentioned in  Alarm Model 7 : High Resource Usage  on page 49 . MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svSystemResourceLowNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.4.0.1

50

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40

05-00262 C07

 

PTS Alarms  Alarm Model 7 : High Resource Usage

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

HOST-RESOURCES-MIB:hrStorageIndex

1.3.6.1.2.1.25.2.3.1.1

HOST-RESOURCES-MIB:hrStorageDescr 

1.3.6.1.2.1.25.2.3.1.3

HOST-RESOURCES-MIB:hrStorageSize

1.3.6.1.2.1.25.2.3.1.5

HOST-RESOURCES-MIB:hrStorageUsed

1.3.6.1.2.1.25.2.3.1.6

HOST-RESOURCES-MIB:hrStorageAllocationFailures

1.3.6.1.2.1.25.2.3.1.7

3.8.2 Minor Notific Notification: ation: High Resour Resource ce Usage  A major notification notif ication is sent when a system resource (64 bit counter in the t he  hrStorageTable) is at capacity or exceeds the capacity mentioned in  Alarm Model 7 : High Resource Usage  on page 49 . MIB Reference

Description

MIB Trap Name

SANDVINE-MIB svSystemResourceLowNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.4.0.1

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

HOST-RESOURCES-MIB:hrStorageIndex

1.3.6.1.2.1.25.2.3.1.1

HOST-RESOURCES-MIB:hrStorageDescr 

1.3.6.1.2.1.25.2.3.1.3

HOST-RESOURCES-MIB:hrStorageSize

1.3.6.1.2.1.25.2.3.1.5

HOST-RESOURCES-MIB:hrStorageUsed

1.3.6.1.2.1.25.2.3.1.6

HOST-RESOURCES-MIB:hrStorageAllocationFailures

1.3.6.1.2.1.25.2.3.1.7

3.8.3 Warning Notification: High Resource Usage This notification notification is sent when a resource (64 bit counter counter)) is equal to 90% capaci capacity ty (value=90) (value=90) in an interval of 8 seconds. seconds. Profile

Description

Frequency

8 seconds

Severity

Warning

Condition

((HOST-RESOURCES-MIB::hrStorageUsed * 100)/ HOST-RESOURCES-MIB::hrStorageSize) HOST-RESOURCES-MIB::hrStorageSize) > 90

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40

51

05-00262 C07

 

PTS Alarms  Alarm Model 7 : High Resource Usage

3.8.4 Clear Notificatio Notification: n: High Resource Resource Usage This alarm is cleared if a resource (64 bit counte counter) r) falls below 80% capaci capacity ty in an interval of 8 seconds. seconds. The clear notification is sent if a system resource that was over 90% capacity returns to below 80% capacity. Profile

Description

Frequency

8 seconds

Severity

Cleared

Condition

((HOST-RESOURCES-MIB::hrStorageUsed * 100)/ HOST-RESOURCES-MIB::hrStorageSize) HOST-RESOURCES-MIB::hrStorageSize) < 80

MIB Reference

Description

MIB

• •

Trap Name

svSystemResourceOkNotification

Trap OID

• •

SANDV SANDVIN INEE-MIB MIB SANDVIN SANDVINE-R E-RESO ESOURC URCES-M ES-MIB IB

1.3.6.1.4.1.1 1.3.6.1.4.1.11610.6 1610.6799.3. 799.3.4.0.2 4.0.2 1.3.6.1.4.1.1 1.3.6.1.4.1.11610.6 1610.6799.3. 799.3.25.1.2 25.1.2

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

HOSTRESOURCESMIB:hrStorageIndex

1.3.6.1.2.1.25.2.3.1.1

HOST-RESOURCES-MIB:hrStorageDescr 

1.3.6.1.2.1.25.2.3.1.3

HOST-RESOURCES-MIB:hrStorageSize

1.3.6.1.2.1.25.2.3.1.5

HOST-RESOURCES-MIB:hrStorageUsed

1.3.6.1.2.1.25.2.3.1.6

HOST-RESOURCES-MIB:hrStorageAllocationFailures

1.3.6.1.2.1.25.2.3.1.7

3.8.5 Impact and Suggested Resolution, Alarm Model 7 This alarm is tied to the aggregate sum of all modules in the system. As a result, the alarm might not occur even though a single module mod ule's 's res resour ource ce has exc exceed eeded ed 90% of its cap capaci acity ty.. The alarm alarm is cleare cleared d whe when n res resour ource ce usage usage set settles tles below below 80% of its cap capaci acity ty..

3.8.5.1 Impact of High Resource Usage Alarm The impact of this alarm varies depending on the resource affected. This table provides the resource ID, description, and impact details. Note: Resource Resour ce IDs identified identified with an asterisk (*) apply to both the PTS and Virtual PTS platform platforms. s. Those without an asterisk apply to the PTS platform only only.. In the PTS Linux platform, each resource in this table has the ID  incremented by 1000.

For example: ID has the value 1002 for   Real memor memory y;  has the value 1004 for  , and so on. ID   Mbuf Mbuf clust clusters ers

52

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40

05-00262 C07

 

PTS Alarms  Alarm Model 7 : High Resource Usage

ID

Description

Impact

2

Real memory

Possible application failure/exit if usage continues to increase.

4

Mbuf clusters

Packets Packet s may begin to drop if usage contin continues ues to increa increase. se.

5*

Packet memory

Packets Packet s may start to drop if usage continues to increase.

6

Filesystem /

Loss of log files, and core files if usage contin continues ues to increa increase. se.

7

Filesystem /d2

Loss of log files, and core files if usage contin continues ues to increa increase. se.

8

File descriptors

 Applications may fail if usage continues to increase.

9

Filesystem /d3

Report data is lost if usage continues to increase.

18

Kernel address space

Possible system failure if usage continues to increase.

19

Kernel memory

Possible system failure if usage continues to increase.

20 *

PTS Flows

Packet inspection may cease if usage continues to increase.

21 *

PTS Subscribers

Packet inspection may cease if usage continues to increase.

22

DNS Users

Packet inspection may cease if usage continues to increase.

23

PTS Sub Subscri scriber ber stat stats s cou counter nters s User bandwidth detection may miss some abusers.

24

NPU MacVlan table space

31 32

WDTM Detection Session Packet inspection may cease if usage continues to increase. WDTM Detection Session (on Packet inspection may cease if usage continues to increase. CND)

33

WDTM Attack Object

Packet inspection may cease if usage continues to increase.

34 *

 Attribute strings

String attributes are not set (have NULL values) and SandScript may not function as expected.

35 *

PTS shaping memory

Packets may drop out if usage continues to increase.

36 *

PTS lev level el dis distrib tributi ution on instanc instances es Shaping and Session Management may not function correctly; the PTS could possib pos sibly ly sha shape pe too muc much h and and/or /or limit limit too man many y flows flows (re (resul sultin ting g in dro droppe pped d packet packets). s).

37 *

PTS tee header entries

Teeing may not function correctly (destinations may not see all expected packets).

38 *

PTS or SDE measuremen measurementt instances

Measured statistics may be inaccurate or not collected altogether for certain instances.

40 *

Ipusermap Ipuse rmap webs webservic ervices es queu queue e If usage continues to increase, Radius and/or DHCP packets (sub-ip mapping) may drop and cannot reach the SPB.

41 *

Ipusermap radius queue

If usage continues to increase, Radius and/or DHCP packets (sub-ip mapping) may drop and cannot reach the SPB.

42 *

PTS demographic stats hosts

Host reports in NDS is inacc inaccurate. urate.

43 *

PTS or SDE classifie classifier  r  instances

Classifiers are not assigned to the correct value and measurements/limiters/sh apers unique by classifiers, will not have all the correct instances.

44 *

PTS or SDE policy table row memory

SandScript logic that uses SandScript tables will not function as expected.

45 *

TCP Reassembly Buffering

 Analysis of TCP flows for the purposes of populating  Flow.Stream fields in SandScript will cease, leaving those fields unpopulated.

46 *

Stream Analysis Buffering

 Analysis of any analyzed flows for the purposes of populating Flow.Stream fields in SandScript will cease, leaving those fields unpopulated.

47

NPU Diverted Source table space

Packets Packet s between a new pair of MACs are not inspected. inspected.

Inspection of packets with new MAC or VLAN ID may cease if usage continues Inspection continues to increase.

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40

53

05-00262 C07

 

PTS Alarms  Alarm Model 7 : High Resource Usage

ID

Description

Impact

48

NPU Layer 2 Rewrit Rewrite e table space

Packets Packet s between a new pair of MACs are not inspected. inspected.

49

NPU Layer 3 Hairpin table space

New layer 3 hairpin destinations are not configured.

50 * 51 *

Total Memory PTS or SDE Map instances

Possible application failure/exit if usage continues to increase.  All maps defined in the SandScript are limited to t o 400MB of memory. This is the t he default configuration and it is recommended not to override this limit. There is a possibility of application failure/exit if overridden.

52 *

PTS Streaming Flows

The TCP stream analyzer will not analyze some flows. Stream SandScript fields for those flows will be null.

53 *

PTS Primed Flow Classificati Classification on New primes are rejected. Misclassification of new flows may result in SandScript instances not applied.

54 *

PTS Largest Flow Classification Container 

55 *

PTS IP Fra Fragme gmentat ntation ion Record Records s Exhausting fragmentation records could impact recognition of large fragmented UDP packets.

56 *

External MAC address table

There is no impact from this resource.

58

WDTM detection user 

The WDTM policy does not affect additional users.

59 * 61

Statistic Records Reassembly Buffers (Small)

Statistics Statist ics records sent to the SPB are discarded discarded and the statist statistics ics are lost.  All PTS releases start starting ing with PTS 6.00.

62

Reassembly Reasse mbly Buffe Buffers rs (Mediu (Medium) m)  All PTS releases start starting ing with PTS 6.00.

63

Reassembly Buffers (Large)

 All PTS releases start starting ing with PTS 6.00.

64 *

RTMP Streaming Flows

RTMP stream analyzer will not analyze some flows. Stream PAL fields for those flows are null.

68 *

HTTP Streaming Flows

HTTP stream analyzer will not analyze some flows. Stream PAL fields for those flows are null.

70 *

PTS Policy Controller Server  memory

New unique by values passed to the policy controller will no longer create new controll systems. This will result in a shaper rate of the configured contro configured maximum for  this unique by instance.

71 *

PTS Policy Controller Client memory

New unique unique by value values s pas passed sed to the pol policy icy con contro trollller er on the cur curren rentt PTSD PTSD no lon longer  ger  creates new control system creates systems. s. If there is traffic for these unique by instances instances on other PTSDs then there is little or no impact. If on the other hand, all traffi traffic c for the given unique unique by instan instance ce is seen at this module module,, then this results in a shaper shaper rate of the configured maximum for this unique by instance.

72 *

BGPD subnets

Some of the BGP subnet information may be dropped. Data used for traffic reports may not be completely accurate.

73 *

BGPD RIB memory

Some of the BGP subnet information may be dropped. Data used for traffic reports may not be completely accurate.

75

PTS policy table row memory Indicates that the SandScript defined policy tables require more memory than the system currently lets them use. As a result, a row was not created in the table. When a row is not created in a table, any values that should be stored in it, and access acc essed ed in sub subseq sequen uentt pol policy icy run runs, s, are not availa available ble.. Any busin business ess logic logic dep deploy loyed ed in SandScript SandScript that depend depends s on the presence of a table row will fail, and the SandScript is not applied as expected.

78

Dynamic shunted subnets

54

New primes for certain source or destination IPs are rejected. Misclassification of  new flows may result result in SandScript not applied.

Indicates Indica tes that the databa database se used to store subnets shunted shunted for the IP Overload Overload Management feature Management features s is nearly full. If the PTS cannot inse insert rt entries into this

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40

05-00262 C07

 

PTS Alarms  Alarm Model 7 : High Resource Usage

ID

Description

Impact

database, additional abusive1 IP addresses that the PTS detects are not shunted. This has the potential to impact inspection and the subscriber experience. 79

Subscriber NAT mappings

Ind Indica icates tes that the sys system tem has rea reache ched d the maxi maximum mum sup suppor ported ted number number of sub subscri scriber  ber  network-address-translation (NAT) mappings. As a result of this condition, subscriber-aware policies cannot associate traffic to the transmitting subscriber  and, consequently, will under-count usage and otherwise treat traffic as though its subscriber is unknown.

80

PTS map entrie entries s

Reports the number of polic Reports policy y map entries that SandScrip SandScriptt has loaded. When the alarm alarm is rai raised sed,, it indic indicate ates s that that an att attemp emptt to reloa reload d a pol polic icy y map has has faile failed d becau because se the number of map entries that the policy engine needs to load the map is larger  than the number of entries that the policy engine can use without affecting other  portions portio ns of the system. As a result, result, SandScript conti continues nues to run with the contents of the map that were loaded prior to the attempt to reload the map.

81

Centrall LB IPv4 table rows Centra

Indicates that the table used to load-balan Indicates load-balance ce IPv4 addresses addresses is nearly full. If the PTS cannot find an entry in this table for a subscriber IP, IP, it shunts packets with that subscriber IP. Shunted packets are not inspected, meaning that the PTS will not apply policy to them, nor will it count them beyond shunt statistics. statistics.

82

Subscriber Cache

Tracks the number of subscribers allocated in the Subscriber Management subsystem on the SDE.

84*

PTS PTS packe packett usag usage e by all all poli policy cy Tracks utilization of memory buffers available for queuing and delaying packets. actions This is a subset of the total packet buffers in the system, but only a subset of these are queued or delayed. Note: • Resour Resource ce 39 39,, 84 and 85 85 refe referr to the same same packet packet pool. pool. • Res Resour ource ce 3 39 9 an and d 85 are are disp display layed ed in in the  show syste system m resource resource but do not raise any alarm. • Onl Only y Resour Resource ce 8 84 4 ra raise ises s an a alar larm m for  events.hrStorage.conf   events.hrStorage.conf.

89

PTS detector memory

Tracks the amount of memory that the Network Protection Detector Subsystem uses on the PTS.

90

Mitigation Rules

Tracks the number of mitigation rules issued to the Mitigation Rule Engine.

91

Centrall LB IPv6 table rows Centra

Indicates that the table used to load-balance IPv6 addresses with default prefix is nearly full. If the PTS cannot find an entry in this table for a subscriber IP, it shunts packets with that subscriber IP. Shunted packets are not inspected, meaning that the PTS will not apply policy to them, nor will it count them beyond shunt statistics. This alarm appears with these severities: •   Warning—The PTS raises a warning alarm when the number of entries in the Central LB table exceeds 90% of the tables capacity. •   Mino Minor— r—Th The e PTS PTS rais raises es a mino minorr alar alarm m when when the the numb number er of en entr trie ies s in the the Cent Centra rall LB IPv6 table exceeds 95% of the tables capacity. The PTS clears the warning and minor alarms once the number Central LB IPv6 table rows drops below 80% of the tables capacity.

92

Central LB IPv6 LPM table Central rows

1

Indica Indicates tes tha thatt the tree used used to load-b load-bala alance nce IPv6 addres addresses ses with with non non-de -defau fault lt pre prefix fix is nearly full. If the PTS cannot find an entry in this table for a subscriber subscriber IP, IP, it shunts shu nts pac packet kets s wit with h that that sub subscr scrib iber er IP IP.. Shunte Shunted d pac packet kets s are not ins inspec pected ted,, mea meanin ning g that the PTS will not apply policy to them, nor will it count them beyond shunt statistics. This alarm appears with these severities:

  The PTS module (PTSM) sampl samples es packets and generates per-user stats. Users that cross the threshold for new flow rate, packet rate, rate, or bit rate, are reported as 'abusive'. When a drop occurs, the load balancer shunts a number of abusive IPs on the associated module

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40

55

05-00262 C07

 

PTS Alarms  Alarm Model 7 : High Resource Usage

ID

Description

Impact

•   Warning— The PTS raises a warning alarm when the number of entries in in the Central LB table exceeds 90% of the tables capacity. •   Mino Minor— r— The The PTS PTS ra rais ises es a mino minorr al alar arm m when when the the numb number er of en entr trie ies s in the the Cent Centra rall LB IPv6 table exceeds 95% of the tables capacity. The PTS clears the warning and minor alarms once the number Central LB IPv6 table rows drops below 80% of the tables capacity. 93

PTS flow extension records

TCP divert ceases to work for new flows. Few SandScript SandScript fields that expose flow statistics are not set and have NULL values.

3.8.5.2 3.8.5. 2 Suggested Suggested Resolutions for High Resource Usage Alarm Suggested resolutions depend on which system resource has encountered the problem. On multi-module hardware, it would be good to determine if any one module is more at fault than the others when the PTS is deployed inline. Between...

Then...

Layer 2 Switches

See the  PTS Administration Guide  to ensure that the configuration is correct for Layer 2.

Layer 3 Routers

Review Rev iew the exter external nal devic device e config configura uratio tion n to red reduce uce the numbe numberr of Eth Ethern ernet et MAC addres addresses ses passing through the PTS cluster.

Run the show syste system m resou resources rces CLI command to review the breakdown of the resource. If the source of the problem cannot be det determ ermine ined, d, run the techsupport CLI comman command d to cap captur ture e diagn diagnost ostic ic infor informat mation ion for the sys system tem.. Con Contac tactt San Sandvi dvine ne Cus Custom tomer  er  Support or its authorized partner for further assistance. This table provides the suggested resolution for the different resources: ID

Description

Suggested Resolutions

2

Real memory

Check the configuration for an over-provisioning condition.

4

Mbuf clusters

Check the configuration for an over-provisioning condition.

5

Packet memory

Check the configuration for an over-provisioning condition.

6

Filesystem /

Run the  du  command to search the file system for large files and remove the unnecessary files.

7

Filesystem /d2

Run the  du  command to search the file system for large files and remove tee to file captures, log files, or other unnecessary files.

8

File descriptors

Run the fstat com comman mand d to det determ ermine ine if any of the pro proces cesses ses have have an unu unusua sually lly large number of files open.

18

Kernel address space

Check the configuration for an over-provisioning condition.

19

Kernel memory

Check the configuration for an over-provisioning condition.

20

PTS Flows

Check the configuration for an over-provisioning condition.

21

PTS Subscribers

Check the configuration for an over-provisioning condition.

22

DNS Users

Check the configuration for an over-provisioning condition.

23

PTS Subscriber stats counters

Check the configuration for an over-provisioning condition.

56

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40

05-00262 C07

 

PTS Alarms  Alarm Model 7 : High Resource Usage

ID

Description

Suggested Resolutions

24

NPU MacVlan table space

Check the deployment model and then: • •

Reduce the nu number mber o off VLAN IDs used used in the network, network, or  Only use th the e 2 port b bridgeridge-group group deplo deployment yment mode

31 32

WDTM Detection Session WDTM Detection Session (on CND)

Check the configuration for an over-provisioning condition. Check the configuration for an over-provisioning condition.

33

WDTM Attack Object

Check the configuration for an over-provisioning condition.

34

 Attribute strings

Consider simplifying your SandScript to use fewer string attributes.

35

PTS shaping memory

Check the configuration for an over-provisioning condition.

36

PTS level distribution instances

Do not use level distribution when shaping, or limiting unique by something that has a large number of instances cluster wide, such as subscribers. Consider  load balancing the shaping or limiting unique by activity. You can also disable level distribution.

37

PTS tee header entries

Verify erify if the there re is an ove overla rlap p of thr three ee or more more tee actio actions ns and and res resolv olve e in San SandSc dScri ript. pt.

38

PTS or SDE measurement instances

Consider the size of the sets used in SandScript's measurement unique-by specifications. If the sets are large (for example, measurement unique by subscriber), removing ofunique these measurements from SandScript and/or usingconsider measurements withsome fewer instances.

40

Ipusermap webservices queue

Verify the tee/ipusermap policy.

41

Ipusermap RADIUS queue

Verify the tee/ipusermap policy.

42

PTS demographic stats hosts

Verify that network classes are properly configured. The problem is the result of including too many IP addresses in an 'internal' network class. Ensure that counting of external hosts is disabled.

43

PTS or SDE classifier instances

Consider the number of different values that you can assign to the classifiers in SandScript. SandSc ript. For classifie classifiers rs that can take on a broad range of values, values, such as the client IP address, find a way to reduce the number of distinct values the classifier can take on.

44

PTS or SDE policy table row memory Consider the size of the sets used in SandScript's table unique-by specs, and how often you are writing to a table row. Reduce the number of rows you need to persist to a SandScript table.

45

TCP Reassembly Buffering

Check the configuration for an over-provisioning condition. Check configuration for a scenario that can cause a large number of packets of every flow to route somewhere other than through the PTS.

46

Stream Analysis Buffering

Check the configuration for an over-provisioning condition. Check configuration for a scenario that could cause a large number of packets of every flow to route somewhere other than through the PTS.

47

NPU Diverted Source table space

Check the deployment model. Try to have traffic flowing between fewer distinct layer 2 destinations.

48

NPU Layer 2 Rewrite Rewrite table space

Check the deployment model. Try to have traffic flowing between fewer distinct layer 2 destinations.

49

NPU Layer 3 Hairpin table space

Configure fewer layer 3 hairpin destinations.

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

57

 

PTS Alarms  Alarm Model 7 : High Resource Usage

ID

Description

Suggested Resolutions

50

Total Memory

Check that element is not over-provisioned. Also check to see if any application on the modules in question has used an exceedingly large amount of memory. This This ind indicat icates es that a mismis-conf configur iguratio ation, n, over over-pro -provisi visioni oning, ng, or soft software ware erro errorr exists. exists. Run the   sho identify how much show w sys system tem pro proces cesses ses * CLI command to identify memory each process is using, on each module.

51

PTS or SDE map instances

Check the SandScript map entries configured and remove entries from them to reduce the memory usage to below 40Mb.

52

PTS Streaming Flows

Verify that the PTS is not overloaded.

53

PTS Primed Flow Classification instances

Check the configuration for an over-provisioning condition.

54

PTS Largest Flow Classification Container 

Check the configuration for an over-provisioning condition.

55

PTS IP Fragmentation Records

Check the configuration for an over-provisioning condition. You need to either  disable fragmentation recognition or expand the deployment.

56

External MAC address table

Swit Switch ch to the the Laye Layerr 2 (L (L2) 2) mo mode de.. Re Revi view ew ext exter erna nall swit switch ch conf config igur urati ation on to re redu duce ce number of Ethernet MAC addresses passing through the PTS cluster.

58

WDTM detection user 

Check the con Check config figura uratio tion n for an ove over-p r-prov rovisi isioni oning ng condit condition ion and adj adjust ust the WDT WDTM M policy as necessary.

59

Statistics Records

Verify that the PTS is connected to the SPB and that the SPB is properly configured to accept messages from the PTS.

64

RTMP Streaming Flows

Check the configuration for an over-provisioning condition.

68

HTTP Streaming Flows

Check the configuration for an over-provisioning condition.

70

PTS Policy Controller Server memory Decrease the total memory footprint of the policy controller subsystem. To do this, decrease the number of: •

Contro Controllers llers define defined d throug through h SandScr SandScript. ipt.



San SandScr dScript ipt met metric rics. s.



His Histog togram ram bin bins s for the output_histogram parameter.



His Histog togram ram bin bins s for for a m metr etric ic data parameter.

show polic policy y contr controller oller CLI command to check that the total Run the  of number unique by instances is not high. This displays the total number of  instances instan ces on the current PTS, and compares that against the stated platform limi limits. ts. If this this is the the case case,, ensu ensure re all all of the the un uniq ique ue by inst instan ance ces s ar are e ac actu tual ally ly acti active ve.. Perform a CND restart, to reset the total memory, if there are several unused unique by instances.

71

PTS Policy Controller Client memory

Decrease the total memory footprint of the policy controller subsystem. To do this, decrease the number of: •

Contro Controllers llers define defined d throug through h SandScr SandScript. ipt.



San SandScr dScript ipt met metric rics. s.



His Histog togram ram bin bins s for the output_histogram parameter.



His Histog togram ram bin bins s for for a m metr etric ic data parameter.

58

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 7 : High Resource Usage

ID

Description

Suggested Resolutions

72

BGPD subnets

Increase Increa se the BGP subnet limit using the   set config config servic service e bgp subnet-limit CLI command. If the maximum limit is reached, then apply filters on the peer routers to filter the BGP subnet information that is sent to the element.

73

BGPD RIB memory

config servic service e bgp Increase Increa se the BGP subnet limit using the   set config CLI comm comman and. d. If the the maxi maximu mum m limi limitt is re reac ache hed, d, then then appl apply y rib-memory limit CLI filters on the peer routers to filter the BGP subnet information that is sent to the element.

74

BGPD client RIB memory

Increase Increa se the BGP subnet limit using the   set config config servic service e bgp rib-memory limit CLI command. If the maximum limit is reached, then rib-memory consider applying filters on the peer routers to filter the BGP subnet information that is sent to the element.

81

Central LB IPv4 table rows

To identify the cause and resolve this alarm: 22: 2: Misconfigured Network 1.   Check if the PTS has raised Alarm Model 2  Awareness, for external addresses on subscriber ports exceeds exc eeds threshold, and resolved it. Ensure re tha thatt the sub subscri scriber ber data databas base e is not sendin sending g spuriou spurious s log login in notifica notification tions s 2.   Ensu to the cluster. Such logins cause the PTS to load-balance the associated IP addresses addres ses and if the cluste clusterr nev never er sees sees tho those se addres addresses ses the they y end up was wastin ting g resources. Use jms Capture of the messages from the SPB to view logins. The PTS proces pro cesses ses log login in not notifi ificat cation ions s for IP add addres resses ses in an int intern ernal al subnet subnet.. Rem Remove ove tho those se su subn bnet ets s wi with th IP addr addres esse ses s that that theclus thecluste terr do does es not not see see (s (suc uch h as wh when en multiple clusters share the same SPB), from internal subnets to reduce the number of entries created in the table. 3.   Split subscriber traffic over multiple PTS clusters. A PTS cluster can handle a maximum of 16 million IPs. 4.   Run the   set set con config fig servic service e load-b load-bala alance ncer r mode mode CLI command and, provided the policy-based load-balancing is not required to either  static or  ip-hash  ip-hash.

Both  static and  ip-hash modes do not track individual individual IPs. They can hash the entire space of IPv4 addresse addresses, s, and theref therefore ore do not have a limit on the number of IPs they can load-bala load-balance. nce. Note: Run the   resta restart rt servi service ce scfd CLI command to change the load-balancing mode. 5.   Run the   clear servi service ce load-balan load-balancer cer CLI command to clear the Centrall LB IPv4 table of stale entrie Centra entries s if:

• •

The set set of subsc subscriber riber IIP P addresses addresses that that the PTS inspect inspects s has changed changed significantly, and The PTS PTS does not re receive ceive subscriber subscriber logouts logouts ffrom rom the subscri subscriber  ber  database.

Note:

Running this command can cause the traffic to shunt until all IP addresses are re-learned. partner, if this does 6.   Contact Sandvine Customer Support, or its authorized partner, not resolve this issue

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

59

 

PTS Alarms  Alarm Model 7 : High Resource Usage

ID

Description

Suggested Resolutions

84

PT PTS S pa pack cket et usag usage e by all all poli policy cy acti action ons s Check the configuration for an over-provisioning condition. Note: • •

Resour Resource ce 39 39,, 84 and 8 85 5 refe referr to the same same pa packet cket p pool. ool. Resour Resource ce 3 39 9 an and d 85 are displayed displayed in the show syste system m resource resource but do

not raise any alarm. •   Only Resource 84 raises an alarm for  events.hrStorage.conf   events.hrStorage.conf. 91

Central LB IPv6 table rows

To identify the cause and resolve this alarm: 22: 2: Misconfigured Network 1.   Check if the PTS has raised Alarm Model 2  Awareness, for external addresses on subscriber ports exceeds exc eeds threshold, and resolved it. Ensure re tha thatt the sub subscri scriber ber data databas base e is not sendin sending g spuriou spurious s log login in notifica notification tions s 2.   Ensu to the cluster. Such logins cause the PTS to load-balance the associated IP addres add resses ses and if the cluste clusterr nev never er sees sees tho those se addres addresses ses the they y end up was wastin ting g resources. Use jms Capture of the messages from the SPB to view logins. The PTS proces pro cesses ses log login in not notifi ificat cation ions s for IP add addres resses ses in an int intern ernal al subnet subnet.. Rem Remove ove tho those se su subn bnet ets s wi with th IP addr addres esse ses s that that theclus thecluste terr do does es not not see see (s (suc uch h as wh when en multiple clusters share the same SPB), from internal subnets to reduce the number of entries created in the table. 3.   Split subscriber traffic over multiple PTS clusters. A PTS cluster can handle a maximum of 16 million IPs. 4.   Run the   set set con config fig servic service e load-b load-bala alance ncer r mode mode CLI command and, provided the policy-based load-balancing is not required to either  static or  ip-hash  ip-hash. Both  static and  ip-hash modes do not track individual individual IPs. They can hash the entire space of IPv6 addresse addresses, s, and theref therefore ore do not have a limit on the number of IPs they can load-bala load-balance. nce. Note:

Run the   resta restart rt servi service ce scfd CLI command to change the load-balancing mode. 5.   Run the   clear servi service ce load-balan load-balancer cer CLI command to clear the Centrall LB IPv4 table of stale entrie Centra entries s if:

• •

The set set of subsc subscriber riber IIP P addresses addresses that that the PTS inspect inspects s has changed changed significantly, and The PTS PTS does not re receive ceive subscriber subscriber logouts logouts ffrom rom the subscri subscriber  ber  database.

Note:

Running this command can cause the traffic to shunt until all IP addresses are re-learned. partner, if this does 6.   Contact Sandvine Customer Support, or its authorized partner, not resolve this issue 92

Centrall LB IPv6 LPM table rows Centra

To identify the cause and resolve this alarm: Ensure re tha thatt the sub subscri scriber ber data databas base e is not sendin sending g spuriou spurious s log login in notifica notification tions s 1.   Ensu to the cluster. Such logins cause the PTS to load-balance the associated IP addres add resses ses and if the cluste clusterr nev never er sees sees tho those se addres addresses ses the they y end up was wastin ting g resources.

60

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 8: Overloaded Processor 

ID

Description

Suggested Resolutions

Ensure tha Ensure thatt the sub subscri scriber ber data databas base e is not sendin sending g spuriou spurious s log login in notifica notification tions s with non-default IPv6 prefix length to the cluster. Logins cause the PTS to load-balance the associated IP addresses and, if the cluster never sees those addresses they will waste resources. 2.   Split subscriber traffic over multiple PTS clusters. A PTS cluster can handle a maximum of 16 million IPs. 3.   Run the   set set con config fig servic service e load-b load-bala alance ncer r mode mode CLI command and, provided the policy-based load-balancing is not required to either  static or  ip-hash  ip-hash.

Both  static and  ip-hash modes do not track individual individual IPs. They can hash the entire space of IPv6 addresse addresses, s, and theref therefore ore do not have a limit on the number of IPs they can load-bala load-balance. nce. Note:

Run the   resta restart rt servi service ce scfd CLI command to change the load-balancing mode. 4.   Run the   clear servi service ce load-balan load-balancer cer CLI command to clear the Centrall LB IPv4 table of stale entrie Centra entries s if:



The set set of subsc subscriber riber IIP P addresses addresses that that the PTS inspect inspects s has changed changed



significantly, and The PTS PTS does not re receive ceive subscriber subscriber logouts logouts ffrom rom the subscri subscriber  ber  database.

Note:

Running this command can cause the traffic to shunt until all IP addresses are re-learned. 5.   Contact Sandvine Customer Support, or its authorized partner, partner, if this does not resolve this issue

3.9 Alarm Model 8: Overloaded Processor  This alarm is raised if processors are operating at an overloaded rate over a 2.5 minute or 150 second interval. This is typically due to over-provisioning of the system.  A warning, minor, or major notification is sent for tthese hese overload percentages over a 2.5 minute interval: • • •

war warnin ning—8 g—80% 0% overlo overloade aded d minor— minor—90% 90% ov overload erloaded ed and faile failed d insp inspection ection of 100 1000 0 packets packets major— major—95% 95% ov overload erloaded ed and faile failed d insp inspection ection of 200 2000 0 packets packets

It is cleared when the CPU utilization of all overloaded modules drops below 75% over 2.5 minutes. Profile

Description

Severities

• • •

Raise Notification

svSysProcessorOverLoadNotification

Clear Notification

svSysProcessorLoadOkNotification

Warni arning ng Minor   Major  

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

61

 

PTS Alarms  Alarm Model M odel 8: Overloaded Processor 

Profile

Description

Triggers

• • • •

ptsdRe ptsdResStatsW sStatsWarnin arningCpu gCpu pts ptsdRe dResSta sStatsMi tsMinor norCpu Cpu pts ptsdRe dResSta sStatsMa tsMajor jorCpu Cpu pts ptsdRe dResSta sStatsCp tsCpu u

3.9.1 Overloaded Processor – Notification MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svSysProcessorOverLoadNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.4.0.3

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName svSeverity

1.3.6.1.2.1.1.5 1.3.6.1.4.1.11610.6799.1.10

svPtsResourcesStatsProcessorLoad

1.3.6.1.4.1.11610.435.8374.1.7723.3.13

svPtsResourcesStatsDeltaTooBusy

1.3.6.1.4.1.11610.435.8374.1.7723.3.14

svPtsResourcesStatsLatestCpu

1.3.6.1.4.1.11610.435.8374.1.7723.3.15

3.9.1.1 3.9.1. 1 PTS Daemon CPU busy: Warning Warning This notification is sent if the CPU utilization of one or more modules exceeds 80% over 2.5 minutes or 150 seconds. It is cleared when the CPU utilization of all overloaded modules drops below 75% over 2.5 minutes or 150 seconds. Profile

Description

Frequency

8 seconds

Severity

Warning

Condition

SANDVINE-MIB::svPtsResourcesStatsProcessorLoad = processorLoadHigh(2)

3.9.1.2 PTS Daemon CPU busy: Minor alarm This notification notification is sent if the CPU utilizatio utilization n of one or more modul modules es exceeds 90% over 2.5 minutes or 150 seconds, seconds, and the PTS has failed inspection of 1000 packets in that time. Profile

Description

Frequency

3600 seconds

Severity

Minor 

Condition

SANDVINE-MIB::svPtsResourcesStatsProcessorLoad = processorLoadSevere(3)

62

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 8: Overloaded Processor 

3.9.1.3 PTS Daemon CPU busy: Major  This notification notification is sent if the CPU utilizatio utilization n of one or more modul modules es exceeds 95% over 2.5 minutes or 150 seconds. seconds. The PTS has also failed to inspec inspectt 2000 packet packets s in that time. Profile

Description

Frequency Severity

8 seconds Major 

Condition

SANDVINE-MIB::svPtsResourcesStatsProcessorLoad = processorLoadCritical(4)

3.9.2 Overloaded Processor Cleared This notification is sent when the CPU utilization of all overloaded modules drops below 75% over 2.5 minutes or 150 seconds. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svSysProcessorLoadOkNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.4.0.4

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svPtsResourcesStatsProcessorLoad

1.3.6.1.4.1.11610.435.8374.1.7723.3.13

svPtsResourcesStatsDeltaTooBusy

1.3.6.1.4.1.11610.435.8374.1.7723.3.14

svPtsResourcesStatsLatestCpu

1.3.6.1.4.1.11610.435.8374.1.7723.3.15

3.9.3 Impact and Suggested Resolution, Alarm Model 8 Modules Modu les that are experie experienci ncing ng periods periods of pro prolon longed ged CPU uti utiliz lizatio ation n in exc excess ess of 95% have deg degrad raded ed packet packet ins inspect pection ion perf performa ormance nce..  As a result, some new and existing flows may be bypassed for inspection to avoid congestion, latency, as well as possible packet loss. And so the integrit integrity y of the traffic is maintained at the cost of not perform performing ing full packet inspection. inspection. If packets are repor reported ted as “not inspecte inspected”, d”, even a small number number,, the alarm severi severity ty is upgraded upgraded to minor alarm. If you receive receive this this ala alarm, rm, it may indica indicate te a variet variety y of und underl erlyin ying g pro proble blems. ms. For For sug sugge geste sted d res resol oluti utions ons,, see Min Minor or ala alarms rms and sug sugges gested ted resolu resolutio tions ns for this alarm.

3.9.3.1 Overloaded Processor Alarms Modules Modu les that are experie experienci ncing ng periods periods of pro prolon longed ged CPU uti utiliz lizatio ation n in exc excess ess of 95% have deg degrad raded ed packet packet ins inspect pection ion perf performa ormance nce..  As a result, some new and existing flows may be bypassed for inspection tto o avoid congestion, latency, as well as possibly packet loss. So, the integr integrity ity of the traffic is maintained at the cost of not perform performing ing full packet inspection. inspection. Since packet inspection is impacted, this can result in the following side effects. The severity of these side effects depends on the SandScript SandScript that is deployed on the PTS, how long this overlo overload ad condition exists exists,, and how many modules are impacted:

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

63

 

PTS Alarms  Alarm Model M odel 8: Overloaded Processor 

• • •

NDS repor reports ts may show lower tha than n expected ba bandwid ndwidth th in all bandwidth-ba bandwidth-based sed report reports s other than Bandwidth Bandwidth by Interface, Interface, for  the periods where the alarm was raised. Flow-specific SandScript is not applied applied to all new or existing flows that impact shaping, protocol tracking, detection (malicious and otherwise), and similar situations. Subscr Subscriber iber mapping mapping m may ay stall when al alll new flo flows ws are n not ot insp inspected. ected.

3.9.3.2 Identifying Overloaded Modules 1.   Run the   show servi service ce loadload-balan balancer cer maste master r CLI command to locate the cluster's master load-balancer. 2.   Run the   show servi service ce loadload-balan balancer cer modul modules es detai detail l  CLI command to identify the overloaded modules, on the master load-balancer. 3.   Run the CLI command: show polic policy y inspe inspection ction and examine the PacketNotInspected field for the number of packets that are not inspected.

3.9.3.3 Alarm Time-line For this type of alarm, it is important to identify when it first occured, and when notifications were sent and/or cleared. To view this information, use the show alarms history [Id] CLI command. Once a timeline is established, augment it with information related to: • • • •

Rec Recent ent PTS sof softwa tware re upg upgrad rades. es. Recent topology topology changes changes iin n the the networ network. k. Recent changes changes to SandScript SandScript a and nd su subnets bnets on the PTS. Recent features features enable enabled/disa d/disabled bled on the PTS.

If there is a strong correlatio correlation n between any of these recent change changes s and the alarm time-line time-line,, focus the root cause investig investigation ation there. ther e. For exam example ple,, if SandScri SandScript pt was recently recently modi modified fied,, you shou should ld inve investig stigate ate the impa impact/in ct/intent tent of thos those e changes changes more thoroug thoroughly hly..

3.9.3.4 Alarm Frequency  Along with the alarm time-line, alarm frequency information can also provide insight into root cause: •





If the alarm occu occurs, rs, but is cleared cleared at a fairly consi consistent stent freq frequency uency,, then this is a strong in indicato dicatorr that the alarm is asso associated ciated with time-of-day traffic, and possibly that there is SandScript that is utilized more frequently at those times. For example, the alarm occurs (perhaps multiple times) during peak hours only. To verify, correlate the alarm time-line with the NDS reports that span that time-line. Note the bandwidth consumption of protocols from the NDS reports and determine if there is any specific SandScript that is in action on the higher bandwidth consuming protocols. If the alarm is occu occurs, rs, but is cleared cleared sporadi sporadically cally (in (indepen dependent dent of peak hours), hours), it may indic indicate ate the overl overload oad is related related to malicious malici ous traffic in the networ network. k. Check NDS report reports s for unusua unusuall spikes. Also, if WDTM is enabled, be sure to check NDS reports for malicious activity. If the alarm recu recurs rs for a long perio period d of time with no clea clearing, ring, it may ind indicate icate that the PT PTS S is under-provisi under-provisioned. oned. Y You ou can check bandwidth utilization in real time using the  show  show int interf erface ace rat rate e  command.

3.9.3.4.1 Traffic Captures It is often useful to take traffic captures on the affected modules before the alarm is active and while it is active. It is even better  if you can catch the transi transition tion point. This can be difficult difficult if the alarm is sporadic and the overload conditio condition n moves between modules. Traffic captures can often provide key insights into whether or not the overload alarm is related to specific traffic.

3.9.3.5 Load Balancing The PTS load-balancing modes include:

64

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 9: Unavailable Processing Module

• Static load-balanc load-balancing, ing, using the 8-bit mode. • Dynami Dynamic c load load-bala -balancing, ncing, using central centralized ized lload-ba oad-balanci lancing. ng. In both modes it is acceptable for some inspection modules to have elevated loads while others do not. However, if using static load-balancing, and some modules are consistently running at elevated levels, the inspection rates could be approaching the limits of the cluster. If using central load-balancing, running the   clear servi service ce load-balan load-balancer cer CLI command provides an immediate re-distribution of bundle assignments. Bundle assignments are automatically redistributed when a module's load reaches a critical threshold, before that threshold traffic is inspected as expected. In the ev even entt of a mo modu dule le failu failure re,, poss possib ibly ly caus caused ed fro from m a softw softwar are e or hard hardwa ware re fail failur ure, e, tra traff ffic ic to that that mo modu dule le is imme immedi diat atel ely y shun shunted ted to prevent interrupting the subscribers traffic. If the module does not come back into service after 5 minutes, the IP addresses which it was inspecting are redistributed to the remaining modules.

3.9.3.5.1 Module Overloaded   on page 61), When centralized load balancing detects that a module is overloaded (see  Alarm Model 8: Overloaded Processor  on it adaptively rebalances bundles off of that module and redistributes them across the elements until the overload condition is resolved. So it is not abnormal to see the occasional overload alarm when centralized load balancing is being used. However, depending on the duration and frequency of the overload alarm, this can indicate a few possible issues. If the elements are under-provisioned for the volume of traffic, removing load from one module can lead to overloading another  module mod ule.. It sho should uld als also o be evi eviden dentt from from Ala Alarm rm Time ime-li -line ne and and Ala Alarm rm Frequ Frequen ency cy that that ove overlo rload ad bou bounce nces s betwee between n module modules s or elemen elements. ts. You may need another element to dilute the per module load. If centralized load balancing bundling is not defined by IP--subscriber attribute, cost class, and the related definitions--then the bundling too coarse. This means that there may be too many IPs per bundle. Perform a quick estimate of the IP to bundle ratio with may showbeservice load-balancer stats. In this case, if a module overload occurs and the centralized load balancer begins rebalancing bundles from that module across the elements, it would be moving traffic traffic for 1639 IPs for every bundle it moved. Use Alarm Timeline and Alarm Frequency to validate this scenario. The overload alarm bounces between modules or elements. In order to resolv resolve e this, the IP to bundl bundle e ratio needs to be reduced in order to reduce reduce per bundle bundle load. For exampl example, e, if bundle bundles s are defined by subscriber attribute, then more attribute values are needed to reduce the ratio.

3.9.3.5.2 Module Offline When a mo When modu dule les s sta status tus tra trans nsit itio ions ns fro from m “u “up” p” to any any oth other er statu status, s, the the centr central al load load bala balanc ncer er ar arms ms a timer timer.. If the sta statu tus s of as asso soci ciate ated d modules does not return to up before the timer expires, all bundles are removed from that module and redistributed throughout the cluster. The default timeout for the timer is five minutes. If all of the other modules are already at high load then this could cause overload on th thos ose e modu module les. s. You can can chec check k for for thes these e type types s of ev even ents ts usin using g the the CLI CLI comm comman and d show service service load load-bala -balancer ncer modules modules . detail

3.10 Alarm Model 9: Unavailable Processing Module This alarm is raised if one of the modules in the system has gone down. If this occurs during system boot-up, the most likely cause is a hardwa hardware re problem with the module that prevents prevents it from booting booting.. If it occurs some time after a module has been up and running for a period of time, it is more likely caused by a fatal operating system error that caused the module to restart itself. In this case, the module should come back up in a few minutes and the svEnvModuleUpNotification is sent out, which includes the module reboot cause in the trap information (varbinds). Note:

 Alarm Model 9 (Unavailable Processing Module) is not supported on the PTS Linux platform. platform .

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

65

 

PTS Alarms  Alarm Model M odel 9: Unavailable Processing Module

Profile

Description

Severities

• •

Raise Notification

svSysModuleDownNotification

Clear Notification Triggers

svSysModuleUpNotification •   ppumgrOperStatusDown

Major   Clear  

•   ppumgrOperStatusUp Unique Instance Identifier 

svModuleControllerModuleDescription

3.10.1 Unavailable Processing Module - Notification MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svSysModuleDownNotification

Trap OID

.1.3.6.1.4.1.11610.6799.3.4.0.7

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

.1.3.6.1.2.1.1.5

svSeverity

.1.3.6.1.4.1.11610.6799.1.10

svModuleControllerModuleDescription

.1.3.6.1.4.1.11610.435.10084.1.10.1.8

svModuleControllerModuleAdminStatus

.1.3.6.1.4.1.11610.435.10084.1.10.1.3

svModuleControllerModuleOperStatus

.1.3.6.1.4.1.11610.435.10084.1.10.1.4

3.10.1.1 Unavailable Processing Module This notification is sent if the admin status of a module controller is up (value=1) but the operational status of that module is either  initializing (value=1) or faulted (value=3). Profile

Description

Frequency

0 seconds (Immediate)

Severity

Major 

Condition

(SANDVINE-MIB::svMod (SANDVINE-MIB ::svModuleCo uleControll ntrollerModu erModuleAdm leAdminStatu inStatus s == 1 (up)) && ((SANDVINE-MI ((SANDVINE-MIB::svMo B::svModu du leControllerModuleOperStatus == 2 (Initializing)) || (SANDVINE-MIB::svModule ControllerModuleOperStatus == 3 (faulted)))

3.10.2 Unavailable Processing Module Cleared This notification notification is sent if a module showing a “down” status, but is now operati operational. onal. It also includes the module reboot cause in the trap information (varbinds).

66

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 9: Unavailable Processing Module

MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svSysModuleUpNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.4.0.8

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svModuleControllerModuleDescription

1.3.6.1.4.1.11610.435.10084.1.10.1.8

svModuleControllerModuleAdminStatus

1.3.6.1.4.1.11610.435.10084.1.10.1.3

svModuleControllerModuleOperStatus

1.3.6.1.4.1.11610.435.10084.1.10.1.4

svModuleControllerModuleRebootCause

1.3.6.1.4.1.11610.435.10084.1.10.1.7

3.10.2.1 Unavailable Processing Module Cleared This notification notification is sent when the admin status of a module controlle controllerr is up and the operational operational status of that module is also up (value=1). Profile

Description

Frequency

0 seconds (Immediate)

Severity

Cleared

Condition

(SANDVINE-MIB::svModuleControllerModuleAdminStatus == 1 (up)) && (SANDVINE-MIB:: svModuleControllerModuleOperStatus == 4 (up))

3.10.3 Impact and Suggested Resolution, Alarm Model 9 Depending on the load balancing method that is being used, the module may not be used for any traffic processing. If the module continues to transition between up and down, it should be administratively disabled until Sandvine Customer Support or its authorized partner can assist with the issue.

3.10.3.1 Unavailable Processing Module Alarm Depending on the load balancing method used, the module may not be processing any traffic. If a module continues to transition between up and down, disable it administratively until Customer Support or its authorized partner can assist with the issue. In ge gene nera ral, l, thi this s al alar arm m ma may y in indi dica cate te a kern kernal al cra crash sh on the the mo modu dule le that that we went nt down down,, or that that the there re is a hard hardwa ware re is issu sue e on that that mo modu dule le.. Contact Customer Support for assistance.

3.10.3.2 Load Balancing The PTS load-balancing modes include: • Static load-balanc load-balancing, ing, using the 8-bit mode. • Dynami Dynamic c load load-bala -balancing, ncing, using central centralized ized lload-ba oad-balanci lancing. ng. In both modes it is acceptable for some inspection modules to have elevated loads while others do not. However, if using static load-balancing, and some modules are consistently running at elevated levels, the inspection rates could be approaching the service ce load-balan load-balancer cer CLI command provides an limits of the cluster. If using central load-balancing, running the   clear servi

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

67

 

PTS Alarms  Alarm Model 10: Unavailable Service Component

immediate re-distribution of bundle assignments. Bundle assignments are automatically redistributed when a module's load reaches a critical threshold, before that threshold traffic is inspected as expected. In the ev even entt of a mo modu dule le failu failure re,, poss possib ibly ly caus caused ed fro from m a softw softwar are e or hard hardwa ware re fail failur ure, e, tra traff ffic ic to that that mo modu dule le is imme immedi diat atel ely y shun shunted ted to prevent interrupting the subscribers traffic. If the module does not come back into service after 5 minutes, the IP addresses which it was inspecting are redistributed to the remaining modules.

3.10.3.2.1 8-Bit Load Balancing When a module is unavailable any traffic destined for that module is rebalanced among the remaining modules in the cluster. This increases the traffic processing load of the other modules. Traffic is rebalanced again when the module comes back online. In the case of static load balancing all traffic on all modules is rebalanced.

3.10.3.2.2 Centralized Load Balancing When a module is unavailable any traffic destined for that module is shunted for a period of up to five minutes and no new traffic is assigned to the modul module. e. If the module comes back online before the five minute period expires, traffic is again sent to that module.. If the module does not come back online within the five minute period, all traffic assigned module assigned to the module is rebalanced among the remaining modules in the cluster.

3.11 Alarm Model 10: Unavailable Service Component The alarm is raised when if any of the service components fails due to administrative reasons or fatal error. Unless this is resolved administratively, all service components restart automatically. Note:  Alarm Model 10 (Unavailable Service Component) is not supported on the PTS Linux platform. Profile

Description

Severities

• •

Major Notification

svSysServiceComponentOfflineNotification

Clear Notification

svSysServiceComponentOnlineNotification

Triggers

• •

Unique Instance Identifier 

svServiceComponentName

Major   Clear  

Service ServiceCompo ComponentDo nentDownT wnTrigge rigger  r  Service ServiceCompo ComponentUp nentUpTri Trigger  gger 

These tables explain the MIB Reference names and Varbind Names for this alarm. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svSysServiceComponentOfflineNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.4.0.5

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

68

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 10: Unavailable Service Component

Varbind Varb ind Name

Varbind OID

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svServiceComponentName

1.3.6.1.4.1.11610.435.11281.1.11.1.3

svServiceComponentAdminStatus svServiceComponentOperStatus

1.3.6.1.4.1.11610.435.11281.1.11.1.5 1.3.6.1.4.1.11610.435.11281.1.11.1.6

svServiceComponentFaults

1.3.6.1.4.1.11610.435.11281.1.11.1.12

3.11.1 Unavailable Service Component: Major  The major notification notification is sent if the admini administrati strative ve status of a service componen componentt is up (value=1 (value=1)) but its operational operational status is: • • • •

sto stoppe pped d (va (value lue=1) =1) unl unlice icense nsed d (value= (value=3) 3) fau faulte lted d (value (value=4) =4) deg degrad raded ed (value (value=8) =8)

This table explains the different statuses. Operational status

Description of service

stopped

The service is not running.

disabled

The configuration has disabled the service.

unlicensed

 A services s ervices license is invalid (PTS only).

faulted

Service has faile Service failed d and is not provid providing ing some or all of its functi functionali onality ty to the system. Restart the system.

initializing

The service is initializing itself.

starting

The service is starting itself.

reloading

The service is reloading itself.

degraded

The service is online on some, but not on all, module modules. s.

online

The service is online.

This table describes the trigger information for the alarms raise (major) notification. Profile

Description

Frequency

0 seconds (Immediate)

Severity

Major 

Condition

(SANDVINE-MIB::svServiceComponentAdminStatus ==1 (up)) && ((SANDVINE-MIB::svServ iceComponentOperStatus == 1 (stopped))) || (SANDVINE-MIB::svServiceComponentOperStatus == 3 (unlicensed)) || (SANDVINE-MIB::svServiceComponentOperStatus == 4 (faulted)) || ( SANDVINE-MIB::svServiceComponentOperStatus == 8 (degraded)))

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

69

 

PTS Alarms  Alarm Model 10: Unavailable Service Component

3.11.2 Unavailable Service Component: Clear  This notification is sent when the administrative status of a service component returns to up (value=1) and its operational status is online (value=9). Profile

Description

Severity

Cleared

Condition

• •

(SANDVI (SANDVINE-MIB:: NE-MIB::svServi svServiceComp ceComponentA onentAdminSt dminStatus atus ==1 (up)) && (SANDVI (SANDVINE-MIB:: NE-MIB::svServi svServiceComp ceComponentO onentOperStat perStatus us == 9 (up))

3.11.3 Background Service Processes The background service processes monitored on the PTS are: Name

Full Name

Description

Applicability

SFCD

Switching Fabric Control Daemon

This daemon handles the configuration and management  All releases of internal layer 2 switching for the PTS. The SFCD handles configuration and management of the switch fabric and interface modules (for example, SFP+). In addition, the SFCD also handles the discovery and distribution of traffic to all local and cluster-wide modules.

PTSD

Policy Pol icy Traff Traffic ic Switch Switch Dae Daemon mon This daemon inspects packets, identifies protocols, and  All releases applilies app es Sa SandS ndScri cript pt poli policy cy on ne new w flows flows for for PT PTSM SM to enfor enforce. ce. The PTSD application is responsible for many functions inc includ luding ing L3L3-L7 L7 pro protoc tocol ol recogn recogniti ition, on, hos hostt and URL filter filtering ing,, and HTTP redirection.

PTSM

Policy Traffic Switch Kernel Module

This module provides bridging of all flows within the kernel,  All releases and functions such as traffic shaping.

CND

Central Node Daemon

This daemon connects and sends statistics (which include  All releases published expressions and other data available in NDS reports) to the reporting platform. This daemon is also involved in mapping of subscribers and dynamically shaping/session management rates, and can beadjusting used in load-balancing.

SCDPD

Sandvine Cluster Discovery Protocol Daemon

This daemon coordinates communication among all  All releases applications on a single box and within a cluster; also acts as the SNMP daemon.

SVBGPD

Sandvine Border Gateway Protocol Daemon

This daemon learns routes from routers through the BGP  All releases protocol and communicates those routes to other PTS processes such as PTSD and CND.

MSD

Manageme Mana gement nt Server Server Dae Daemon mon This daemon provides services for CLI and Control Center. Relea Release se 6.0 and up

70

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 10: Unavailable Service Component

3.11.4 Impact and Suggested Resolution: Alarm Model 10 Typically, any service that exits unexpectedly restarts automatically, so this alarm can appear and then clear itself. If that does not happen, restart the affected service manually to resolve the issue. Also if a service is stopped for administrative reasons, restart the service manually. •

Run the   restart restart servi service ce  command to restart an active process.

• The impact of this alarm is dependen dependentt on the specific servic service e process. This table provides the impact of failur failure e for the service processes. Name

Impact of Service Failure

CND

The impact when the CND services stops: •

The sys system tem cann cannot ot uplo upload ad stati statistics stics to the rep reporting orting platform platform (PTS on only). ly).

• •

The CND st stops ops mapp mapping ing subs subscriber cribers, s, but, exis existing ting sub subscribe scriberr mappin mappings gs remai remain n active. The CND stops The stops dyn dynami amical cally ly adj adjust usting ing sha shapin ping g and ses sessio sion n manage managemen mentt rates. rates. Som Some, e, or all all,, dyn dynami amic c shapers shaper s in the cluster are reset to the configure configured d maximum rate. If CND is the loadload-balanc balancer er master, master, the load-ba load-balanci lancing ng state is clea cleared. red. The loadbala loadbalancing ncing element element retains retain s the last rate(s) from before the cnd was stopped.



MSD

Existi Existing ng CLI sessio sessions ns areed. discon disconnected nected fromisthe newrCLI sessions are element unable unable tountil connect connec until the servic service e is restor restored. Contro Control l Center notserver able toand monito monitor or deploy deplo y to the the t service is restored.

PTSD

The PTSD element stops inspecting traffic and shunts traffic until the service restarts. During the restart, there may be short periods where packets are dropped.

PTSM

Stops the PTS from inspec inspecting ting traffic and can cause it to drop packets.

SCDPD

Prevents clustering from working and in the case of a PTS can result in preventing traffic inspection. The CLI and SNMP interfaces will also fail.

SVBGPD

If enabled, this service stops the subnets.txt file from updating and leads to no SandScript application to some subscribers (PTS only).

ECD

Unable to establish the connection between SandScript and shell (command-line).

SFCD

For informations, refer to the  Impact of Restarting SFCD Services  in the PTS Alarms Reference Guide.

Run these CLI comman commands ds to diagnose diagnose the probl problem em further: •   show syste system m services •   show service load-ba load-balance lancer r modul modules es Run the   sho show w sys system tem log CLI command to diagn diagnose ose the reason when services exit unexpectedly unexpectedly or fail to start.  Alternatively, you can also search tthe he   /var/log/svlog  for reasons, if a service exits unexpected unexpectedly ly or fails to start up.

3.11.4.1 Impact of Restarting SFCD Services Considerable system impact can result from restarting SFCD services. Possible impact includes: •

The SFC SFCD Dp proc rocess ess is stop stopped ped..



Tra Traffic ffic in buffers buffers within the sw switch itch fabric is lo lost. st.



The switch fabric canno cannott rece receive ive a additio dditional nal traffic. traffic.

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

71

 

PTS Alarms  Alarm Model 11: Unavailable Bypass Group



The NPUs shu shunt nt tra traffi ffic. c.



If there is a by bypass pass card card,, traffi traffic c is now in byp bypass ass for thi this s unit and no iinspect nspection ion is done. done.



While ttraffic raffic iis s in bypas bypass, s, no mappings mappings take pl place ace as Rad Radius ius traf traffic fic is pass passed ed throu through gh the syste system. m.



Mappin Mappings gs during during the entir entire e durati duration on of the rest restart, art, and fo forr this spec specific ific uni unit, t, are igno ignored. red.



Potenti Potential al impact can affect affect the repor reporting ting inte interval rval in demo demographi graphic c statistics. statistics. This is depe dependent ndent on how long long it takes for the system to comple complete te the restart. restart.



The SFC SFCD Dp proc rocess ess com comes es bac back. k.



The N NPU PU imme immediatel diately y starts starts send sending ing tra traffic ffic to the sw switch itch fa fabric. bric.



Tra Traffic ffic is now inspected inspected on the modul modules. es.



RADIUS traffi traffic c is teed teed to the the SPB so that ma mapping pping takes place.

3.11.4.2 Impact of Restarting CND Considerable system impact can result from restarting CND services. Possible impact includes: •

The CND process process stop stops. s.



Traff Traffic ic is una unaffe ffected cted..

• •

Any stat statistics istics queued queued in in the CND and on th their eir wa way y to the SPB are los lost. t. Any sub subscribe scriberr looku lookups ps queu queued ed in the CND and on thei theirr way to th the e SPB are lost.



If the CND is the master ce central ntral load load balance balancer, r, a new mast master er is elected elected.. The new master mai maintain ntains s origina originall balancing. balancing.



The CND is res restar tarted. ted.

3.12 Alarm Model 11: Unavailable Bypass Group This alarm is raised when a set of interfa interfaces ces that are in a bypass group have gone into the bypass state. This could happe happen n because the group was administratively put into bypass mode, or if software detected a critical processing error. Note: This alarm applies to both the internal bypass on the PTS 24000 as well as the external bypass chassis configuration. As a result, there are different triggers and MIB references for each configuration. Profile

Description

Severity

• • •

Raise Notification

svIfBypassGroupInBypassNotification

Clear Notification

svIfBypassGroupActiveNotification

Triggers

Internal bypass: • portTopolo portTopologyBypas gyBypassMode sMode • portTopolo portTopologyActiv gyActiveMode eMode External bypass: • extA extActi ctive veMo Mode de • extB extBypa ypassM ssMod ode e •

Crit Critic ical al Major   Clear  

hbFa hbFaul ultM tMod ode e

72

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 11: Unavailable Bypass Group

Profile

Description

Unique instance identifier  svBypassGroupGroupTableOperStatus

3.12.1 Bypassing Traffic – Notification MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svIfBypassGroupInBypassNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.2.0.1

Internal bypass Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svBypassGroupGroupTableDescription

1.3.6.1.4.1.11610.435.10470.1.30.10.1.7

svBypassGroupGroupTableOperStatus

1.3.6.1.4.1.11610.435.10470.1.30.10.1.3

svBypassGroupGroupTableAdminStatus

1.3.6.1.4.1.11610.435.10470.1.30.10.1.2

External bypass Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svPortTopologyBypassChassisDescription

1.3.6.1.4.1.11610.435.10470.1.30.6

svPortTopologyBypassChassisOperStatus

1.3.6.1.4.1.11610.435.10470.1.30.2

svPortTopologyBypassChassisAdminStatus

1.3.6.1.4.1.11610.435.10470.1.30.1

3.12.1.1 Bypassing Traffic - Internal Bypass This notification is sent if the bypass group table operational status sets to bypass mode (value=0). Profile

Description

Frequency

30 seconds

Severity

Major 

Condition

SANDVINE-MIB::svBypassGroupGroupTableOperStatus == 0 (bypass)

3.12.1.2 3.12.1 .2 Bypassing Bypassing Traffic - External bypass This notification is sent when bypass group table operational status sets to bypass mode (0).

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

73

 

PTS Alarms  Alarm Model 11: Unavailable Bypass Group

Profile

Description

Frequency

30 seconds

Severity

Major 

Condition

SANDVINE-MIB::svPortTopologyBypassChassisOperStatus == 0 (bypass)

3.12.1.3 3.12.1 .3 Unavailable Bypass Group - External bypass This notification notification is sent when contact with the bypass chassis has been lost and proper operation operation of the bypass chassis is no longer guaranteed. Profile

Description

Frequency

30 seconds

Severity

Critical

Condition

SANDVINE-MIB::svBypassGroupGroupTableOperStatus == 4 (hb_fault)

3.12.2 Bypassing Traffic Cleared This notification is sent when a set of interfaces that were previously in the bypass state are now active. It is an Interfaces Active alarm. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svIfBypassGroupActiveNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.2.0.2

Internal bypass Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity svBypassGroupGroupTableDescription

1.3.6.1.4.1.11610.6799.1.10 1.3.6.1.4.1.11610.435.10470.1.30.10.1.7

svBypassGroupGroupTableOperStatus

1.3.6.1.4.1.11610.435.10470.1.30.10.1.3

svBypassGroupGroupTableAdminStatus

1.3.6.1.4.1.11610.435.10470.1.30.10.1.2

External bypass Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svPortTopologyBypassChassisDescription

1.3.6.1.4.1.11610.435.10470.1.34.6

svPortTopologyBypassChassisOperStatus

1.3.6.1.4.1.11610.435.10470.1.34.2

74

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 11: Unavailable Bypass Group

Varbind Varb ind Name

Varbind OID

svPortTopologyBypassChassisAdminStatus

1.3.6.1.4.1.11610.435.10470.1.34.1

3.12.2.1 Bypassing Traffic Cleared—Internal Bypass This notification is sent when the bypass group table operational status sets to active (value=1). Profile Description Frequency

30 seconds

Severity

Cleared

Condition

SANDVINE-MIB::svBypassGroupGroupTableOperStatus == 1 (active)

3.12.2.2 Bypassing Traffic Cleared—External Bypass This notification is sent when bypass group table operational status sets to active (value = 1). Profile

Description

Frequency

30 seconds

Severity

Cleared

Condition

SANDVINE-MIB::svPortTopologyBypassChassisOperStatus == 1 (active)

3.12.3 Unavailable Bypass Group—Critical Alarms This alarm indicates that the bypass chassis has stopped sending heartbeats. The bypass bypass chassi chassis s stops stops sen sendin ding g hea heartb rtbeat eats s whe when n it has lost lost pow power er,, the ser serial ial cable cable has bec become ome dis discon connec nected ted,, or it has det detect ected ed a hardware fault on the chassi chassis s or blade blade..

3.12.3.1 Impact and Suggested Resolution, Alarm Model 11 Traffic is not inspected. Suggested resolutions: 1.   Use the   s command d to inspect inspect the status of the bypass chassi chassis. s. show how inter interface face bypas bypass-cha s-chassis ssis CLI comman 2.   If the OperStatus shows as hb_fault Inspect the serial cables and ensure that they are securely fastened to both the PTS 24000 and the bypass chassis.

LEDs EDs on the bypass bypass cha chassis ssis for fai failure lure mod mode. e. 3.   Check the L

3.12.3.1.1 Bypassing Traffic Traffic This alarm indicates that the bypass functionality of a bypass group is either enabled administratively or through software after  detecting a critical processing error. Bypass Operation State

Alarm

Comment

active

Clear 

Packets are inspected.

bypass

Major 

Either a software fault was detected warranting warranting a bypass event or the user  forced bypass via the admin status. In this state, packets are not inspected. inspected.

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

75

 

PTS Alarms  Alarm Model 12: Network Interface Errors

Bypass Operation State

Alarm

Comment

hb_fault

Critical

Loss of communication communication with bypass chassis. chassis. In this state, packets are not inspected.

Bypass Administration State

Comment

software

Default mode (internal bypass blade) - allows the software to decide when to transition into bypass and active state. Bypass mode is trigge triggered red in the event that scdpd, sfcd, or lbcd daemons are stopped. Active mode is triggered when all daemons are operating normally.

bypass

Remains Remain s in bypass mode regardl regardless ess of software software state. In this state, packets will not be inspected

active

Remains in active mode regardless of software state. In this state, packets may be dropped if the PTS is reboo rebooted ted or power cycled cycled..

down

Default mode (external bypass chassis) - no bypass chassis is connected

Run the   show inter command d to view the bypass group state. interface face bypas bypass s CLI comman

3.12.3.2 Impact and Suggested Resolution, Alarm Model 11 In the event of an alarm: • •

Pac Packet ket ins inspec pectio tion n iis s bypa bypasse ssed. d. No sta statistic tistics s are gather gathered ed wh while ile the al alarm arm iis s rai raised. sed.

the e elemen elementt to complete complete the boot u up p sequen sequence. ce. 1.   Allow th command to che check ck that that Bypa BypassAd ssAdmin minMod Mode e is not con config figure ured d (fo (forr int intern ernal al 2.   Runthe show config inter interface face bypass CLI command bypass blades). 3.   Verify Verify,, for an external bypass chassis, that the BypassAdmin mode is configured to software. a.   Run the  s show how con config fig int interf erface ace byp bypass ass ext extern ernal al CLI command 4.   Check that all services are running. The bypass alarm is triggered if any of these services are not running: SFCD, SCDPD, LBCD. Run the   show syste system m servi services ces CLI command to check services.

3.13 Alarm Model 12: Network Interface Errors This alarm is raised when physical errors are detected on an interface (ifInErrors or ifOutErrors).Typical errors are FCS and alignment errors. Cabling issues are the usual causes of these types of errors. Profile

Description

Severities

Minor and Major 

Raise Notification

svIfErrorNotification

Clear Notification

svIfNoErrorNotification

Triggers

ifNetworkErrorsX Where X is the interface index (iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry.ifIndex) .

76

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 12: Network Interface Errors

Profile

Description

Note that thi this s mode modell is only only vali valid d for inte interfac rfaces es with type (iso.org (iso.org.dod .dod.int .interne ernet.mg t.mgmt.m mt.mib-2 ib-2.int .interfa erfaces. ces. ifTable.ifEntry.ifType) PropMultiplexor or EthernetCsmacd. So for an interface with the index 21106692, the trigger name would be ifNetworkErrors21106692. Unique Instance Identifier 

IF-MIB:ifIndex

3.13.1 3.13. 1 Network Network Interface Interface Errors –Major –Major and Minor Notifications Notifications MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svIfErrorNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.2.0.3

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName svSeverity

1.3.6.1.2.1.1.5 1.3.6.1.4.1.11610.6799.1.10

IF-MIB:ifIndex

1.3.6.1.2.1.2.2.1.1

IF-MIB:ifDescr 

1.3.6.1.2.1.2.2.1.2

IF-MIB:ifInErrors

1.3.6.1.2.1.2.2.1.14

IF-MIB:ifOutErrors

1.3.6.1.2.1.2.2.1.20

svIfDeltaStatsIfDeltaTableIfInErrors

1.3.6.1.4.1.11610.435.10470.1.3.4.10.1.3

svIfDeltaStatsIfDeltaTableIfOutErrors

1.3.6.1.4.1.11610.435.10470.1.3.4.10.1.5

svThresholdDelta

1.3.6.1.4.1.11610.6799.1.15

3.13.1.1 Network Interface Errors - Major  This notification is sent if the sum of ifInErrors and ifOutErrors exceeds 1000 within an interval of 1 hour (3600 seconds). Profile

Description

Frequency

3600 seconds

Severity

Major 

Condition

PTS—DELTA (IF-MIB::ifInErrors + IF-MIB::ifOutErrors) > 1000 When the major alarm occurs on a PTS 32000 model, the condition is DELTA (IF-MIB::ifInErrors + IF-MIB::ifOutErrors) > 1000 SPB—DELTA (IF-MIB::ifInErrors + IF-MIB::ifOutErrors) >= 1000

3.13.1.2 3.13.1 .2 Network Interface Errors- Minor  This notification is sent if the sum of ifInErrors and ifOutErrors falls below 20, or rises above 10, within an interval of 1 hour (3600 seconds).

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

77

 

PTS Alarms  Alarm Model 12: Network Interface Errors

Profile

Description

Frequency

3600 seconds

Severity

Minor 

Condition

PTS—If the Major alarm was raised: DELTA (IF-MIB::ifInErrors + IF-MIB::ifOutErrors) 10 When the alarm occurs on a PTS 32000 model, the conditio condition n is: DELTA (IF-MIB::ifInErrors + IF-MIB::ifOutErrors) 10 SPB—When the alarm occurs on a SRP, the condition is, DELTA (IF-MIB::ifInErrors + IF-MIB:: ifOutErrors) ifOutE rrors) >= 10 but < 1000

3.13.2 Network Interface Errors- Clear  This notification is sent when no physical errors have been observed on an interface for at least an hour. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svIfNoErrorNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.2.0.4

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

IF-MIB:ifIndex

1.3.6.1.2.1.2.2.1.1

IF-MIB:ifDescr 

1.3.6.1.2.1.2.2.1.2

IF-MIB:ifInErrors

1.3.6.1.2.1.2.2.1.14

IF-MIB:ifOutErrors

1.3.6.1.2.1.2.2.1.20

svIfDeltaStatsIfDeltaTableIfInErrors

1.3.6.1.4.1.11610.435.10470.1.3.4.10.1.3

svIfDeltaStatsIfDeltaTableIfOutErrors

1.3.6.1.4.1.11610.435.10470.1.3.4.10.1.5

svThresholdDelta

1.3.6.1.4.1.11610.6799.1.15

Specifically, the clear notification is sent if the sum of ifInErrors and ifOutErrors equates to 0 within an interval of 1 hour (3600 seconds). Profile

Description

Frequency

3600 seconds

Severity

Cleared

Condition

DELTA (IF-MIB::ifInErrors + IF-MIB::ifOutErrors) == 0

78

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 12: Network Interface Errors

3.13.3 PTS—Impact and Suggested Resolution, Alarm Model 12 Discarded packets can adversely affect subscriber traffic. General resolutions include: •

Fault Faulty y fiber/ fiber/copper copper patch cable cable..

• •

Fau Faulty lty int interf erface ace module modules. s. por portt sp speed eed and duplex duplex mismatc mismatch. h.

Perform these resolution steps for Alarm Model 12: 1.   Run the   show inter interface face count counters ers CLI command to confirm which interfaces are experiencing interface errors. 2.   Run the   show inter interface face confi configurat guration ion CLI comman command d to confir confirm m that the port speed and duplex duplex match. PTS> show inte interfac rface e conf configur iguratio ation n Po Port rt Ad Admi minS nSta tatu tus s Op Oper erSt Stat atus us MT MTU U Me Medi dium um If IfAl Alia ias s Fu Func ncti tion on La LagP gPor ort t Sh Shun unt t ---- ------------------- ------------------ ----------- ------------------ ------------- ---------------------- ------------ -----------2-11 [up] [down] 15,796 10GBase-LR [subscriber] none [false]

externall ports: 3.   If the errors are seen on externa a.   Replace the fiber/copper patch cables with known good fiber/copper cables. b.   Replace the interface modules with known good modules.

 Another possible cause is a link failure due to receiving link-faults from the opposing connected port. Link-fault messages occur  when a neighbor device connected to the PTS is no longer receiving a signal, which may indicate a failure of the optical interface (SFP+) or a break in the fiber optic cable, and the neighbour device is sending remote faults to the PTS. When the PTS receives a link-fault signal: • • •

The OperSta OperStatus tus o off the port is b brought rought down. Thi This s ala alarm rm is rai raised sed.. Run the   show inter interface face modul modules es  CLI command for the down port to identify normal Rx and Tx levels.

In thi this s situat situatio ion, n, confir confirm m the receiv receive e pow power er lev levels els on the nei neighb ghbor or device device and cor correc rectt any link-p link-path ath iss issues ues bet betwee ween n the two device devices. s. PTS> show interfac interface e modu modules les 2-11 Port : ModuleType : Admi Admin nSta Status tus : ModuleStatus : SerialNumber : VendorName : VendorRevision : DataCode : Medium : Temperature : TxPower : RxPo RxPow wer : Connector : SupportedInterfaces: SupportedI nterfaces: Se Seri rial alEn Enco codi ding ng : No Norm rmal alBi BitR tRat ate e : Uppe Upper rBit BitRate Rate : Lowe Lower rBit BitRate Rate : Op Opti tion ons s :

2-11 [SFP+] [ [u up] 10 and is < 1000

3.14.2 Discarded Packets- Clear  This This notif notific icat atio ion n is sent sent when when an inte interf rfacethat acethat was was pr prev evio ious usly ly disc discar ardi ding ng packe packets ts ha has s not not disc discar arde ded d any any packe packets ts for for at leas leastt fifte fifteen en (15) minutes. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svIfNoDropNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.2.0.6

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

IF-MIB:ifIndex

1.3.6.1.2.1.2.2.1.1

IF-MIB:ifDescr 

1.3.6.1.2.1.2.2.1.2

IF-MIB:ifInDiscards

1.3.6.1.2.1.2.2.1.13

IF-MIB:ifOutDiscards

1.3.6.1.2.1.2.2.1.19

svIfDeltaStatsIfDeltaTableIfInDiscards

1.3.6.1.4.1.11610.435.10470.1.3.4.10.1.2

svIfDeltaStatsIfDeltaTableIfOutDiscards

1.3.6.1.4.1.11610.435.10470.1.3.4.10.1.4

svThresholdDelta

1.3.6.1.4.1.11610.6799.1.15

Profile

Description

Frequency

900 seconds

Severity

Cleared

Condition

DELTA (IF-MIB::ifInDiscards + IF-MIB::ifOutDiscards) == 0

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

83

 

PTS Alarms  Alarm Model M odel 13: Discarded Packet Packets s

3.14.3 PTS—Impact and Suggested Resolution, Alarm Model 13 The type of traffi traffic c that the PTS discards may adversely affect the subscriber subscriber traffic. Packet discards discards can also occur due to a variety of reasons on the different port locations/types within the PTS. Types of Traffic causing Packet Discards

PTS Port Locations for Packet Discards

• • • • •

• • • • •

Data Cluster Service Control Management

Exte Extern rnal al Network Processing Unit (Load Balancing) Internal Fabric (Workfarm, Switch, Fabric) Packet Processing Modules Policy

Alarm 13 Resolution Steps 1.   Identify the  type of traffic  that PTS discar discards ds and the location  of packet discards. Note that the possible locations are specific to the PTS platform and the installed blades under consideration.

These commands provide more information about the type of traffic and location of packet discards: a.   Run the  show alarm location on of packet disca discards. rds. alarms s CLI command to see the traffic type and locati

correlate ate the function of an external port with its b.   Run the  s show how inter interface face confi configurat guration ion CLI command to correl configuration. c.   Run the  show inter interface face drops CLI command to gather additional information on interface packet discards.

When using the   show inter interface face drops drops|non|non-zero zero CLI command, consider narrowing your the search with the non-zero filter. The output from this command validates the location of packet discards within the system. 2.   Once you have identified the type of traffic and location, investigate the packet discards on egress links.

Packet discards on egress links are most often attributed to system under provisioning or bursty traffic conditions. These are some of the examples for packet discards on egress links: Packet Discards on External Interfaces Traffic Type

Reason for Packet Discard

CLI command For Diagnosis/ Resolution

Data

System intersect.under-provisioning for data

sho show w sys blades blades show sho w system interf inttem erface ace rate rate data data

 An excess of fflooded looded packets may be show inter interface face bridg bridge-gro e-group up due to improp improper er cabling for fan-in / fan-out (multiplex). That is, when there are gre greate aterr tha than n two inter interfac faces es in a sin singl gle e bridge bridge group group.. For For exa exampl mple, e, 10 G se sendi nding ng to 1 G. If the interfa interface ce is in a multimulti-port port bridge-group, verify that there is no excess of flooded packets. Cluster 

System under-provisioning for cluster  Revisit guidelines for cluster deployment element interconnectivity or poor cluster  recommendations. topology.

84

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 13: Discarded Packets P ackets

Traffic Type

Reason for Packet Discard

CLI command For Diagnosis/ Resolution

Improper Impro per trunk distr distributi ibution on confi configura guration. tion. •   show confi config g interface interface trunk-distribution •   sho show w interf interface ace rate link-aggregation-group Improper load balancing configuration.

Use these CLI commands to see if the PTS is inspecting an unbalanced amount of traffic: •   show config servic service e load-balancer load-balancer •   sho show w interf interface ace rate link-aggregation-group

Service

Under-provisioning for divert/tee functionality and throughput: • •

Persis Persistent tent drops: underunder-provis provisionin ioning g Period Periodic ic drops: bursts to destin destination ation

•   sho show w interf interface ace rate cluste cluster r •   show policy desti destinatio nation n

Packet Discards on Internal Fabric

Interface drops on internal aggregate fabric data may occur on the switching fabric chip connected to the data interfaces of  the processing module. Possible causes include: •

Uneven distri distribution bution of subsc subscriber riber IP IPs s acros across s elemen elements ts in the c cluster luster..



Ove Over-p r-prov rovisi isione oned d cluster  cluster 



Burs Bursty ty traf traffic fic to the mod module ules. s.

Interface drops on internal aggregate fabric core may occur when the links are saturated. Possible causes include: •

Clu Cluste sterr iis sn not ot a ffull ull mesh. mesh.



Uneven distri distribution bution of traf traffic fic betw between een ele elements ments in in the clu cluster ster..



On the PTS 22 22000, 000, the n number umber of cl cluster uster lilinks nks on the bl blade ade exce exceeds eds the num number ber of cluster cluster links links on the chassis. chassis. See sect section ion 6.4. the PTS Hard more infor informat matio ion. n. 6.4.2 2 PTS 2200 22000: 0: Conn Connect ecting ing Clus Cluster ter Int Interfa erfaces ces in the Hardware ware Ins Install tallatio ation n Guid Guide e, for more

Contact Sandvine Technical Support, for resolution of issues due to packet discards on internal fabric. Packet Discards on Network Processing Units (NPU)

The packet discards on NPUs are heavily dependent on the installed blade. There are two classes—link-side and  switch-side. Type of NPU Packet Discard

Reason for Packet Discard

Link side

For PTS 14520, BLD 24040, and BLD 24080, the reason for packet discard is same as that for  Data   Data Traffic Type in the table  Packet Discards on External Interfaces . Improper NPU assignment or NPU link assignment. Th Thiscan iscan bedue to oversub oversubsc scri ript ptio ion n ofan NP NPU U oran NPU NPU link.

CLI command For Diagnosis

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

85

 

PTS Alarms  Alarm Model M odel 13: Discarded Packet Packets s

Type of NPU Packet Discard

Reason for Packet Discard

CLI command For Diagnosis

Switch-Side

System unde System under-pr r-provis ovisioni ioning ng can lead to proc processi essing ng module module Validate NPU and NPU link egress bursts. assignments via the  show interface inter face npu assignment assignment CLI command.

Packet Discards on Packet Processing Modules

This may due to: • • •

Imprope Improperr load balan balancing cing configuratio configuration. n. Imprope Improperr tu tunneli nneling ng configu configuration ration (that is, IP IP-in-IP -in-IP). ). High us usage age sub subscribe scriberr excee exceeding ding the limita limitations tions of a singl single e modul module. e.

3.   Investigate the packet discards on ingress links.

Ingress packet discards on a cluster or service port may be due to an unknown destination flood. Example: A change in the network networ k path to a tee destinat destination ion host. This may also be indicative of a topolo topology gy loop for service/clus service/cluster ter ports. Sometimes packet discards discards may be due to severe errors in the packets, due to physic physical al link layer reasons. Interface drops on internal aggregate module  ptsm_data1/ptsm_data2 may occur if the  ingress packet rate  to the processing module exceeds the  packet processing/inspection rate. Normally, these discards occur as a result of high CPU utilization on the processing module. Possible causes include: •

Cables are not seated proper properly ly..



UnderUnder-provis provisioned ioned cluste cluster. r.



Sha Sharp rp iincr ncreas ease e in in ne new w sessi sessions ons..



Lar Large ge num number ber of sub subscri scriber bers. s.



Uneven distri distribution bution of subsc subscriber riber IP IPs s acros across s elemen elements ts in the c cluster luster..



Attack traffi traffic—sync—syn-flood flood attacks/addre attacks/address-scan ss-scan attacks attacks..



Ove Overly rly comp complex lex SandSc SandScrip ript. t.

4.   Check whether the cables are firmly seated. 5.   Check the traffic rates to the modules and see if the traffic rates are getting distributed evenly in the modules.

Run the   show servi service ce loadload-balan balancer cer modul modules es CLI command for more information. deployment. yment. 6.   Determine whether the load-balancing algorithm used is appropriate for the deplo Consider using centralized load-balancing by locality, if the network is experiencing extreme asymmetric traffic. 7.   Remove the SandScript policy and those policies that cause the packet discards.

3.14.4 SPB—Impact and Suggested Resolution, Alarm Model 13 Discarded packets will affect performance and might delay subscriber mapping. Identify the port on which the traffic was discarded.

86

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 14: Network Interface Down

These commands provide more information about the type of traffic and location of packet discards: a.   Run the  show alarm alarms s CLI command to see the location of packet discards. b.   Run the  show inter interface face drops CLI command to gather additional information on interface packet discards.

If the needs to discar discard d outgoing packets, is due to resulting a sustai sustained ned exceedin exceeding link speed. It may necessary to splitSRP the cluster and allocate an SRPpackets to each, itpart of the pairload of clusters. It gisthe possible that policy onbe the attached PTS is calling for too much informat information ion from the SRP during traffic inspe inspection. ction. Revise the policy to cache some of the information or to not use such complex table structures.

3.15 Alarm Model 14: Network Interface Down This alarm is raised to indicate that the SNMP entity, acting in an agent role, has detected that the ifOperStatus object for one of its commun communicatio ication n links is about to enter the down state from some other state (but not from the notPre notPresent sent state). This other  state is indicated by the included value of ifOperStatus. Profile

Description

Severities

Major 

Raise Notification

linkDown

Clear Notification

linkUp

Triggers

•   linkDownTrigger  • lin linkUp kUpTr Trigg igger  er  Note: This model is only valid for interface interfaces s with type (iso.org.d (iso.org.dod.in od.interne ternet.mgmt t.mgmt.mib.mib-2.int 2.interfac erfaces.if es.ifT Table able.. ifEntry.ifType) ifEntry.ifT ype) EthernetCsmacd.

Unique Instance Identifier 

IF-MIB:ifIndex

3.15.1 Network Interface Down - Notification This notific notification is sent the administ administrative statusional of thestatus' interf interface ace is up but operat operational ional changes es from to Down. is also sentation if, with bothifadministrat administrative iverative and operat operational are Down, thethe admin administrati istrative vestatus statuschang changes from Up Down to Up It MIB Reference

Description

MIB

IF-MIB

Trap Name

linkDown

Trap OID

1.3.6.1.6.3.1.1.5.3

Varbind Varb ind Name

Varbind OID

ifIndex

1.3.6.1.2.1.2.2.1.1

ifAdminStatus

1.3.6.1.2.1.2.2.1.7

ifOperStatus

1.3.6.1.2.1.2.2.1.8

ifDescr 

1.3.6.1.2.1.2.2.1.2

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

87

 

PTS Alarms  Alarm Model 14: Network Interface Down

Varbind Varb ind Name

Varbind OID

svInterfacesIfOperStatusUpAndStable

1.3.6.1.4.1.11610.435.15747.1.2.2.1.33

Profile

Description

Frequency

8 seconds

Severity

Major 

Condition

(svOperStatusUpAndStable == 2(false)) && (IF-MIB::ifAdminStatus == 1(up))

If a physic physical al interface is down, for exampl example, e, a cable is removed from an interf interface. ace. The alarm displays the name of the physical interface from where the cable is removed.

3.15.2 Network Interface Down - Clear  The clear notification is sent when the interface's ifOperStatus changes from Down to Up, while ifAdminStatus is Up, or when ifAdminStatus changes from Up to Down while ifOperStatus is Down. MIB Reference

Description

MIB

IF-MIB

Trap Name

linkUp

Trap OID

1.3.6.1.6.3.1.1.5.4

Varbind Varb ind Name

Varbind OID

ifIndex

1.3.6.1.2.1.2.2.1.1

ifAdminStatus

1.3.6.1.2.1.2.2.1.7

ifOperStatus

1.3.6.1.2.1.2.2.1.8

ifDescr 

1.3.6.1.2.1.2.2.1.2

svInterfacesIfOperStatusUpAndStable

1.3.6.1.4.1.11610.435.15747.1.2.2.1.33

Profile

Description

Frequency

8 seconds

Severity

Cleared

Condition

(svOperStatusUpAndStable == 1(true)) || (IF-MIB::ifAdminStatus == 2(down))

3.15.3 Network Interface Down - Major Alarms This alarm indicates indicates that the admin administrati istrative ve status of the interface is up but the operatio operational nal status is down.

88

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 15: Unavailable Processing Module

3.15.4 Impact and Suggested Resolutions for Alarm Model 14 This alarm can impact subscriber traffic if the affected interfaces are used for the intersection of traffic or clustering elements: 1.   Run the   show inter interface face confi configurat guration ion  CLI command to verify that the interface is administratively enabled, but  Down. 2.   Run the   set confi config g inter interface face enabl enabled ed  CLI command to disable the port if the interface is not for use under normal operation.

inter terfac face e is a form-f form-fact actor or plugg pluggabl able e mod module ule,, suc such h as SFP SFP+ + or XFP, run the show interface modules 3.   If the in CLI command to inspect the status of the module and the link. switch/router.. 4.   Ensure that the cables are properly seated on the element and on the switch/router Run n the show log com comman mand d to inspe inspect ct the sys system tem logs logs for pos possib sible le fai failur lures, es, err errors ors,, or warni warnings ngs aga agains instt the spe specif cific ic inter interfac face e 5.   Ru which is showing as "down". 6.   If Alarm Mod Model el 16 is also rraised, aised, s see ee the asso associated ciated Impact and Suggested Resolutions  section for further analysis.

3.16 Alarm Model 15: Unavailable Processing Module This alarm is raised to indicate a problem with the operational status of the packet processing Load Balancer. Note:  Alarm Model 15 (Unavailable Processing Module) is not supported on the PTS Linux platform. Profile

Description

Severities

Warning, Minor, Major 

Raise Notification

svLBOperStatusDownNotification

Clear Notification

svLBOperStatusUpNotification

Triggers

• •

Unique Instance Identifier 

svLoadBalancerStatsOperStatus

lbcStat lbcStatusOpe usOperation rationDegrad Degraded ed lbc lbcStat StatusT usTrig rigger  ger 

3.16.1 Load Balancer Down - Notification This notification is sent whenever a load balancer OperStatus is down. This typically indicates that one of the modules to which it is forwarding data is no longer available. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svLBOperStatusDownNotification

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

89

 

PTS Alarms  Alarm Model M odel 15: Unavailable Processing Module

MIB Reference

Description

Trap OID

1.3.6.1.4.1.11610.6799.3.6.0.1

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svLoadBalancerStatsOperStatus

1.3.6.1.4.1.11610.435.8377.1.3.3

3.16.1.1 3.16.1 .1 Load Balancer Balancer is Degraded This notification is sent if load balancer operational status is degraded (value=3). Profile

Description

Frequency

0 seconds (Immediate)

Severity

Minor 

Condition

SANDVINE-MIB::svLoadBalancerStatsOperStatus == 3 (degraded)

3.16.1.2 Load Balancer is Down This notification is sent if the load balancer operational status down (value=2). Profile

Description

Frequency

0 seconds (Immediate)

Severity

Major 

Condition

SANDVINE-MIB::svLoadBalancerStatsOperStatus == 2 (down)

3.16.2 Load Balancer Down - Clear  This This noti notifi fica catio tion n is sent sent if th the e load load ba bala lanc ncer er Oper OperSta Status tus is Up. Up. This This occu occurs rs if all all of the the pr proc ocess essin ing g mo modu dule les, s, to whic which h it is forw forwar ardi ding ng data, are available. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svLBOperStatusUpNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.6.0.2

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

90

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 15: Unavailable Processing Module

Varbind Varb ind Name

Varbind OID

svLoadBalancerStatsOperStatus

1.3.6.1.4.1.11610.435.8377.1.3.3

3.16.2.1 3.16.2 .1 Load Balancer is Operational Operational This notification is sent when the load balancer operational status is up (value=1). Profile

Description

Frequency

0 seconds (Immediate)

Severity

Cleared

Condition

SANDVINE-MIB::svLoadBalancerStatsOperStatus == 1 (up)

3.16.3 Load balancer down - minor alarms This notification indicates one or more modules in a PTS cluster are not up. When a module exits the up state, all traffic assigned to the module is shunte shunted. d. If the module does not reenter the up state after five minutes, all of the traffi traffic c assigned to the modules is rebalanced to other modules. load-balancer cer modules CL interface Run the show service load-balan CLII comm comman and d to vi view ew the the st stat atus us of the the mo modu dule les. s. Run Run the the show interface processing sing traffic traffic.. rate modul modules es CLI command to see if a module is proces

3.16.3.1 3.16.3 .1 Impact and Suggested Resolution, Resolution, Alarm Model 15 During the shunting interval, traffic assigned to the failed module is not inspected. In general, this alarm indicates: • • •

Cer Certai tain n pr proce ocesse sses s are are no nott onlin online. e. There are co communic mmunication ation issues in the PTS c cluster luster.. Module Modules s are overlo overloaded aded due to traffic traffic inspe inspection. ction. To narrow down the root causes of the problem: a.   Run the  show syste system m servi services ces CLI command to verify that the services are online. b.   Run the  show show sys system tem mod module ules s and   show syste system m resou resources rces CLI commands to verify that the PTSD, SFCD, and SCDPD processes are online and resources are available:

Run n the show inter CLI comma command nd to ver verif ify y tha thatt the there re ar are e no inte interfa rface ce do down wn c.   Ru interface face confi configurat guration ion [inte [interface rface] ] CLI alarms for cluster ports. d.   Run the  CLI CLI> > pin ping g > sho show w int interf erface ace cou counte nters rs dat data a CLI command, on the respective PTS, to check any dropped packets on the connected interfaces.

f.   Run the   show inter interface face spann spanning-t ing-tree ree vlans CLI command to verify that the same set of VLANs and MSTP

instances are used in the PTS cluster. g.   Run the   show servi service ce loadload-balan balancer cer maste master r CLI command to determine the master load balancer. Then run the   show servi balancer er to verify that no service ce loadload-balan balancer cer modul modules es detai detail l command on the master load balanc modules are overloaded. See the  PTS Administration Guide  for additional information.

3.16.4 Load balancer down - major alarms This notification indicates that there are no modules available to inspect traffic. Previously discovered modules that are not up are not used to inspect traffic. To view module states in a cluster, use the   show servi service ce load-balan load-balancer cer modules modules CLI command.

3.16.4.1 3.16.4 .1 Impact and Suggested Resolution, Resolution, Alarm Model 15 If this alarm occurs, all traffic that is intersecting intersecting with the PTS is not inspecte inspected. d. In general, this alarm may indicate: •

Cer Certai tain n pr proce ocesse sses s are are no nott onlin online. e.



There are co communic mmunication ation issues in the PTS c cluster luster..



Module Modules s are overlo overloaded aded due to traffic traffic inspe inspection. ction. To narrow down the root causes of the problem: a.   Verify that modules are not administratively been brought down. Use the   s show how system system module modules s CLI command. b.   Verify that the PTSD, SFCD, and SCDPD processes are online and resources are avai available: lable:

show sho w sys system tem mod module ules s show syste system m resou resources rces c.   Verify that no interfaces connecting the cluster (interfaces with function [cluster]) are down. Use the show interface interface configuration [interface]  CLI command. d.   Verify communication between SERVICE SERVICE ports in a PTS cluster:  PTS PTS> > ping ping 60 (days)) && (SANDVINE-MIB::svLicenseState == 3 (valid))

3.19.5 Invalid Software License—Clear  This notification is sent if a module license that was previously invalid or about to expire is now valid. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svSysLicenseValidNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.4.0.10

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svLicenseName

1.3.6.1.4.1.11610.435.7534.1.10.1.2

svLicenseStartDate

1.3.6.1.4.1.11610.435.7534.1.10.1.6

svLicenseExpirationDate

1.3.6.1.4.1.11610.435.7534.1.10.1.7

svLicenseDaysToExpiry

1.3.6.1.4.1.11610.435.7534.1.10.1.8

100

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 19: Invalid Software S oftware License

This alarm is cleared when days to expiry for that module's license is greater than 90 days and the license state is valid (3). Profile

Description

Frequency

30 seconds

Severity

Cleared

Condition

(SANDVINE-MIB::svLice (SANDVINE-MIB: :svLicenseDa nseDaysT ysToExpir oExpiry y > 90 (days)) (days)) && (SANDVINE-MIB:: (SANDVINE-MIB::svLice svLicenseStat nseState e == 3 (valid))

3.19.6 Impact and Suggested Resolution, Alarm Model 19  An alarm is raised if the licenses are invalid, about to expire, already expired, or if the element uses a cached copy of a license. Note that: •

When When th the e lilice cens nse e expi expire res, s, the the elem elemen entt cont contin inue ues s to func functi tion on no norm rmal ally ly.. It does does not not take take any any acti action on un unti till it is re rest star arte ted d or re relo load aded ed.. For example, the PTS daemon continues to inspect packets and function normally until it is restarted or reloaded. At that time it detects that the license is inval invalid id and stops inspecti inspecting ng packets.



If a new set of license licenses s is available available for the licen license se server server,, the element de detects tects it only whe when n it is restarte restarted d or reloaded. reloaded.

The suggested resolutions for alarm model 19 are: • •





• •

Ifthealar Ifthealarm m isno isnott cr criiti tica cal,runthe l,runthe show syste CLII comma command nd to iden identi tify fy the the lice licens nses es th that at wi willll expi expire re,, and and conta contact ct system m licen licenses ses CL the Sandvine Account Team to renew the licenses. If the alarm is crit critical ical and you do no nott use a license serv server er,, and the licens license e is either expi expired red or missing missing on the element, element, then run the   show syste identify ify the license that has the Valid column set to  false. Contact system m licen licenses ses CLI command to ident Sandvine Customer Support or its authorized partner for further assistance. If the lic licens ense e is of type [network]  with the  Valid  column set to  false, then the license was lost, most likely due to a communication error with the license server. In this case, confirm that you have not started more elements than the network license licen se allows. If you have, shut down one to let the element receive the license license.. If the lic licens ense e is of type [unknown], it means that during a reload or restart of the element, the license is lost. Confirm that the license server is operational operational and that the element can connect to the licens license e server. If it does, confirm that you have not started more elements elements than the networ network k license allows. If you have, shut down one to let the element receive receive the licen license se or  consider acquiring additional licenses by contacting your Sandvine Account Team. If th the e llic icen ense se typ type e iis s  [cache], then run the   show syste system m licen licenses ses CLI command to identify identify the license that has the Valid column set to  false. Contact Sandvine Customer Support or its authorized partner for further assistance. Run the   show confi config g servi service ce licen license-se se-server rver CLI command to determine the connected license server.  An output out put similar to this appears displaying the connected license servers based on your configuration: primary host: licensese licenseserver1 rver1 port: port : 6200 first-redundant host: licensese licenseserver2 rver2 port: port : 6200 second-redundant host: licensese licenseserver3 rver3 port: port : 6200

The alarms, triggers, and events associated with licenses are defined in these files: /usr/local/sandvine/etc/alarms/license.alarm.conf /usr/local/sandvine/etc/events/events.license.conf

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

101

 

PTS Alarms  Alarm Model 20: Overloaded Cluster 

3.20 Alarm Model 20: Overloaded Cluster  This alarm is raised if all modules on a PTS cluster have exceeded their designed maximum   targe target t load thres threshold hold or the cost space is insufficient. Note: Cost Space is the maximu maximum m cost limit that is assigned to a PTSD instanc instance e when load balancer balancer is configured configured to balance by “cost” . This value is specified in the load-balancing policy or set by the respective policy package.

 A single, multi-module PTS is considered a cluster, and if all the modules on the P PTS TS element exceed the   targe target t load threshold, this can trigger the major alarm. The default value of the   targe target t load thres threshold hold is 90%. If cost space is insuffici insufficient ent for all the modules, this can trigger the minor alarm. The alarm is cleared when one module is below the   targe target t load thres threshold hold and one module has sufficient cost space. Note:  Alarm Model 20 (Overloaded Cluster) is not supported on the PTS Linux platform. Profile

Description

Severities

Major, Minor, Clear 

Raise Notification

svLBClusterOverloadNotification

Clear Notification

svLBClusterOverloadClearNotification

Triggers

clusterOverloadTrigger 

Unique Instance Identifier 

none – applies to the entire PTS

3.20.1 Overloaded Cluster - Major  MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svLBClusterOverloadNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.6.0.3

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

3.20.2 Overloaded Cluster - Minor  MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svLBClusterOverloadNotification

102

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 20: Overloaded Cluster 

MIB Reference

Description

Trap OID

1.3.6.1.4.1.11610.6799.3.6.0.3

Varbind Varb ind Name

Varbind OID

svClusterConfigName SNMPv2-MIB:sysName

1.3.6.1.4.1.11610.435.5213.1.2.1 1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

3.20.3 Overloaded Cluster - Clear  This notification is sent when one module is below the   targe target t load thres threshold hold and one module has sufficient cost space, which implies that the cluster is no longer overloaded. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svLBClusterOverloadClearNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.6.0.4

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

Profile

Description

Frequency

8 seconds

Severity

Clear 

Condition

 At least one module is below the  targe target t load threshold threshold and one module has sufficient cost space.

3.20.4 Impact and Suggested Resolution, Alarm Model 20 If this alarm occurs, all the modules have a significant load in trying to inspect the traffic delivered to them. The size of the cluster  is underrated for the traffic that is inspected. While this alarm is active, new IP traffic discove discovered red by the load balancer is shunted (not inspected inspected). ). Run the  show servi service ce command d to identi identify fy the maximu maximum m load threshold. If the load of a module exceeds exceeds a load-balan loadbalancer cer modul modules es detai detail l CLI comman maximum load threshold (default setting is 95%), some of the traffic on the module is rebalanced to bring its load below the target load threshold. While this alarm is active, the rebalanced traffic is shunted. Note that in both cases the shunting is temporary and IPs are discovered after a timeout period. If the alarm persists then the load balancer will perpetually be discovering and shunting the same set of IPs. If th the e amou amount nt of tr traf affi fic c th that at th the e clus cluste terr is re rece ceiv ivin ing g can can be re redu duce ced, d, then then it woul would d be the the be best st opti option on.. Revi Review ew the the load load of the the modu module les s using   show servi service ce loadload-balan balancer cer modul modules es. If more PTS elements are added to the cluster, cluster, you can force a rebalance rebalance across the cluster using clear service load-balancer.

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

103

 

PTS Alarms  Alarm Model 21: Overloaded Subcluster 

This alarm may also be triggered if there are communications problems on the cluster links. If this alarm is seen in combination with high Rx, Tx cluster link, or cluster link interface drop alarms, this indicates that there is insufficient cluster network capacity forthe gi give ven n netw networ ork k lo load ad.. If the the dr drop ops s be beco come me exc exces essi sive ve thesyst thesystem em star starts ts plac placin ing g mo modu dule les s into into temp tempor orar ary y shun shuntt to help help alle allevi viat ate e the network congestion. In this case, the resolution is to reduce the data intercept traffic for the cluster or to increase the number  of cluster links. Troubleshooting Cost Space Configuration When configured to load-balance by cost, run the   show servi service ce loadload-balan balancer cer modules modules detail detail CLI command from the CLI operational mode for troubleshooting cost space configuration. The  A  Assign ssignedCos edCost t % column shows the cost assigned per module. If the   Assign AssignedCos edCost t % is greater than 90 % for any module, a minor alarm is triggered.

3.21 Alarm Model 21: Overloaded Subcluster  This alarm indicates that load balancer discovered new traffic, but could not assign it to a module in the subcluster where the traffic was discov traffic discovered. ered. It is sent when all modules in a subcluster subcluster are down, or have exceede exceeded d their target load threshold.  All modules m odules in a subcluster have module loads that exceed a target load threshold. The default value for the   targe target t load threshold setting is 90%. If there is anothe anotherr subcluster with modules below the target load threshold, threshold, then the traffic traffic is assigned assigned to one of its subclusters. subclusters.  Alarm Model 20 (Overloaded Cluster) is triggered t riggered whenever a subcluster s ubcluster is unavailable Note:  Alarm Model 21 (Overloaded Subcluster) is not supported on the P PTS TS Linux platform. Profile

Description

Severities

Major, Clear 

Raise Notification

svLBSubClusterOverloadNotification

Clear Notification

svLBSubClusterOverloadClearNotification

Triggers

•   SubclusterModuleOverload •   SubclusterOverloadClear 

Unique Instance Identifier 

svLoadBalancerAssignmentSubClusterOverloaded

3.21.1 Overloaded Subcluster - Major  This notification is sent if the load balancer indicates that the locality of a subcluster is violated, which implies that the subcluster  is overloaded. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svLBSubClusterOverloadNotification

Trap OID

.3.6.1.4.1.11610.6799.3.6.0.5

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

104

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 21: Overloaded Subcluster 

Varbind Varb ind Name

Varbind OID

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svLoadBalancerAssignmentSubClusterOverloaded

1.3.6.1.4.1.11610.435.8377.1.3.14.14

Profile

Description

Frequency

8 seconds

Severity

Major 

Condition

SANDVINE-MIB::svLoadBalancerAssignmentLocalityViolated == 1 (true)

3.21.2 Overloaded Subcluster - Clear  This notification is sent when one module is below the   targe target t load thres threshold hold and one module has sufficient cost space, which implies that the cluster is no longer overloaded. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svLBClusterOverloadClearNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.6.0.4

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

Profile

Description

Frequency

8 seconds

Severity

Clear 

Condition

 At least one module is below the  targe target t load threshold threshold and one module has sufficient cost space.

3.21.3 Impact and Suggested Resolution, Alarm Model 21 The system assigns IP bundles to another subcluster in order to compensate. Traffic that is re-balanced while the box is in this state is not identified. identified. To view all the modules in a subcluster, run these CLI commands: show serv service ice load load-bal -balance ancer r modu modules les show serv service ice load load-bal -balance ancer r stat stats s

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

105

 

PTS Alarms  Alarm Model 22: Misconfigured Network Net work Awareness

3.22 Alarm Model 22: Misconfigured Network Awareness This alarm is raised if the number of external-netclass addresses seen on a subscriber interface within 5 minutes, has exceeded the rising threshold. See  Alarm Model 77: Misconfigured Network Interface  on page 189 for additional information. Profile

Description

Severities

Minor, Clear 

Raise Notification

svIfExternalAddressesOnSubscriberInterfaceNotification

Clear Notification

svIfExternalAddressesOnSubscriberInterfaceClearedNotification

Triggers

extrenalAddrsOnInternalPorts

Unique Instance Identifier 

N/A

3.22.1 Misconfigured Network Awareness Alarm This alarm is raised if the PTS detects more than 15% of the total packets in either of these conditions: conditions: • •

A packet ar arrives rives on a su subscrib bscriber-fac er-facing ing inte interface rface con containi taining ng an exter external nal source source IP address, address, or  Both clien clientt and and ser server ver have non-zero non-zero p packet acket count.

 A non-zero count in this field can be due to these reasons: • •

All the intern internal al s subnets ubnets are not d defined efined in th the e  subnets.txt file. Bridge Bridge-grou -group p configur configuration ation ha has s changed changed,, which affe affects cts where the dat data a interse intersect ct cables are con connected nected..

3.22.2 Misconfigured Network Awareness- Minor  This notification is sent if the sampled percentage of external addresses seen on internal ports during a 5-minute interval has increased increa sed to 15% or more. Profile Frequency

Description 300 seconds

Severity

Minor 

Condition

delta(svPortTopologyStatsExternalAddressesOnInternalPorts) * 100 / delta(svPortTopologyStats CheckedExternalAddressesOnInternalPorts) >= 15

MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svIfExternalAddressesOnSubscriberInterfaceNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.2.0.9

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

106

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 22: Misconfigured Network Awareness

Varbind Varb ind Name

Varbind OID

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svPortTopologyStatsExternalAddressesOnInternalPorts

1.3.6.1.4.1.11610.435.10470.1.3.1

svPortTopologyStatsCheckedExternalAddresses svPortTopologyStatsChecke dExternalAddressesOnInternalPorts OnInternalPorts 1.3.6.1.4.1.11610.435.10470.1.3.201 svPortTopologyDebugProcessedMissingSubnetList 1.3.6.1.4.1.11610.435.10470.1.9.3

3.22.3 Misconfigured Network Awareness - Clear  This notification is sent when the sampled percentage of external addresses seen on internal ports during a 5-minute interval has fallen below 7%. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svIfExternalAddressesOnSubscriberInterfaceClearedNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.2.0.10

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svPortTopologyStatsExternalAddressesOnInternalPorts

1.3.6.1.4.1.11610.435.10470.1.3.1

svPortTopologyStatsChecke svPortT opologyStatsCheckedExternalAddresses dExternalAddressesOnInternalPorts OnInternalPorts 1.3.6.1.4.1.11610.435.10470.1.3.201 Profile

Description

Frequency

300 seconds

Severity

Cleared

Condition

delta(svPortTopologyStatsExternalAddressesOnInternalPorts) * 100 / delta(svPortTopologyStats CheckedExternalAddressesOnInternalPorts) < 7

3.22.4 Impact and Suggested Resolution, Alarm Model 22 Interface functions may not be operating correctly, as the traffic may not be correctly classified. Any subscriber-specific policy rules may not be properly applied and reporting (especially subnet-based and subscriber-based) may not be accurate. Traffic in the external subnets seen on subscriber interfaces may not be managed as expected, or at all, by the configured policy. correctly.. 1.   Ensure that all PTS ports are wired correctly 2.   Inspect the missing subnets list using the  sh  show ow inter interface face ip-ad ip-address dress-trac -tracking king CLI command.



If this accu accurately rately describ describes es a subscr subscriber iber sub subnet, net, then th this is value s should hould be added added to subnets.txt under the internal cost class (See the  SandScript Configuration Guide, for more information on configuring the  subnets.txt  file. ).

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

107

 

PTS Alarms  Alarm Model 23: Runtime SandScript Er Errors rors



If the missing su subnets bnets lis listt is either too spe specific, cific, or not sp specific ecific en enough, ough, it is like likely ly that for your net network, work, the network network size used for aggregation needs adjustment. This value represents the subnet prefix that should be used while populating IPs into a CIDR list.

3.   Veri erify fy that that the interf interface ace fun functi ctions ons hav have e bee been n correc correctly tly def define ined d usi using ng the show interface configuration command. configuration CLI command.  An Internet-facing I nternet-facing port falsely conf configured igured as a subscriber-facing port could trigger this alarm. 4.   If the subnets.txt  has been modified, perform an  svreload.

3.22.4.1 Adjusting Network Aggregation You may need to adjust the number of top-level bits used for subnets. 1.   Set the network size to the numbe numberr of top-level bits used used for subnets on your subscri subscriber ber networks. networks.

For example, for IPv6, set the network mask width to 48 with the command: PTS# set config config interfac interface e addr addressess-trac tracking king netw networkork-mask mask widt width h ipv6 48

Or, for IPv4, set the network mask width to 48 with the command: PTS# set config config interfac interface e addr addressess-trac tracking king netw networkork-mask mask widt width h ipv4 20

2.   Perform an  svreload. 3.   To inspect the missing subnets list, run the sh  show ow inter interface face ip-ad ip-address dress-trac -tracking king CLI command. Check that it now accurately describes a subscriber subnet.

subscriber ber subnet subnet is now accurate accurately ly descri described, bed, add to subnets.txt. 4.   If the subscri

3.22.4.2 Additional Information These files define the alarms alarms,, trigge triggers, rs, and events for this alarm: •   /usr/local/sandvine/etc/alarms/interfaces.alarm.conf •   /usr/local/sandvine/etc/events/events.ptsm.conf

3.23 Alarm Model 23: Runtime SandScript Errors This alarm is raised when SandScript errors are detected as shown in the SandScript error table. Profile

Description

Severities

• • • •

Raise Notification

svPolicyRuntimeErrorNotification

ClearNotification

svPolicyNoRuntimeErrorNotification

Triggers

• •

Major   Minor   Warni arning ng Clear  

pol policy icyErr ErrorM orMajo ajor  r  pol policy icyErr ErrorM orMino inor  r 

108

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 23: Runtime SandScript E Errors rrors

Profile

Description

Unique Instance Identifier 

svPolicyErrorsPolicyErrorCount

3.23.1 Runtime SandScript Errors  An alarm is raised for f or one of the SandSc SandScript ript policy error instances, if the t he counter increments. The alarm is cleared only if the counter is not incremented for 1 hour or more. The actual time to clear the alarm ranges betwe between en 1 to 2 hours. hours. The condition condition whereby each alarm counter is raised is different for each instance. Instance Name

Possible Cause

 Analyze "dpm" action issued on ReadOnly flow

The analyze "dpm" action is not issued on every SandScript call on a flow flow,, causing the flow to drop to read-only read-only mode.

Demographic stats flow skipped interval

 A flow was not visited in a stats-publishing interval, so the flow's demographic statistics for that interval is reported in the subsequent interval.

Enumerated classifier assigned an invalid value

 An enumerated classifier is assigned with an invalid value.

Expression stats max columns exceeded

 A published expression stat exceeds a field, column, or  classification limit.

Http_response actions issued on ReadOnly flow

 A read-only flow is actioned with  http_response  in the SandScript.

Http_response action issued with a null payload expression

 A  http_response action is issued with a payload string that evaluated to NULL in the SandScript.

Max Measurements Per Flow Exceeded

Flows are measured up to four connections with four bitrate measurements at a time. When SandScript exceeds this limit, an alarm is triggered.

Maximum shapers per flow has been exceeded

Up to si six x shap shaper ers s any any give given n time time can can shap shape e flow flows, s, and and an alar alarm m is triggered when SandScript exceeds this limit.

Maximum Maximu m ports per SMTP host have been exceed exceeded ed

Each Simple Mail Transfer Protocol (SMTP) host is monitored to detect spammers and this detection is done on a per-port basis.. The maximum basis maximum number of ports monitor monitored ed per host is configurable using the  rc.conf  variable  variable spam_max_ports_per_smtp_host. An alarm is triggered when this limit is exceeded.

Max tee destinations per flow exceeded

One or more flows are not tee-d to all desired destinations destinations because the maximum number of tee destinations was exceeded.

Measurement memory allocation failure

 A measurement with a new unique-by key was not allocated due to lack of available memory.

Overloaded, IP shunting

 A burst of traffic that exceeds the packet processing and inspection rate; the PTS applies SandScript.

Overloaded, UDP shunting

 A burst of UDP traffic that exceeds the t he rate that UDP packet processing and inspection rate; the PTS applies SandScript.

Stat was not written to the SPB due to an error 

During statistics integration, a condition prevents adding a statistic to the database. Counters in PDB provide further 

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

109

 

PTS Alarms  Alarm Model 23: Runtime SandScript Er Errors rors

Instance Name

Possible Cause

information inform ation as to why this alarm is raised. raised. They are under  devices/statsIntegration/1/stats . Currently only histogramDefinitionNotPresent raises this alarm. This means that the bin definition for a histogram stat did not arrive. Infinite loop detected

The current policy includes a foreach that can end up lo looping oping infinitely. An Inifinite loop detection triggers when used with a Range, when the Range's: • • •

start  >  ends with a positive positive step. end  >  starts with negative step. St Step ep is zero zero..

DNS Modification of read only flow

 A DNS packet modification is issued on a read-only flow.

DNS modification of fragmented packets

 A DNS modification is issued on a fragmented DNS packet. pack et.

DNS modification of TCP stream

 A DNS modification was issued on a DNS flow that uses TCP.

3.23.2 Runtime SandScript Errors - Major  MIB Reference

Description

MIB

MIB - SANDVIN SANDVINE-MIB E-MIB

Trap Name

svPolicyRuntimeErrorNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.7.0.1

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB::sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svPolicyErrorsPolicyErrorName

1.3.6.1.4.1.11610.435.8083.1.3.2.10.1.2

svPolicyErrorsPolicyErrorCount

1.3.6.1.4.1.11610.435.8083.1.3.2.10.1.3

This notification notification is sent if the SandScri SandScript pt error count exceeds 1 within an interval of 1 hour (3600 seconds). seconds). Profile

Description

Frequency

3600 seconds

Severity

Major 

Condition

DELTA(SANDVINE-MIB::svPolicyErrorsPolicyErrorCount) && SANDVINE-MIB::svPolicyEr  rorsPolicyErrorSeverity == 4

3.23.3 Runtime Sandscript Errors - Minor  This notification is sent if runtime sandscript error count exceeds 1 within an interval of 3600 seconds.

110

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 23: Runtime SandScript E Errors rrors

MIB Reference

Description

MIB

MIB - SANDVIN SANDVINE-MIB E-MIB

Trap Name

svPolicyRuntimeErrorNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.7.0.1

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB::sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svPolicyErrorsPolicyErrorName

1.3.6.1.4.1.11610.435.8083.1.3.2.10.1.2

svPolicyErrorsPolicyErrorCount

1.3.6.1.4.1.11610.435.8083.1.3.2.10.1.3

Profile

Description

Frequency

3600 seconds

Severity

Minor 

Condition

DELTA(SANDVINE-MIB::svPolicyErrorsPolicyErrorCount) && SANDVINE-MIB::svPolicyEr  rorsPolicyErrorSeverity == 5

3.23.4 Runtime Sandscript Errors - Clear  This notification is sent to indicate a previously detected SandScript error. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svPolicyNoRuntimeErrorNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.7.0.2

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB::sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svPolicyErrorsPolicyErrorName

1.3.6.1.4.1.11610.435.8083.1.3.2.10.1.2

svPolicyErrorsPolicyErrorCount

1.3.6.1.4.1.11610.435.8083.1.3.2.10.1.3

This alarm is cleared when the SandSc SandScript ript error count equals 0 within an interval interval of 1 hour (3600 seconds). seconds). Profile

Description

Frequency

3600 seconds

Severity

Cleared

Condition

DELTA (SANDVINE-MIB::svPolicyErrorsPolicyErrorCount) == 0

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

11 111 1

 

PTS Alarms  Alarm Model 23: Runtime SandScript Er Errors rors

3.23.5 Runtime SandScript Errors- Possible Instances The possible SandScript errors are: Name

Applicability

Max Measurements Per Flow Exceeded

PTS 5.20.08 and up on 5.2 stream, PTS 5.40.03 and up on newer streams.

Maximum shapers per flow has been exceeded

PTS 5.20.08 and up on 5.2 stream, PTS 5.40.03 and up on newer streams.

Maximum Maximu m ports per SMTP host have been exceed exceeded ed

PTS 5.20.08 and up on 5.2 stream, PTS 5.40.03 and up on newer streams.

Overloaded, IP shunting

PTS 5.40.03 and up.

Overloaded, UDP shunting

PTS 5.40.03 and up.

Measurement memory allocation failure

PTS 5.51 and up.

Demographic stats flow skipped interval

PTS 5.51 and up.

Enumerated classifier assigned an invalid value

PTS 5.51 and up.

Expression stats max columns exceeded

PTS 5.51 and up.

Max tee destinations per flow exceeded

PTS 5.51 and up.

Http_response actions issued on ReadOnly flow

PTS 5.51 and up.

Stat was not written to the SPB due to an error 

PTS 5.51.09 and up.

Http_response action issued with a null payload expression

PTS 5.51.12 and up.

 Analyze "dpm" action issued on ReadOnly flow

PTS 5.60.06 and up.

Policy Controller overloaded on ppu

PTS 6.10 and up.

Policy Controller degraded on ppu

PTS 6.10 and up.

Policy Controller degraded on control processor 

PTS 6.10 and up.

Infinite loop detected

PTS 7.35 and up.

DNS modification of read only flow

PTS 7.35 and up.

DNS modification of fragmented packets

PTS 7.35 and up.

DNS modification of TCP stream

PTS 7.35 and up.

Run the   s show how pol policy icy err errors ors CLI command to display errors.

3.23.6 SandScript Errors These are the possible types of SandScript errors:

3.23.6.1 3.23.6 .1 http_response http_response Action Issued on ReadOnly ReadOnly Flow Flows are actioned only with the  http_response action if it is writeabl writeable. e. A flow is writeable if each SandScript SandScript run for that flow Flow.Strea Stream.Rea m.ReadOnly dOnly = false action. result in a   set Flow.

112

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 23: Runtime SandScript E Errors rrors

If any SandSc SandScript ript run for a flow does not result in this action, the flow irreversib irreversibly ly becomes read-on read-only ly and the  http_response action on the flow results in this SandScript error. To resolve this alarm, edit SandScript such that any flows for which you want  http_response action are always previously actioned with   set Flow. Flow.Strea Stream.Rea m.ReadOnly dOnly = false.

3.23.6.2 3.23.6 .2 Max Measurements Measurements per Flow Exceeded Up to four connections with four bitrate measurements can measure flows. An alarm occurs when SandScript exceeds this limit. For example: measurem meas urement ent "A" connecti connections ons wher where e true measurem meas urement ent "B" connecti connections ons wher where e true measurem meas urement ent "C" connecti connections ons wher where e true measurem meas urement ent "D" connecti connections ons wher where e true measurem meas urement ent "E" connecti connections ons wher where e true

The first four measurements measure flows correctly, while the fifth measurement triggers an alarm.

3.23.6.3 Max Shapers per Flow Exceeded Up to four shapers can simultan simultaneously eously shape a flow and an alarm is trigge triggered red if SandScript exceeds this limit. For example: shaper shap er "A" 100M 100Mbps bps shaper shap er "B" 100M 100Mbps bps shaper shap er "C" 100M 100Mbps bps shaper shap er "D" 100M 100Mbps bps shaper shap er "E" 100M 100Mbps bps if true true then \ shape sha pe to client client shaper shaper "A" \ shape sha pe to client client shaper shaper "B" \ shape sha pe to client client shaper shaper "C" \ shape to client shape client shaper shaper "D" \ shape sha pe to client client shape shaper r "E"

The first four measurements measure flows correctly, while the fifth triggers an alarm.

3.23.6.4 Max Ports per SMTP Host Exceeded Each SMTP host is monito monitored red to detect spamme spammers rs and this detectio detection n is done on a per-port per-port basis. The maximum number of ports that are monitored per host is configurable using the  rc.conf  variable  variable reaching this spam_max_ports_per_smtp_host. Spammers are detected on any port that detects SMTP traffic before reaching maximum limit. limit. If SMTP traffic is detected on any port after this limit is reache reached, d, that traffi traffic c is not inspected to determine determine if it is spam.

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

113

 

PTS Alarms  Alarm Model 23: Runtime SandScript Er Errors rors

3.23.6.5 PTSM IP Shunting (PTS Only) If the incoming packet rate exceeds the rate packets that the PTSM processes, then PTSM goes into the IP shunting mode such that all packets are bridged rather than processed and inspected. For example, a 64-byte UDP packet at 50% of line rate of interf interface ace with the Rx queue triggers shuntin shunting. g.

3.23.6.6 3.23.6 .6 PTSM UDP Shunting (PTS Only) If the incoming UDP packet rate exceeds the rate packets that the PTSM inspects, then PTSM goes into the UDP shunting mode such that all UDP packets are bridged rather than inspected. For example, a 64-byte UDP packet at 25% of line rate of interf interface ace with Rx queue trigge triggers rs shunting.

3.23.7 Impact and Suggested Resolution, Alarm Model 23 The impact is different for each instance of this alarm. Instance name

Impact

Max Measurements Per Flow

Measurements may not work as expected.

Exceeded Maximum shapers per flow has been Shapers may not work as expected. exceeded Maximum ports per SMTP host have If SMTP traffic is detected on any ports after the limit is reached then that traffic is not been exceeded inspected to determine spam. Overloaded, IP shunting

The PTS goes into the IP shunti shunting ng mode where all the packets are bridged instead of  processed. This happens at a packet level- so it does not matter if the packet belongs to a flow where SandScript is applied. The determination is made that the latency becomes excessive and the probability of dropping packets goes up if further processing is done. Thus the packet is not shaped, nor does it have any other SandScript applied to it. It is not cou counte nted d in the statis statistic tics s for the sub subscr scribe iberr, any pro protoc tocol, ol, or cla classi ssific ficati ation. on. For statis statistic tical al purposes, purpo ses, it is counted with the protocol set to 'shun 'shunted'. ted'. When the burst of traffi traffic c has subsided, the system automatically detects when to return to normal processing.

Overloaded, UDP shunting

The The PT PTS S goe goes s into into the UDP shu shunti nting ng mod mode e whe where re all the UDP packet packets s are bri bridg dged ed instea instead d of processed. This happens at a packet level thus it does not matter if the packet belongs to a flow that had SandScript applied applied to it. The determination determination is made that the latenc latency y becomes excessive and the probability of dropping packets goes up if further processing is done. Thus the packet is not shaped, nor does it have any other SandScript SandScript applied to it. It is not counted in the statist statistics ics for the subscriber subscriber,, any protocol, protocol, or classi classificati fication. on. For statistical purposes, it is counted with the protocol set to 'shunted'. When the burst of traffic has subsided, the system automatically detects when to return to normal processing.

Measurement memory allocation failure

Measurements may not work as expected.

Demographic stats flow skipped interval

The demographic statistics of the flow are reported in the incorrect interval.

Enumerated classifier assigned an invalid value

Classifiers may not work as expected. Some flows may not get classified correctly.

114

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 23: Runtime SandScript E Errors rrors

Instance name

Impact

Expression stats max columns exceeded

Not all subscriber measurements measurements that are expected to be repor reported ted persist. As a result these measurements are not available in the reports.

Max tee destinations per flow

Flows are not teed to all destinations.

exceeded Http_response actions issued on ReadOnly flow

 A flow f low in the SandScript to send an HTTP response is not actioned.

Stat was not written to the SPB due to an error 

Some statistics statistics data is missin missing g from the SPB.

Http_response action issued with a null payload expression

 A flow f low in the SandScript to send an HTTP response is not actioned.

 Analyze "dpm" action issued on ReadOnly flow

 A flow f low in the SandScript is not analyzed.

Policy Controller overloaded on module

Messages between the server and client are incomplete due to message size limitations. This means that the control system is missing information for some unique 'by instances' because of one of these reasons: •

Having more more uniq unique ue 'by ins instances tances'' than recommende recommended d in the siz sizing ing guideline guidelines. s.



Usin Using g too too many many me metr tric ics s or havi having ng too too ma many ny hist histog ogra ram m bins bins for for a metr metric ic data data defi defini niti tion on..

Policy Policy Control Controller ler degrade degraded d on mod module ule The SandSc SandScrip riptt con control troller lers s on a mod module ule do not pro proces cess s all related related mes messag sages. es. Dep Depend ending ing on how many messages are not processed, this degrades controller functionality and results in incomplete information for controller processing. This can occur if the module is under very heavy load. If this alarm instan instance ce is raised raised,, monitor that for some time and consider this as an issue only if it persis persists ts or repeat repeats. s. Policy Con Policy Contro trolle llerr deg degrad raded ed on con contro troll The SandScript controller server is unable to process all related messages from the processor  clients. This results in degraded behavior, where the control system actions incomplete information. This can occur when the controller is under heavy load. Infinite loop detected

The infinite  foreach statement is not run.

DNS modification of read only flow

The DNS modification is not applied to the flow.

DNS modification of fragmented packets

The DNS modification is not applied to the packet.

DNS modification of TCP stream

The DNS modification is not applied to the flow.

The suggested resolutions for each instance of the alarm are: Instance name

Suggested resolutions

Max Measurements Per Flow Exceeded

 Adjust the t he measurements configuration in the S SandScript andScript so that the documented maximums are not exceeded.

Maximum shapers per flow has been  Adjust shaper s haper policies so tthe he number of shapers in a S SandScript andScript do not exceed the exceeded documented maximum. Maximum ports per SMTP host have There is a defaul defaultt limit of 5 ports that a mail server can receive receive SMTP traffic. traffic. You can been exceeded run the  set  set confi config g netwo network-pr rk-protect otection ion max-portsmax-ports-per-s per-smtp-h mtp-host ost CLI command to change the default value. Before doing this, Sandvine recommends you to verify that the ports are valid for this email server and are not used for spam or email

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

115

 

PTS Alarms  Alarm Model 23: Runtime SandScript Er Errors rors

Instance name

Suggested resolutions

redirection server. If the SMTP traffic is detected on any ports, after the limit is reached, the traffic is not inspec inspected ted for spam. Overloaded, IP shunting

This alarm resolves resolves itself within 1 to 2 hours after the burst subsides. Run the  clear alarms count alarms counters ers CLI command to clear all the counter-based alarms to manually clear the alarm. If the burst sustains or re-occurs then the alarm is raised again after it is cleare cleared. d. Sys System tems s tha thatt con contin tinual ually ly enc encoun ounter ter this this alarm alarm are lik likely ely und underer-pro provis vision ioned, ed, apply apply expensive SandScript, or subject to some sustained change in the process of heavy traffic. Increase the cluster capacity to resolve this issue.

Overloaded, UDP shunting

This alarm resolves resolves itself within 1 to 2 hours after the burst subsides. Y You ou can run the clear alarms count counters ers CLI command to clear all the counter-based alarms to manually clear the alarm. If the burst sustains or re-occurs then the alarm is raised again after it is cleared. Systems that continually encounter this alarm are likely under-provisioned, apply expensive SandScript, or subject to some sustained change in the process of heavy traffic. Increase the cluster capacity to resolve this issue.

Measurement memory allocation failure

Consider the cardinality of the unique-by keys used in the measurements and adjust SandScript to reduce the number of measurement instances.

Demographic stats flow skipped interval

The system is busy and cannot evaluate SandScript on all flows. Reduce the amount of  traffic that the PTS is processing or simplify SandScript.

Enumerated classifier assigned an invalid value

Check San Check SandSc dScrip riptt to ens ensure ure tha thatt enu enumer merate ated d cla classi ssifie fiers rs are set to the cor correc rectt enu enumer merate ated d classifier values.

Expression stats max columns exceeded

Check published subscriber measurements that are unique-by (subscriber, classifier) and ensure that the set is small.

Max tee destinations per flow exceeded

 Adjust the t he tee configuration in the SandScript s so o the documented maximums are not exceeded.

Http_response actions issued on ReadOnly flow

To action a flow with  http_response, provide a   set Flow.Strea Flow.Stream.Rea m.ReadOnly dOnly = false action on the flow each time SandScript is run on that flow. This keeps the flow out of the read-onl read-only y state.

Stat was not written to the SPB due to an error 

histogramDefinitionNotPresent: The CND or PTSD / CMT buffer is probably overloaded causing histogram instances to arrive after the statistic is written to the SPB. Check that an excessive number of instances (>10000) for the histogram measurements is not getting published.

Http_response action issued with a null payload expression

 Adjust the t he SandScript such that it is not possible to issue an  http_response action with a NULL payload string.

 Analyze "dpm" action issued on ReadOnly flow

 Analyze "dpm" must be issued on every DNS S SandScript andScript call leading up to and including the current one for the flow to remain writa writable. ble.

Policy Controller overloaded on module

Decrease the total memory footprint of each unique by instance. To do this, decrease one of: •

The num number ber of p poli olicy cy m metr etrics ics..



The num number ber of hi histog stogram ram bins bins for the output_histogram  parameter.



The n numb umber er of hist histogr ogram am bi bins ns fo forr a me metric tric data  parameter.

Policy Policy Control Controller ler degrade degraded d on mod module ule Ensure that each new flow is sampled no more than once for QoE. If the PTS is regularly under very heavy load, then it is advisabl advisable e to only sample a subset of the total number  of flows (for example, every second flow). Systems that continually encounter this alarm are likely under-provisioned, apply expensive SandScript, subject to some sustained

116

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 24: High Network Interface Rx Rat Rate e

Instance name

Suggested resolutions

change in process heavy traffic, or subject to sustained changes in process heavy traffic. Increase the cluster capacity to resolve this issue. Policy Pol icy Con Contro trolle llerr deg degrad raded ed on con contro troll De Decre crease ase the numbe numberr of uniq unique ue by ins instan tances ces tha thatt the clu cluste sterr su supp pport orts. s. System Systems s co conti ntinu nuall ally y processor 

enco encounter untering ing this alarm are in likely like ly under-pro under-provisio visioned, ned,or appl apply y expensive expen SandS SandScrip cript, t, subje subject to some sustained change process heavy traffic, subject tosive sustained changes inct process heavy traffic. Increase the cluster capacity to resolve this issue.

Infinite loop detected

Identify the  foreach statement that has parameters that result in an infinite loop. Inifinite loop detection is triggered when a Range's: • start  >  end with positive step. • end  >  starts with a negati negative ve step. • step step is ze zero ro.. Fix the  foreach range parameters so that they do not satisfy the condition above.

DNS modification of read only flow

Provide a   set Flow. Flow.Strea Stream.Rea m.ReadOnly dOnly = false action on the flow every time SandScript SandSc ript is run on that flow. This keeps the flow out of the read-only read-only state.

DNS modification of fragmented packets

Use   Flow.Application.DNS.IsModif Flow.Application.DNS.IsModifiable iable to determine if a modification of a particular DNS packet is supported. Avoid performing any DNS modifications when the value of this field is  false.

DNS modification of TCP stream

Use   Flow.Application.DNS.IsModif Flow.Application.DNS.IsModifiable iable to determine if a modification of a particular DNS packet is supported. Avoid performing any DNS modifications when the value of this field is  false.

3.24 Alarm Model 24: High Network Interface Rx Rate This alarm is raised when an interface interface port is receiv receiving ing data at a rate that exceeds 90% of its capacity capacity.. It inclu includes des the current rate, max rate, utilization percentage, percentage, and descrip description tion of the associ associated ated interfac interface e port. If the data rate drops below 80% then a clear notification is generated. Note that this notification is only sent for cluster interface ports. Profile

Description

Severities

Major 

Raise Notification

svSysIfPortRxRateHighNotification

Clear Notification

svSysIfPortRxRateNormalNotification

Triggers

ifPortRxRateTrigger 

Unique Instance Identifier 

svPortTopologyPortRateDescription

3.24.1 High Network Interface Rx Rate - Major  MIB Reference

Description

MIB

SANDVINE-MIB

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

117

 

PTS Alarms  Alarm Model 24: High Network Interface Rx Rate

MIB Reference

Description

Trap Name

svSysIfPortRxRateHighNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.2.0.11

Varbind Varb ind Name svClusterConfigName

Varbind OID 1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB::sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svPortTopologyPortRateRxRate

1.3.6.1.4.1.11610.435.10470.1.40.1.1

svPortTopologyPortRateMaxRate

1.3.6.1.4.1.11610.435.10470.1.40.1.3

svPortTopologyPortRateRxUtilization

1.3.6.1.4.1.11610.435.10470.1.40.1.4

svPortTopologyPortRateDescription

1.3.6.1.4.1.11610.435.10470.1.40.1.6

svPortTopologyPortRateRxRate64

1.3.6.1.4.1.11610.435.10470.1.40.1.9

svPortTopologyPortRateMaxRate64

1.3.6.1.4.1.11610.435.10470.1.40.1.11

This notification notification is sent if the utilizat utilization ion for a partic particular ular port reaches or exceeds 90% of the maximum supporte supported d Rx rate. The 32-bit varbinds 64-bit varbinds:svPortTopologyPortRateRxRate and svPortTopologyPortRateMaxRate are deprecated in favour of these new • •

svPortT svPortTopolo opologyPortR gyPortRateRxR ateRxRate64 ate64 svPortT svPortTopolo opologyPortR gyPortRateMax ateMaxRate64 Rate64

Profile

Description

Frequency

8 seconds

Severity

Major 

Condition

SANDVINE-MIB::svPortTopologyPortRateRxUtilization >= 90%

3.24.2 High Network Interface Rx Rate Cleared This notification notification is sent if an interface port, that previous previously ly sent out a rate high notification notification,, is now receiving receiving data at a rate that is less than or equal to 80% of its capacity. It includes the current rate, max rate, utilization percentage, and description of the associated interface port. Note that this notification is only sent for cluster interface ports. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svSysIfPortRxRateNormalNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.2.0.12

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB::sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svPortTopologyPortRateRxRate

1.3.6.1.4.1.11610.435.10470.1.40.1.1

118

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 24: High Network Interface Rx Rat Rate e

Varbind Varb ind Name

Varbind OID

svPortTopologyPortRateMaxRate

1.3.6.1.4.1.11610.435.10470.1.40.1.3

svPortTopologyPortRateRxUtilization

1.3.6.1.4.1.11610.435.10470.1.40.1.4

svPortTopologyPortRateDescription

1.3.6.1.4.1.11610.435.10470.1.40.1.6

svPortTopologyPortRateRxRate64

1.3.6.1.4.1.11610.435.10470.1.40.1.9

svPortTopologyPortRateMaxRate64

1.3.6.1.4.1.11610.435.10470.1.40.1.11

This alarm is cleared when the utilization for a particular port drops to at least 80% of the maximum supported Rx rate. The 32-bit varbinds svPortTopologyPortRateRxRate and svPortTopologyPortRateMaxRate are deprecated in favour of these new 64-bit varbinds: • •

svPortT svPortTopolo opologyPortR gyPortRateRxR ateRxRate64 ate64 svPortT svPortTopolo opologyPortR gyPortRateMax ateMaxRate64 Rate64

Profile

Description

Frequency

8 seconds

Severity

Cleared

Condition

SANDVINE-MIB::svPortTopologyPortRateRxUtilization = 90%

MIB Reference

Trap Name

MIB

SANDVINE-MIB

Trap Name

svSysIfPortTxRateHighNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.2.0.13

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB::sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svPortTopologyPortRateTxRate

1.3.6.1.4.1.11610.435.10470.1.40.1.2

svPortTopologyPortRateMaxRate

1.3.6.1.4.1.11610.435.10470.1.40.1.3

svPortTopologyPortRateTxUtilization

1.3.6.1.4.1.11610.435.10470.1.40.1.5

svPortTopologyPortRateDescription

1.3.6.1.4.1.11610.435.10470.1.40.1.6

svPortTopologyPortRateTxRate64

1.3.6.1.4.1.11610.435.10470.1.40.1.10

svPortTopologyPortRateMaxRate64

1.3.6.1.4.1.11610.435.10470.1.40.1.11

120

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 25: High Network Interface Tx Rate

The 32-bit varbinds svPortTopologyPortRateTxRate and svPortTopologyPortRateMaxRate are deprecated in favour of these new 64-bit varbinds: • •

svPortT svPortTopolo opologyPortR gyPortRateTxR ateTxRate64 ate64 svPortT svPortTopolo opologyPortR gyPortRateMax ateMaxRate64 Rate64

3.25.2 High Network Interface Tx Rate Cleared This notification is sent if an interface port, that previously sent out a rate high notification, is now transmitting data at a rate that is less than or equal to 80% of its capacity capacity.. It inclu includes des the current rate, max rate, utilization utilization percent percentage, age, and descripti description on of the associated interface port. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svSysIfPortTxRateNormalNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.2.0.14

Varbind Varb ind Name

Varbind OID

svClusterConfigName SNMPv2-MIB::sysName

1.3.6.1.4.1.11610.435.5213.1.2.1 1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svPortTopologyPortRateTxRate

1.3.6.1.4.1.11610.435.10470.1.40.1.2

svPortTopologyPortRateMaxRate

1.3.6.1.4.1.11610.435.10470.1.40.1.3

svPortTopologyPortRateTxUtilization

1.3.6.1.4.1.11610.435.10470.1.40.1.5

svPortTopologyPortRateDescription

1.3.6.1.4.1.11610.435.10470.1.40.1.6

svPortTopologyPortRateTxRate64

1.3.6.1.4.1.11610.435.10470.1.40.1.10

svPortTopologyPortRateMaxRate64

1.3.6.1.4.1.11610.435.10470.1.40.1.11

This alarm is cleared when the utilization for a particular port drops to at least 80% of the maximum supported Tx rate. The 32-bit varbinds svPortTopologyPortRateTxRate and svPortTopologyPortRateMaxRate are deprecated in favour of these new 64-bit varbinds: • •

svPortT svPortTopolo opologyPortR gyPortRateTxR ateTxRate64 ate64 svPortT svPortTopolo opologyPortR gyPortRateMax ateMaxRate64 Rate64

Profile

Description

Frequency

8 seconds

Severity

Cleared

Condition

SANDVINE-MIB::svPortTopologyPortRateTxUtilization = 2

Description

 A piece of hardware has been detect detected ed as faulty (consistent failure).

126

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 28: Discarded Subscriber S ubscriber Stat State e

3.27.2 Hardware no longer faulted  A piece of faulty hardware that was replaced and is no longer exhibiting failures. This notification indicates that the hardware is not reporting faults. Profile

Description

MIB

SANDVINE-MIB

Trap Name

svEnvHardwareFaultNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.1.0.16

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

Hardware fault: Clear error  Profile

Description

Frequency

0 seconds (immediate)

Severity

Minor 

Condition (marginal)

SANDVINE-MIB::svDiagnosticsFailureNumFailures = = 0

Description

 A piece of faulty hardware that was replaced and is no longer exhibiting failures.

3.27.3 Impact and Suggested Resolution, Alarm Model 27 For either alarm severity, you may experience unexpected behavior such as loss link or dropped packets. Read the failure comment and look for other alarms such as interface module errors. errors. If there is nothi nothing ng obviously wrong with the unit, there is most likely a fault that needs to be addressed. addressed. Contact Sandvine Sandvine Customer Support or its authorized partner immediately.

3.2 3. 28 Alarm larm Model 28: Disc iscarded rded Subscrib riber Sta tate te This alarm is raised if either the PTS or the SDE receives an SPB request to clear all current current subscriber mappings. mappings. Profile

Description

Severities

Minor 

Raise Notification

svSubMappingFlushedNotification

Clear Notification

svSubMappingFlushClearNotification

Triggers

subMappingFlushed

Unique Instance identifier 

svSubscriberMapManagementSubscriberClearEvents

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

127

 

PTS Alarms  Alarm Model M odel 28: Discarded Subscriber State

3.28.1 Subscriber Mappings Cleared - Notification MIB

Trap Name

MIB

SANDVINE-MIB

Trap Name Trap OID

svSubMappingFlushedNotification 1.3.6.1.4.1.11610.6799.3.8.0.3

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svSubscriberMapManagementSubscriberClearEvents

1.3.6.1.4.1.11610.435.7742.1.3.1000.3

3.28.1.1 Subscriber Mappings Flushed This notification notification is sent if the SPB sends a 'clea 'clearr subscriber mappi mappings' ngs' message to the PTS or SDE. Profile

Description

Frequency

8 seconds

Severity

Minor 

Condition

DELTA (SANDVINE-MIB::svSubscriberMapManagementSubscriberClearEvents) > 1

3.28.2 3.28. 2 Subscriber Subscriber Mappings Cleared - Clear  Notification sent 30 minutes after most recent IP mapping flush. It indicates that subscribers subscribers are mapped on SPB and PTS/SDE as normal. normal. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svSubMappingFlushClearNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.8.0.4

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svSubscriberMapManagementSubscriberClearEvents

1.3.6.1.4.1.11610.435.7742.1.3.1000.3

3.28.2.1 Clear Trap for Subscriber Mappings Flushed This alarm is cleared when the SPB has not sent a “clear subscriber mappings” message to the PTS/SDE for 1800 seconds (30 minutes).

128

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 29: Disabled Subscriber Lookups

Profile

Description

Frequency

1800 seconds

Severity

Cleared

Condition

DELTA (SANDVINE-MIB::svSubscriberMapManagementSubscriberClearEvents) == 0

3.28.3 Subscriber Mappings on SPB and PTS/SDE Cleared - Minor  Alarm This notification is sent if the SPB sends a “clear subscriber mappings” message to the PTS/SDE. This causes the counter  SANDVINE-MIB::svSubscriberMapManagementSubscriberClearEvents to increment by 1 and will clear all subscriber  mappings mappin gs from the PTS/SDE.

3.28.4 Impact and Suggested Resolution, Alarm Model 28  Alarm Model 28 is ttypically ypically the result of a mass calling event taking place on the SP SPB. B. Unt Untilil subscribers are remapped, these are the impacts of alarm 28: •

There is no sub subscribe scriber-awar r-aware e SandS SandScript cript ma managin naging g subscriber subscriber tr traffic affic..



Subscr Subscriber-b iber-based ased reporting reporting shows under counti counting. ng.



Subscr Subscriber iber attribut attributes es are not ava availabl ilable e to the the PTS or SDE. Check the SPB IP Mapper monitor logs to verify the alarm conditio condition. n. Login to the SPB and check for error messages in these files: •   /var/log/sonicmq.log •   /var/log/jboss-server.log •   /var/log/svlog

3.29 Alarm Model 29: Disabled Subscriber  Lookups This alarm is raised when either the PTS or SDE receives a disable-l disable-lookups ookups notificatio notification n from the SPB. A mass calling event on the SPB is typically the cause of Alarm Model 29. Profile

Description

Severities

Minor 

Raise Notification

svSubLookupsDisabledNotification

Clear Notification

svSubLookupsEnabledNotification

Triggers

• •

Unique Instance identifier 

N/A

sub subLoo Lookup kupsDi sDisab sabled led sub subLoo Lookup kupsEn sEnabl abled ed

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

129

 

PTS Alarms  Alarm Model 29: Disabled Subscriber Lookups

3.29.1 Disabled Subscriber Lookups MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name Trap OID

svSubLookupsDisabledNotification 1.3.6.1.4.1.11610.6799.3.8.0.5

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

3.29.1.1 Disabled Subscriber Lookups This notification notification is sent when the SPB disab disables les subscrib subscriber er lookups on the PTS and the SDE. Profile

Description

Frequency

8 seconds

Severity

Minor 

Condition

SANDVINE-MIB::svSubscriberMapControlPerformLookups == false

3.29.2 Disabled Subscriber Lookups Cleared This notification is sent when the PTS or SDE receives an enable-lookups notification from the SPB. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svSubLookupsEnabledNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.8.0.6

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

3.29.2.1 Clear Trap for Subscriber Lookups Disabled This notification notification is sent when the SPB enabl enables es subscriber subscriber lookups on the PTS and the SDE. Profile

Description

Frequency

8 seconds

Severity

Minor 

Condition

SANDVINE-MIB::svSubscriberMapControlPerformLookups == true

130

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 30: Delayed Subscriber Subscr iber Mapping

3.29.3 Subscriber Lookups Disabled—Minor  This notification is sent when the SPB sends a  cannot process lookups  message to either the PTS or SDE. This causes the counter, SANDVINE-MIB::s SANDVIN E-MIB::svSubscr vSubscriberMa iberMapManag pManagementIp ementIpLookup LookupsDisa sDisabledEv bledEvents, ents, to increm increment ent by 1 and preven prevents ts the PTS from perfor performing ming any IP-based subscriber lookups. The PTS/SDE resumes lookups when the SPB sends a lookup processing enabled message, which cause SANDVINE-MIB::svSubscriberMapManagementIpLookupsEnabledEvents to increment, and lowers the alarm.

3.29.4 Impact and Suggested Resolution, Alarm Model 29 When this When this al alarm arm oc occur curs, s, until until the time time the SPB se sends nds a looku lookup p proce processi ssing ng ena enabl bled ed mes messag sage, e, unmap unmapped ped IP add addres resse ses s in subscr subscrib iber  er  classes are not looked up from the SPB. For these addresses, this can cause: •

Subscr Subscriber-a iber-aware ware S SandScr andScript ipt is not ma managing naging subscriber subscriber tr traffic affic..



Subscr Subscriber-b iber-based ased reporting reporting shows under counti counting. ng.



Subscr Subscriber iber attribut attributes es are not ava availabl ilable e to the the PTS or the SDE.

 A Mass Calling Event on the SPB is typically the cause of tthis his alarm. Check the SPB IP Mapper Monitor logs, logon to the SPB and check the logs for error messages: /var/log/sonicmq.log /var/log/jboss-server.log /var/log/svlog

3.30 3. 30 Alar Alarm m Mode Modell 30: 30: Dela Delaye yed d Sub Subsc scri ribe berr Mapp Mappin ing g This alarm is raised when there are significant significant delays in IP mappin mapping g on the PTS or SDE. Profile

Description

Severities Raise Notification

Minor  svSubMappingLateNotification

Clear Notification

svSubMappingOnTimeNotification

Triggers

subMappingLate

Unique Instance identifier 

svSubscriberMapStatsIpAddressesMappedLate

3.30.1 Delayed Subscriber Mapping MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svSubMappingLateNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.8.0.1

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

131

 

PTS Alarms  Alarm Model 30: Delayed Subscriber Mapping

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svSubscriberMapConfigLateMappingTimeout

1.3.6.1.4.1.11610.435.7742.1.2.100

DISMAN-EXPRESSION-MIB::expValueCounter64Val

1.3.6.1.2.1.90.1.3.1.1.9."Sandvine.subMappingOnTimeDeltaExp. 0.0.0

DISMAN-EXPRESSION-MIB::expValueCounter64Val

1.3.6.1.2.1.90.1.3.1.1.9."Sandvine.subMappingLateDeltaExp. 0.0.0

3.30.1.1 3.30.1 .1 Delayed Subscriber Subscriber Mapping on the PTS/SDE This notification is sent when more than 5% of all subscriber mappings occur late within an interval of 15 minutes (900 seconds). This notification notification is sent if, over the past 15-mi 15-minute nute interval interval,, 5% or more of the IP addresses addresses mapped to subscribers took longer  than 30 seconds to be mapped, after the arrival of the first data packet for that IP address. Profile

Description

Frequency

0 seconds (Immediate)

Severity Condition

Minor  (expValueCounter64Val."Sandvine". "subMappingLateDeltaExp".0.0.0 * 100) / (expValueCounter64Val."Sandvine" (expValue Counter64Val."Sandvine"."subMappingOnTimeDeltaExp".0.0.0 ."subMappingOnTimeDeltaExp".0.0.0 + expValueCounter64Val."Sandvine expValu eCounter64Val."Sandvine"."subMappingLateDeltaExp".0.0.0) "."subMappingLateDeltaExp".0.0.0) >= 5

3.30.2 Delayed Subscriber Mapping Cleared This notification is sent if IP mapping on the PTS/SDE has been operating within reasonable delays within an interval of 900 seconds (15 minutes). MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svSubMappingOnTimeNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.8.0.2

Varbind Varb ind Name

Varbind Varbi nd OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svSubscriberMapConfigLateMappingTimeout

1.3.6.1.4.1.11610.435.7742.1.2.100

DISMAN-EXPRESSION-MIB::e xpValueCounter64Val

1.3.6.1.2.1.90.1.3.1.1.9."Sandvine"."subMappingOnTimeDeltaExp".0.0.0

DISMAN-EXPRESSION-MIB::e xpValueCounter64Val

1.3.6.1.2.1.90.1.3.1.1.9."Sandvine"."subMappingLateDeltaExp".0.0.0

132

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 30: Delayed Subscriber Subscr iber Mapping

3.30.2.1 3.30.2 .1 Clear Trap for Delayed Delayed Subscriber Subscriber Mapping on PTS/SDE PTS/SDE This notification is sent when less than 1% of all subscriber mappings, within an interval of 900 seconds (15 minutes), occur late. Profile

Description

Frequency Severity

0 seconds (Immediate) Cleared

Condition

(expValueCounter64Val."Sandvine"."subMappingLateDeltaExp".0.0.0 * 100) / ( expValueCounter64Val."Sandvine"."subMappingOnTimeDeltaExp".0.0.0 expValueCounter64Val."Sandvine"."subMapping OnTimeDeltaExp".0.0.0 + expValueCounter64Val. "Sandvine"."subMappingLateDeltaExp".0.0.0) snmptable -v2c -cpublic localhost SANDVINESANDVINE-MIB::svDia MIB::svDiameterStats meterStatsDiameterA DiameterAlarmsTable larmsTable SNMP table: SANDVINESANDVINE-MIB::svDia MIB::svDiameterStats meterStatsDiameterA DiameterAlarmsTable larmsTable svDi svDia amet meterSt erSta atsD tsDiame iamet terAl erAla arms rmsName Name svD svDiam iameter eterS Stat tatsDia sDiam meter eterA Alar larmsCo msCou unt Earl Early y ag age e thre thresh shol old d re reac ache hed d fo for r in inco comi ming ng m mes essa sage ge 0 Incoming message dropped 0 Age Age thre hreshol shold d exceed ceede ed for for incomi comin ng messa essag ge 0 Outgoing message dropped 0 Outgoing rate too high 0 Incoming rate too high 0 Incoming rate reached threshold 0 Outgoing rate reached threshold 0

Profile

Description

Severities

• •

Raise Notification

svDiameterErrorNotification

Clear Notification

svDiameterNoErrorNotification

Trigger 

diameterErrorTrigger 

Major   Clear  

3.37.1 Unknown Diameter Session ID Error—Raise This notification notification is sent for one of the Diameter error instan instances, ces, if the counte counterr has increased increased at all. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svDiameterUnknownSessionNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.9.0.3

158

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 38: Diameter Error 

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svDiameterStatsDiameterAlarmsName

1.3.6.1.4.1.11610.435.15450.1.3.115.100.1.2

svDiameterStatsDiameterAlarmsCount

1.3.6.1.4.1.11610.435.15450.1.3.115.100.1.3

3.37.2 Diameter Error—Clear  This notification is sent when the counter stops increasing for an hour or more. Note: It may take 1-2 hours for the alarm to clear. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svDiameterNoErrorNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.9.0.4

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svDiameterStatsDiameterAlarmsName

1.3.6.1.4.1.11610.435.15450.1.3.115.100.1.2

svDiameterStatsDiameterAlarmsCount

1.3.6.1.4.1.11610.435.15450.1.3.115.100.1.3

3.37.3 Impact and Suggested Resolution, Alarm Model 38 The impact is different for each alarm instance. Condition Name

Possible Cause

Outgoing rate too high

Dropped outgoing Diameter messages.

Incoming rate too high

The incoming message receive rate is throttled resulting in dropped Diameter  messages.

Outgoing message dropped

Dropped outgoing Diameter messages.

Incoming message dropped

Dropped incoming Diameter messages.

Early age threshold reached for incoming  Age of tthe he incoming messages reached the configured early threshold. Possible P ossible message cause is that the local Diameter peer is busy.  Age threshold exceeded for incoming message

Incoming messages are dropped.

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

159

 

PTS Alarms  Alarm Model 39: Diameter Server Outgoing Message Age Exc Exceeded eeded Maximum Threshold

Condition Name

Possible Cause

Incoming rate reached threshold

Dropped incoming Diameter messages.

Outgoing rate reached threshold

Dropped outgoing Diameter messages.

The suggested resolutions for each instance of the alarm are: Condition Name

Resolution

Outgoing rate too high

 Adjust either outgoing Diameter m message essage rate or configured limit for outgoing message rate.

Incoming rate too high

 Adjust either incoming Diameter m message essage rate or configured limit for incoming message rate.

Outgoing message dropped

 An increase in the   OutgoingDroppedCreationFailur OutgoingDroppedCreationFailures es  is responsible for  Outgoing Outgo ing messa message ge dropp dropped ed alarm. Run the   show show ser servic vice e dia diamet meter er messag messages es detail detail CLI command. Note: Ensure the diameter  max-message-size  max-message-size is configured with the recommended value.

show ser servic vice e dia diamet meter er messag messages es detail detail CLI command. Incoming message dropped Run the   show Early age threshold reached for incoming Reduce the load on the local Diameter node or adjust the configured value of early message age threshold for incoming messages.

 Age threshold exceeded for incoming message

Reduce the load on the local Diameter node or adjust the configured value of  maximum age threshold for incoming messages.

Incoming rate reached threshold

Try to adjust either incoming Diameter message rate or configured threshold limit for incoming message rate.

Outgoing rate reached threshold

Try to adjust either outgoing Diameter message rate or configured threshold limit for outgoing message rate.

3.38 Alarm Model 39: Diameter Server Outgoing Message Age Exceeded Maximum Threshold This alarm is raised when the outgoing message age on a Diameter server exceeds the maximum threshold. It can occur on any of the Diameter Diameter peers listed using the   sho show w ser servic vice e dia diamet meter er pee peer r ser server ver command. Profile

Description

Severities

• •

Raise Notification

svDiameterServerMsgAgeNotification

Clear Notification

svDiameterServerMsgAgeNoErrorNotification

Major   Clear  

160

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 39: Diameter Server Outgoing Message A Age ge Exceeded Maximum Threshold

3.38.1 3.38 .1 Diam Diamet eter er Serv Server er Outgo Outgoin ing g Mess Messag age e Age Ex Exce ceed eded ed Maxi Maximu mum m Threshold—Raise This notification is sent when the age of an outgoing message reaches the maximum threshold. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svDiameterServerMsgAgeNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.9.0.15

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svDiameterStatsServerPeerLocalIdentity

1.3.6.1.4.1.11610.435.15450.1.3.109.101.1.2

svDiameterStatsServerPeerOutgoingExceededMaxAgeThres 1.3.6.1.4.1.11610.435.15450.1.3.109.101.1.29 hold

3.38.2 3.38 .2 Diam Diamet eter er Serv Server er Outgo Outgoin ing g Mess Messag age e Age Ex Exce ceed eded ed Maxi Maximu mum m Threshold—Clear  This notification is sent when no outgoing message reaches the maximum threshold within one minute. Note: It can take several minute minutes s to clear this alarm. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svDiameterServerMsgAgeNoErrorNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.9.0.16

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svDiameterStatsServerPeerLocalIdentity

1.3.6.1.4.1.11610.435.15450.1.3.109.101.1.2

svDiameterStatsServerPeerOutgoingExceededMaxAgeThres 1.3.6.1.4.1.11610.435.15450.1.3.109.101.1.29 hold

3.38.3 Impact and Suggested Resolution, Alarm Model 39 Outgoing messages are dropped if this alarm condition continues to persist.

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

161

 

PTS Alarms  Alarm Model 40: Diameter Peer Failed Back Over 

command d to check the Diameter message drops. drops. 1.   Run the   show show ser servic vice e dia diamet meter er mes messag sages es det detail ail CLI comman 2.   Determine if the local Diameter peer is overloaded. 3.   Adjust the configured server's outgoing messages' maximum age threshold value.

3.39 Alarm Model 40: Diameter Peer Failed Back Over  This alarm is raised when a Diameter peer fails over to the secondary peer. Run the sho show w servic service e diamet diameter er peer peer client client CLI command to identify the Diameter peers on which the alarm can occur. Profile

Description

Severities

• •

Raise Notification Clear Notification

svDiameterPeerFailedOverNotification svDiameterPeerFailedBackNotification

Warni arning ng Clear  

3.39.1 Diameter Peer Failed Back Over - Raise This notification is sent when the Diameter peer fails over to the secondary peer. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svDiameterPeerFailedOverNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.9.0.5

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svDiameterStatsClientPeerRemoteIdentity

1.3.6.1.4.1.11610.435.15450.1.3.109.100.1.7

svDiameterStatsClientPeerFailedOver 

1.3.6.1.4.1.11610.435.15450.1.3.109.100.1.14

3.39.2 Diameter Peer Failed Back Over—Clear  This notification is raised when the secondary peer fails back to primary peer.

162

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 41: Diameter Server Connection with Client Peer Lost

MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svDiameterPeerFailedBackNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.9.0.6

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svDiameterStatsClientPeerRemoteIdentity

1.3.6.1.4.1.11610.435.15450.1.3.109.100.1.7

svDiameterStatsClientPeerFailedOver 

1.3.6.1.4.1.11610.435.15450.1.3.109.100.1.14

3.39.3 Impact and Suggested Resolution, Alarm Model 40 If left to persist, messages destined to the primary peer are sent to its secondary Diameter peer. 1.   Run these CLI commands to check for potential errors and discards: show serv service ice diam diameter eter mess messages ages deta detail il show serv service ice diam diameter eter mess messages ages rate

2.   Check: a.   That the remote primary Diameter peer is operational and reachable.

hostname name of the remote remote primary Di Diameter ameter pee peerr is correct in  diam_peer_config.xml. b.   That the host c.   Whether the Diameter local peer has sent any invalid message to the remote Diameter peer. peer.

3.40 Ala 3.40 Alarm Model odel 41 41:: Diam Diamet eter er Serv Server er Conn Connec ecti tion on with Client Peer Lost This alarm is raised when a server loses a connection with a client peer. It can occur on any of the configured configured Diamete Diameterr peers that appear when you run the   sho show w ser servic vice e dia diamet meter er pee peer r ser server ver CLI command. Profile

Description

Severities

• •

Raise Notification

svDiameterServerErrorNotification

Clear Notification

svDiameterServerNoErrorNotification

Warni arning ng Clear  

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

163

 

PTS Alarms  Alarm Model 41: Diameter Server Connection with Client Peer Lost

3.40.1 3.40. 1 Diameter Diameter Server Server Connection Connection with Client Client Peer Lost—Raise Lost—Raise MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svDiameterServerErrorNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.9.0.7

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svDiameterStatsServerPeerRemoteIdentity

1.3.6.1.4.1.11610.435.15450.1.3.109.101.1.27

svDiameterStatsServerPeerChildrenDestroyed

1.3.6.1.4.1.11610.435.15450.1.3.109.101.1.8

3.40.2 3.40. 2 Diameter Diameter Server Server Connection Connection with Client Client Peer Lost—Clear  Lost—Clear  This notification notification is sent when a server does not lose a connection with a clien clientt peer within an hour or more. Note: The actual time to clear may range between one and two hours. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svDiameterServerNoErrorNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.9.0.8

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

sysName svSeverity

1.3.6.1.2.1.1.5 1.3.6.1.4.1.11610.6799.1.10

svDiameterStatsServerPeerRemoteIdentity

1.3.6.1.4.1.11610.435.15450.1.3.109.101.1.27

svDiameterStatsServerPeerChildrenDestroyed

1.3.6.1.4.1.11610.435.15450.1.3.109.101.1.8

3.40.3 Impact and Suggested Resolution, Alarm Model 41 If left to persist, messages destined destined to the remote Diameter clien clientt peer may be dropped. dropped. 1.   Run the   show show ser servic vice e dia diamet meter er pee peer r ser server ver CLI command to review the Diameter peer server details. 2.   Check if the: a.   Remote Diameter peer is operational and reachable.

164

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 42: Diameter Client Outgoing Message Age Reached Early Threshold

b.   Diameter local peer has sent any invalid message to the remote Diameter peer.

3.41 Alarm Model 42: Diameter Client Outgoing Message Age Reached Early Threshold This This al alar arm m is ra rais ised ed when when th the e age age of a clie client nts s outg outgoi oing ng me mess ssag age e re reac ache hes s the the earl early y thre thresh shol old. d. Run Run the the show service service diam diameter eter peer clien client t  CLI command to see the list of Diameter peers that this alarm can occur on. Profile

Description

Severities

• •

Raise Notification

svDiameterClientEarlyAgeNotification

Clear Notification

svDiameterClientEarlyAgeNoErrorNotification

Minor   Clear  

3.41.1 Diameter Client Outgoing Message Age Reached Early Threshold—Raise This notification is sent when the age of a client's outgoing message reaches the early threshold. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svDiameterClientEarlyAgeNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.9.0.9

Varbind Varb ind Name

Varbind OID

svClusterConfigName sysName

1.3.6.1.4.1.11610.435.5213.1.2.1 1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svDiameterStatsClientPeerRemoteIdentity

1.3.6.1.4.1.11610.435.15450.1.3.109.100.1.7

svDiameterStatsClientPeerOutgoingExceededMaxAgeEarly Threshold

1.3.6.1.4.1.11610.435.15450.1.3.109.100.1.44

3.41.2 Diameter Client Outgoing Message Age Reached Early Threshold—Clear  This notification is sent when an outgoing message does not reach the early threshold within one minute.

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

165

 

PTS Alarms  Alarm Model 43: Diameter Client Outgoing Message Age Exc Exceeded eeded Maximum Threshold

Note: The actual time to clear may range between one and two minutes. minutes. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svDiameterClientEarlyAgeNoErrorNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.9.0.10

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svDiameterStatsClientPeerRemoteIdentity

1.3.6.1.4.1.11610.435.15450.1.3.109.100.1.7

svDiameterStatsClientPeerOutgoingExceededMaxAgeEarly Threshold

1.3.6.1.4.1.11610.435.15450.1.3.109.100.1.44

3.41.3 Impact and Suggested Resolution, Alarm Model 42 Outgoing Outgoi ng messages are dropped if this alarm is left to persis persist. t. 1.   Run the   show show ser servic vice e dia diamet meter er mes messag sages es det detail ail CLI command to view details. 2.   Check whether:

local cal Diameter peer is overloaded. a.   The lo adjust ust the configured outgoing message early age threshold value. b.   You need to adj

3.42 Alarm Model 43: Diameter Client Outgoing Message Age Exceeded Maximum Threshold This alarm is raised when the age of a client’s outgoing message reaches the maximum threshold. This alarm can occur on any Diameter peer listed after running the   sho show w ser servic vice e diamet diameter er peer peer client client CLI command. Profile

Description

Severities

• •

Raise Notification

svDiameterClientMsgAgeNotification

Clear Notification

svDiameterClientMsgAgeNoErrorNotification

Major   Clear  

166

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 43: Diameter Client Outgoing Message Age Exceeded Maximum Threshold

3.42.1 Diameter Client Outgoing Message Age Exceeded Maximum Threshold—Raise MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svDiameterClientMsgAgeNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.9.0.11

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svDiameterStatsClientPeerRemoteIdentity

1.3.6.1.4.1.11610.435.15450.1.3.109.100.1.7

svDiameterStatsClientPeerOutgoingExceededMaxAgeThres hold

1.3.6.1.4.1.11610.435.15450.1.3.109.100.1.45

3.42.2 Diameter Client Outgoing Message Age Exceeded Maximum Threshold—Clear  This notification is sent when outgoing messages do not reach the maximum threshold within 1 minute. Note: The actual time to clear may range between one and two minutes. minutes. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svDiameterClientMsgAgeNoErrorNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.9.0.12

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svDiameterStatsClientPeerRemoteIdentity

1.3.6.1.4.1.11610.435.15450.1.3.109.100.1.7

svDiameterStatsClientPeerOutgoingExceededMaxAgeThres hold

1.3.6.1.4.1.11610.435.15450.1.3.109.100.1.45

3.42.3 Impact and Suggested Resolution, Alarm Model 43

Outgoing message are dropped if this alarm is left to persist. Outgoing 1.   Run the   show show ser servic vice e dia diamet meter er mes messag sages es det detail ail CLI command to view details.

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

167

 

PTS Alarms  Alarm Model 44: Diameter Server Outgoing Message Age Reached Early Threshold

2.   Check:

Diameter iameter peer is overloaded. a.   If the local D b.   Adjust the configured client's outgoing messages maximum age threshold value.

3.43 Alarm Model 44: Diameter Server Outgoing Message Age Reached Early Threshold This alarm is raised when the age of a server's outgoing message reaches the early threshold. The Diameter server experiences a lag in attempting to send messages to a client. This indicates that the client is experiencing issues or taking too long to process messages. Run the   sho identify fy that Diameter peers that this alarm occurs on. show w ser servic vice e dia diamet meter er pee peer r ser server ver CLI command to identi Profile

Description

Severities

• •

Raise Notification

svDiameterServerEarlyAgeNotification

Clear Notification

svDiameterServerEarlyAgeNoErrorNotification

Warni arning ng Clear  

3.43.1 Diameter Server Outgoing Message Age Reached Early Threshold—Warning This notification is sent when the age of a server's outgoing message reaches the early threshold. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svDiameterServerEarlyAgeNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.9.0.13

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svDiameterStatsServerPeerLocalIdentity

1.3.6.1.4.1.11610.435.15450.1.3.109.101.1.2

svDiameterStatsServerPeerOutgoingExceededMaxAgeEarly Threshold

1.3.6.1.4.1.11610.435.15450.1.3.109.101.1.28

168

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 50: Unknown Diameter Session-ID

3.43.2 Diameter Server Outgoing Message Age Reached Early Threshold - Clear  This notification is sent when none of the outgoing messages reach the early threshold within one minute. Note: The actual time to clear may range between one and two minutes. minutes. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svDiameterServerEarlyAgeNoErrorNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.9.0.14

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svDiameterStatsServerPeerLocalIdentity svDiameterStatsServerPeerOutgoingExceededMaxAgeEarly Threshold

1.3.6.1.4.1.11610.435.15450.1.3.109.101.1.2 1.3.6.1.4.1.11610.435.15450.1.3.109.101.1.28

3.43.3 Impact and Suggested Resolution, Alarm Model 44 Outgoing Outgoi ng messages are dropped if this alarm is left to persis persist. t. 1.   Run the   show show ser servic vice e dia diamet meter er mes messag sages es det detail ail CLI command to check for potential errors and discards. 2.   Check whether:

local cal Diameter peer is overloaded. a.   The lo b.   You need to adj adjust ust the configured outgoing message early age threshold value.

3.44 Alarm Model 50: Unknown Diameter  Session-ID This alarm is raised when there is an unsolicited unsolicited reques requestt from the OCS in the Gy interface or from the PCRF in the Gx interface to the PTS, with a Diameter session ID that is unknown to the PTS. Profile

Description

Severities

Minor, Clear 

Raise Notification

svDiameterUnknownSessionNotification

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

169

 

PTS Alarms  Alarm Model 50: Unknown Diameter Session-ID

Profile

Description

Clear Notification

svDiameterNoUnknownSessionNotification

Trigger 

GxUnknownSessionsTrigger and GyUnknownSessionsTrigger 

This alarm has these instances: • Gy: wh when en the varbin varbind d argu argument ment sv svMeasure MeasurementsIn mentsIndex dex is 17899 •

Gx: wh when en the varbin varbind d argu argument ment sv svMeasure MeasurementsIn mentsIndex dex is 17640 17640..

3.44.1 Unknown Diameter Session ID Error- Minor  This notification notification is sent for one of the Diameter error instan instances ces if the counter has increased at all. This notification notification indica indicates tes that an unsoli unsolicited cited reques requestt was sent from the OCS in the Gy interface, interface, or PCRF in the Gx interface to the PTS, with a session ID that is unknown to the receiving PTS. These reques requestt types are available available in either interface interface:: • •

RAR—R RAR—Re-Autho e-Authorizati rization on Reques Requests ts ASR ASR—Ab —Abort ort Sess Session ion Reques Requests ts

Since Sin ce unknow unknown n sessio session n IDs occ occur ur dur during ing reg regula ularr ope operati ration, on, the ala alarm rm occ occurs urs onl only y when when more tha than n 10, 10,000 000 inv invali alid d IDs are det detect ected ed within a 10 minute interval. The alarm is cleared when fewer than 2,000 invalid IDs are detected within 10 minutes. Invalid Invali d session IDs occur at these times: •

RAR/ASR RAR/ASR—just —just before llogout. ogout. T These hese re requests quests should not aff affect ect oper operation ations. s.



After PTS mod module ule reba rebalanci lancing. ng. The new mod module ule re-i re-initiat nitiates es the sessi session, on, so any upda updates tes the OCS/P OCS/PCRF CRF sent to the PTS PTS are sent to the new one when the new session is establish established. ed.

If there is a message routing issue or a severe problem with the OCS/PCRF, it is expected that all unsolicited requests will have invalid session IDs. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svDiameterUnknownSessionNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.9.0.3

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svDiameterStatsDiameterAlarmsName

1.3.6.1.4.1.11610.435.15450.1.3.115.100.1.2

svDiameterStatsDiameterAlarmsCount

1.3.6.1.4.1.11610.435.15450.1.3.115.100.1.3

3.44.2 Impact and Suggested Resolution, Alarm Model 50 If left to persist, unsolicited requests from the OCS/PCRF are not deployed on the PTS. Service continues but incorrect policy

could deploy to subscribers: 1.   Verify the configuration of routing Diameter messages on the OCS/PCRF/Diameter Proxy/Network

170

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 51: Diameter Interface Error 

re-initiation tiation of sessions 2.   Correlate with alarms that can indicate information loss on the PTS. If so, take measures to verify re-ini over the Diameter interface with OCS/PCRF.

3.45 Alarm Model 51: Diameter Interface Error  This alarm is raised when an error condition is detected in the PCRF on the Gx interface or the OCS on the Gy interface. Profile

Description

Severities

Major, Clear 

Raise Notification

svDiameterInterfaceErrorNotification

Clear Notification

svDiameterNoInterfaceErrorNotification

Triggers

GxInterfaceErrorTrigger,, GyInterfaceErrorTrigger GxInterfaceErrorTrigger GyInterfaceErrorTrigger,, GyEventChargingInterfaceErrorTrigger 

Unique Instance Identifier 

• •

Gx Gx:: 18 180 075 Gy Gy:: 17 179 900

• Gy Even Eventt Cha Chargi rging: ng: 227 22710 10 Usage Management 3.00, Usage Management 4.20

 Applicability This alarm has these instances: •

Gx: wh when en the varbin varbind d argu argument ment sv svMeasure MeasurementsIn mentsIndex dex is 18075



Gy: wh when en the varbin varbind d argu argument ment sv svMeasure MeasurementsIn mentsIndex dex is 17900 17900..

3.45.1 Raise and clear notifications Raising MIB Reference

Description

MIB

SANDVINE-DIAMETER-IF-MIB

Trap Name

svDiameterInterfaceErrorNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.9.1.3

Clearing MIB Reference

Description

MIB

SANDVINE-DIAMETER-IF-MIB

Trap Name

svDiameterNoInterfaceErrorNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.9.1.4

Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

171

 

PTS Alarms  Alarm Model M odel 51: Diameter IInterface nterface E Error  rror 

Varbind Varb ind Name

Varbind OID

sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svMeasurementsIndex

1.3.6.1.4.1.11610.435.12757.1.10.1.100

svMeasurementsValue

1.3.6.1.4.1.11610.435.12757.1.10.1.3

Varbind values if the alarm is raised from the Policy Enforcement package: Varbind Varb ind Name

Varbind Varbi nd OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.4

svMeasurementsIndex

1.3.6.1.4.1.11610.435.12757.1.10.1.1.18075

svMeasurementsValue

1.3.6.1.4.1.11610.435.12757.1.10.1.3.18075

Varbind values if the alarm is raised from the Online Charging package: Varbind Varb ind Name

Varbind Varbi nd OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.4

svMeasurementsIndex

1.3.6.1.4.1.11610.435.12757.1.10.1.1.17900

svMeasurementsValue

1.3.6.1.4.1.11610.435.12757.1.10.1.3.17900

3.45.2 Diameter Interface Error—Major  This notification notification is sent when an error condition is detected in the PCRF, PCRF, on the Gx interface or the OCS on the Gy interface. interface. It is cleared when no problems occur for 10 minutes. Gx: possible causes

• • • • • • • • • • •

Mismatc Mismatched hed session session ID IDs s bet between ween reque requests sts a and nd a answers. nswers. PCRF trying to s set et an an un unknown known trigge trigger. r. PCRF trying to set conflicting conflicting trigge triggers. rs. Sessio Sessions ns are d deleted eleted withou withoutt termin terminate ate mess messages ages du due e to a sev severe ere pro protocol tocol is issue. sue. Confli Conflicting cting add/remove add/remove AVPs AVPs while while iinstall nstalling ing rules. Confli Conflicting cting rule activation activation and deactivation deactivation times. Mis Missin sing g rul rule e names. names. Mis Missin sing g rev revali alidat dation ion tim times. es. Sent the same rul rule e name in both the Cha Chargingrging-RuleRule-Install Install an and d the Chargi Charging-Rul ng-Rule_Remo e_Remove ve AVPs AVPs in CCA. Instal Installed led a rule w where here the de deactiva activation tion time iis s less than th the e activa activation tion time iin n CCA-I/U CCA-I/U.. Sen Sentt the wro wrong ng trigg trigger er n name ame in in CC CCA-I/ A-I/U. U.



Sent the Ch Chargin arging g Rule Inst Install all A AVP VP withou withoutt a rule nam name e or a defini definition tion in CC CCA-I/U. A-I/U.

172

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 51: Diameter Interface Error 

• • • • • • • • • • • • • •

••

Sentt result Sen result cod code e UNR UNRECO ECOVER VERABL ABLE_E E_ERRO RRORS RS (For (For exa exampl mple: e: 300 3001,3 1,3003 003,30 ,3005, 05, and so on) in the result result code code AVP in CCA-I/ CCA-I/U. U. Sent the GX_TR GX_TRIGGER_ IGGER_NO_EVE NO_EVENT_TR NT_TRIGGER IGGERS S with o other ther tri triggers ggers iin n CCA-I CCA-I/U. /U. Sent the re revalid validation ation timeout timeout trigger trigger witho without ut a revali revalidation dation ttime ime A AVP VP in the CCA CCA-I/U. -I/U. Instal Installed led the sessi session on level key in CCA-I/U CCA-I/U and then inst installed alled a send new se session ssion leve levell key without disablin disabling g the previous previous one in CCA-U. Sent a sessio session n level key with the wr wrong ong grant typ type e value. Thi This s means that the Tx gran grant, t, Rx grant, and Total Total grant grant do not have a null value. Sent a diffe different rent GSU type wi without thout dis disablin abling g the existin existing g session le level vel monito monitoring ring key in the CCA CCA-U/RAR. -U/RAR. Sent the rule le level vel key in CCA-I CCA-I/U /U and instal installed led the sessi session on level key wi with th the same name (ru (rule le level key name) name) in CCA-U. Config Configured ured two prima primary ry PCRF connec connection tion with the PTS, insta installed lled more than 8 monitoring monitoring keys along along with a traffic classifier classifier,, and sent traffic for all of them. This can also indica indicate te that the PCRF sent a CCA message with an incorrect session ID. Disab Disabled led a s session ession level key wh which ich do does es not exist tthrough hrough CCA-U. Sent mor more e than 3 30 0 monitoring monitoring ke keys ys in CC CCA-I/U A-I/U or m more ore tha than n 30 PCC rrules ules in CCA-I/U CCA-I/U.. More than one se sessionssion-level level key in install stall was re received ceived in a sin single gle Usage Mon Monitorin itoring g Informati Information, on, CCA, or RAR. Recei Received ved a Usage Moni Monitoring toring su support pport dis disable able with oth other er grant A AVPs VPs in the Usage Mo Monitori nitoring ng Informa Information tion of CCA or RAR. The Usag Usage e Monitoring Monitoring Report Req Required uired wa was s received bu butt the Usage repo report rt trigger trigger was not receive received d for the session. session. The Usage Monitoring Report required was received along with other grant AVPs. The Usag Usage e Monitoring Monitoring Informa Information tion was rec received eived in CCA and RA RAR R but the monitor monitoring ing key was abse absent nt in the Usage Monitoring Monitoring Information AVP. The new mo monitori nitoring key re received ceived but ut granting w was askey presen present t in led. the Usag Usage e Monito Monitoring ring Info Informatio rmation. n. Usage m monitor onitoring ingng rep report ort was is re require quired d butb nono m monitor onitoring iis s instal installed.

Gy: possible causes

• • • • •

Mismatc Mismatched hed session session ID IDs s bet between ween reque requests sts a and nd a answers. nswers. Mismatc Mismatched hed R RG/Serv G/Service ice Ids Ids bet between ween reque requests sts an and d ans answers. wers. OCS is trying trying to s set et an unknow unknown n trigger trigger.. Sessio Sessions ns that are d deleted eleted w without ithout te terminat rminate e message messages s due to a seve severe re protoc protocol ol issue. issue. Quota grants that do not matc match h the config configuratio uration n of th the e servi service. ce.

3.45.3 Impact and Suggested Resolutions, Alarm Model 51 If left to persist, the applications applications may not function function.. This alarm indicates a severe problem in the interface between the PTS and OCS/PCRF: 1.   Create traffic captures of Diameter traffic and analyze them. 2.   Analyz Analyze e the o output utput o off the C CLI LI command command sh  show ow usage usage-mana -managemen gement t polic policy-enf y-enforcem orcement ent error-even error-events ts. If any of  the counters has a non-zero value, this may indicate the source of the error. To get more detailed information, such as the time at which the error occurre occurred d or the subscrib subscriber er IP IP,, run the CLI command  show usage-mana usage-managemen gement t policy-enf polic y-enforcem orcement ent error error-even -events ts log.

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

173

 

PTS Alarms  Alarm Model 52: Diameter Missing Subscriber Information

3.46 Alarm 3.4 larm Mode Modell 52: 52: Diam Diame eter ter Mis iss sing ing Subs Subscr crib iber  er  Information This This al alar arm m is ra rais ised ed when when a sess sessio ion n is not not es esta tabl blis ishe hed, d, for for a su subs bscr crib iber er wi with th PCRF PCRF for for Gx or OC OCS S for for Gy Gy,, due due to miss missin ing g ma mand ndat ator ory y IP ses sessio sion n or sub subscr scribe iberr inform informati ation. on. Som Some e sub subscr scribe iberr inf inform ormati ation on req requir uired ed to ini initia tiate te a Dia Diamet meter er sessio session n is mis missin sing. g. Thi This s indica indicates tes a configuration mismatch between the SPB, PTS, and RADIUS server. Profile

Description

Severities

• •

Raise Notification

svDiameterMissingSubscriberInfoNotification

Clear Notification

svDiameterNoMissingSubscriberInfoNotification

Triggers

• •

Unique Instance Identifier 

svMeasurementsIndex

Major   Clear  

GxMissi GxMissingSubs ngSubscriber criberInfoT InfoTrigger  rigger  GyMissi GyMissingSubs ngSubscriber criberInfoT InfoTrigger  rigger 

3.46.1 Missing subscriber information—Minor  This notification notification is sent when a session is not establis established, hed, for a subscr subscriber iber with PCRF for Gx or OCS for Gy, due to missing missing mandatory IP session or subscriber information. MIB Reference

Description

MIB

SANDVINE-DIAMETER-IF-MIB

Trap Name

svDiameterMissingSubscriberInfoNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.9.1.5

Varbind values if the alarm is raised from the Policy Enforcement package: Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svMeasurementsIndex

1.3.6.1.4.1.11610.435.12757.1.10.1.1.17638

svMeasurementsValue

1.3.6.1.4.1.11610.435.12757.1.10.1.3.17638

Varbind values if the alarm is raised from the Online Charging package: Varbind Varb ind Name

Varbind OID

svClusterConfigName

1.3.6.1.4.1.11610.435.5213.1.2.1

174

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 53: Unknown Diameter Service

Varbind Varb ind Name

Varbind OID

SNMPv2-MIB:sysName

1.3.6.1.2.1.1.5

svSeverity

1.3.6.1.4.1.11610.6799.1.10

svMeasurementsIndex

1.3.6.1.4.1.11610.435.12757.1.10.1.1.17903

svMeasurementsValue

1.3.6.1.4.1.11610.435.12757.1.10.1.3.17903

3.46.2 Missing subscriber information—Clear  This notification is sent when 10 minutes pass without detecting problems with subscriber information. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svDiameterNoMissingSubscriberInfoNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.9.1.6

3.46.3 Impact and Suggested Resolutions, Alarm Model 52 Sessions may not start if the source of the alarm is not addressed. 1.   Create traffic captures of RADIUS (or any other mapping technology used in the deployment) traffic and analyze them. 2.   Verify that these package parameters are configured with information that exists in those traffic captures:



Gx: su subscrip bscription_ tion_id_typ id_type, e, sub subscript scription_i ion_id_data d_data,, ip_ ip_can_typ can_type e



Gy: subscr subscription iption_id_t _id_type, ype, subscr subscription iption_id_d _id_data ata

3.47 Alarm Model 53: Unknown Diameter Service This alarm is raised when an unknown service is detected in the rating group, service ID or PCC rule name for PCRF in the Gx interface or OCS in the Gy interface. Alarm Model 53 is also raised if Record Generator, or usage server, receives an unknown Service ID in the Rf interface. interface. • • •

Gx—PC Gx—PCRF RF tries to inst install all or remov remove e a Policy and Char Charging ging Con Control trol (PCC) ru rule, le, or a base rule, that that was not pre-provisio pre-provisioned ned on the PTS. GxGy— GxGy—PCRF PCRF trie tries s to install or rem remove ove a PCC rule cont containin aining g a service (Rating (Rating Group and and/or /or Service Service ID) that was not pre-provisioned on the PTS. Gy—OC Gy—OCS S sends ser services vices (Ra (Rating ting Gro Group up and/o and/orr Servic Service e ID) that are n not ot pre-pr pre-provisi ovisioned oned on the P PTS. TS.

Profile

Description

Severities

• •

Major   Clear  

Raise Notification Clear Notification

svDiameterUnknownServiceNotification svDiameterNoUnknownServiceNotification

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

175

 

PTS Alarms  Alarm Model 53: Unknown Diameter Service

Profile

Description

Triggers

• • •

GxUnkn GxUnknownRu ownRuleName leNameTri Trigger  gger  GyUnkn GyUnknownSer ownServiceT viceTrigge rigger  r  GxGyUn GxGyUnknown knownService ServiceTri Trigger  gger 

Unique Instance Identifier 

• • •

Gx Gx:: 17 1763 639 9 Gy Gy:: 17 1790 904 4 Gx GxGy Gy:: 1763 17637 7

 Applicability

• •

Usage Manage Management ment 3.00 3.00 a and nd 4.2 4.20+: 0+: G Gx, x, Gy Gy,, GxGy, GxGy, GyEv GyEventCha entCharging rging Usa Usage ge Manag Manageme ement nt 4.40 4.40 a and nd o onwa nward: rd: Rf 

 Affected Platforms: Platf orms:

PTS: GX, GY, and GxGy

3.47.1 Unknown Diameter Service - Major  MIB Reference

Description

MIB

SANDVINE-DIAMETER-IF-MIB

Trap Name

svDiameterUnknownServiceNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.9.1.7

Varbind Varb ind Name

Varbind OID

svMeasurementsIndex

1.3.6.1.4.1.11610.435.12757.1.10.1.100

svMeasurementsValue

1.3.6.1.4.1.11610.435.12757.1.10.1.3

Profile

Description

Frequency

600 seconds

Severity

Major 

Condition

 At least one invalid service is raised within 10 minutes.

Description

This notification is received when there is any invalid service within a 10 minute period.

GxUnknownRuleNameTrigger: Major  Profile

Description

Frequency

600 seconds

Severity

Major 

Condition

DELTA(svMeasurementsValue.17639) DELT A(svMeasurementsValue.17639) > 0

Description

 An Unknown PCC Rule Name was rec received. eived.

GyUnknownServiceTrigger: Major  Profile

Description

Frequency

600 seconds

Severity

Major 

176

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 53: Unknown Diameter Service

Profile

Description

Condition

DELTA(svMeasurementsValue.17904) DELT A(svMeasurementsValue.17904) > 0

Description

 An Unknown service (Rating-Group and/or Service-Id) was received.

GxGyUnknownServiceTrigger: Major  Profile

Description

Frequency

600 seconds

Severity

Major 

Condition

DELTA(svMeasurementsValue.17637) DELT A(svMeasurementsValue.17637) > 0

Description

 An Unknown service (Rating-Group and/or Service-Id) was received.

3.47.2 Unknown Diameter Service - Clear  MIB Reference

Description

MIB

SANDVINE-DIAMETER-IF-MIB

Trap Name

svDiameterNoUnknownServiceNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.9.1.8

Varbind Varb ind Name

Varbind OID

svMeasurementsIndex

1.3.6.1.4.1.11610.435.12757.1.10.1.100

svMeasurementsValue

1.3.6.1.4.1.11610.435.12757.1.10.1.3

Profile

Description

Frequency

600 seconds

Severity

Clear 

Condition

There is no invalid service within 10 minutes.

Description

 A clear c lear notification is sent when there is no invalid service within 10minutes.

GxUnknownRuleNameTrigger: Clear  Profile

Description

Frequency

600 seconds

Severity

Clear 

Condition

DELTA(svMeasurementsValue.17639) DELT A(svMeasurementsValue.17639) = 0

Description

No unknown PCC Rule Name was received over the sampling period.

GyUnknownServiceTrigger: Clear  Profile

Description

Frequency

600 seconds

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

177

 

PTS Alarms  Alarm Model 53: Unknown Diameter Service

Profile

Description

Severity

Clear 

Condition

DELTA(svMeasurementsValue.17904) DELT A(svMeasurementsValue.17904) = 0

Description

No unknow unknown n service service (Ra (Ratin ting-G g-Grou roup p and and/or /or Servic Service-Id e-Id)) was rec receiv eived ed over over the sampli sampling ng per period iod..

GxGyUnknownServiceTrigger: Clear  Profile

Description

Frequency

600 seconds

Severity

Clear 

Condition

DELTA(svMeasurementsValue.17637) DELT A(svMeasurementsValue.17637) = 0

Description

No unknow unknown n servic service e (Ra (Ratin ting-G g-Grou roup p and and/or /or Ser Servic vice-I e-Id) d) was rec receiv eived ed over over the sampli sampling ng per period iod..

3.47.3 Impact and Suggested Resolutions, Alarm Model 53 If the measurement index is for Gx, Gy, GxGy or GyEventCharging, verify that the OCS/PCRF and PTS service configurations match. If Measurement index is for RfSde or RfPts, verify that the PTS and SDE servic service e configurations configurations match. These files define alarms, triggers, and events for GXGy Unknown Service Received: /usr/local/sandvine/etc/alarms/diameter_interface.alarm.conf /usr/local/sandvine/etc/events/events.diameter_interface.conf /usr/local/share/snmp/mibs/SANDVINE-DIAMETER-IF-MIB.txt

Impact  Alarm Model 53 indicates tthat hat a configuration error exists between PTS and OCS/PCRF. If the alarms source is not addressed, the application will not function properly. For example: Configuration errors in...

Can result in...

Gx

Incorrect Traffic flow enforcement.

Gy

Over/Under charging of customers.

Resolving Alarm Model 53

Modify the service configuration in both the PTS and SDE, to make them consistent for both platforms. See the  Quota Manager  Product Guide  for details of how to adjust the servic service e configurati configuration on on the SDE. Verifing Verifin g Resolution

This alarm clears 600 seconds after the configuration error is resolved for each Rf instance. Run CLI commands in both the PTS and SDE to verify config configured ured service services s are matched. matched.

178

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

 

PTS Alarms  Alarm Model 59: Unavailable BGP Master 

3.48 Alarm Model 59: Unavailable BGP Master  This alarm is raised when BGP is enabled, but there is no active BGP master in a cluster. This can occur if no PTS elements have peer routers configured, or if all master-eligible BGP daemons in a cluster fail. Profile Description Severities

Major  Clear 

Raise Notification

svBgpdMasterFailErrorNotification

Clear Notification

svBgpdMasterFailClearNotification

Triggers

• •

bgp bgpdNo dNoMas Master terErro Error  r  bgp bgpdMa dMaste sterAct rActive ive

3.48.1 Unavailable BGP Master - Major  This notification notification is sent if BGP is enabled, but there is no active BGP master in the cluster cluster.. Note: This alarm can flap when there is a single PTS with a local-id configured and one or more peer routers. When the element cannot establish a connection with the peer router, this alarm might appear repeatedly. To prevent this, either remove or  reconfigure the peer router. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svBgpdMasterFailErrorNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.11.0.1

Varbind Varb ind Name

Varbind OID

svServiceComponentAdminStatus

1.3.6.1.4.1.11610.435.11281.1.11.1.5.1 1.3.6.1.4.1.11610.435.1 1281.1.11.1.5.11326.7315 1326.7315

svSvbgpdStatsBgpdMasterStatus

1.3.6.1.4.1.11610.435.7272.1.3.10

Profile

Description

Frequency

Immediate

Severity

Major 

Condition

svSvbgpdStatsBgpdMasterStatus >= 2

Where: •

svServi svServiceComp ceComponentAd onentAdminStat minStatus—Ind us—Indicates icates the the status of the compo component. nent. V Valid alid values values are: • •



up—La up—Launches unches the c compone omponent nt across across the e entire ntire system. down— down—Stops Stops the c compone omponent nt ac across ross the entire system.

svSvbg svSvbgpdStats pdStatsBgpdMa BgpdMasterSta sterStatus—Ind tus—Indicates icates whe whether ther an active maste masterr BGP daemon is present present.. Valid Valid values are: •

active( active(1)—In 1)—Indicate dicates s that a maste masterr BGP daemo daemon n proces process s is operat operating ing norm normally ally..

• •

down(2 down(2)—Ind )—Indicates icates an e error rror condition condition w where here no BG BGP P daemon pr process ocess is ac acting ting as the master master.. initia initial(3)— l(3)—Indica Indicates tes a system start startup up condit condition ion where no ma master ster BGP daem daemon on process process has yet been set as the master. master.

Sandvine Policy Traffic Switch Alarms Reference Guide, Release 7.40 05-00262 C07

179

 

PTS Alarms  Alarm Model M odel 59: Unavailable BGP Master 

3.48.2 Unavailable BGP Master - Clear  This This noti notifi fica cati tion on is se sent nt if th ther ere e was was a BG BGP P ma mast ster er fail failur ure, e, and and a BG BGP P ma mast ster er beco become mes s ac acti tive ve,, or wh when en SV SVBG BGPD PD is ad admi mini nist stra rati tive vely ly disabled. MIB Reference

Description

MIB

SANDVINE-MIB

Trap Name

svBgpdMasterFailClearNotification

Trap OID

1.3.6.1.4.1.11610.6799.3.11.0.2

Varbind Varb ind Name

Varbind OID

svServiceComponentAdminStatus

1.3.6.1.4.1.11610.435.11281.1.11.1.5.1 1.3.6.1.4.1.11610.435.1 1281.1.11.1.5.11326.7315 1326.7315

svSvbgpdStatsBgpdMasterStatus

1.3.6.1.4.1.11610.435.7272.1.3.10

Profile

Description

Frequency

Immediate

Severity

Clear 

Condition

svSvbgpdStatsBgpdMasterStatus < 2

3.48.3 Impact and Suggested Resolution, Alarm Model 59 If BGP is enabled, the element expects up-to-date subnet information from configured peer routers that are connected to the el elem emen entt ru runn nnin ing g th the e BG BGP P mas maste terr. If an act activ ive e BG BGP P ma maste sterr is not not pr pres esen ent, t, then then the the elem elemen entt cann cannot ot re rece ceiv ive e the the subn subnet et info inform rmati ation on.. This can result in incorrect packet processing. Note:  A BGP daemon withdraws its master eligibility if it loses, or cannot establish, connection with the peer rrouter. outer.

Complete these steps to resolve the alarm: 1.   Check the   /var/log/svlog  file for indications regarding why the BGP master failed. 2.   Run the   show show ser servic vice e bgp pee peer r  CLI command and check these output fields to ensure that BGP is enabled:

•   IPAddress—Confirm that at least one element has one or more peer routers configured. •   ConnectionState—Ensure that the  ConnectionState  output for the config configured ured peer is  ESTABLISHED. 3.   Check the connection between the peer router and the element. If the BGP service is unable to establish a connection with the peer router within 180 seconds, Alarm model 60 is raised. raised. See  Alarm Model 60: Disconnected BGP Peer  on page 181 for instructions to resolve the alarm.

required, red, run the  resta 4.   If requi restart rt servi service ce svbgp svbgpd d  CLI command to restart the SVBGPD service.

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF