KPI Workbench and Custom SQL Queries Training

October 24, 2018 | Author: Arif Budiman | Category: Sql, Parameter (Computer Programming), Databases, Table (Database), Information Retrieval

Share Embed Donate

Report this link

Short Description

kpi...

Description

Nemo Analyze Training Custom KPI Workbench & Custom SQL Queries

Custom queries and KPIs SQL queries and KPI Workbench •

•3rd party query tool

•Nemo Analyze

•Nemo Analyze

•(visualisation in map, graph, etc)

•(visualisation in map, graph, etc) •

•KPI workbench

•ODBC interface •(SQL queries)

•Nemo Analyze database

•

2 ways of creating custom KPI: –

SQL query

–

KPI workbench

SQL queries can be used as custom KPIs as such in Nemo Analyze, or from any 3rd party tool that has ODBC interface KPI workbench can be used to further process the output of SQL queries

Custom queries and KPIs SQL queries and KPI Workbench •

•3rd party query tool

•Nemo Analyze

•Nemo Analyze

•(visualisation in map, graph, etc)

•(visualisation in map, graph, etc) •

•KPI workbench

•ODBC interface •(SQL queries)

•Nemo Analyze database

•

2 ways of creating custom KPI: –

SQL query

–

KPI workbench

SQL queries can be used as custom KPIs as such in Nemo Analyze, or from any 3rd party tool that has ODBC interface KPI workbench can be used to further process the output of SQL queries

SQL Queries vs. KPI Workbench •

SQL pros Efficient for power users Standard language Scalability of the queries is good Excellent tool for grouping and filtering of data SQL cons Requires SQL and database schema knowledge Temporal relations between tables are difficult or impossible to build Difficult or impossible to track sequence of events, and do other more advanced logic KPI workbench pros Graphical UI, no need for SQL knowledge Possibility to create temporal correlations between tables Possibility to create state machines, and other advanced sequences KPI workbench workbench cons co ns Currently data can be correlated only per file, e.g. MMS end to end delivery time cannot be calculated from two separate files Scalabilty is not as good as with SQL, may run into Japa heap space error with large datasets Use SQL if you need to add complex filter or formatting for one of the exsiting queries in the parameter tree Use KPI workbench if you cannot do what you want with SQL – – –

–

•

– –

–

•

–

– –

•

–

–

•

•

KPI Workbench

General

•

•

•

•

•

KPI workbench is a graphical scripting environment for creating user-defined custom parameters and key performance indicators KPI workbench is an additional data processing layer in top of the database SQL interface Predefined SQL queries (parameter tree), or custom SQL queries are used as an input for KPI workbench Results of a custom KPI can be visualised in maps, charts and grids in Nemo Analyze user interface Why to use KPI workbench instead of custom SQL queries? –

–

Easier to use: no need for SQL knowledge, or Nemo Analyze database schema Many advanced queries are not possible to do with SQL: tracking certaing signaling message sequences, call drop root cause analysis, etc.

Input and Output of the Custom KPI •

•

•

Custom KPI can have one or multiple input parameters Input parameters can be selected from the parameter tree: –

Predefined parameters

–

Custom SQL queries

–

Existing custom KPIs

Input parameter is a dataset in tabular format that has column containing parameter values, and typically other related columns –

•

E.g. ”Ec/No best active set” contains columns for Ec/No, scrambling code, channel number, time, lat, long

Output is always one dataset

Correlations – Merging Input Datasets Together •

•

•

•

Input can have multiple datasets, output has always only one, therefore the input datasets need to be merged, i.e., correlated to one table Mathematical and logical operation nodes take only one input dataset, therefore if input data is needed from multiple datasets (parameters), they need to be combined to one dataset with Correlations Time based correlations: –

Previous value

–

Current value

–

Next value

–

All values within time range

SQL type correlations –

Inner Join

–

Left outer join

–

Union

–

Cartesian product

Correlation Using Time Intervals

•

The underlying data recording principle of all Nemo measurement tools is that the data is

•

written to the log file only when values change, i.e. values are not written periodically.

•

More precisely, if any of the information elements of a given event change, the event is

•

written to the log file.

Correlations Previous, Current, and Next Value •

•

•

•

Each row of the lefmost input dataset is taken directly to the output From other input datasets, previous/current/next row from the timestamp of each leftmost dataset row is taken into the ouput Useful when taking ”snapshots” of the network E.g. value of each radio parameter in the event of dropped call

Correlations All Values Within Time Range •

Useful when combining non-event type data (RSCP, RxLevel, Tx power, etc) to one table –

–

Scatter plots (Ec/N0 vs RSCP, Throughput vs. RxLevel, etc.) Custom Events with multiple conditions: filter rows where Ec/N0 < x AND RSCP > y AND etc..)

Correlations All Values Within Time Range •

•

•

•

•

•

= Timestamp of a row in table

•

= Validitity time interval of a row

•

Correlates two or more input datasets based on time

Example correlation

•

Validity time intervals of the samples are taken into account in the correlation so that no data is lost

•

A1

•

Table B

•

Table A (input)

B1

•

•

Output row is written for each timestamp where any of the input datasets has a row

(input)

•

•

Table A+B •

(output)

A1+B1

•

B2

A1+B2 A2+B2 A2+B3

•

•

A2

•

•

B3

•

B4

•

•

Output is also written if the validity time interval of a non-first input dataset ends

A2+B4

•

Leftmost input dataset works as a filter: output rows are only written if the leftmost input dataset has a valid row for the given timestamp

A2 A3 A3+B5

•

A3

•

•

B5

•

A4

A4+B5

•

Output contains all columns from all input tables

•

A5

•

A6

•

A6+B6

•

•

B7

•

•

A5+B5 A5+B6

•

B6

•

•There is no new line in the input datasets, output is written because the validity of B4 ends. The table B columns in the output row are empty.

•

time

•Output is not written from B7 because Table A (leftmost input) does not have valid row at the time of B7

Correlations Union •

•

Example correlation

•

Each row of each input dataset is put to output Union does not merge rows together from input tables

Table A+B

•

A1

•

Table A

Table B

•

A1

•

A2

•

B1

•

•

A3

•

•

•

Rows are not in time order Union + order by time is used as state machine input

B2

A2

•

•

A4

•

A3

•

B3

A5

•

A4

•

•

B4

•

•

A5

Union

A6

•

B5

•

•

B1

•

A6

B6

•

•

B2

•

B7

•

B3

•

B4

•

= A row in the table

•

B5

•

B6

•

B7

•

Aggregate Functions •

Average, Minimum, Maximum, Sum, and Count

•

Input: dataset of N rows

•

Output: One row with the aggregate of selected input column

•

Definitions needed for the node: –

–

–

Column: selected the column from the input dataset to which the aggregate will be calculated Weight by: Each sample can be weighted (multiplied) by other column of input data. Typical usage is to weight average by time or by distance Group by: Data can be grouped, e.g. Avg RSCP per scrambling code

State Machine

•Idle •

Call attempt

Call failure

•

•

Powerful tool for various applications: Calculating delay of any signaling procedure, e. g. call setup delay, handover delay Tracking certain signaling message sequence and create event the occurence Track call, or packet data transfer state, and calculate statistics binned by call/data transfer e.g. Avg tput per data transfer Input: one dataset with the needed input data in time order (UNION + Order by time) Output: Row per each transition arrow in the state machine , if output is set start_time: timestamp of the transition end_time: timestamp of the next transition time_interval: time difference in milliseconds between start_time and end_time, that is, the time spent in the target state of the transition Time_inter, with columns for the time spent in the old state, running index number, and user-definable text –

–

Call disconnected •

Call dropped •

–

•

•

–

–

–

–

•Call connected

•Connecting call

Alerting

•

Configuring State Machine •

•

•

•

Required States are added from the properties of the state machine Transitions are defined in the properties of source state For each transition, the conditions must be defined based on the values of input data row If text in the State transition Output field is defined, and output will be written from every occurrence of the transition

GROUP BY Node •

•

GROUP BY node can be used group table of data based on any column or columns of the table, and to calculated aggregates (min, max, avg, etc) per group for other columns E.g. Statistics per cell table, with average RSCP, average Ec/N0, nr. of dropped calls, nr. of failed calls

Creating Per Session Aggregation •

•

•

•

It is often necessary to aggregate data per session (call, attach, PDP context, data transfer, etc For this purpose, session parameters are available for every session type in the parameter tree The session parameters are returning one row per session, with session specific information such as status (fail/success), cause code, call setup time, data connection setup time, data transfer protocol, etc. The session parameters have timestamp at the beginning of the session, with time range/time interval ranging to the end of the session –

–

This makes it possible to correlate any metrics with the session by using the session as master input in All Values Within Time Range (see next slide) By adding Group By after that, the metrics can be aggregated per session

All Values Within Time Range with Session as master input

= Timestamp of a row in table

•

= Validitity time interval of a row

•

Example correlation BLER

Call session (Master input)

B1

Call 1

(input)

Output Call1+B1

B2

Call1+B2

B3

Call1+B3

B4

Call1+B4 Call1

Call 2

B5

B6

•There is no new line in the input datasets, output is written because the validity of B4 ends. The table B columns in the output row are empty. B7

time •Output is not written from BLER 7 because Call session does not have valid row at the time of BLER 7 (that is, the call was ended before)

Call2+B5

Call2+B6

KPI Workbench improvements (Professional Edition) •

Possibility to execute a KPI simultaneously on all measurement files Needed when correlating data across files, e. g. Comparing scanner and mobile data Time-triggered state transition in state machine Enables event creation when given message was not received within defined time period State transition triggered by changed value in state machine Enables event creation when e.g. Serving cell changes New “Session” parameters in the Parameter Tree Aggregation of data per call, data transfer, PDP context, Attach, without having to create state machine to track the session –

•

–

•

–

•

–

Time shift node

•

•

•

Time shift node can be used to modify the timestamp and the time range of any input data. One of the most relevant use cases of this node is when one wants to automatically capture details before and/or after a particular event for custom root cause analysis purposes Picture below illustrates capturing of RF details 4s before and after each dropped call using time shift node

Resample node •

•

•

•

•

Resample node can be used to resample any input data containing the time column Sampling period is user-definable in milliseconds Nemo logging tools write data to the log file when the value changes, not periodically Data is handled properly in Nemo Analyze, but if the data is to be exported to a 3rd party tool, it is usually better to export the data with a constant sample period Together with Time shift and Group by nodes, Resample can be also used to bin data over a longer period to reduce the data amount.

Running KPIs per File/Measurement/All •

KPI can be run in three modes: –

–

–

•

–

–

•

Per measurement. Input queries are run over all files of a measur ement at time All. Input queries are run over the whole data set as one shot.

Per file should be use always when possible –

•

Per file. Input queries are run per file

Input data is processed one logfile at time, even when running the KPI over multiple KPIs Lower memory consumption Can be used always when data is not correlated across multiple logfile

Execute per measurement is when comparing data across two or more logfiles of the same measurement Excute per all when comparing data across arbitrary sets of data, e.g. two measurements collected with different drive test kits at the same time or when comparing imported OSS data to drive test data –

Inputs must be sorted by time

Execute per file, Example •

In this example, Ec/No, RSCP, and TX power per logfile are correlated to same table

Execute per Measurement, Example •

•

For example, comparing scanner RSRP and mobile RSRP of the same drive Note that all the input queries are run over all the files of a measurement, when comparing e.g. RSCPs of two terminals, the RSCP queries must be filtered by device number

SQL Queries

To get started:

•

•

•

•

Read this document Get SQL editor. Queries can be written with Nemo Analyze database browser, but better SQL freeware editors are available, such as http://gpoulose.home.att.net/Tools/QTODBC61.msi Login/Passwd for the database: administrator/password Get following reference documents from the Nemo User Club”User manuals and user documents downloads” –

Nemo Analyze database schema (Describes the table structure of the database)

–

Nemo File format specification (Describes the events in Nemo measurement fileformat)

–

Open access SQL reference (Describes the supported SQL syntax)

–

Nemo Analyze user manual (Describes the special, Nemo-specific SQL scalar functions and stored procedures supported by the Nemo Analyze SQL driver)

Nemo Analyze Database Solution •

Database solution from ObjectStore

•

OBDC interface from OpenAccess + Nemo added scalar functions and stored procedures

•

The solution is optimized for fast and convenient use as a standalone tool: No high system requirements – standalone Analyze runs in a standard PC

–

Queries over single measurement file are always fast regardless of the amount of files in the database

–

Maintenance free, no DB administrator needed

–

Tables •

•

•

•

•

•

Data in relational DB is stored in relations, which are perceived by the user as tables Table represents an object, or an event (e.g. employee, sales order, etc.) Each row of the table represents an unique instant of the object or event Each column represents different peace of information regarding the object or event (e.g. first name, last name, salary, customer name, order charge, etc. When referred to a table in a query, the syntax is {schema name}.{table name} E.g. "Nemo.UMTS"."BLER” Scema: Nemo.UMTS

–

Table: BLER

–

Relations Between Tables •

•

•

•

If rows in given table can be associated with rows in another table, tables are said to have a relationship between them One-to-one, one-to-many, and many-to-many relations are possible (first two exist in Nemo Analyze DB) E.g. one-to-one: in a given table, each row has a relation with a single row of another table. Every table has a primary key (red arrow) Primary key is a column that uniquely identifies the row within a table. Using scalar OID_SHORT(”oid”) will convert ”oid” columns into a 9 digit integer.

–

•

Foreign key (yellow arrow) is the ”oid” of the related row in another table - E.g. BLER and Event: ”the_event” of each row in the BLER table equals ”oid” of one row in the Event table

Overview of the Database Schema •

Database schema is based on Nemo logfile format

•

Nemo fileformat is event based

•

New event is recorded when recorded value changes, not periodically

•

•

•

Data is grouped to different events based on relations between data, e.g. SHO event records all data that is related to soft handover Every event of the Nem o fileformat has a corresponding table in the database See Nemo file format specification and database schema for more infor mation about different events and mapping of events to tables

•

oid (Object ID) column is present in every table of the database

•

oid is the unique identifier of a row in the table Measurement table has one row per measurement. Measurement in Nemo language means a logfile, or set of logfiles •ECNO collected as multi-device measurement. •the_event

•

•

Device table has one row per logfile

•

Event table contains time and G PS information of all data

•...

•SHO •the_event •... •CAA •the_event •... •All event tables

•Measurement •oid

•Event

•title

•oid

•...

•Device

•time

•oid

•gps_latitude

•the_measurement

•gps_longitude

•device_extension

•the_device

•...

•the_event •...

Nemo Analyze Database Schema Mapping of Fixed Size Events to DB Tables •

•

If the event has always fixed number of parameters, it is mapped straight to corresponding table For example, mapping of SHO event to the DB: For each SHO event, there is one row in the SHO table

–

For each measurement there is one row in Measurement table

–

SHO and Event tables have one-to-one relation

–

Device and event have one-to-many relation

–

Measurement and Device have one-to-many relation

–

Nemo Analyze Database Schema Mapping of Variable Size Events to DB Tables •

•

•

If the event has variable amount of parameters, it is mapped to multiple tables in the DB Mapping of each event is described in database schema documentation Example: ECNO event. Number of parameters in ECNO event depends on the amount of cells recorded For each ECNO event, there is one row in ECNO table

–

For each carrier measured in one ECNO event, there is a row in Channel table

–

For each cell (active, monitored, etc.) measured in the ECNO event, there is a row in Cell table

–

Event, Device, and Measurement relations are as with fixed size events

–

+ Views •

•

•

There is a + view for every table that has relation with the Event table Views are displayed as tables, where the table name has + in the end In the view, the corresponding table has been joined with the Event, Device, and Measurement tables

BLER

•

Time, GPS data, and the measurement file name are associated with the actual parameter

–

•

•

For example: BLER+ table contains also the time and coordinates, joined from the Event table + views are faster to query, and easier to use, compared to self-made JOIN + views should be always used in queries it time, or lat&long is needed!

BLER+

•

Right way to do it

•

Timestamp in the Database •

Time is recorded to the database in Timestamp format (column: ”sql_time”) Native binary format (column: ”time”) Data is recorded event based, not periodically in Nemo fileformat In order to get correct statistics, and to plot correct graphs, two things are needed: Timestamp. Point of time when the event occurred Interval. Elapsed time of how long the recorded event has been/will be valid Both the timestamp, and the interval are embedded to the binary timestamp: Timestamps can be fetched with scalar function T_ (native binary timestamp) Interval in milliseconds can be fetched with scalar function TI_INTERVAL(native binary timestamp) When custom queries are made to Analyze UI binary time should be used Analyze UI fetches both the timestamp and the t ime interval from the native binary timestamp automatically – –

•

•

– –

•

–

–

•

–

Filtering Results of Certain Measurement File(s) Fastest to query and easy way to do it is to use HINT

•

HINT is added at the end of the query, in the following format:

•

/* MEAS({measurement_file_name_1}:{file_extension}| {measurement_file_name_2}:{file_extension}|....) */

•

•

•

•

When using HINT, queries over certain file(s) are always fast, regardless of the amount of other files in the DB JOIN can be done with regular SQL syntax also This is needed only when data from Device, or Measurement table is needed to return

–

•

•

Even in that case it is recommended to use HINT to perform the actual filtering, because it makes the query faster When custom queries are made to Analyze UI, measurement filtering MUST NOT BE DONE, unless query is wanted to be statically limited to certain measurement file(s) HINT is added automatically to the query in runtime

–

Right way to do it

•

Value Enumeration •

•

•

•

•

Lots of data is recorded in number format, where the meaning of each value is enumerated in the fileformat Numbers are written to the DB In the DB there is ValueEnum table, which has enumeration of all the number parameters Real meaning of a number value can be fetched with following scalar function: VAL_TO_STRING({parameter_name_in_ValueEnum}, {value}) When custom queries are made to Analyze UI, VAL_TO_STRING is not needed! –

Number values are displayed automatically in ’decoded’ format

Connections 1/2 •

•

•

•

•

Certain tables, like DAS have relation to Connection table Every DAS event belongs to some connection DAS can belong to: Attach, PpdContextActivation, Data, and DataTransfer connection(s) Attach is parent connection of PDPContextActivation, PdpContextActivation is parent connection of Data, etc. Two scalars are available related to connections: CONN_IS_SHARED(conn1.”oid”, conn2.”oid”). Checks if the connections are the same, or if the one is parent for another CONN_IS_TYPE(conn.”oid”, numeric_exp). Checks if connection is of given type –

–

Connections 2/2 •Connection types: •0="Unknown" •1="Voice" •2="Handover"

Examples: 1. Get only those throughput samples that were recorded during data transfer:

•3="Attach" •4="PDP contect activation" •5="Data" •6="RRC" •7="Data transfer" •8="MMS" •9="SMS" •10="POC"

2.

•

Get all throughput samples recorded when PDP context has been activated and Access point has been ’Internet’

If the table does not have relation to connection (column the_connection does not exist), These things have to be done with time correlation (described later)

•11="LAU" •12="RAU" •13="Ping"

Correlating Tables Without Relation 1/3 •

Most tables do not have any relation between each other, but you may still want to join data between such tables

•

Two type of JOINs can be made based on time:

1.

Sample based correlation

2.

-

Each row of table x and a row from table y WHERE the timestamp of x row is within time range of the y row

-

E.g. Tx power vs. RSCP scatter

Time range condition - Each row of table x WHERE the timestamp is in certain range, defined by some other tables. E.g. Tx power samples between Attach request and Attach Accept L3 messages.

Correlating Tables Without Relation 2/3 Sample Based Correlation 1. 2.

List tables to be joined in FROM clause: FROM table x, table y make x.”time” = y.”time” condition to WHERE clause When x table is the first one in FROM clause: each y sample is taken, and checked, if there is a x sample that has the same timestamp as y sample OR if the x timestamp falls to the validity time interval of the y sample When y table is the first one in FROM clause, the comparison is vice versa Comparison is not bi-directional! It there is a big difference in the sample periods of x and y, the one that has smaller sample  period, should be selected as y in order to get highest possible resolution Make x.the_device = y.the_device condition to WHERE clause (this limits the time correlation within each measurement file, improves the performance) Add /* OPTIONS(USE_TIME_SCOPE) */ hint at the end of the sql query. This enables the time scope join. –

–

3. 4.

For example: Get Tx power vs. RSCP, when RSCP < -95 dBm Tx power is in ”Nemo.UMTS”.”TXPC” table, RSCP is in ”Nemo.UMTS.ECNO”.”Cell” table, no relation between the tables All Tx power samples are taken and checked if they fall to the validity time interval of an RSCP sample that is < -95 •

•

SELECT a.sql_time AS "Time", "tx_power_umts", "rscp" FROM "Nemo.UMTS.ECNO"."Cell+" b, "Nemo.UMTS"."TXPC+" a WHERE b.order = 1 AND b.cell_type = 0 AND a.time = b.time AND a.the_device = b.the_device AND rscp < -95 /* OPTIONS(USE_TIME_SCOPE) */ •

•

•

•

Correlating Tables Without Relation 3/3 Time Range Correlation •

•

•

”sql_time” columns is the timestamp in datetime format ”sql_time” column has to be used always in the conditions where it is needed to test whether a timestamp is in the range of set by two other timestamps Time indexing does not work if the T_ (” time”) is used, or sql_time is used inside scalar function, this will degrade the performance of this kind of queries

Guidelines for Custom Queries in Analyze UI •

•

Binary timestamp (”time”) can (and should) be returned in the SELECT row, UI converts it automatically to readable timestamp Binary timestamp MUST be used if query is wanted to see correctly in line and bar graph This is because the validity time interval of the samples is needed also to plot correct graphs

–

•

•

•

Measurement file filter HINT must not be used in the query, unless it is wanted to be statically limited to certain measurement file(s) – UI filters the query results in runtime VALUE_ENUM scalar is not needed, UI uses it automatically Results have to be ordered by time (ORDER BY ”sql_time”) in order to see results correct in graphs and map

Adding Custom Queries to Analyze UI Manual Query •

Own queries can be added from query manager 1.Query manager -- > add manual query 2.Give name to the query 3.Select ”Edit SQL manually” checkbox 4.Type the query in 5. Set aliases for different graphs, i.e. what column will be in x-axis, what will be in y-axis, etc. 6. When finished, the query is available in the “User” branch in the parameter tree in Analyze workspace.

Adding Custom Queries to Analyze UI Q uery Wizard •

•

•

Simple queries can be done with query wizard Joins are made automatically between tables that have relationships Time-based correlation is not done –

If two selected tables don’t have relation, result set will be Cartesian product of the tables!

Step-by-step procedure: •

Select tables from where data is get

•

Select columns to be displayed from the table(s)

•

•

•

Select column for ordering the results, if time based ordering is not sufficient Set filtering conditions for results Define aliases between controls and result columns (which columns will be x and which will be y in graph etc.)

KPI Workbench and Custom SQL Queries Training

Short Description

Description

Comments

We need your help!