Netezza Fundamentals.pdf
Short Description
Download Netezza Fundamentals.pdf...
Description
Netezza Fundamentals Introduction to Netezza Netezza for Application Application Developers Biju Nair 04/29/2014
Version: Draft 1.5
Document provided for information purpose only.
Preface As with any subject, one may ask why do we need to write a new document on the subject subject when there is so much information available online. This question is more pronounced especially in a case like this where the product vendor publishes detailed documentation. documentation. I agree and for that matter this document in no way replaces the documentation and knowledge already available on Netezza appliance. The primary objective of this document is
To be a starting guide for anyone who is looking looking to understand the appliance appliance so that they can be productive in a short duration To be a transition guide for professionals who are familiar with other database management systems and would like to or need to start using the appliance Be a quick reference on the fundamentals for professionals who have some experience with the appliance
With the simple objective objective in focus, the book covers covers the Netezza appliance broadly broadly so that the reader can be productive in using the appliance quickly. References to other documents have been provided for readers interested in gaining more thorough knowledge. Also joining the Netezza developer community on the web which is very active is highly recommended. If you find any errors or need to provide feedback, please notify to bnair at asquareb dot com with the subject “Netezza” and thank you in advance for your feedback.
© asquareb llc
1
Preface As with any subject, one may ask why do we need to write a new document on the subject subject when there is so much information available online. This question is more pronounced especially in a case like this where the product vendor publishes detailed documentation. documentation. I agree and for that matter this document in no way replaces the documentation and knowledge already available on Netezza appliance. The primary objective of this document is
To be a starting guide for anyone who is looking looking to understand the appliance appliance so that they can be productive in a short duration To be a transition guide for professionals who are familiar with other database management systems and would like to or need to start using the appliance Be a quick reference on the fundamentals for professionals who have some experience with the appliance
With the simple objective objective in focus, the book covers covers the Netezza appliance broadly broadly so that the reader can be productive in using the appliance quickly. References to other documents have been provided for readers interested in gaining more thorough knowledge. Also joining the Netezza developer community on the web which is very active is highly recommended. If you find any errors or need to provide feedback, please notify to bnair at asquareb dot com with the subject “Netezza” and thank you in advance for your feedback.
© asquareb llc
1
Table of Contents 1.
Netezza Netez za Architecture Archite cture .................................. ................ ................................... ................................... ................................... .................................. ................................... ..................... ... 3
2.
Netezza Netez za Objects Object s ................................. ................ .................................. ................................... ................................... .................................. ................................... .............................. ............ 9
3.
Netezza Netez za Security Secur ity ................................. ................ .................................. ................................... ................................... .................................. ................................... ............................ .......... 16
4.
Netezza Netez za Storage Storag e ................................. ................ .................................. ................................... ................................... .................................. ................................... ............................ .......... 17
5.
Statistics Stati stics and Query Performance Perfor mance .................................. ................. .................................. ................................... ................................... ................................. ................ 22
6.
Netezza Netez za Query Plan Analysis .................................. ................. ................................... ................................... ................................... ................................... ........................ ....... 26
7.
Netezza Netez za Transactions Transac tions .................................. ................ ................................... ................................... ................................... .................................. ................................... .................... 31
8.
Loading Data, Database Database Back-up and Restores ............... ........................ .................. .................. .................. .................. ................... ................... ............ ... 33
9.
Netezza Netez za SQL ................................. ................ ................................... ................................... .................................. ................................... ................................... ................................. ................ 38
10.
Stored Stor ed Procedures..... Proce dures....................... ................................... ................................... ................................... ................................... ................................... .............................. ............. 41
11.
Workload Work load Management Managem ent .................................. ................. ................................... ................................... ................................... ................................... ........................... .......... 52
12.
Best Practices Pract ices ................................. ................ .................................. ................................... ................................... .................................. ................................... ............................ .......... 56
13.
Version Versi on 7 Key Features.......... Featu res............................ ................................... ................................... ................................... .................................. ................................... .................... 58
14.
Further Furthe r Reading ................................... ................. ................................... ................................... ................................... .................................. ................................... ...................... .... 59
© asquareb llc
2
1. Netezza Architecture Building a good foundation helps developing better and beautiful things. Similarly understanding the Netezza architecture which is the foundation helps develop applications which use the appliance efficiently. This section details the architecture at a level which will satisfy the objective to help use the appliance efficiently. Netezza uses a proprietary architecture called Asymmetric Massively Parallel Processing (AMPP) which combines the large data processing efficiency of Massively Parallel Processing (MPP) where nothing (CPU, memory, storage) is shared and symmetric multiprocessing to coordinate the parallel processing. The MPP is achieved through an array of S-Blades which are servers on its own running its own operating systems connected to disks. While there may be other products which follow similar architecture, one unique hardware component used by Netezza called the Database Accelerator card which is attached to the S-Blades. These accelerator cards can perform some of the query processing stages while data is being read from the disk instead of the processing being done in the CPU. Moving large amount of data from the disk to the CPU and performing all the stages of query processing in the CPU is one of the major bottlenecks in the many of the database management systems used for data warehousing and analytics use cases. The main hardware components of the Netezza appliance are a host which is a Linux server, which can communicate to an array of S-Blades each of which has 8 processor cores and 16 GB of RAM running Linux operating system. Each processor in the S-Blade is connected to disks in a disk array through a Database Accelerator card which uses FPGA technology. Host is also responsible for all the client interactions to the appliance like handling database queries, sessions etc. along with managing the metadata about the objects like database, tables etc. stored in the appliance. The S-Baldes between themselves and to the host can communicate through a custom built IP based high performance network. The following diagram provides a high level logical schematic which will help imagine the various components in the appliance.
© asquareb llc
3
The S-Blades are also referred as Snippet Processing Array or SPA in short and each CPU in the SBlades combined with the Database Accelerator card attached to the CPU is referred as a Snippet Processor. Let us use a simple concrete example to understand the architecture. Assume an example data warehouse for a large retail firm and one of the tables store the details about all of its 10 million customers. Also assume that there are 25 columns in the tables and the total length of each table row is 250 bytes. In Netezza the 10 million customer records will be stored fairly equally across all the disks available in the disk arrays connected to the snippet processors in the S-Blades in a compressed form. When a user query the appliance for say Customer Id, Name and State who joined the organization in a particular period sorted by state and name the following are the high level steps how the processing will happen
The host receives the query, parses and verifies the query, creates the code to be executed to by the snippet processors in the S-Blades and passes the code for the S-Blades The snippet processors execute the code and as part of the execution, the data block which stores the data required to satisfy the query in a compressed form from the disk attached to the snippet processor will be read into memory. The Database Accelerator card in the snippet processor will un-compress the data which will include all the columns in the table, then it will remove the unwanted columns from the data which in case will be 22 columns i.e. 220 bytes out of the 250 bytes, applies the where clause which will remove the unwanted rows from the data and passes the small amount of the data to the CPU in the snippet processor. In traditional databases all these steps are performed in the CPU. The CPU in the snippet processor performs tasks like aggregation, sum, sort etc on the data from the database accelerator card and parses the result to the host through the network. The host consolidates the results from all the S-Blades and performs additional steps like sorting or aggregation on the data before communicating back the final result to the client.
The key takeaways are
The Netezza has the ability to process large volume of data in parallel and the key is to make sure that the data is distributed appropriately to leverage the massive parallel processing. Implement designs in a way that most of the processing happens in the snippet processors; minimize communication between snippet processors and minimal data communication to the host.
© asquareb llc
4
Based on the simple example which helps understand the fundamental components of the appliance and how they work together, we will build on this knowledge on how complex query scenarios are handled in the relevant sections.
Terms and Terminology The following are some of the key terms and terminologies used in the context of Netezza appliance. Host: A Linux server which is used by the client to interact with the appliance either natively or through remote clients through OBDC, JDBC, OLE-DB etc. Hosts also store the catalog of all the databases stored in the appliance along with the meta-data of all the objects in the databases. It also passes and verifies the queries from the clients, generates executable snippets, communicates the snippets to the SBlades, coordinates and consolidates the snippet execution results and communicates back to the client. Snippet Processing Array: SPA is an array of S-Blades with 8 processor cores and 16 GB of memory running Linux operating system. Each S-Blade is paired with Database Accelerator Card which has 8 FPGA cores and connected to disk storage. Snippet Processor: The CPU and FPGA pair in a Snippet Processing Array called a snippet processor which can run a snippet which is the smallest code component generated by the host for query execution.
Netezza Objects The following are the major object groups in Netezza. We will see the details of these objects in the following chapters.
Users Groups Tables Views Materialized View Synonyms Database Procedures and User Defined Function
For anyone who is familiar with the other relational database management systems, it will be obvious that there are no indexes, bufferpools or tablespaces to deal with in Netezza.
Netezza Failover As an appliance Netezza includes necessary failover components to function seamlessly in the event of any hardware issues so that its availability is more than 99.99%. There are two hosts in a cluster in all the Netezza appliances so that if one fails the other one can takes over. Netezza uses Linux-HA (High
© asquareb llc
5
Availability) and Distributed Replicated Block Device for the host cluster management and mirroring of data between the hosts. As far as data storage is concerned, one third of every disk in the disk array stores primary copy of user data, a third stores mirror of the primary copy of data from another disk and another third of the disk is used for temporary storage. In the event of disk failure the mirror copy will be used and the SPU to which the disk in error was attached will be updated with the disk holding the mirror copy. In the event of error in a disk track, the track will be marked as invalid and valid data will be copied from the mirror copy on to a new track.
If there are any issues with one of the S-Blades, other S-Blades will be assigned the work load. All failures will be notified based on event monitors defined and enabled. Similar to the dual host for high availability, the appliance also has dual power systems and all the connection between the components like host to SPA and SPA to disk array also has a secondary. Any issues with the hardware components can be viewed through the NZAdmin GUI tool.
Netezza Tools There are many tools available to perform various functions against Netezza. We will look at the tools and utilities to connect to Netezza here and other tools will be detailed in the relevant sections. For Administrators one of the primary tools to connect to the Netezza is the NzAdmin. It is a GUI based tool which can be installed on a Windows desktop and connect to the Netezza appliance. The tool has a system view which it provides a visual snapshot of the state of the appliance including issues with any hardware components. The second view the tool provides is the database view which lists all the databases including the objects in them, users and groups currently defined, active sessions, query history and any backup history. The database view also provides options to perform database administration tasks like creation and management of database and database objects, users and groups. The following is the screen shot of the system view from the NzAdmin tool.
© asquareb llc
6
The following is the screen shot of the database view from the NzAdmin tool.
The second tool which is often used by anyone who has access to the appliance host is the “nzsql” command. It is the primary tool used by administrators to create, schedule and execute scripts to perform administration tasks against the appliance. The “nzsql” command invoke the SQL command interpreter through which all Netezza supported SQL statements can be executed. The command also has some inbuilt options which can be used to perform some quick look ups like list of list of databases, users, etc. Also the command has an option to open up an operating system shell through which the user
© asquareb llc
7
can perform OS tasks before exiting back into the “nzsql” session. As with all the Netezza commands, the “nzsql” command requires the database name, users name and password to co nnect to a database. For e.g. nzsql –d testdb –u testuser –p password
Will connect and create a “nzsql” session with the database “testdb” as the user “testuser” after which the user can execute SQL statements against the database. Also as with all the Netezza commands the “nzsql” has the “-h” help option which displays details about the usage of the command. Once the user is in the “nzsql” session the following are the some of the options which a user can invoke in addition to executing Netezza SQL statements. \c dbname user passwd Connect to a new database \d tablename Describe a table view etc \d{t|v|i|s|e|x} List tables\views\indexes\synonyms\temp tables\external tables \h command Help on particular command \i file Reads and executes queries from file \l List all databases \! Escape to a OS shell \q Quit nzsql command session \time Prints the time taken by queries and it can be switched off by \time again One of the third party vendor tools which need to be mentioned is the Aginity Workbench for Netezza from Aginity LLC. It is a GUI based tool which runs on Windows and uses the Netezza ODBC driver to connect to the databases in the appliance. It is a user friendly tool for development work and adhoc queries and also provides GUI options to perform database management tasks. It is highly recommended for a user who doesn’t have the access to the appliance host (which will be most of the users) but need to perform development work.
© asquareb llc
8
2. Netezza Objects Netezza appliance comes out of the box loaded with some objects which are referred to as system objects and users can create objects to develop applications which are referred as user objects. In this section we will look into the details about the basic Netezza objects which every user of the appliance need to be aware of.
System Objects: Users
The appliance comes preconfigured with the following 3 user ids which can’t be modified or deleted from the system. They are used to perform all the administration tasks and hence should be used by restricted number of users. User id root
nz admin
Description The super user for the host system on the appliance and has all the access as a super user in any Linux system. Netezza system administrator Linux account that is used to run host software on Linux The default Netezza SQL database administrator user which has access to perform all database related tasks against all the databases in the appliance.
Groups
By default Netezza comes with a database group called public. All database users created in the system are automatically added as members of this group and cannot be removed from this group. The admin database user owns the public group and it can’t be changed. Permissions can be set to the public group so that all the users added to the system get those permissions by default. Databases
Netezza comes with two databases System and a model database both owned by Admin user. The system database consists of objects like tables, views, synonyms, functions and procedures. The system database is primarily used to catalog all user database and user object details which will be used by the host when parsing, validating and creation of execution code for queries from the users.
User Objects: Database
Users with the required permission or an admin user can create databases using the create database sql statement. The following is a sample SQL to create a database called testdb and it can be executed in an nzsql session or other query execution tool. create database testdb;
© asquareb llc
9
Table
The database owner or user with create table privilege can create tables in a database. The following is a sample table creation statement which can be executed in an nzsql session or other query execution tool. create table employee ( emp_id integer not null, first_name varchar(25) not null, last_name varchar(25) not null, sex char(1), dept_id integer not null, created_dt timestamp not null, created_by char(8) not null, updated_dt timestamp not null, updated_by char(8) not null, constraint pk_employee primary key(emp_id) constraint fk_employee foreign key (dept_id) references department(dept_id) on update restrict on delete restrict ) distribute on random;
Anyone who is familiar with other DBMS systems, the statement will look familiar except for the “distribute on” clause details of which we will see in a later section. Also there are no storage rel ated details like tablespace on which the table needs to be created or any bufferpool details which are handled by the Netezza appliance. The following is the list of all the data types supported by the Netezza appliance which can be used in the column definitions of tables. Data Type byteint (int1) smallint (int2) Integer (int or int4) bigint (int8) numeric(p,s)
Description/Value
Storage -128 to 127 1 byte -32,768 to 32,767 2 bytes -35,791,394 to 35,791,394 4 bytes -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 8 bytes Precision p can range from 1 to 38 and scale from 0 to P p < 9 – 4 bytes 10< p 10000 THEN v_count := v_count + 1; ELSE v_count_min := v_count_min + 1; END IF;
The following is a sample IF-THEN-ELSE-IF control statement IF v_sal > 10000 THEN v_count := v_count + 1; ELSE IF v_sal > 5000
© asquareb llc
47
v_count_min := v_count_min + 1; END IF; END IF;
The following is a sample IF-THEN-ELSIF-ELSE control statement IF location.category = ‘H’ THEN notification := ‘Hazardous’; ELSIF location.category = ‘R’ THEN notification := ‘Restricted’; ELSE notification := ‘Standard’; END IF;
Note ELSIF can also be spelled as ELSEIF which produces the same result. Iterative Control
NZPLSQL supports LOOP-END LOOP, WHILE loop and FOR loop iterative control statements. Users can terminate out of loops using EXIT statement. The following is an example of LOOP-END LOOP statement LOOP Perform tasks IF v_count > 1000 THEN EXIT loop1; END IF; Perform tasks END LOOP;
In the sample above IF control statement is used to determine when to exit from the loop. The same can be accomplished using EXIT WHEN statement as in the following sample. Note that when a label is used in the EXIT statement, the label should belong to the current loop to which the EXIT statement belongs to or should be one of the outer loop in which the current loop is part of.
LOOP
© asquareb llc
48
Perform tasks LOOP Perform tasks IF v_count > 1000 THEN EXIT loop1;
-- The code exits both loop 1 and loop 2
END IF; Perform tasks END LOOP; -- This ends loop 2 END LOOP; -- This ends loop 1
The following is an example of WHILE loop statement WHILE state = ‘MA’ and year = ‘2000’ LOOP perform tasks END LOOP;
The following is an example of FOR loop statement FOR count in 1..100 LOOP –- reverse key word can be used to count backwards Perform tasks END LOOP;
Note that if required both the WHILE and FOR loop can be exited out using EXIT statement in combination with an IF or WHEN control statement before the loops are completed. Working with table rows
While discussing variable types, we saw the data type of %ROWTYPE which defines the variable to be the type of a specific table row. Instead being a type of a specific table, a variable can also be defined as a type of RECORD which can be used to store data from any table. For e.g. a variable r_EMP defined as EMP%ROWTYPE can store data from the EMP table when used in a SELECT statement like SELECT * INTO r_EMP WHERE ID = 100; But when used to select data from a DEPT table into the same variable will result in error. But if a variable r_REC is defined as RECORD, then the variable can be used to store data from both the EMP table and the DEPT table with no issues. Another way table row data can be retrieved is by defining multiple variables with the same data type definitions as the table columns and then use them as comma separated list in the select statement like the following example
© asquareb llc
49
SELECT * FROM EMP INTO v_ID, v_NAME, v_DEPT WHERE ID = 1000;
So far it is all good to retrieve one record from tables. And as mentioned earlier, the FOUND or ROW_COUNT variables can be used to verify whether any records were returned from the SELECT queries executed. But if the query returns a set of rows then a FOR loop is required to traverse through the result set and act on them. The following is a sample code to handle result sets in NZPLSQL FOR r_REC in SELECT * FROM EMP LOOP perform actions END LOOP; r_REC can
be defined as a RECORD or a ROWTYPE of EMP table type. The column values from the table row data can be retrieved using the “.” operator and the table column name like r_REC.EMP_ID. Error Handling and Messages EXCEPTION statement can be used to handle exceptions during the execution of a NZPLSQL. When an error occurs the SQLERRM stores the text of the error message. The following is an example BEGIN Perform tasks EXCEPTION WHEN OTHERS THEN Perform tasks END;
The exception section of the code gets executed only when there is an error and is placed at the end of the procedure code. The RAISE statement which can be used to prompt a message complements the EXCEPTION statement to provide some detailed error messages to the user. The following is an example BEGIN Perform tasks EXCEPTION WHEN OTHERS THEN RAISE NOTICE ‘Exception details %’, SQLERRM; END;
RAISE can take in different levels and one of them is NOTICE. Others being DEBUG and EXCEPTION. While RAISE EXCEPTION will abort the transaction, the DEBUG and NOTICE levels are only used to send messages to logs. Also EXCEPTIONs which occur in a procedure will propagate through the call chain until it reaches a point where it is handled using EXCEPTION statement or reaches the main procedure. Result set as return value
© asquareb llc
50
We have seen how procedures can return unique values and are aware of the fact that procedures can also return result set. In order for a procedure to return a result set, it should be defined with the return type if REFTABLE(table_name). The following is an example CREATE OR REPLACE PROCEDURE SP_SELECT_EMP() RETURNS REFTABLE(EMP) LANGUAGE PLSQL AS
The table referred to in the REFTABLE clause should exist in the database even if there is no data in it during the creation of the procedure. In order to return the data in the table, the RETURN REFTABLE statement should be used as in this example where all the data in the EM P table will be returned to the caller of the procedure who needs to handle the result set. CREATE OR REPLACE PROCEDURE SP_SELECT_EMP() RETURNS REFTABLE(EMP) LANGUAGE PLSQL AS BEGIN_PROC BEGIN Perform tasks RETURN REFTABLE; EXCEPTION WHEN OTHERS THEN Perform tasks END; END_PROC;
Tables referred in the REFTABLE clauses in procedures currently defined in the database cannot be dropped until the procedure is dropped or the procedure referring to the table is altered to return another table data as result set.
© asquareb llc
51
11.
Workload Management
Netezza appliance comes with components to configure the system resource usage so that it can be utilized efficiently by the various user groups. In order to configure the system for optimal usage the system usage needs to be monitored and understood clearly. In this section we will look into how to monitor the system usage and how the appliance can be configured for optimal resource usage. Query History Collection and Reporting
Understanding the queries getting executed and the resources used in the system provides a good understanding of how the appliance is being utilized by the users of the installation. Netezza can automatically collect and store the details about the queries getting executed by creating a query history database and enabling a query history configuration. The query history provides data like
Queries executed, their start and end time and the total execution time Queries executed by users and user groups Tables and the columns in the table accessed by the queries and the operations performed
Apart from using the historical query statistics to define the work load management, the data will also help review and if required redefine the distribution and organization of the table data. Enabling query history data collection involves creating a query history database, creating a query history configuration which defines the type of data which needs to be collected and enabling the query history configuration so that the system starts populating the query history database. The query history database should be secured as with any other user defined database so that required privileges are granted only to the required users. The query history database can be created using the nzhistcreatedb command utility. The utility creates the history database with the required tables to store data and views to run queries against them. It is recommended to use the views to run any queries against the data collected instead of the underlying tables for future compatibility purposes. The following is an example to create prodhistdb query history database nzhistcreatedb –d prodhistdb –t query –u huser –o hadmin – p hadminpass –v 1
The user hadmin will be the owner for the new query database and the user huser will be used to load query statistic data into the tables in the history database. Both the user ids should be in existence in the system before the nzhistcreatedb command is executed. In order to start collecting the query statistics and store into the history database, query history configurations need to be defined to collect the required level of query data and a configuration needs to be enabled. Multiple configurations can be defined for various levels of data but only one configuration can be active at any time. When a configuration is enabled it collects by default the data about login failures, session start and end and query history process startup. In addition to these default data users can define a configuration to collect data about queries, plan, tables and columns. The following is an example on creating a query history configuration and how to enable it
© asquareb llc
52
-- To create a history configuration which collects all data regarding queries --
CREATE HISTORY CONFIGURATION prod_hist HISTTYPE QUERY DATABASE prodhistdb USER huser PASSWORD 'huserpass' COLLECT PLAN,COLUMN LOADINTERVAL 10 LOADMINTHRESHOLD 4 LOADMAXTHRESHOLD 20 STORAGELIMIT 25 LOADRETRY 1 VERSION 1; -- To set a new history configuration to take in effect when the appliance is started next time --
SET HISTORY CONFIGURATION prod_hist; -- Stop and start the appliance so that the new history configuration set can take in effect --
nzstop nzstart
The sample history configuration uses the history database created previously and requires the user id and password of the user defined as the history database user specified in the – u option in the query history database creation. The sample history configuration once enabled will collect data for all the areas since by specifying PLAN in the COLLECT option it also collects for QUERY and also by specifying COLUMN also collects the data for TABLES implicitly. As you may have noticed in the sequence of steps to enable a history configuration, the system requires a stop and a start i.e. setting a configuration will not get the configuration to take in effect. Once a configuration gets enabled, the system will start collecting data and will store it in a directory. The data stored will get loaded into the history database at regular intervals mentioned in the LOADINTERVAL option of the configuration or when the LOADMAXTHRESHOLD is reached. To stop query history data collection a history configuration with histtype of none can be created and that configuration can be enabled so that no query history data will be collected and loaded into the query history database. The following is an example -- To create a history configuration which collects no history data --
CREATE HISTORY CONFIGURATION disable_history HISTTYPE NONE; -- To set a new history configuration to take in effect when the appliance is started next time --
SET HISTORY CONFIGURATION disable_history;
Overtime the history database will grow in size with loading of data and the database data needs to be purged based on the period for which the data is required so that the size can be controlled. The command line utility nzhistcleanupdb can be used to delete data from the history database until up to a certain date and time. Along with minimizing the storage use, purging unwanted data will also help improve the performance against the history database. The following is an example to purge records which were created before 2012 Jan 31 00:00:00 hrs from the prodhistdb database nzhistcleanupdb -d prodhistdb -u hadmin -pw hadminpass –t "2012-12-31"
Also the history database can be dropped as any user database by using the drop database command. It is important that no active history configurations are using the database before it is dropped so that there are no issues with the query history data collection process. For anyone who is interested in understanding queries executed and the pattern of usage of the system the following are some of the key views in the query history database.
© asquareb llc
53
Name $v_hist_queries $v_incomplete_queries
$v_successful_queries $v_unsuccessful_queries $v_hist_log_events $v_table_access_stats $v_column_access_stats
Description View to query about completed queries View to query about queries for which data is not captured completely due to system reset or incomplete load of query statistics Same view as $v_hist_queries but only shows data for successful queries Same view as $v_hist_queries but only shows data for unsuccessful queries Shows data about all the events which happened in the system Shows cumulative stats on all access happened on each table in the system Shows cumulative stats on all access happened on all table column
Workload Management
Once the pattern of system usage is understood the workload in the system can be managed so that the system resources can be utilized efficiently. Netezza provides the following features in order to manage the workload Feature Short Query (SQB)
Description Bias This is to reserve system resources for all queries which are estimated to be complete in less than 2 seconds. The query time limit of less than 2 seconds is configurable along with the resources allocated for SQB. For the configuration changes to take in effect, the system would require a pause and resume. The resources which can be configured are number slots in the GRA scheduler and snippet scheduler queues for short queries, memories in snippet processors and host for short queries. Guaranteed Resource User groups can be created called resource sharing groups (RSG) to which a min Access (GRA) and max percentage of system resources can be assigned. This will make sure that any job or query executed by a user attached to the resource will be guaranteed the min percentage of the system resources. By default the admin users is defined to get 50% of the system resources and hence it is advisable to use the admin user sparingly. Also for any administrative tasks like backup etc. it is a good practice to have a resource sharing group defined with the required privileges and have users attached to the group so that they can perform the tasks. Priority Query A user, group or session can be assigned a priority of critical, high, normal or Execution (PQE) low and the appliance will prioritize the allocation of resources and schedule the queries or jobs executed in the order of priority accordingly. Critical and high priority jobs and queries get more resources than normal or low priority. If multiple jobs/queries with differing priorities are executed with in the same RSG, the jobs will get proportion of the resources assigned to the RSG based on the priority. There are two hidden priorities which are used by the system “System Critical” the highest priority for system operations and the “System Background” for low priority system jobs. Gate Keeper Unlike the other 3 features, gate keeper is not enabled by default and would require a configuration change. Gate Keeper can be used to throttle the number of queries on various categories which can be executed at the same time in the system. By default configuration parameters are provided to set the maximum number of jobs/queries which can be executed concurrently in the system for the four job priorities of critical, high, normal and low. The number of jobs in
© asquareb llc
54
Feature
Description each priority which can be executed concurrently can be modified using the configuration parameters. Once the number of jobs/queries for a particular priority reaches the number set in the configuration, the gate keeper starts queuing additional jobs in the internal queues for the particular category. If gate keeper is used along with GRA then the jobs gets scheduled based on the resource availability for the job in the RSG to which it belongs and the priority. Gate keeper can also be used to configure additional queues based to throttle queries based on the estimated amount of time to execute queries. The following are some examples of workload management configuration using GRA, PQE and gate keeper -- To create a RSG with minimum resource allocation of 15% and max allocation of 30% --
CREATE GROUP reporting WITH RESOURCE MINIMUM 15 RESOURCE MAXIMUM 30; -- To create a user group with default priority of high and all the users attached to it gets the priority of high --
CREATE GROUP tableau WITH DEFPRIORITY HIGH; -- To set the priority if an active session to critical --
ALTER SESSION 501028 SET PRIORITY TO CRITICAL; -- To alter the default priority and the maximum priority of an user --
ALTER USER mike WITH DEFPRIORITY LOW MAXPRIORITY HIGH;
© asquareb llc
55
12.
Best Practices Define all constraints and r elationships between objects. Even though Netezza doesn’t enforce them other than the not null constraint, the query optimizer will still use these details to come-up with an efficient query execution plan. When defining columns use data types for which Netezza can create zone maps. Some of the easy targets are not using columns with numeric(x,0). If data for a column is known to have a fixed length value, then use char(x) instead of varchar(x). Varchar(x) uses additional storage which will be significant when dealing with TB of data and also impacts the query processing since additional data need to be pulled in from disk for processing. Use NOT NULL wherever data permits. This will help improve performance by not having to check for null condition by the appliance and will reduce storage usage. Use the same data type for columns used in joins so that the query execution can be efficient which in turn helps queries execute faster. Use the same data type and length for columns with the same name in all the tables in the database. Distribute on columns of high cardinality and ones that used to join often. It is best to distribute fact and dimension table on the same column. This will reduce the data redistribution during queries improving the performance. Even if both the fact and dimension table can’t be distributed on the same key, make effort to avoid redistribution on the fact table by choosing the right column for distribution. Distribute on one column whenever possible and do not create keys for the sake of distribution. Use random distribution as the last resort and it is fine to use random distribution on a small table since they may get broadcast. Define clustered base table when data in a fact table is often looked through multiple dimensions. Use columns which will used to look into the data through multiple dimensions when organizing data in a clustered base table. Create materialized view on a small set of the columns from a large table often used by user queries. Create sorted materialized views with the most restricting column in the order by clause of the view so that it can be used as an index. Do not drop and recreate materialized view since the OID will get changed and may impact other dependent objects. Schedule grooms after major changes like updates, deletes, alters on tables so that space utilization is optimized along with increase in query performance. Schedule regular “generate statistics” jobs particular ly on larger tables and tables with very activity so that the optimizer can generate optimal execution plans which improves query performance. Schedule regular backups which should also include host backup so that data backup and the catalogs are in sync.
© asquareb llc
56
Do not store large quantity of data in the host since it will affect the performance. Use network mounts or third party vendor products to take and store backups. Monitor the system work load at regular intervals and alter the system workload management accordingly. Use admin user sparingly. Define a separate group with proper resource allocation along with required privileges to perform admin related tasks and add users to it to perform admin tasks. Prefer joins over correlated sub queries.
© asquareb llc
57
13.
Version 7 Key Features Page level zone maps instead of extent level. o Until version 6.0 zone maps were created at the extend level which is 3 MB. But starting version 7.0 zone maps are created at the page level which is 128 KB and that means less amount of data will be brought into SPU since the system knows what data is stored at a much more granular level and eliminate unwanted pages to be read. Parallelism in query snippet processing o Currently snippets in a query plan are processed in sequence and with version 7.0 where possible snippets in query will be executed in parallel which will improve the query performance significantly. Restricted distribution of snippets to SPUs o Currently snippets in query plans are scheduled and executed in all the SPUs. Since the appliance knows what data is stored in which disk which intron attached to which SPU, query snippets will only get distributed to the SPU which has the relevant data for the query to be processed.
© asquareb llc
58
View more...
Comments