Seminar Report - Teradata is a Relational Database Managemen
Short Description
seminar for btech...
Description
1. INTRODUCTION Teradata is a relational database management system (RDBMS) that drives a company's data warehouse. Teradata provides the foundation to give a company the power to grow, to compete in today's dynamic marketplace, to achieve the goal of "Transforming Transactions into Relationships" and to evolve the business by getting answers to a new generation of questions. Teradata's scalability allows the system to grow as the business grows, from gigabytes to terabytes and beyond. Teradata's unique technology has been proven at customer sites across industries and around the world. Teradata is an open system, compliant with ANSI standards. It is currently available on UNIX MP-RAS and Windows 2000 operating systems. Teradata is a large database server that accommodates multiple client applications making inquiries against it concurrently. Various client platforms access the database through a TCP-IP connection across an IBM mainframe channel connection. The ability to manage large amounts of data is accomplished using the concept of parallelism, wherein many individual processors perform smaller tasks concurrently to accomplish an operation against a huge repository of data. To date, only parallel architectures can handle databases of this size.
2. What is Teradata ? Teradata is a relational database management system initially created by the firm with the same name, founded in 1979. Teradata is part of the NCR Corporation which acquired the Teradata company on February 28, 1991. It is a massively parallel processing system running a shared nothing architecture. The main point with the Teradata DBMS is that it's linearly and predictably scalable in all dimensions of a database system workload (data volume, breadth, number of users, complexity of queries), explaining its popularity for enterprise data warehousing applications. Teradata is offered on Intel servers interconnected by the BYNET messaging fabric. Teradata systems are offered with either Engenio or EMC disk arrays for database storage.
1
Teradata offers a choice of several operating systems NCR UNIX SVR4 MP-RAS, a variant of System V UNIX from AT&T Microsoft Windows 2000 and Windows Server 2003 SUSE Linux on 64-bit Intel servers has been pre-announced for 2006. Teradata Enterprise Data Warehouses are often accessed via ODBC or JDBC by applications running on operating system such as Microsoft Windows or flavors of UNIX. The warehouse typically sources data from operational systems via a combination of batch and trickle loads. The largest and most prominent customer of this DBMS is Wal-Mart, which runs its central inventory and other financial systems on Teradata. Wal-Mart's Teradata Data Warehouse is generally regarded by the DBS industry as being the largest data warehouse in the world. Other Teradata customers include companies like AT&T (formerly SBC), Dell, Continental Airlines, National Australia Bank, FedEx, Vodafone, Gap Inc, Safeway Inc, eBay and Kaiser Permanente. Teradata's main competitors are other high-end solutions such as Oracle and IBM's DB2.
3. Why Teradata? Teradata is the world's leading Enterprise Data Warehousing solutions provider . Today, more than 60% of the world's most admired global companies use Teradata technology, including: •
90% of the Top Global Telecommunications Companies
•
50% of the Top Global Retailers
•
70% of the Top Global Airlines
•
60% of the Top Global Transportation Logistics Companies
•
40% of the Top Global Commercial and Savings Banks Along with our proven, time-tested leadership in data warehousing, Teradata
offers a wide variety of solutions for Customer Relationship Management, Supply and Demand Chain Management, Financial Services, Enterprise Risk Management and much more. Add accolades and awards from Gartner, Intelligent Enterprise, DM Review and many other industry experts, and Teradata is clearly the best choice.
2
4. Advantages & Disadvantages Advantages 1) Teradata database is Linearly scalable - We can expand the database capacity by just adding more nodes to the existing database. If the data volume grows we can add more hardware and expand the database capacity. 2) Extensive parallel processing - Teradata has a extensive parallel processing capacity, it can handle multiple adhoc requests and many concurrent users. 3) Shared nothing architecture - Teradata database has shared nothing architecture, it has high fault tolerance and data protection. Single Version of Truth, Parallel aware optimizer, reduces DBA activities and Good warehouse incase of huge data. Maintain as DW is not so easy in terms of cost.
Disadvantages •
Teradata development and DBA resources are harder to come by and therefore more expensive.
•
Teradata is not as open as Oracle in Tech.
•
Maintain as DW is not so easy in terms of cost.
•
Many key tech is still under the control of Teradata PS
•
Teradata is for enterprise application, not as widely used as Oracle or Sybase
•
It is not suitable for small transaction OLTP databases It's not really a consideration for enterprise software. Teradata is designed
for very high data volumes. If you tried that on Oracle with a $20 Oracle DBA or someone just out of a training course they would be completely at sea. They just wouldn't know how to optimise it for high data volume loads and intense user query. For a database of this size the experienced DBAs for Oracle or DB2 may be as expensive as the DBAs for Teradata although probably more numerous. Companies are choosing Teradata because they perceive it to be better at high volume data warehouse work than the RDBMS products such as Oracle and DB2. Teradata have designed a DW database for fast loading of huge data volumes and fast SQL querying. It is more specialised for DW than Oracle or DB2.
3
5. Scalability
Figure - 1 Scalability "Linear scalability" means that as you add components to the system, the performance increase is linear. Adding components allows the system to accommodate increased workload without decreased throughput. Teradata was the first commercial database system to scale to and support a trillion bytes of data. The origin of the name Teradata is "tera-," which is derived from Greek and means "trillion." The chart below lists the meaning of the prefixes: Prefix
Exponent 3
Meaning
kilo-
10
1,000 (thousand)
mega-
106
1,000,000 (million)
giga-
9
1,000,000,000 (billion)
12
10
tera-
10
1,000,000,000,000 (trillion)
peta-
1015
1,000,000,000,000,000 (quadrillion)
18
exa-
10
1,000,000,000,000,000,000 (quintillion) Table : 1
Teradata can scale from 100 gigabytes to over 100 terabytes of data on a single system without losing any performance capability. Teradata's scalability provides investment protection for customer's growth and application development. Teradata is the only database that is truly scalable, and this extends to data loading with the use of parallel loading utilities. Teradata is scalable in multiple ways, including hardware, complexity, and concurrent users. Hardware
4
Growth is a fundamental goal of business. A Teradata MPP system easily accommodates that growth whenever it happens. The Teradata Database runs on highly optimized NCR servers in the following configurations: •
SMP - Symmetric multiprocessing platforms manage gigabytes of data to support an entry-level data warehousing system.
•
MPP - Massively parallel processing systems can manage hundreds of terabytes of data. You can start small with a couple of nodes, and later expand the system as your business grows.
With Teradata, you can increase the size of your system without replacing: •
Databases - When you expand your system, the data is automatically redistributed through the reconfiguration process, without manual interventions such as sorting, unloading and reloading, or partitioning.
•
Platforms - Teradata's modular structure allows you to add components to your existing system.
•
Data model - The physical and logical data models remain the same regardless of data volume.
Applications Applications you develop for Teradata configurations will continue to work as the system grows, protecting your investment in application development Complexity Teradata is adept at complex data models that satisfy the information needs throughout an enterprise. Teradata efficiently processes increasingly sophisticated business questions as users realize the value of the answers they are getting. It has the ability to perform large aggregations during query run time and can perform up to 64 joins in a single query.
Concurrent Users
5
As is proven in every benchmark Teradata performs, Teradata can handle the most concurrent users, who are often running multiple, complex queries. Teradata has the proven ability to handle from hundreds to thousands of users on the system simultaneously. Adding many concurrent users typically reduces system performance. However, adding more components can enable the system to accommodate the new users with equal or even better performance.
6. Teradata Manageability
6
One of the key benefits of Teradata is its manageability. The list of tasks that Teradata Database Administrators do not have to do is long, and illustrates why the Teradata system is so easy to manage and maintain compared to other databases. Things Teradata Database Administrators Never Have to Do Teradata DBAs never have to do the following tasks: •
Reorganize data or index space.
•
Pre-allocate table/index space and format partitioning. While it is possible to have partitioned indexes in Teradata, they are not required.
•
Pre-prepare data for loading (convert, sort, split, etc.).
•
Unload/reload data spaces due to expansion. With Teradata, the data can be redistributed on the larger configuration with no offloading and reloading required.
•
Write or run programs to split input source files into partitions for loading. With Teradata, the workload for creating a table of 100 rows is the same as
creating a table with 1,000,000,000 rows. Teradata DBAs know that if data doubles, the system can expand easily to accommodate it. Teradata provides huge cost advantages, especially when it comes to staffing Database Administrators. Customers tell us that their DBA staff requirements for administering non-Teradata databases are three to 10 times higher. How Other Databases Store Rows and Manage Data Even data distribution is not easy for most databases to do. Many databases use range distribution, which creates intensive maintenance tasks for the DBA. Others may use indexes as a way to select a small amount of data to return the answer to a query. They use them to avoid accessing the underlying tables if possible. The assumption is that the index will be smaller than the tables so they will take less time to read. Because they scan indexes and use only part of the data in the index to search for answers to a query, they can carry extra data in the indexes, duplicating data in the tables. This way they do not have to read the table at all in some cases. As you will see, this is not nearly as efficient as Teradata's method of data storage and access. Other DBAs have to ask themselves questions like: •
How should I partition the data? 7
•
How large should I make the partitions?
•
Where do I have data contention?
•
How are the users accessing the data? Many other databases require the DBAs to manually partition the data.
They might place an entire table in a single partition. The disadvantage of this approach is it creates a bottleneck for all queries against that data. It is not the most efficient way to either store or access data rows. With other databases, adding, updating and deleting data affects manual data distribution schemes thereby reducing query performance and requiring reorganization. A Teradata system provides high performance because it distributes the data evenly across the AMPs for parallel processing. No partitioning or data re-organizations are needed. With Teradata, your DBA can spend more time with users developing strategic applications to beat your competition
8
7. Unconditional Parallelism
Figure -2 Unconditional Parallelism
Teradata provides exceptional performance using parallelism to achieve a single answer faster than a non-parallel system. Parallelism uses multiple processors working together to accomplish a task quickly. An example of parallelism can be seen at an amusement park, as guests stand in line for an attraction such as a roller coaster. As the line approaches the boarding platform, it typically will split into multiple, parallel lines. That way, groups of people can step into their seats simultaneously. The line moves faster than if the guests step onto the attraction one at a time. At the biggest amusement parks, the parallel loading of the rides becomes essential to their successful operation. Parallelism is evident throughout a Teradata system, from the architecture to data loading to complex request processing. Teradata processes requests in parallel without mandatory query tuning. Teradata's parallelism does not depend on limited data quantity, column range constraints, or specialized data models -- Teradata has "unconditional parallelism."
8. Ability to Model the Business
9
Figure - 3 A data warehouse built on a business model contains information from across the enterprise. Individual departments can use their own assumptions and views of the data for analysis, yet these varying perspectives have a common basis for a "single version of the truth." With Teradata's centrally located, logical architecture, companies can get a cohesive view of their operations across functional areas to: •
Find out which divisions share customers.
•
Track products throughout the supply chain, from initial manufacture, to inventory, to sale, to delivery, to maintenance, to customer satisfaction.
•
Analyze relationships between results of different departments.
•
Determine if a customer on the phone has used the company's website.
•
Vary levels of service based on a customer's profitability. You get consistent answers from the different viewpoints above using a
single business model, not functional models for different departments. In a functional model, data is organized according to what is done with it. But what happens if users later want to do some analysis that has never been done before? When a system is optimized for one department's function, the other departments' needs (and future needs) may not be met. A Teradata system allows the data to represent a business model, with data organized according to what it represents, not how it is accessed, so it is easy to understand. The data model should be designed with regard to usage and be the same regardless of data volume. With Teradata as the enterprise data warehouse, users can ask new questions of the
10
data that were never anticipated, throughout the business cycle and even through changes in the business environment. A key Teradata strength is its ability to model the customer's business. Teradata's business modelsare truly normalized, avoiding the costly star schema and snowflake implementations that many other database vendors use. Teradata can do Star Schema and other types of relational modeling, but Third Normal Form is the methodology Teradata recommends to customers. Teradata's competitors typically implement Star Schema or Snowflake models either because they are implementing a set of known queries in a transaction processing environment, or because their architecture limits them to that type of model. Normalization is the process of reducing a complex data structure into a simple, stable one. Generally this process involves removing redundant attributes, keys, and relationships from the conceptual data model. Teradata supports normalized logical models because Teradata is able to perform 64 table joins and large aggregations during queries. Mature, Parallel-Aware Optimizer Teradata's Optimizer is the most robust in the industry, able to handle: •
Multiple complex queries
•
Joins per query
•
Unlimited ad-hoc processing The Optimizer is parallel-aware, meaning that it has knowledge of system
components (how many nodes, vprocs, etc.). It determines the least expensive plan (timewise) to process queries fast and in parallel. Evolution to Active Data Warehousing: Data Warehouse Usage Evolution : There is an information evolution happening in the data warehouse environment today. Changing business requirements have placed demands on data warehousing technology to do more things faster. Data warehouses have moved from back room strategic decision support systems to operational, business-critical components of the enterprise. As your company evolves in its use of the data warehouse, what you need from the data warehouse evolves, too.
11
Figure - 4 Stage 1 Reporting: The initial stage typically focuses on reporting from a single source of truth to drive decision-making across functional and/or product boundaries. Questions are usually known in advance, such as a weekly sales report. Stage 2 Analyzing: Focus on why something happened, such as why sales went down or discovering patterns in customer buying habits. Users perform ad-hoc analysis, slicing and dicing the data at a detail level, and questions are not known in advance. Stage 3 Predicting: Sophisticated analysts heavily utilize the system to leverage information to predict what will happen next in the business to proactively manage the organization's strategy. This stage requires data mining tools and building predictive models using historical detail. As an example, users can model customer demographics for target marketing. Stage 4 Operationalizing: Providing access to information for immediate decision-making in the field enters the realm of active data warehousing. Stages 1 to 3 focus on strategic decision-making within an organization. Stage 4 focuses on tactical decision support.. Tactical decision support is not focused on developing corporate strategy, but rather on supporting the people in the field who execute it. Examples: 1) Inventory management with just-in-time replenishment, 2) Scheduling and routing for package delivery. 3) Altering a campaign based on current results.
12
Stage 5 Active Warehousing: The larger the role an ADW plays in the operational aspects of decision support, the more incentive the business has to automate the decision processes. You can automate decision-making when a customer interacts with a web site. Interactive customer relationship management (CRM) on a web site or at an ATM is about making decisions to optimize the customer relationship through individualized product offers, pricing, content delivery and so on. As technology evolves, more and more decisions become executed with event-driven triggers to initiate fully automated decision processes. Example: determine the best offer for a specific customer based on a real-time event, such as a significant ATM deposit Active Data Warehouse Data warehouses are beginning to take on mission-critical roles supporting CRM, one-to-one marketing, and minute-to-minute decision-making. Data warehousing requirements have evolved to demand a decision capability that is not just oriented toward corporate staff and upper management, but actionable on a day-to-day basis. Decisions such as when to replenish Barbie dolls at a particular retail outlet may not be strategic at the level of customer segmentation or long-term pricing strategies, but when executed properly, they make a big difference to the bottom line. We refer to this capability as "tactical" decision support. Tactical decisions are the drivers for day-to-day management of the business. Businesses today want more than just strategic insight from their data warehouse implementations-they want better execution in running the business through more effective use of information for the decisions that get made thousands of times per day. The origin of the active data warehouse is the timely, integrated store of detail data available for analytic business decision-making. It is only from that source that the additional traits needed by the active data warehouse can evolve. These new "active" traits are supplemental to data warehouse functionality. For example, the work mix in the database still includes complex decision support queries, but expands to take on short, tactical queries, background data feeds, and possibly event-driven updates all at the same time. Data volumes and user concurrency levels may explode upward beyond expectation. Restraints may need to be placed on the longer, analytical queries in order to guarantee tactical work throughput. While accessing the detail data directly remains an important opportunity for analytical work, tactical work may thrive on shortcuts and summaries, such 13
as operational data store (ODS) level information. And for both strategic and tactical decisions to be useful to the business, today's data, this hour's data, even this minute's data has to be at hand. Teradata is positioned exceptionally well for stepping up to the challenges related to high availability, large multi-user workloads, and handling complex queries that are required for an active data warehouse implementation. Teradata technology supports the evolving business requirements by providing high performance and scalability for: •
Mixed workloads (both tactical and strategic queries) for mission critical applications
•
Large amounts of detail data
•
Concurrent users Teradata provides 7x24 availability and reliability, as well as continuous
updating of information so data is always fresh and accurate
14
9. Teradata Architecture
Figure: 5 Architecture of Teradata
Single Data Store
Figure: 6 Single Data Store
15
Teradata acts as a single data store, with multiple client applications making inquiries against it concurrently. Instead of replicating a database for different purposes, with Teradata you store the data once and use it for all clients. Teradata provides the same connectivity for an entry-level system as it does for a massive enterprise data warehouse.
10. Teradata System A Teradata system contains one or more nodes. A node is a term for a processing unit under the control of a single operating system. The node is where the processing occurs for the Teradata Database. There are two types of Teradata systems: •
Symmetric multiprocessing (SMP) - An SMP Teradata system has a single node that contains multiple CPUs sharing a memory pool.
•
Massively parallel processing (MPP) - Multiple SMP nodes working together comprise a larger, MPP implementation of Teradata. The nodes are connected using the BYNET, which allows multiple virtual processors on multiple nodes to communicate with each other.
Figure - 7 Teardata System To manage a Teradata system, you use: •
SMP system: System Console (keyboard and monitor) attached directly to the SMP node
•
MPP system: Administration Workstation (AWS)
16
To access a Teradata system, a user typically logs on through one of multiple client platforms (channel-attached mainframes or network-attached workstations). Client access is discussed in the next module. Node Components A node is a basic building block of a Teradata system, and contains a large number of hardware and software components. A conceptual diagram of a node and its major components is shown below. Hardware components are shown on the left side of the node and software components are shown on the right side.
Figure – 8 Node Component
Shared Nothing Architecture The Teradata vprocs (which are the PEs and AMPs) share the components of the nodes (memory and cpu). The main component of the "shared-nothing" architecture is that each AMP manages its own dedicated portion of the system's disk space (called the vdisk) and this space is not shared with other AMPs. Each AMP uses system resources independently of the other AMPs so they can all work in parallel for high system performance overall.
17
Using the BYNET The BYNET (pronounced, "bye-net") is a high-speed interconnect (network) that enables multiple nodes in the system to communicate. It has several unique features: •
Scalable: As you add more nodes to the system, the overall network bandwidth scales linearly. This linear scalability means you can increase system size without performance penalty -- and sometimes even increase performance.
•
High performance: An MPP system typically has two BYNET networks (BYNET 0 and BYNET 1). Because both networks in a system are active, the system benefits from having full use of the aggregate bandwidth of both the networks.
•
Fault tolerant: Each network has multiple connection paths. If the BYNET detects an unusable path in either network, it will automatically reconfigure that network so all messages avoid the unusable path. Additionally, in the rare case that BYNET 0 cannot be reconfigured, hardware on BYNET 0 is disabled and messages are rerouted to BYNET 1.
•
Load balanced: Traffic is automatically and dynamically distributed between both BYNETs.
18
11. Teradata Software Parsing Engine A Parsing Engine is a vproc that manages the dialogue between a client application and the Teradata Database, once a valid session has been established. Each PE can support a maximum of 120 sessions. AMP The AMP is a vproc that controls its portion of the data on the system. The AMPs work in parallel, each AMP managing the data rows stored on its vdisk. AMPs are involved in data distribution and data access in different ways. Channel Driver Channel Driver software is the means of communication between an application and the PEs assigned to channel-attached clients. There is one Channel Driver per node. Teradata Gateway Teradata Gateway software is the means of communication between an application and the PEs assigned to network-attached clients. There is one Teradata Gateway per node.
19
12. Teradata Warehouse Builder Teradata Warehouse Builder (TWB) is a data warehouse loading tool that enables data extraction, transformation and loading processes common to all data warehouses. Using built-in operators, Teradata Warehouse Builder combines the functionality of the Teradata utilities (FastLoad, MultiLoad, FastExport, and TPump) in a single parallel environment. Its extensible environment supports FastLoad INMODs, FastExport OUTMODs, and Access Modules to provide access to all the data sources you use today. There is a set of open APIs (Application Progammer Interface) to add third party or custom data transformation to Teradata Warehouse Builder scripts. Using multiple, parallel tasks, a single Teradata Warehouse Builder script can load data from disparate sources into the Teradata Database in the same job. Teradata Warehouse Builder is scalable and enables end-to-end parallelism. The previous versions of utilities (like FastLoad) allow you to load data into Teradata in parallel, but with a single input stream. Teradata Warehouse Builder allows you to run multiple instances of the extract, optional transformation, and load operators. You can have as many loads as you have sources in the same job. With multiple sources of data coming from multiple platforms integration is important in a parallel environment. Teradata Warehouse Builder eliminates the need for persistent storage. It stores data into data buffers so you no longer need to write data into a flat file. Since you don't need flat files, there is no longer a 2GB file limit. Teradata Warehouse Builder provides a single, SQL-like scripting language, as well as a GUI to make scripting faster and easier. You can do the extract, some transformation, and loads all in one SQL-like scripting language. Once the dynamics of the language are learned, you can perform multiple tasks with a single script. You can use script converters to convert scripts on existing systems for utilities (FastLoad, MultiLoad, FastExport, and TPump) to Teradata Warehouse Builder scripts.
20
Figure – 9 Teradata Warehouse
21
13. What Is SQL?
Figure – 10 Accessing of Teradata
Teradata is accessed using SQL (Structured Query Language). SQL is the industry standard access language for communicating with a relational database. SQL is a set-oriented language for relational database management. A user or application can use SQL statements to perform operations on the data and define how an answer set should be returned from an RDBMS. Teradata supports two types of SQL: •
ANSI SQL: Teradata SQL is compliant with ANSI standards (an industry standard).
•
Teradata SQL Extensions: NCR has added Teradata SQL extensions above and beyond standard SQL capabilities, including one-step SQL statements for complex administrative operations.
Teradata SQL Benefits Teradata SQL is the set of SQL commands used with the Teradata Database. Some benefits of Teradata SQL are: •
Parallel Execution - The Optimizer breaks up an SQL statement into tasks that can be executed in parallel to minimize resource contention. The design of the Teradata
22
Database, along with its automatic data distribution, balances the workload and reduces bottlenecks. •
ANSI Compliant - Teradata SQL is compliant with ANSI standards. If you have programs already written with ANSI-compliant SQL for a different relational database, you can run them with Teradata, as well.
•
High-Performance Extensions - NCR has added Teradata SQL extensions that are above and beyond the standard SQL capabilities, including one-step SQL statements for complex administrative operations.
User Assistance Statements and Modifiers SQL user assistance statements (and modifiers) vary widely from database vendor to database vendor. Teradata's user assistance statements are commonly called Teradata extensions. These Teradata extensions are additions to the DDL, DML, and DCL statements in standard SQL, and make some operations less time consuming. This page discusses the following Teradata SQL user assistance commands: •
HELP
•
HELP SESSION
•
SHOW
This page also discusses the statement modifier: •
EXPLAIN
The HELP Statement The HELP statement is used to display information about database objects. You can get help on the following: HELP DATABASE HELP USER HELP TABLE HELP VIEW HELP MACRO HELP TRIGGER
23
HELP PROCEDURE HELP COLUMN HELP INDEX HELP STATISTICS . . . and much more! Example: HELP DATABASE databasename Displays all the objects in the specified database. The HELP SESSION Statement Use the HELP SESSION statement to see specific information about your SQL session. Example: HELP SESSION; Displays the user name with which you logged in, the log-on date and time, your default database, and other information related to your current session. The SHOW Statement Use the SHOW statement to display the data definition language (DDL) associated with database objects (tables, views, macros, triggers, or stored procedures). You can show the DDL for the following: SHOW TABLE SHOW VIEW SHOW MACRO SHOW TRIGGER SHOW PROCEDURE SHOW JOIN INDEX
24
Example: SHOW TABLE tablename Displays the CREATE TABLE statement that was used to create the specified table. The EXPLAIN Modifier The EXPLAIN modifier allows you to preview how Teradata will execute an SQL request. It is a good way to see what database resources will be used in processing the request. Use the EXPLAIN modifier preceding any SQL statement to see a plan with: •
English text describing a plan for how the statement will be processed.
•
An estimate of the number of rows involved.
•
A relative cost of the request. The relative cost is shown in units of time, and should not be used to predict
actual response time for an SQL request. This time estimate can be used to compare the duration of request processing relative to other plans. When you execute a request preceded by the EXPLAIN modifier, the request is not executed. Instead, the system: •
Fully parses the request.
•
Optimizes the request.
•
Reports the complete plan for executing the request in readable English.
Example: EXPLAIN SELECT * FROM tablename; Displays the steps involved in processing the request, SELECT * FROM the specified table.
14. A Teradata Database 25
In Teradata systems, the words "database" and "user" have specific definitions. Database: The Teradata Definition In Teradata, a "database" provides a logical grouping of information. A Teradata Database also provides a key role in space allocation and access control. A Teradata Database is a defined, logical repository that can contain objects, including: •
Databases: A defined object that may contain a collection of Teradata objects.
•
Users: Databases that each have a logon ID and password for logging on to Teradata.
•
Tables: Two-dimensional structures of columns and rows of data stored on the disk drives.
•
Views: A virtual "window" to subsets of one or more tables or other views, predefined using a single SELECT statement.
•
Macros: Definitions of one or more Teradata SQL and report formatting commands.
•
Triggers: One or more Teradata SQL statements associated with a table and executed when specified conditions are met.
•
Stored Procedures: Combinations of procedural and non-procedural statements run using a single CALL statement
Note: A Database with no Perm Space can have views, macros, and triggers, but no tables or stored procedures. These Teradata objects are created, maintained, and deleted using SQL, and are described in further detail in this section. User: A Special Kind of Database A User can be thought of as a colllection of tables, views, macros, triggers, and stored procedures. A User is a specific type of Database, and has attributes in addition to the ones listed above: •
Logon ID
•
Password
26
So, a User is the same as a Database except that a User can actually log on to the RDBMS. To log on to Teradata, you need to specify a User to log on to (which is simply a Database with a password). You cannot log on to a Database because it has no password.
Figure :11 Creating Databases and Users In Teradata, Databases (including that special category of Databases called Users) have attributes assigned to them: •
Access Rights: Privileges that allow a User to perform operations (such as CREATE, DROP, and SELECT) against database objects. A User must have the correct access rights to a database object in order to access it.
•
Perm Space: The maximum amount of Permanent Space assigned and available to a User or Database to store tables. Unlike some other relational databases, Teradata does not physically pre-allocate Perm Space for Databases and Users when they are defined during object definition time. Only the Permanent Space limit is defined, then the space is consumed dynamically as needed. All Databases have a defined upper limit of Permanent Space.
•
Spool Space: The amount of space assigned and available to a User or Database to gather answer sets. For example, when executing a conditional query, qualifying rows are temporarily stored using Spool Space. Depending on how the system is set
27
up, a single query could temporarily use all available System Space to store its result in spool. Permanent Space not being used for tables is available for Spool Space. •
Temp Space: The amount of space used for global temporary tables, and these results remain available to the user until the session is terminated. Tables created in Temp Space will survive a restart. Permanent Space not being used for tables is available for Temp Space as well as Spool Space.
15. Protecting Data Several types of data protection are available with Teradata. Some of them are •
Redundant Array of Inexpensive Disks (RAID) is a storage technology that provides data protection at the disk drive level.
•
Fallback is a Teradata feature that protects data against AMP failure. As shown later in this module, Fallback uses groups of AMPs that provide for data availability and consistency if an AMP is unavailable.
•
Locking prevents multiple users who are trying to access or change the same data simultaneously from violating data integrity. This concurrency control is implemented by locking the target data. Locks are automatically acquired during the processing of a request and released when the request is terminated. Locks may be applied at three levels: Database Locks: Apply to all tables and views in the database. Table Locks: Apply to all rows in the table. Row Hash Locks: Apply to a group of one or more rows in a table
•
Temporary locks can be placed on data to prevent multiple users from simultaneously changing it: •
Exclusive Lock
• Write
Lock
• Read
Lock
• Access
Lock
28
•
Journals can be used for specific types of data or process recovery: •
Permanent Journals
•
Recovery Journals
•
Transient Journal
Permanent Journals are an optional feature of Teradata to provide an additional level of data protection. You specify the use of Permanent Journals at the table level. It provides full-table recovery to a specific point in time. It also can reduce the need for costly and time-consuming full-table backups. Permanent Journals are tables stored on disk arrays like user data is, so they take up additional disk space on the system. The Teradata DBA (Database Administrator) maintains the Permanent Journal entries (deleting, archiving, and so on.) The Teradata Database uses Recovery Journals to automatically maintain data integrity in the case of: •
An interrupted transaction (Transient Journal)
•
An AMP failure (Down-AMP Recovery Journal) Recovery Journals are created, maintained, and purged by the system
automatically, so no DBA intervention is required. Recovery Journals are tables stored on disk arrays like user data is, so they take up additional disk space on the system. A Transient Journal maintains data integrity when in-flight transactions are interrupted (due to aborted transactions, system restarts, and so on). Data is returned to its original state after transaction failure. A Transient Journal is used during normal system operation to keep "before images" of changed rows so the data can be restored to its previous state if the transaction is not completed. This happens on each AMP as changes occur. When a transaction is started, the system automatically stores a copy of all the rows affected by the transaction in the Transient Journal until the transaction is committed (completed). Once the transaction is complete, the "before images" are purged. In the event of a transaction failure, the "before images" are reapplied to the affected tables and deleted from the journal, and the "rollback" operation is completed.
29
16. Conclusion Teradata is the future of the Data Mining. In future everyone we start using Teradata Database. Now it is costly, works are going on to reduce its cost. So, it will reach to small business people also.
Reference: 1.
www.teradata.com
2.
www.google.com
3.
www.wikepedia.com
4.
www.DatumResources.com
5.
www.headstrong.com
30
View more...
Comments