Tuesday 11 November 2014

REGEXP_REPLACE VS OTRANSLATE functions in Teradata

Like Oracle, Teradata has implemented regular expressions from Teradata 14. Since these expressions are much better than traditional functions. We can try to replace them, say OTRANSLATE can be replaced with REGEXP_REPLACE.

Below example is for your understanding.

n  Table
 CREATE   VOLATILE  TABLE Employee_Target, FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL
(
employee_no INTEGER,
dept_no NUMBER,
first_name VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC,
last_name CHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC,
salary DECIMAL(10,2))
UNIQUE PRIMARY INDEX(employee_no)
            ON       COMMIT PRESERVE ROWS;

n  Data
INSERT INTO Employee_Target
VALUES           (1, 999, 'S22pot  ', 'Paul', NULL);
INSERT INTO Employee_Target
VALUES           (2, 999 ,'Arfy123' ,'Paul', NULL);
INSERT INTO Employee_Target
VALUES           (3, 50, 'M!ke', 'Larkins', 245098.00);
INSERT INTO Employee_Target
VALUES           (4, '40', 'S@ra','Wilson', 97450.75);
INSERT INTO Employee_Target
VALUES           (5, '40', 'Marsha', 'Lewis', 98453.88);

OTRANSLATE EXAMPLE

  SELECT
             employee_no
             ,CASE WHEN TRANSLATE_CHK(first_name USING LATIN_TO_UNICODE) <> 0 THEN 'UNK'
             WHEN OTRANSLATE( first_name,'' || OTRANSLATE( first_name,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' ,''), '')  = '' THEN 'UNK'
              WHEN OTRANSLATE( first_name,'' || OTRANSLATE( first_name,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' ,''), '') IS NOT NULL
              THEN OTRANSLATE( first_name,'' || OTRANSLATE( first_name,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' ,''), '')
               ELSE '' END AS first_name
  FROM Employee_Target ORDER BY 1;

REGEXP_REPLACE EXAMPLE

     SELECT
             employee_no
             ,CASE WHEN TRANSLATE_CHK(first_name USING LATIN_TO_UNICODE) <> 0 THEN 'UNK'
             WHEN REGEXP_REPLACE(first_name,'[^a-zA-Z0-9]+','',1,0,'i') ='' THEN 'UNK'
             WHEN REGEXP_REPLACE(first_name,'[^a-zA-Z0-9]+','',1,0,'i') IS NOT NULL
             THEN REGEXP_REPLACE(first_name,'[^a-zA-Z0-9]+','',1,0,'i')
              ELSE '' END AS first_name
   FROM Employee_Target ORDER BY 1;

Note:
I have added TRANSLATE_CHK function to avoid “The string contains an untranslatable character” error.

Saturday 3 May 2014

OREPLACE VS OTRANSLATE function in Teradata


OTRANSLATE
·         Returns source_string with every occurrence of each character in from_string replaced with the corresponding character in to_string.
·         OTRANSLATE is an embedded services system function. It can be activated by invoking embedded services functions
·         Expressions passed to this function must have one of the following data types: CHAR or VARCHAR

SELECT OTRANSLATE(' Current Year 2013 ',  '3', '4');

The occurrence in source_string of the character in from_string ('3') is replaced by the character in to_string ('4').

Otranslate also answers below queries:

ü  How to strip Special Characters in Teradata
ü  How strip only characters data from VARCHAR field in Teradata
ü  How to extract numbers from strings in Teradata

Let me walkthrough with some examples by having some data in volatile table

CREATE           VOLATILE  TABLE Employee_Target, FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL
(
employee_no INTEGER,
dept_no NUMBER,
first_name VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC,
last_name CHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC,
salary DECIMAL(10,2))
UNIQUE PRIMARY INDEX(employee_no)
            ON       COMMIT PRESERVE ROWS;


INSERT INTO Employee_Target
VALUES           (1, 999, 'S22pot  ', 'Paul', NULL);
INSERT INTO Employee_Target
VALUES           (2, 999 ,'Arfy123' ,'Paul', NULL);
INSERT INTO Employee_Target
VALUES           (3, 50, 'M!ke', 'Larkins', 245098.00);
INSERT INTO Employee_Target
VALUES           (4, '40', 'S@ra','Wilson', 97450.75);
INSERT INTO Employee_Target
VALUES           (5, '40', 'Marsha', 'Lewis', 98453.88);


Example 1:
For stripping all special characters from the column first_name

SEL
employee_no
,first_name
 ,OTRANSLATE(first_name,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' ,'') AS O_first_name
 ,OTRANSLATE(first_name,'' || OTRANSLATE(first_name,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' ,''), '') AS OO_first_name
 FROM    Employee_Target;                         


Example 2:
For stripping only characters by removing special characters and numeric values in first_name

SEL
employee_no
,first_name
 ,OTRANSLATE(first_name,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' ,'') AS O_first_name
 ,OTRANSLATE(first_name,'' || OTRANSLATE(first_name,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' ,''), '') AS OO_first_name
 FROM    Employee_Target;                             

Example 3:
For stripping only numeric by removing special characters and characters values in first_name

SEL
employee_no
,first_name
,OTRANSLATE(first_name,'0123456789' ,'') AS O_first_name
,OTRANSLATE(first_name,'' || OTRANSLATE(first_name,'0123456789' ,''), '') AS OO_first_name
 FROM    Employee_Target; 


OREPLACE
·         Replaces every occurrence of search_string in the source_string with the replace_string. You use this function either to replace or remove portions of a string.
·         OTRANSLATE is an embedded services system function. It can be activated by invoking embedded services functions
·         Expressions passed to this function must have one of the following data types: CHAR, VARCHAR, or CLOB
·         OREPLACE provides a superset of the functionality provided by the TRANSLATE function. TRANSLATE provides single character, 1-to-1 substitution while OREPLACE allows you to substitute 1 string for another, as well as to remove character strings.

SELECT OREPLACE('Current Year 2013 ', '2013', '2014');
The string '2013' in the source string was replaced by the string '2014'.


Some more examples

SELECT OREPLACE('This water is a pure water', 'water', 'juice');

Output is This juice is a pure juice, which means multiple words can also be replaced

Now let us see how Null values can be handled via OREPLACE


SELECT OREPLACE('Can I replace this word pass', 'pass', NULL);

SELECT OREPLACE('Can I replace this word pass', 'pass', '');


Above two queries would return as Can I replace this word, Which means you cannot substitute a NULL


Can we Replace Null like a Coalesce Function?

SELECT OREPLACE(NULL, NULL, 'UNKNOWN');

The Answer would be No, the result of above query would be Null only.

Sunday 20 October 2013

Teradata 13 VS Teradata 14

Teradata 14 is fully loaded with fantabulous features and enhancements, particularly I like the way they have compete directly with his peer by bringing the features like DENSE_RANK function, NUMBER data type etc. and other features like AUTOMATED STATISTICS MANAGEMENT,ROW-LEVEL SECUIRITY,EQUITY JOIN FOR SKEW TABLES and much more.  
Handpicked the features of Teradata 13/13.10 and Teradata 14/14.10 on developer perspective from release summaries. Hope this serves us a quick reference. 


Teradata 13
Teradata 14
1.    Collect Statistics Optimization
      a)      Improved Sampled Statistics
      i)        improves sampled statistics 
       for the number of unique values for partitioning columns
     b) Restrictions Removed on Collecting 
      Statistics
     i)        Multicolumn statistics on join and
       hash indexes
     ii)      System-derived PARTITION 
     statistics on partitioned join indexes
     iii)    Single-column, multicolumn and PARTITION statistics on volatile tables
     iv)    Statistics on unhashed tables
     v)      Sampled statistics on the partitioning columns of PPI

1. Collect Statistics Enhancements
     a)      Adding the SUMMARY option
to collect table-level statistics
     b)      Adding a number of internal
enhancements to the Optimizer
with respect to histogram structure
and use, including:
     c)       Maintaining statistics history.
     d)      Enhancing extrapolation methods
for stale statistics.
     e)      Enhancing sampling options.
     f)       Storing statistics data in their
native Teradata data types
without losing precision.

2.       Count(*) Optimization
      a)      The count function now reads 
          the cylinder index rather than 
           performing a full table scan
2. Automated Statistics 
Management
      a)      Teradata Database can manage 
       statistics collection for you, 
       simplifying database administration 
       by ensuring that the needed statistics 
       are collected at the right time.

3. Dynamic partition elimination 
   (DPE)
      a)      Enhanced Performance of 
          Unspooled PPI Merge Joins
3.       Active Fallback, Phase II
      a)      Phase II of this feature repairs
 the primary copy of a data block
from fallback when an
unrecoverableb it error occurs.
Phase I allowed the data block to be 
read from the fallback copy.

4. Group By and Distinct 
  Performance Equivalence
      a)   While using  DISTINCT, the 
      query optimization process rewrite
       it to GROUP BY at the background. 
       So both are same now
4.       Expansion by Business Days
      a)      Adds support for the following
 new business anchors: 
WEEK_BEGIN, WEEK_END,
QUARTER_BEGIN,
QUARTER_END, 
YEAR_BEGIN and YEAR_END.
5. Handling Redundant DISTINCT Detection and Removal
      a)      Internal enhancement for Distinct
5.       Encryption Enhancements
      a)      The Blowfish encryption
algorithm.
6. Implement Smart Local 
   Aggregation Decisions
      a)  This feature improves the performance 
     of many aggregate functions, such as 
     SUM, COUNT, MIN, and MAX.
7.       Hash Join Enhancements
      a)      Enhanced performance of
outer joins, inclusion and
exclusion semi joins, joins
with partition elimination,
and joins with cross terms.
7.    Increased Join/Subquery Limit
      a)      Enables More Complex Queries 
      With Larger Numbers of Joins 
      i.e. from 64 to 128.
7. Increased Maximum Number
of Vprocs
      a)      The maximum number of
 virtual processors (vprocs)
supported per Teradata Database
system has been increased from
16,384 to 30,720.
8.   Inner and Outer Join Elimination Enhancements
     a)      Enables several new inner and 
     outer join elimination performance 
     enhancements

8.  Increased Partition Limits
      a)      This feature increases
limits related to partitioned
 primary indexes (PPIs) and
their partitioning expressions.
The new limits for partitioning
expressions also apply to
column-partitioned NoPI tables 
and column-partitioned NoPI
join indexes.
9.       JI/AJI Enhancements
     a)      Produce better join plans by 
     using join indexes with Partial Group 
     By optimizations
9.       Indexes on UDT Columns
      a)      You can now create primary
and secondary indexes on most
 types of user-defined type
(UDT) columns.
10.  Non-Key Access Paths 
      Enhancements
      a)      Better use of access paths to 
        base tables and join indexes 
        improves query performance
10.   FastLoad Support for
Temporal Tables
       a)      FastLoad can now inserts
into temporal tables.
11.   Top N Enhancements
      a)      This feature extends the functionality 
     of the TOP n operator and incorporates 
     several new processing optimizations
     for both TOP n and “any n” requests.
11.   Multiple WITH/WITH 
  RECURSIVE Clauses
       a)      The number of WITH or WITH
RECURSIVE request modifiers
that can be specified with a DML
request increases from one
to any number.
12.   Implicit DateTime
      a)      Default DateTime values for the CREATE/ALTER TABLE statement
12.   NUMBER Data Type
      a)      Increased compatibility with
other databases, which include
 a similar NUMBER data type.

13.   New SHOW Access Right
       a)      The SHOW access right allows a 
       grantee to access a table definition 
      without accessing any of its data.
13.   Row-Level Security
       a)      Teradata row-level security  
       (RLC) allows you to restrict data
access by row.
Hierarchical category:
Security Classification
i)        Labels: Top Secret = 4,
Secret = 3, Classified = 2,
Unclassified = 1
Non-Hierarchical category: Country
ii)      Labels: USA = 4, UK = 3,
Germany = 2, France = 1
14.Simplified Query Capture 
     Database DBQL XML 
    Query Plan Logging
      a)   Capturing Query Capture 
       Database (QCD)-like information 
       is now less complex, enabling 
      better performance of logging query
      and workload information.
14.   Secure Password Storage
and Retrieva
a)  Passwords and other logon
string data in coded file locations
that the Teradata Wallet application
manages tdpid/username,$tdwallet
(password_alias)
or
tdpid/username,$tdwallet
15.    No Primary Index Tables
       a)      The syntax for the CREATE 
      TABLE statement has been changed 
      to permit user data tables to be created
      without a primary index. Such tables
      are referred to as NoPI
      (No Primary Index) tables.
      i)        Enhanced performance for 
      FastLoad bulk data loads into
     staging tables.
      ii)      Enhanced performance for TPump 
       Array INSERT minibatch loads into 
      staging tables

15.  SQL ARRAY/VARRAY 
      Data Type
a) ARRAY, a user-defined
 multidimensional data type 
with  up to 5 dimensions and
a user-defined maximum number
of elements, all of the same
specific data type.
b)VARRAY, an Oracle-
compatible form  of the
ARRAY data type. Unlike the
Oracle VARRAY data type, 
the Teradata VARRAY type can
be defined in multiple dimensions.
16.   Period Data Types
      a)      a Period data type value, indicates 
      when a particular event starts and ends.
16.   Temperature-Based Block-
       Level Compression
a)Teradata Database now provides
temperature-based block-level
compression (TBBLC), which
automatically compresses cold
 (infrequently accessed) data
and automatically decompresses
data when it becomes warm
(more frequently accessed).

17.   Transfer Statistics
       a)      CREATE TABLE AS statement 
       that uses the WITH DATA AND STATISTICS option.
17.   Teradata Columnar
a) Permits efficient access to
selected data, which reduces
query I/O. Adds flexibility in
defining a partitioned table or
join index.Such flexibility provides opportunities to improve
workload performance.
Enables the optimizer to exclude
unneeded column partitions,
significantly enhancing
query performance.
18.   New Teradata Reserved Words
      a.      CONNECTAR CTCONTROL                      CURRENT_ROLEAR
           CURRENT_USERAR
      b.      EXPAND EXPANDING
      c.     GETAR
     d.      RESIGNALAR
     e.      SIGNALAR
      f.      UNTIL_CHANGED
      g.    VARIANT_TYPE
      h.     XMLPLAN
18.New Teradata Reserved 
     Words
       a.       NUMBER
       b.      TD_ANYTYPE
19. New Teradata Nonreserved 
 Words
      a.  APPLNAME
      b. CLASS_ORIGINAN COMMAND_FUNCTIONAN COMMAND_FUNCTION_
      CODEAN
      c. CONDITIONAR CONDITION_IDENTIFIERAN CONDITION_NUMBERAN 
      CREATOR
     d.      DIAGNOSTICSAN DOWN              
     e.      EXCEPTION
      f.        GLOP
      g.       LDIFF+
      h.      MEETS+ MEMBERAR MESSAGE_LENGTHAN MESSAGE_TEXTAN MOREAN
      i.  NODDLTEXT NUMBERAN
      j.  OLD_NEW_TABLE OWNER
      k.  P_INTERSECT+ P_NORMALIZE+ PERIOD PRECEDES+ PRIORAN 
     l.  RDIFF+ RESET RETURNED_SQLSTATEAN ROW_COUNTAN RULES 
     RULESET
     m. SUBCLASS_ORIGINAN  
        SUCCEEDS+
     n.   THROUGH TRANSACTION_
    ACTIVEAN
     o.      XML
19.   New Teradata Nonreserved 
       Words
       a.       ANCHOR_HOUR
       b.      ANCHOR_MILLISECOND
       c.       ANCHOR_MINUTE
       d.      ANCHOR_SECOND
       e.      ARRAY
       f.        AUTO
       g.       AUTOTEMP
       h.      BLOCKCOMPRESSION
       i.        CALENDAR
       j.        EXPORTWIDTH
       k.       IPARTITION
       l.        MANUAL
       m.    MAXINTERVALS
       n.      MAXVALUELENGTH
      o.      NEVER
      p.      ORDINALITY
     q.      PARTITION#L16
      r.        QUARTER_BEGIN
      s.       QUARTER_END
       t.        RANGE#L16
       u.      ROWIDGEN
       v.       ROWIDGEN2
       w.     STATSUSAGE
       x.       VARRAY
       y.       WEEK_BEGIN
       z.       WEEK_END
       aa.   YEAR_BEGIN
       bb.  YEAR_END
20.   Obsolete Teradata Tools and 
     Utilities Products
Obsolete 
Software 
Product
Replacement Product
Teradata 
Multitool
Teradata 
Database 
command line utilities
PMON
Teradata 
Viewpoint – 
Various system management 
portlets
Teradata 
Dynamic 
Workload 
Manager
Teradata 
Viewpoint – Workload 
Designer portlet
Teradata 
Manager
Teradata 
Viewpoint – 
Various system management 
portlets
Teradata SQL Assistant Web Edition
Teradata SQL Assistant
20.   Equality Joins on Skewed 
        Tables
Teradata Database uses a new
hybrid join method for joins with
equality join conditions where one
or both sources are skewed.
The hybrid join method is called
Partial Redistribution and Partial
Duplication (PRPD).
Teradata Database splits the
sources of the join into separate
spools, performs several regular
 joins between those separate spools
and combines the join results. The
number of split joins is either 2
(if only one source is skewed) or
3 (if both sources are skewed on
different values).
PRPD can be used even if the
join columns on both tables have
more than one skewed value.
The Optimizer chooses the PRPD
method only if it is less costly
than other join methods.

21.    
21.   Algorithmic and Multi-Value Compression Enhancements

Multi-Value Compression 
(MVC)
now compresses table columns 
defined  with the following
data types:
      a.       TIME and TIME WITH 
      TIME ZONE
      b.      TIMESTAMP and TIMESTAMP
WITH TIME ZONE
Algorithmic Compression 
(ALC)
now allows you to define 
your own algorithms to compress
 and decompress columns with the following data types:
      a.       TIME and TIME WITH 
      TIME ZONE
      b.      TIMESTAMP and TIMESTAMP
WITH TIME ZONE
      c.       BLOB/CLOB
      d.      Distinct LOB-type UDTs,
with some restrictions
22.    
22.   Block Level Compression Enhancements
       a)      Finer control over the types of
tables that are compressed.
       b)      Single-command compression of
all qualifying tables in a database.
Improved performance from 
compressing only the fallback
data subtables.
23.    
23.   Derived Periods
You can now define a derived 
period column using two
DateTime columns.

24.    
24.   1 MB Data Blocks
      /1 MB Spool Rows
Teradata Database creates and
stores data blocks up to 1 MB 
for most systems. The prior
maximum data block size was
128 KB. 

The maximum size of an internal
spool row is 1 MB. The prior 
maximum size was 64 KB.

25.    
25.   Importing from an 
      Oracle Database
Provides dynamic access to
Oracle data Allows customers
to continue linking to related
Oracle databases for data after
migrating their data warehouse
to Teradata
26.    
26.   New Ordered Analytical and Aggregate Functions
CUME_DIST calculates the 
cumulative distribution of a
value in a group of values.
DENSE_RANK returns the 
rank of a  row in an ordered
group of rows according to the
value of one or more columns,
assigning the next rank value
to the next unique value in the
 face of ties. Rank values are
not skipped in the event of ties.
FIRST_VALUE returns the first 
row from a partition.
LAST_VALUE returns the last 
value from a partition.

Teradata Database 14.10 
supports the following new 
aggregate functions:
PERCENTILE_CONT takes
a percentile value and a sort
specification and returns an
interpolated value that would fall
into that percentile value
(an inverse distribution function).
PERCENTILE_DISC takes a
percentile value and a sort
specification and returns an
element from the set
(an inverse distribution function).
MEDIAN is a specific case of
 PERCENTILE_CONT where 
the percentile value is 0.5.
27.    
27.   Sort Ordering of NULL 
       Values
you can specify whether NULL 
values sort first or last in:
You have more control and 
flexibility with respect to sort
ordering of  NULL values.
28.    
28.   Update-Delete-Insert 
       Counts
Tracks the number of insert, 
delete, and update operations
performed on tables in Teradata
Database to better enable the
Optimizer to estimate cardinalities.
29.    
29.   Teradata Intelligent Memory
The permanent table data used
most often now remains in-memory
for faster access. The data stays in
VERYHOT cache until other data
 is used more often and replaces
it in the cache.
Teradata Virtual Storage (TVS)
automatically determines the hottest
data.

Blog is open for your comments and suggestions!!

Tuesday 11 November 2014

REGEXP_REPLACE VS OTRANSLATE functions in Teradata

Like Oracle, Teradata has implemented regular expressions from Teradata 14. Since these expressions are much better than traditional functions. We can try to replace them, say OTRANSLATE can be replaced with REGEXP_REPLACE.

Below example is for your understanding.

n  Table
 CREATE   VOLATILE  TABLE Employee_Target, FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL
(
employee_no INTEGER,
dept_no NUMBER,
first_name VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC,
last_name CHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC,
salary DECIMAL(10,2))
UNIQUE PRIMARY INDEX(employee_no)
            ON       COMMIT PRESERVE ROWS;

n  Data
INSERT INTO Employee_Target
VALUES           (1, 999, 'S22pot  ', 'Paul', NULL);
INSERT INTO Employee_Target
VALUES           (2, 999 ,'Arfy123' ,'Paul', NULL);
INSERT INTO Employee_Target
VALUES           (3, 50, 'M!ke', 'Larkins', 245098.00);
INSERT INTO Employee_Target
VALUES           (4, '40', 'S@ra','Wilson', 97450.75);
INSERT INTO Employee_Target
VALUES           (5, '40', 'Marsha', 'Lewis', 98453.88);

OTRANSLATE EXAMPLE

  SELECT
             employee_no
             ,CASE WHEN TRANSLATE_CHK(first_name USING LATIN_TO_UNICODE) <> 0 THEN 'UNK'
             WHEN OTRANSLATE( first_name,'' || OTRANSLATE( first_name,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' ,''), '')  = '' THEN 'UNK'
              WHEN OTRANSLATE( first_name,'' || OTRANSLATE( first_name,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' ,''), '') IS NOT NULL
              THEN OTRANSLATE( first_name,'' || OTRANSLATE( first_name,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' ,''), '')
               ELSE '' END AS first_name
  FROM Employee_Target ORDER BY 1;

REGEXP_REPLACE EXAMPLE

     SELECT
             employee_no
             ,CASE WHEN TRANSLATE_CHK(first_name USING LATIN_TO_UNICODE) <> 0 THEN 'UNK'
             WHEN REGEXP_REPLACE(first_name,'[^a-zA-Z0-9]+','',1,0,'i') ='' THEN 'UNK'
             WHEN REGEXP_REPLACE(first_name,'[^a-zA-Z0-9]+','',1,0,'i') IS NOT NULL
             THEN REGEXP_REPLACE(first_name,'[^a-zA-Z0-9]+','',1,0,'i')
              ELSE '' END AS first_name
   FROM Employee_Target ORDER BY 1;

Note:
I have added TRANSLATE_CHK function to avoid “The string contains an untranslatable character” error.

Saturday 3 May 2014

OREPLACE VS OTRANSLATE function in Teradata


OTRANSLATE
·         Returns source_string with every occurrence of each character in from_string replaced with the corresponding character in to_string.
·         OTRANSLATE is an embedded services system function. It can be activated by invoking embedded services functions
·         Expressions passed to this function must have one of the following data types: CHAR or VARCHAR

SELECT OTRANSLATE(' Current Year 2013 ',  '3', '4');

The occurrence in source_string of the character in from_string ('3') is replaced by the character in to_string ('4').

Otranslate also answers below queries:

ü  How to strip Special Characters in Teradata
ü  How strip only characters data from VARCHAR field in Teradata
ü  How to extract numbers from strings in Teradata

Let me walkthrough with some examples by having some data in volatile table

CREATE           VOLATILE  TABLE Employee_Target, FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL
(
employee_no INTEGER,
dept_no NUMBER,
first_name VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC,
last_name CHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC,
salary DECIMAL(10,2))
UNIQUE PRIMARY INDEX(employee_no)
            ON       COMMIT PRESERVE ROWS;


INSERT INTO Employee_Target
VALUES           (1, 999, 'S22pot  ', 'Paul', NULL);
INSERT INTO Employee_Target
VALUES           (2, 999 ,'Arfy123' ,'Paul', NULL);
INSERT INTO Employee_Target
VALUES           (3, 50, 'M!ke', 'Larkins', 245098.00);
INSERT INTO Employee_Target
VALUES           (4, '40', 'S@ra','Wilson', 97450.75);
INSERT INTO Employee_Target
VALUES           (5, '40', 'Marsha', 'Lewis', 98453.88);


Example 1:
For stripping all special characters from the column first_name

SEL
employee_no
,first_name
 ,OTRANSLATE(first_name,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' ,'') AS O_first_name
 ,OTRANSLATE(first_name,'' || OTRANSLATE(first_name,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' ,''), '') AS OO_first_name
 FROM    Employee_Target;                         


Example 2:
For stripping only characters by removing special characters and numeric values in first_name

SEL
employee_no
,first_name
 ,OTRANSLATE(first_name,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' ,'') AS O_first_name
 ,OTRANSLATE(first_name,'' || OTRANSLATE(first_name,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' ,''), '') AS OO_first_name
 FROM    Employee_Target;                             

Example 3:
For stripping only numeric by removing special characters and characters values in first_name

SEL
employee_no
,first_name
,OTRANSLATE(first_name,'0123456789' ,'') AS O_first_name
,OTRANSLATE(first_name,'' || OTRANSLATE(first_name,'0123456789' ,''), '') AS OO_first_name
 FROM    Employee_Target; 


OREPLACE
·         Replaces every occurrence of search_string in the source_string with the replace_string. You use this function either to replace or remove portions of a string.
·         OTRANSLATE is an embedded services system function. It can be activated by invoking embedded services functions
·         Expressions passed to this function must have one of the following data types: CHAR, VARCHAR, or CLOB
·         OREPLACE provides a superset of the functionality provided by the TRANSLATE function. TRANSLATE provides single character, 1-to-1 substitution while OREPLACE allows you to substitute 1 string for another, as well as to remove character strings.

SELECT OREPLACE('Current Year 2013 ', '2013', '2014');
The string '2013' in the source string was replaced by the string '2014'.


Some more examples

SELECT OREPLACE('This water is a pure water', 'water', 'juice');

Output is This juice is a pure juice, which means multiple words can also be replaced

Now let us see how Null values can be handled via OREPLACE


SELECT OREPLACE('Can I replace this word pass', 'pass', NULL);

SELECT OREPLACE('Can I replace this word pass', 'pass', '');


Above two queries would return as Can I replace this word, Which means you cannot substitute a NULL


Can we Replace Null like a Coalesce Function?

SELECT OREPLACE(NULL, NULL, 'UNKNOWN');

The Answer would be No, the result of above query would be Null only.

Sunday 20 October 2013

Teradata 13 VS Teradata 14

Teradata 14 is fully loaded with fantabulous features and enhancements, particularly I like the way they have compete directly with his peer by bringing the features like DENSE_RANK function, NUMBER data type etc. and other features like AUTOMATED STATISTICS MANAGEMENT,ROW-LEVEL SECUIRITY,EQUITY JOIN FOR SKEW TABLES and much more.  
Handpicked the features of Teradata 13/13.10 and Teradata 14/14.10 on developer perspective from release summaries. Hope this serves us a quick reference. 


Teradata 13
Teradata 14
1.    Collect Statistics Optimization
      a)      Improved Sampled Statistics
      i)        improves sampled statistics 
       for the number of unique values for partitioning columns
     b) Restrictions Removed on Collecting 
      Statistics
     i)        Multicolumn statistics on join and
       hash indexes
     ii)      System-derived PARTITION 
     statistics on partitioned join indexes
     iii)    Single-column, multicolumn and PARTITION statistics on volatile tables
     iv)    Statistics on unhashed tables
     v)      Sampled statistics on the partitioning columns of PPI

1. Collect Statistics Enhancements
     a)      Adding the SUMMARY option
to collect table-level statistics
     b)      Adding a number of internal
enhancements to the Optimizer
with respect to histogram structure
and use, including:
     c)       Maintaining statistics history.
     d)      Enhancing extrapolation methods
for stale statistics.
     e)      Enhancing sampling options.
     f)       Storing statistics data in their
native Teradata data types
without losing precision.

2.       Count(*) Optimization
      a)      The count function now reads 
          the cylinder index rather than 
           performing a full table scan
2. Automated Statistics 
Management
      a)      Teradata Database can manage 
       statistics collection for you, 
       simplifying database administration 
       by ensuring that the needed statistics 
       are collected at the right time.

3. Dynamic partition elimination 
   (DPE)
      a)      Enhanced Performance of 
          Unspooled PPI Merge Joins
3.       Active Fallback, Phase II
      a)      Phase II of this feature repairs
 the primary copy of a data block
from fallback when an
unrecoverableb it error occurs.
Phase I allowed the data block to be 
read from the fallback copy.

4. Group By and Distinct 
  Performance Equivalence
      a)   While using  DISTINCT, the 
      query optimization process rewrite
       it to GROUP BY at the background. 
       So both are same now
4.       Expansion by Business Days
      a)      Adds support for the following
 new business anchors: 
WEEK_BEGIN, WEEK_END,
QUARTER_BEGIN,
QUARTER_END, 
YEAR_BEGIN and YEAR_END.
5. Handling Redundant DISTINCT Detection and Removal
      a)      Internal enhancement for Distinct
5.       Encryption Enhancements
      a)      The Blowfish encryption
algorithm.
6. Implement Smart Local 
   Aggregation Decisions
      a)  This feature improves the performance 
     of many aggregate functions, such as 
     SUM, COUNT, MIN, and MAX.
7.       Hash Join Enhancements
      a)      Enhanced performance of
outer joins, inclusion and
exclusion semi joins, joins
with partition elimination,
and joins with cross terms.
7.    Increased Join/Subquery Limit
      a)      Enables More Complex Queries 
      With Larger Numbers of Joins 
      i.e. from 64 to 128.
7. Increased Maximum Number
of Vprocs
      a)      The maximum number of
 virtual processors (vprocs)
supported per Teradata Database
system has been increased from
16,384 to 30,720.
8.   Inner and Outer Join Elimination Enhancements
     a)      Enables several new inner and 
     outer join elimination performance 
     enhancements

8.  Increased Partition Limits
      a)      This feature increases
limits related to partitioned
 primary indexes (PPIs) and
their partitioning expressions.
The new limits for partitioning
expressions also apply to
column-partitioned NoPI tables 
and column-partitioned NoPI
join indexes.
9.       JI/AJI Enhancements
     a)      Produce better join plans by 
     using join indexes with Partial Group 
     By optimizations
9.       Indexes on UDT Columns
      a)      You can now create primary
and secondary indexes on most
 types of user-defined type
(UDT) columns.
10.  Non-Key Access Paths 
      Enhancements
      a)      Better use of access paths to 
        base tables and join indexes 
        improves query performance
10.   FastLoad Support for
Temporal Tables
       a)      FastLoad can now inserts
into temporal tables.
11.   Top N Enhancements
      a)      This feature extends the functionality 
     of the TOP n operator and incorporates 
     several new processing optimizations
     for both TOP n and “any n” requests.
11.   Multiple WITH/WITH 
  RECURSIVE Clauses
       a)      The number of WITH or WITH
RECURSIVE request modifiers
that can be specified with a DML
request increases from one
to any number.
12.   Implicit DateTime
      a)      Default DateTime values for the CREATE/ALTER TABLE statement
12.   NUMBER Data Type
      a)      Increased compatibility with
other databases, which include
 a similar NUMBER data type.

13.   New SHOW Access Right
       a)      The SHOW access right allows a 
       grantee to access a table definition 
      without accessing any of its data.
13.   Row-Level Security
       a)      Teradata row-level security  
       (RLC) allows you to restrict data
access by row.
Hierarchical category:
Security Classification
i)        Labels: Top Secret = 4,
Secret = 3, Classified = 2,
Unclassified = 1
Non-Hierarchical category: Country
ii)      Labels: USA = 4, UK = 3,
Germany = 2, France = 1
14.Simplified Query Capture 
     Database DBQL XML 
    Query Plan Logging
      a)   Capturing Query Capture 
       Database (QCD)-like information 
       is now less complex, enabling 
      better performance of logging query
      and workload information.
14.   Secure Password Storage
and Retrieva
a)  Passwords and other logon
string data in coded file locations
that the Teradata Wallet application
manages tdpid/username,$tdwallet
(password_alias)
or
tdpid/username,$tdwallet
15.    No Primary Index Tables
       a)      The syntax for the CREATE 
      TABLE statement has been changed 
      to permit user data tables to be created
      without a primary index. Such tables
      are referred to as NoPI
      (No Primary Index) tables.
      i)        Enhanced performance for 
      FastLoad bulk data loads into
     staging tables.
      ii)      Enhanced performance for TPump 
       Array INSERT minibatch loads into 
      staging tables

15.  SQL ARRAY/VARRAY 
      Data Type
a) ARRAY, a user-defined
 multidimensional data type 
with  up to 5 dimensions and
a user-defined maximum number
of elements, all of the same
specific data type.
b)VARRAY, an Oracle-
compatible form  of the
ARRAY data type. Unlike the
Oracle VARRAY data type, 
the Teradata VARRAY type can
be defined in multiple dimensions.
16.   Period Data Types
      a)      a Period data type value, indicates 
      when a particular event starts and ends.
16.   Temperature-Based Block-
       Level Compression
a)Teradata Database now provides
temperature-based block-level
compression (TBBLC), which
automatically compresses cold
 (infrequently accessed) data
and automatically decompresses
data when it becomes warm
(more frequently accessed).

17.   Transfer Statistics
       a)      CREATE TABLE AS statement 
       that uses the WITH DATA AND STATISTICS option.
17.   Teradata Columnar
a) Permits efficient access to
selected data, which reduces
query I/O. Adds flexibility in
defining a partitioned table or
join index.Such flexibility provides opportunities to improve
workload performance.
Enables the optimizer to exclude
unneeded column partitions,
significantly enhancing
query performance.
18.   New Teradata Reserved Words
      a.      CONNECTAR CTCONTROL                      CURRENT_ROLEAR
           CURRENT_USERAR
      b.      EXPAND EXPANDING
      c.     GETAR
     d.      RESIGNALAR
     e.      SIGNALAR
      f.      UNTIL_CHANGED
      g.    VARIANT_TYPE
      h.     XMLPLAN
18.New Teradata Reserved 
     Words
       a.       NUMBER
       b.      TD_ANYTYPE
19. New Teradata Nonreserved 
 Words
      a.  APPLNAME
      b. CLASS_ORIGINAN COMMAND_FUNCTIONAN COMMAND_FUNCTION_
      CODEAN
      c. CONDITIONAR CONDITION_IDENTIFIERAN CONDITION_NUMBERAN 
      CREATOR
     d.      DIAGNOSTICSAN DOWN              
     e.      EXCEPTION
      f.        GLOP
      g.       LDIFF+
      h.      MEETS+ MEMBERAR MESSAGE_LENGTHAN MESSAGE_TEXTAN MOREAN
      i.  NODDLTEXT NUMBERAN
      j.  OLD_NEW_TABLE OWNER
      k.  P_INTERSECT+ P_NORMALIZE+ PERIOD PRECEDES+ PRIORAN 
     l.  RDIFF+ RESET RETURNED_SQLSTATEAN ROW_COUNTAN RULES 
     RULESET
     m. SUBCLASS_ORIGINAN  
        SUCCEEDS+
     n.   THROUGH TRANSACTION_
    ACTIVEAN
     o.      XML
19.   New Teradata Nonreserved 
       Words
       a.       ANCHOR_HOUR
       b.      ANCHOR_MILLISECOND
       c.       ANCHOR_MINUTE
       d.      ANCHOR_SECOND
       e.      ARRAY
       f.        AUTO
       g.       AUTOTEMP
       h.      BLOCKCOMPRESSION
       i.        CALENDAR
       j.        EXPORTWIDTH
       k.       IPARTITION
       l.        MANUAL
       m.    MAXINTERVALS
       n.      MAXVALUELENGTH
      o.      NEVER
      p.      ORDINALITY
     q.      PARTITION#L16
      r.        QUARTER_BEGIN
      s.       QUARTER_END
       t.        RANGE#L16
       u.      ROWIDGEN
       v.       ROWIDGEN2
       w.     STATSUSAGE
       x.       VARRAY
       y.       WEEK_BEGIN
       z.       WEEK_END
       aa.   YEAR_BEGIN
       bb.  YEAR_END
20.   Obsolete Teradata Tools and 
     Utilities Products
Obsolete 
Software 
Product
Replacement Product
Teradata 
Multitool
Teradata 
Database 
command line utilities
PMON
Teradata 
Viewpoint – 
Various system management 
portlets
Teradata 
Dynamic 
Workload 
Manager
Teradata 
Viewpoint – Workload 
Designer portlet
Teradata 
Manager
Teradata 
Viewpoint – 
Various system management 
portlets
Teradata SQL Assistant Web Edition
Teradata SQL Assistant
20.   Equality Joins on Skewed 
        Tables
Teradata Database uses a new
hybrid join method for joins with
equality join conditions where one
or both sources are skewed.
The hybrid join method is called
Partial Redistribution and Partial
Duplication (PRPD).
Teradata Database splits the
sources of the join into separate
spools, performs several regular
 joins between those separate spools
and combines the join results. The
number of split joins is either 2
(if only one source is skewed) or
3 (if both sources are skewed on
different values).
PRPD can be used even if the
join columns on both tables have
more than one skewed value.
The Optimizer chooses the PRPD
method only if it is less costly
than other join methods.

21.    
21.   Algorithmic and Multi-Value Compression Enhancements

Multi-Value Compression 
(MVC)
now compresses table columns 
defined  with the following
data types:
      a.       TIME and TIME WITH 
      TIME ZONE
      b.      TIMESTAMP and TIMESTAMP
WITH TIME ZONE
Algorithmic Compression 
(ALC)
now allows you to define 
your own algorithms to compress
 and decompress columns with the following data types:
      a.       TIME and TIME WITH 
      TIME ZONE
      b.      TIMESTAMP and TIMESTAMP
WITH TIME ZONE
      c.       BLOB/CLOB
      d.      Distinct LOB-type UDTs,
with some restrictions
22.    
22.   Block Level Compression Enhancements
       a)      Finer control over the types of
tables that are compressed.
       b)      Single-command compression of
all qualifying tables in a database.
Improved performance from 
compressing only the fallback
data subtables.
23.    
23.   Derived Periods
You can now define a derived 
period column using two
DateTime columns.

24.    
24.   1 MB Data Blocks
      /1 MB Spool Rows
Teradata Database creates and
stores data blocks up to 1 MB 
for most systems. The prior
maximum data block size was
128 KB. 

The maximum size of an internal
spool row is 1 MB. The prior 
maximum size was 64 KB.

25.    
25.   Importing from an 
      Oracle Database
Provides dynamic access to
Oracle data Allows customers
to continue linking to related
Oracle databases for data after
migrating their data warehouse
to Teradata
26.    
26.   New Ordered Analytical and Aggregate Functions
CUME_DIST calculates the 
cumulative distribution of a
value in a group of values.
DENSE_RANK returns the 
rank of a  row in an ordered
group of rows according to the
value of one or more columns,
assigning the next rank value
to the next unique value in the
 face of ties. Rank values are
not skipped in the event of ties.
FIRST_VALUE returns the first 
row from a partition.
LAST_VALUE returns the last 
value from a partition.

Teradata Database 14.10 
supports the following new 
aggregate functions:
PERCENTILE_CONT takes
a percentile value and a sort
specification and returns an
interpolated value that would fall
into that percentile value
(an inverse distribution function).
PERCENTILE_DISC takes a
percentile value and a sort
specification and returns an
element from the set
(an inverse distribution function).
MEDIAN is a specific case of
 PERCENTILE_CONT where 
the percentile value is 0.5.
27.    
27.   Sort Ordering of NULL 
       Values
you can specify whether NULL 
values sort first or last in:
You have more control and 
flexibility with respect to sort
ordering of  NULL values.
28.    
28.   Update-Delete-Insert 
       Counts
Tracks the number of insert, 
delete, and update operations
performed on tables in Teradata
Database to better enable the
Optimizer to estimate cardinalities.
29.    
29.   Teradata Intelligent Memory
The permanent table data used
most often now remains in-memory
for faster access. The data stays in
VERYHOT cache until other data
 is used more often and replaces
it in the cache.
Teradata Virtual Storage (TVS)
automatically determines the hottest
data.

Blog is open for your comments and suggestions!!