The data-hashing function

Before masking data was supported, the built-in hash function has been an alternative since Hive v1.3.0. A hash function reads an input string and produces a fixed-size alphanumeric output string. Since the output is generally uniquely (very little chance of collision) mapping to the input string, the hashed value is quite often used to secure columns, which are the unique identifiers for joining or comparing data. Built-in function, such as md5(...), sha1(...), and sha2(...), can be used for data hashing in HQL:

> SELECT > name, > md5(name) as md5_name, -- 128 bit> sha1(name) as sha1_name, -- 160 bit> sha2(name, 256) as sha2_name -- 256 bit> FROM employee;+---------+----------------------------------+| name | md5_name ...

Get Apache Hive Essentials now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.