Tuesday, May 12, 2015

UUID and its implementation

An UUID is a Universally Unique ID meaning that it will never appear anywhere else in the world. For the requirement  of  the IDs to be universally unique, UUIDs may be our only choice. We only need this if we're sharing our data publicly.

While we use UUIDs, we  store them as a number and not as a string. We don't mean inbinary format. We mean as a 128 bit number, rather than a 288 bit string. For instance, the word 'hello' in ASCII is 68 65 6C 6C 6F, which is the number 448,378,203,247. Storing the string '68656C6C6F' requires 10 bytes. The number 448,378,203,247 requires only 5. All in all, unless you really need the first U in UUID, you can't do much better than auto_increment If you have millions of records, then the saving in storage space will improve the performance . If our IDs do not need to be universally unique, then we can use auto_increment, which guarantees that IDs will be unique within a table (since the value will increment each time).

in Java to generate UUID we can follow the below snippet:

import java.util.UUID;

public class GenerateUUID {
  
  public static final void main(String... aArgs){
    //generate random UUIDs
    UUID idOne = UUID.randomUUID();
    UUID idTwo = UUID.randomUUID();
    log("UUID One: " + idOne);
    log("UUID Two: " + idTwo);
  }
  
  private static void log(Object aObject){
    System.out.println( String.valueOf(aObject) );
  }



Example run:
>java -cp . GenerateUUID
UUID One: 067e6162-3b6f-4ae2-a171-2470b63dff00 
UUID Two: 54947df8-0e9e-4471-a2f9-9af509fb5889


Since UUID is 128 bits and is written as hexadecimal, it's very easy to speed up and store the UUID.For that first, we need to use our programming language for removing the dashes

From 067e6162-3b6f-4ae2-a171-2470b63dff00 to 067e61623b6f4ae2a1712470b63dff00.
Now it's 32 chars (like an MD5 hash, which this also works with).
Since the binary format requires 1 bit to store what hexadecimal does in 4 bits, we will create a BINARY(16) field.

In mySql we can insert using:
INSERT INTO Table (FieldBin) VALUES (UNHEX("067e61623b6f4ae2a1712470b63dff00"))
and query using:
SELECT HEX(FieldBin) AS FieldBin FROM Table

Now in our programming language, we need to re-insert the dashes at the positions 9, 14, 19 and 24 to match your original UUID. If the positions are always different you could store that info in a second field.
Full example :

CREATE TABLE  `test_table` (
    `field_binary` BINARY( 16 ) NULL ,
    PRIMARY KEY (  `field_binary` )
) ENGINE = INNODB ;

INSERT INTO  `test_table` (
    `field_binary`
)
VALUES (
    UNHEX(  '067e61623b6f4ae2a1712470b63dff00' )
);

SELECT HEX(field_binary) AS field_binary FROM `test_table`

If you want to use this technique with any hex string, always do length / 2 for the field length. So for a sha512, the field would be BINARY (64) since a sha512 encoding is 128 characters long.

No comments: