# Base Convert

Base conversion is a very basic skill in programming.

I was taught converting between **decimal**(*base-10*) and **binary**(*base-2*), **decimal**(*base-10*) and **octal**(*base-8*), **decimal**(*base-10*) and **hexadecimal**(*base-16*), and of course among base 2-powered numbers(*pow(2,n)*).

The conversion is quite simple, **division** is the only arithmetic operation we need.

If we want to convert a *base-m* number to a *base-n* number by our hands, the common way is:

1 Convert *base-m* number **A** to *base-10* number **B**;

2 Convert *base-10* number **B** to *base-n* number **C**.

just because the base-10 is the most familiar base to humans.

If you’ve learnt HBase, which is a database system based on Apache Hadoop, you must know the design of RowKey is more than art. You may notice:

6.3.2.3. Rowkey Length

Keep them as short as is reasonable such that they can still be useful for required data access (e.g., Get vs. Scan). A short key that is useless for data access is not better than a longer key with better get/scan properties. Expect tradeoffs when designing rowkeys.

For example, if I have to use uid, which is a unsigned 32-bit int, in my rowkey. that means the value in string may be at least 1-byte long to at most 10-byte [0, 4294967295]. So the way formatting uid if I need to sort uid numerically:

1 |
sprintf("%010d", uid) |

The *base-10* **4294967295** is **1z141z3** in *base-36*, so we can try to *compress* the key by a base-10 to base-36 conversion, then I just need to format uid like:

1 |
sprintf("%07s", uid) |

To understand base conversion, we just need to treat the data as a **number**, left most as **MSB**, right most as **LSB**.

# Base Encode

Base64 is a very common algorithm encoding data to all human readable characters.

VMware uses a Base32 similar encoding(reverse bit order) in it’s license algorithm.

And Base16 is another name of **Hexadecimal**.

These three can be found in RFC 4648.

Read links above, in the standard **Base16**/**Base32**/**Base64**, string like ‘*Man*‘ is ‘*01001101 01100001 01101110*‘ in binary, each byte has its MSB at the first pos:

to **Base64**, binary is grouped to ‘*010011 010110 000101 101110*‘;

to **Base32**, binary is grouped to ‘*01001 10101 10000 10110 1110*‘, length of the last group is 4 which is less than 5, padding to ‘*11100*‘ at the end;

to **Base16**, binary is grouped to ‘*0100 1101 0110 0001 0110 1110*‘;

for each grouped binary, use it as a index of the predefined dictionary, then build the output string.

RFC 4648 provides a standard char table and a ‘URL and Filename Safe’ char table for Base64 and also two char tables for Base32.

But as I wrote above, VMware use a similar but different Base32 which, also take ‘Man’ as example, works like:

‘*Man*‘: ‘*01001101 01100001 01101110*‘;

group to ‘01101 01010 11000 11100 0110’, then padding a 0 before all 0/1 to the last group makes it like ‘01101 01010 11000 11100 00110’.

The char table used for base32 encoding is ‘0123456789ACDEFGHJKLMNPQRTUVWXYZ’, which should be familiar to captcha developers.

I’ve created a repo on github https://github.com/sskaje/BaseEncode, which can support Base2, Base4, Base8, Base16, Base32, Base64 and Base128 of both two orders, you can also set your own char table and padding rule/char.