Base Convert and Base Encode

Base Convert

Base conversion is a very basic skill in programming.
I was taught converting between decimal(base-10) and binary(base-2), decimal(base-10) and octal(base-8), decimal(base-10) and hexadecimal(base-16), and of course among base 2-powered numbers(pow(2,n)).

The conversion is quite simple, division is the only arithmetic operation we need.
If we want to convert a base-m number to a base-n number by our hands, the common way is:
1 Convert base-m number A to base-10 number B;
2 Convert base-10 number B to base-n number C.
just because the base-10 is the most familiar base to humans.

If you’ve learnt HBase, which is a database system based on Apache Hadoop, you must know the design of RowKey is more than art. You may notice: Rowkey Length
Keep them as short as is reasonable such that they can still be useful for required data access (e.g., Get vs. Scan). A short key that is useless for data access is not better than a longer key with better get/scan properties. Expect tradeoffs when designing rowkeys.

For example, if I have to use uid, which is a unsigned 32-bit int, in my rowkey. that means the value in string may be at least 1-byte long to at most 10-byte [0, 4294967295]. So the way formatting uid if I need to sort uid numerically:

The base-10 4294967295 is 1z141z3 in base-36, so we can try to *compress* the key by a base-10 to base-36 conversion, then I just need to format uid like:

To understand base conversion, we just need to treat the data as a number, left most as MSB, right most as LSB.

Base Encode

Base64 is a very common algorithm encoding data to all human readable characters.
VMware uses a Base32 similar encoding(reverse bit order) in it’s license algorithm.
And Base16 is another name of Hexadecimal.
These three can be found in RFC 4648.

Read links above, in the standard Base16/Base32/Base64, string like ‘Man‘ is ‘01001101 01100001 01101110‘ in binary, each byte has its MSB at the first pos:
to Base64, binary is grouped to ‘010011 010110 000101 101110‘;
to Base32, binary is grouped to ‘01001 10101 10000 10110 1110‘, length of the last group is 4 which is less than 5, padding to ‘11100‘ at the end;
to Base16, binary is grouped to ‘0100 1101 0110 0001 0110 1110‘;
for each grouped binary, use it as a index of the predefined dictionary, then build the output string.

RFC 4648 provides a standard char table and a ‘URL and Filename Safe’ char table for Base64 and also two char tables for Base32.

But as I wrote above, VMware use a similar but different Base32 which, also take ‘Man’ as example, works like:
Man‘: ‘01001101 01100001 01101110‘;
group to ‘01101 01010 11000 11100 0110’, then padding a 0 before all 0/1 to the last group makes it like ‘01101 01010 11000 11100 00110’.
The char table used for base32 encoding is ‘0123456789ACDEFGHJKLMNPQRTUVWXYZ’, which should be familiar to captcha developers.

I’ve created a repo on github, which can support Base2, Base4, Base8, Base16, Base32, Base64 and Base128 of both two orders, you can also set your own char table and padding rule/char.

Login to read more

Base Convert and Base Encode by @sskaje:

Incoming search terms: