InputOutput Data Formats
Java I/O
Big-endian and little-endian
Big-endian data stores the bytes that make up an integer starting with the most significant byte and counting down to the least-significant byte.
Little-endian architectures store the least-significant byte first with the most-significant byte coming last.
In a big-endian decimal system (as opposed to a big-endian binary system like computers use) the number 12 means (1 * 10) + (2 * 1). In a little-endian system, 12 means (1 * 1) + (2 * 10), or 21.
Big-endian format is used in the Java Virtual Machine and in almost every modern CPU except Intel's X86 family.
IEEE-754
IEEE-754 is an international standard for the representation of single- and double- precision floating point numbers. It is used by many modern computer architectures, especially ones aimed at engineering and scientific markets.
Again the notable exception is the Intel X86 family. In fact, Microsoft has complained about the requirement of IEEE-754 math in Java. It makes Java slower on X86 Windows than it could be because floating point arithmetic
has to be emulated in software rather than sent directly to the FPU.
UTF-8
UTF-8 is a compressed form of Unicode that uses only one byte for the ASCII characters, two bytes for the most common non-ASCII Unicode characters, and three bytes for the less-common Unicode characters.
In practice, UTF-8 is much more space-efficient than pure Unicode, especially when working with English text.