Filter Streams   «Prev 

InputOutput Data Formats

The DataInputStream and DataOutputStream classes read and write primitive Java data types and Strings in a machine-independent way.
There are three main data formats:
  1. Big-endian for integer types
  2. IEEE-754 for floats and doubles
  3. UTF-8 for Unicode
Java I/O

Big-endian and little-endian

Big-endian data stores the bytes that make up an integer starting with the most significant byte and counting down to the least-significant byte.
Little-endian architectures store the least-significant byte first with the most-significant byte coming last.
In a big-endian decimal system (as opposed to a big-endian binary system like computers use) the number 12 means (1 * 10) + (2 * 1). In a little-endian system, 12 means (1 * 1) + (2 * 10), or 21. Big-endian format is used in the Java Virtual Machine and in almost every modern CPU except Intel's X86 family.


IEEE-754

IEEE-754 is an international standard for the representation of single- and double- precision floating point numbers. It is used by many modern computer architectures, especially ones aimed at engineering and scientific markets. Again the notable exception is the Intel X86 family. In fact, Microsoft has complained about the requirement of IEEE-754 math in Java. It makes Java slower on X86 Windows than it could be because floating point arithmetic has to be emulated in software rather than sent directly to the FPU.

UTF-8

UTF-8 is a compressed form of Unicode that uses only one byte for the ASCII characters, two bytes for the most common non-ASCII Unicode characters, and three bytes for the less-common Unicode characters.
In practice, UTF-8 is much more space-efficient than pure Unicode, especially when working with English text.