Lesson 2 | It all starts with tokens |
Objective | Explain the fundamental coding element in Java programs. |
Fundamental Coding Elements
When a program is processed by the Java compiler, it is first broken down into
tokens. A token is the smallest code element in a program that is meaningful to the compiler.
The following line of Java code contains five tokens:
boolean busy = true;
The tokens in this example are boolean, busy, =, true, and ;.
Understanding tokens is critical, because tokens describe the fundamental structure of the Java programming language. Java tokens can be divided into five categories:
- Identifiers,
- keywords,
- Literals,
- Operators, and
- Separators.
- Identifiers: Tokens that represent names
- Keywords: Special identifiers set aside as programming constructs
- Literals: Program data elements that are constant
- Operators: Programming constructs used to specify an evaluation or computation
- Separators: Symbols to inform the Java compiler of how code elements are grouped
Unicode Character Set
Java programs are written using Unicode.
You can use Unicode characters anywhere in a Java program, including comments and identifiers such as variable names. Unlike the 7-bit ASCII character set, which is useful only for English, and the 8-bit ISO Latin-1 character set,
which is useful only for major Western European languages, the Unicode character set can represent virtually every written language in common use on the planet. If you do not use a Unicode-enabled text editor, or if you do not want to force other programmers who view or edit your code to use a Unicode-enabled editor, you can embed Unicode characters into your Java programs using the special
Unicode escape sequence \uxxxx, in other words, a backslash and a lowercase u, followed by four hexadecimal characters. For example, \u0020 is the space character, and \u03c0 is the character Π