According to the URL specification (
RFC 1738), any character that is not alphanumeric or one of these special characters:
In practice, the only characters that are not commonly encoded are alphanumerics and these characters:
-_.
The encoding scheme used for unsafe characters uses a hexadecimal representation of the encoded character, introduced by a percent sign (
%
).
This, of course, necessitates that a percent sign must be represented by its own value, 25 hex, if it is not being used for this purpose.
Additionally, any spaces in the original string are replaced with the + (plus) sign. What does this look like? Here are a few strings and their encoded equivalents: