A Name-Encoding for URIs
Copyright © 2012 GlobalMentor, Inc. This specification may be freely used but only in unmodifed form.
- Author
- Garret Wilson
- Version
- 2012-02-12
Overview
Name-encoding is a tranformation of a URI to only include name-token characters, that is, characters normally used as "names" in other specifications such as XML, namely the characters '0'-'9', 'a'-'z', 'A'-'Z', '-', and '_'. This transformation results in a single normalized representation of equivalent URIs. This tranformation guarantees that a round-trip transformation will result in a URI equivalent to the original URI, although it may not be identical when compared as a string to the original URI.
Name-encoding is useful for representing a full URI reference in some specification that does not allow URIs. For example, a name-encoded URI can be used as the name of a Subversion property or as a key in a Java properties file.
Rules
A name-encoded URI is a transformation of a normal URI that follows these rules:
- Every existing percent-encoded value in the URI (using the
'%'character as an escape) is normalized to use the lowercase hexadecimal form. - The URI scheme separator
':'is replaced by the hyphen character'-'(U+002D). - Every path separator
'/'in the URI scheme-specific part is replaced by the hyphen character'-'(U+002D). - Every remaining character that is not one of
'0'-'9','a'-'z','A'-'Z','-', or'_'is encoded by escaping each byte of the UTF-8 encoding of the character, using the'_'character (U+005F) as an escape followed by the two-character hexadecimal representation of the byte value in lowercase.
The character-encoding in step 3 is identical to URI-encoding of reserved characters, except that the '_' character is used in place of the '%' character and the hex representation must be in lowercase.
Examples
| URI | Name-Encoding |
|---|---|
http://www.example.com/foo/bar | http---www.example.com-foo-bar |
x-foo.bar://www.example.com/foo/bar | x_2dfoo.bar---www.example.com-foo-bar |
http://www.example.com/foo-bar | http---www.example.com-foo_2dbar |
http://www.example.com/foo_bar | http---www.example.com-foo_5fbar |
http://www.example.com/foo/bar#fooBar | http---www.example.com-foo-bar_23fooBar |
http://www.example.com/foo!bar | http---www.example.com-foo_21bar |
http://www.example.com/foo%2Abar | http---www.example.com-foo_252abar |
References
- IETF RFC 3986
- Uniform Resource Identifier (URI): Generic Syntax. Internet Engineering Task Force, 2005. (See http://www.ietf.org/rfc/rfc3986.txt.)