A Name-Encoding for URIs
Copyright © 2012 GlobalMentor, Inc. This specification may be freely used but only in unmodifed form.
- Author
- Garret Wilson
- Version
- 2012-02-12
Overview
Name-encoding is a tranformation of a URI to only include name-token characters, that is, characters normally used as "names" in other specifications such as XML, namely the characters '0'-'9'
, 'a'-'z'
, 'A'-'Z'
, '-'
, and '_'
. This transformation results in a single normalized representation of equivalent URIs. This tranformation guarantees that a round-trip transformation will result in a URI equivalent to the original URI, although it may not be identical when compared as a string to the original URI.
Name-encoding is useful for representing a full URI reference in some specification that does not allow URIs. For example, a name-encoded URI can be used as the name of a Subversion property or as a key in a Java properties file.
Rules
A name-encoded URI is a transformation of a normal URI that follows these rules:
- Every existing percent-encoded value in the URI (using the
'%'
character as an escape) is normalized to use the lowercase hexadecimal form. - The URI scheme separator
':'
is replaced by the hyphen character'-'
(U+002D). - Every path separator
'/'
in the URI scheme-specific part is replaced by the hyphen character'-'
(U+002D). - Every remaining character that is not one of
'0'-'9'
,'a'-'z'
,'A'-'Z'
,'-'
, or'_'
is encoded by escaping each byte of the UTF-8 encoding of the character, using the'_'
character (U+005F) as an escape followed by the two-character hexadecimal representation of the byte value in lowercase.
The character-encoding in step 3 is identical to URI-encoding of reserved characters, except that the '_'
character is used in place of the '%'
character and the hex representation must be in lowercase.
Examples
URI | Name-Encoding |
---|---|
http://www.example.com/foo/bar | http---www.example.com-foo-bar |
x-foo.bar://www.example.com/foo/bar | x_2dfoo.bar---www.example.com-foo-bar |
http://www.example.com/foo-bar | http---www.example.com-foo_2dbar |
http://www.example.com/foo_bar | http---www.example.com-foo_5fbar |
http://www.example.com/foo/bar#fooBar | http---www.example.com-foo-bar_23fooBar |
http://www.example.com/foo!bar | http---www.example.com-foo_21bar |
http://www.example.com/foo%2Abar | http---www.example.com-foo_252abar |
References
- IETF RFC 3986
- Uniform Resource Identifier (URI): Generic Syntax. Internet Engineering Task Force, 2005. (See http://www.ietf.org/rfc/rfc3986.txt.)