encodeURI vs encodeURIComponent

encodeURIComponent encodes an individual URI "component" such as:

  • Query strings
  • Protocol
  • Hostname
  • etc.

The following characters will not be encoded: -_.~!*()'

The encoding works by converting characters via UTF-8 encoding. The resulting stream of octets is represented with an escape sequence of the form "%xx". Note that with UTF-8 encoding, the final encoded result can be composed of one, two, three, or four escape sequences depending on the bit combination required to represent the code point.

If the input URI component is invalid, `System.Exceptions.InvalidURIException` will be thrown. An example of an invalid URI would be a UTF-8 high surrogate without a corresponding low surrogate.

encodeURI vs encodeURIComponent

System.Encoding.URI.encodeURI is used for encoding a full URI; meanwhile, System.Encoding.URI.encodeURIComponent is used only to encode individual components of a larger URI (such as a query string). This is best understood by example:

1
2
3
4
5
6
import System;
import System.Encoding.URI;
 
Console.log(encodeURIComponent(fullURI)); // "https%3A%2F%2Fwww.onux.com%2Fsearch%3Fq%3D%26%26"
Console.log(encodeURI(fullURI));          // "https://www.onux.com/search?q=&&"

The code above is constructing a URI to search the JS++ website for the && (AND) operator.

Notice, in the code above, encodeURIComponent - when applied to a full URI - will encode characters such as : and / in https://. Meanwhile, encodeURI will not encode these characters because it is concerned with full URIs. Most importantly, we can see that the && string is not escaped with encodeURI; yet, encodeURIComponent will escape too much data.

Best Practice: Constructing Escaped URIs from "Components"

The confusing naming scheme was inherited directly from ECMAScript for compatibility reasons. In most practical use cases, it is best to break a URI into individual components and encode each component individually using only encodeURIComponent. As an example, we can construct a URL from its individual pieces like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import System;
import System.Encoding.URI;
 
string protocol = "https";
string name = "onux";
string tld = "com";
string page = "user1+page";
 
string encoded = encodeURIComponent(protocol) +
                 "://" +
                 encodeURIComponent(name) +
                 "." +
                 encodeURIComponent(tld) +
                 "/" +
                 "page.html?p=" +
                 encodeURIComponent(page);
 

The example above is contrived to illustrate breaking URIs into individual components. In most real-world use cases, we are targeting pages of a specific domain, usually with query strings that need to be escaped. A more useful example might be:

1
2
3
4
5
6
7
8
9
import System;
import System.Encoding.URI;
 
string searchQuery = "JS++ Programming Language";
 
string encoded = root + encodeURIComponent(searchQuery);
 

In the above more practical example, we don't encode every component but only the relevant component (the query string, searchQuery).

See Also

Share

HTML | BBCode | Direct Link