http space - Java URL encoding of query string parameters




decoding and (9)

Say I have a URL

http://example.com/query?q=

and I have a query entered by the user such as:

random word £500 bank $

I want the result to be a properly encoded URL:

http://example.com/query?q=random%20word%20%A3500%20bank%20%24

What's the best way to achieve this? I tried URLEncoder and creating URI/URL objects but none of them come out quite right.


Answers

Apache Http Components library provides a neat option for building and encoding query params -

With HttpComponents 4.x use - URLEncodedUtils

For HttpClient 3.x use - EncodingUtil


In android I would use this code:

Uri myUI = Uri.parse ("http://example.com/query").buildUpon().appendQueryParameter("q","random word A3500 bank 24").build();

Where Uri is a android.net.Uri


1. Split URL into structural parts. Use java.net.URL for it.

2. Encode each structural part properly!

3. Use IDN.toASCII(putDomainNameHere) to Punycode encode the host name!

4. Use java.net.URI.toASCIIString() to percent-encode, NFC encoded unicode - (better would be NFKC!). For more info see: How to encode properly this URL

URL url= new URL("http://example.com/query?q=random word £500 bank $");
URI uri = new URI(url.getProtocol(), url.getUserInfo(), IDN.toASCII(url.getHost()), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String correctEncodedURL=uri.toASCIIString(); 
System.out.println(correctEncodedURL);

Prints

http://example.com/query?q=random%20word%20%C2%A3500%20bank%20$

Here's a method you can use in your code to convert a url string and map of parameters to a valid encoded url string containing the query parameters.

String addQueryStringToUrlString(String url, final Map<Object, Object> parameters) throws UnsupportedEncodingException {
    if (parameters == null) {
        return url;
    }

    for (Map.Entry<Object, Object> parameter : parameters.entrySet()) {

        final String encodedKey = URLEncoder.encode(parameter.getKey().toString(), "UTF-8");
        final String encodedValue = URLEncoder.encode(parameter.getValue().toString(), "UTF-8");

        if (!url.contains("?")) {
            url += "?" + encodedKey + "=" + encodedValue;
        } else {
            url += "&" + encodedKey + "=" + encodedValue;
        }
    }

    return url;
}

URLEncoder should be the way to go. You only need to keep in mind to encode only the individual query string parameter name and/or value, not the entire URL, for sure not the query string parameter separator character & nor the parameter name-value separator character =.

String q = "random word £500 bank $";
String url = "http://example.com/query?q=" + URLEncoder.encode(q, "UTF-8");

Note that spaces in query parameters are represented by +, not %20, which is legitimately valid. The %20 is usually to be used to represent spaces in URI itself (the part before the URI-query string separator character ?), not in query string (the part after ?).

Also note that there are two encode() methods. One without charset argument and another with. The one without charset argument is deprecated. Never use it and always specify the charset argument. The javadoc even explicitly recommends to use the UTF-8 encoding, as mandated by RFC3986 and W3C.

All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.

See also:


  1. Use this: URLEncoder.encode(query, StandardCharsets.UTF_8.displayName()); or this:URLEncoder.encode(query, "UTF-8");
  2. You can use the follwing code.

    String encodedUrl1 = UriUtils.encodeQuery(query, "UTF-8");//not change 
    String encodedUrl2 = URLEncoder.encode(query, "UTF-8");//changed
    String encodedUrl3 = URLEncoder.encode(query, StandardCharsets.UTF_8.displayName());//changed
    
    System.out.println("url1 " + encodedUrl1 + "\n" + "url2=" + encodedUrl2 + "\n" + "url3=" + encodedUrl3);
    


You need to first create a URI like:

    String urlStr = "http://www.example.com/CEREC® Materials & Accessories/IPS Empress® CAD.pdf"
    URL url= new URL(urlStr);
    URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());

Then convert that Uri to ASCII string:

    urlStr=uri.toASCIIString();

Now your url string is completely encoded first we did simple url encoding and then we converted it to ASCII String to make sure no character outside US-ASCII are remaining in string. This is exactly how browsers do.


Here's an extended version of Andy E's linked "Handle array-style query strings"-version. Fixed a bug (?key=1&key[]=2&key[]=3; 1 is lost and replaced with [2,3]), made a few minor performance improvements (re-decoding of values, recalculating "[" position, etc.) and added a number of improvements (functionalized, support for ?key=1&key=2, support for ; delimiters). I left the variables annoyingly short, but added comments galore to make them readable (oh, and I reused v within the local functions, sorry if that is confusing ;).

It will handle the following querystring...

?test=Hello&person=neek&person[]=jeff&person[]=jim&person[extra]=john&test3&nocache=1398914891264

...making it into an object that looks like...

{
    "test": "Hello",
    "person": {
        "0": "neek",
        "1": "jeff",
        "2": "jim",
        "length": 3,
        "extra": "john"
    },
    "test3": "",
    "nocache": "1398914891264"
}

As you can see above, this version handles some measure of "malformed" arrays, i.e. - person=neek&person[]=jeff&person[]=jim or person=neek&person=jeff&person=jim as the key is identifiable and valid (at least in dotNet's NameValueCollection.Add):

If the specified key already exists in the target NameValueCollection instance, the specified value is added to the existing comma-separated list of values in the form "value1,value2,value3".

It seems the jury is somewhat out on repeated keys as there is no spec. In this case, multiple keys are stored as an (fake)array. But do note that I do not process values based on commas into arrays.

The code:

getQueryStringKey = function(key) {
    return getQueryStringAsObject()[key];
};


getQueryStringAsObject = function() {
    var b, cv, e, k, ma, sk, v, r = {},
        d = function (v) { return decodeURIComponent(v).replace(/\+/g, " "); }, //# d(ecode) the v(alue)
        q = window.location.search.substring(1), //# suggested: q = decodeURIComponent(window.location.search.substring(1)),
        s = /([^&;=]+)=?([^&;]*)/g //# original regex that does not allow for ; as a delimiter:   /([^&=]+)=?([^&]*)/g
    ;

    //# ma(make array) out of the v(alue)
    ma = function(v) {
        //# If the passed v(alue) hasn't been setup as an object
        if (typeof v != "object") {
            //# Grab the cv(current value) then setup the v(alue) as an object
            cv = v;
            v = {};
            v.length = 0;

            //# If there was a cv(current value), .push it into the new v(alue)'s array
            //#     NOTE: This may or may not be 100% logical to do... but it's better than loosing the original value
            if (cv) { Array.prototype.push.call(v, cv); }
        }
        return v;
    };

    //# While we still have key-value e(ntries) from the q(uerystring) via the s(earch regex)...
    while (e = s.exec(q)) { //# while((e = s.exec(q)) !== null) {
        //# Collect the open b(racket) location (if any) then set the d(ecoded) v(alue) from the above split key-value e(ntry) 
        b = e[1].indexOf("[");
        v = d(e[2]);

        //# As long as this is NOT a hash[]-style key-value e(ntry)
        if (b < 0) { //# b == "-1"
            //# d(ecode) the simple k(ey)
            k = d(e[1]);

            //# If the k(ey) already exists
            if (r[k]) {
                //# ma(make array) out of the k(ey) then .push the v(alue) into the k(ey)'s array in the r(eturn value)
                r[k] = ma(r[k]);
                Array.prototype.push.call(r[k], v);
            }
            //# Else this is a new k(ey), so just add the k(ey)/v(alue) into the r(eturn value)
            else {
                r[k] = v;
            }
        }
        //# Else we've got ourselves a hash[]-style key-value e(ntry) 
        else {
            //# Collect the d(ecoded) k(ey) and the d(ecoded) sk(sub-key) based on the b(racket) locations
            k = d(e[1].slice(0, b));
            sk = d(e[1].slice(b + 1, e[1].indexOf("]", b)));

            //# ma(make array) out of the k(ey) 
            r[k] = ma(r[k]);

            //# If we have a sk(sub-key), plug the v(alue) into it
            if (sk) { r[k][sk] = v; }
            //# Else .push the v(alue) into the k(ey)'s array
            else { Array.prototype.push.call(r[k], v); }
        }
    }

    //# Return the r(eturn value)
    return r;
};




java http url encoding urlencode