example urlencodedformentity Android Java UTF-8 HttpClient Problem



2 Answers

@Arhimed's answer is the solution. But I cannot see anything obviously wrong with your convertStreamToString code.

My guesses are:

  1. The server is putting a UTF Byte Order Mark (BOM) at the start of the stream. The standard Java UTF-8 character decoder does not remove the BOM, so the chances are that it would end up in the resulting String. (However, the code for EntityUtils doesn't seem to do anything with BOMs either.)
  2. Your convertStreamToString is reading the character stream a line at a time, and reassembling it using a hard-wired '\n' as the end-of-line marker. If you are going to write that to an external file or application, you should probably should be using a platform specific end-of-line marker.
when is clientprotocolexception thrown

I am having weird character encoding issues with a JSON array that is grabbed from a web page. The server is sending back this header:

Content-Type text/javascript; charset=UTF-8

Also I can look at the JSON output in Firefox or any browser and Unicode characters display properly. The response will sometimes contain words from another language with accent symbols and such. However I am getting those weird question marks when I pull it down and put it to a string in Java. Here is my code:

HttpParams params = new BasicHttpParams();
HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
HttpProtocolParams.setContentCharset(params, "utf-8");
params.setBooleanParameter("http.protocol.expect-continue", false);

HttpClient httpclient = new DefaultHttpClient(params);

HttpGet httpget = new HttpGet("http://www.example.com/json_array.php");
HttpResponse response;
    try {
        response = httpclient.execute(httpget);

        if(response.getStatusLine().getStatusCode() == 200){
            // Connection was established. Get the content. 

            HttpEntity entity = response.getEntity();
            // If the response does not enclose an entity, there is no need
            // to worry about connection release

            if (entity != null) {
                // A Simple JSON Response Read
                InputStream instream = entity.getContent();
                String jsonText = convertStreamToString(instream);

                Toast.makeText(getApplicationContext(), "Response: "+jsonText, Toast.LENGTH_LONG).show();

            }

        }


    } catch (MalformedURLException e) {
        Toast.makeText(getApplicationContext(), "ERROR: Malformed URL - "+e.getMessage(), Toast.LENGTH_LONG).show();
        e.printStackTrace();
    } catch (IOException e) {
        Toast.makeText(getApplicationContext(), "ERROR: IO Exception - "+e.getMessage(), Toast.LENGTH_LONG).show();
        e.printStackTrace();
    } catch (JSONException e) {
        Toast.makeText(getApplicationContext(), "ERROR: JSON - "+e.getMessage(), Toast.LENGTH_LONG).show();
        e.printStackTrace();
    }

private static String convertStreamToString(InputStream is) {
    /*
     * To convert the InputStream to String we use the BufferedReader.readLine()
     * method. We iterate until the BufferedReader return null which means
     * there's no more data to read. Each line will appended to a StringBuilder
     * and returned as String.
     */
    BufferedReader reader;
    try {
        reader = new BufferedReader(new InputStreamReader(is, "UTF-8"));
    } catch (UnsupportedEncodingException e1) {
        // TODO Auto-generated catch block
        e1.printStackTrace();
    }
    StringBuilder sb = new StringBuilder();

    String line;
    try {
        while ((line = reader.readLine()) != null) {
            sb.append(line + "\n");
        }
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            is.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    return sb.toString();
}

As you can see, I am specifying UTF-8 on the InputStreamReader but every time I view the returned JSON text via Toast it has strange question marks. I am thinking that I need to send the InputStream to a byte[] instead?

Thanks in advance for any help.




Archimed's answer is correct. However, that can be done simply by providing an additional header in the HTTP request:

Accept-charset: utf-8

No need to remove anything or use any other library.

For example,

GET / HTTP/1.1
Host: www.website.com
Connection: close
Accept: text/html
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.10 Safari/537.36
DNT: 1
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: utf-8

Most probably your request doesn't have any Accept-Charset header.




Related