Why doesn't Java support unsigned ints?



java unsigned short (12)

Why doesn't Java include support for unsigned integers?

It seems to me to be an odd omission, given that they allow one to write code that is less likely to produce overflows on unexpectedly large input.

Furthermore, using unsigned integers can be a form of self-documentation, since they indicate that the value which the unsigned int was intended to hold is never supposed to be negative.

Lastly, in some cases, unsigned integers can be more efficient for certain operations, such as division.

What's the downside to including these?


I've heard stories that they were to be included close to the orignal Java release. Oak was the precursor to Java, and in some spec documents there was mention of usigned values. Unfortunately these never made it into the Java language. As far as anyone has been able to figure out they just didn't get implemented, likely due to a time constraint.


I think Java is fine as it is, adding unsigned would complicate it without much gain. Even with the simplified integer model, most Java programmers don't know how the basic numeric types behave - just read the book Java Puzzlers to see what misconceptions you might hold.

As for practical advice:

  • If your values are somewhat arbitrary size and don't fit into int, use long. If they don't fit into long use BigInteger.

  • Use the smaller types only for arrays when you need to save space.

  • If you need exactly 64/32/16/8 bits, use long/int/short/byte and stop worrying about the sign bit, except for division, comparison, right shift, and casting.

See also this answer about "porting a random number generator from C to Java".


As soon as signed and unsigned ints are mixed in an expression things start to get messy and you probably will lose information. Restricting Java to signed ints only really clears things up. I’m glad I don’t have to worry about the whole signed/unsigned business, though I sometimes do miss the 8th bit in a byte.


I once took a C++ course with someone on the C++ standards committee who implied that Java made the right decision to avoid having unsigned integers because (1) most programs that use unsigned integers can do just as well with signed integers and this is more natural in terms of how people think, and (2) using unsigned integers results in lots easy to create but difficult to debug issues such as integer arithmetic overflow and losing significant bits when converting between signed and unsigned types. If you mistakenly subtract 1 from 0 using signed integers it often more quickly causes your program to crash and makes it easier to find the bug than if it wraps around to 2^32 - 1, and compilers and static analysis tools and runtime checks have to assume you know what you're doing since you chose to use unsigned arithmetic. Also, negative numbers like -1 can often represent something useful, like a field being ignored/defaulted/unset while if you were using unsigned you'd have to reserve a special value like 2^32 - 1 or something similar.

Long ago, when memory was limited and processors did not automatically operate on 64 bits at once, every bit counted a lot more, so having signed vs unsigned bytes or shorts actually mattered a lot more often and was obviously the right design decision. Today just using a signed int is more than sufficient in almost all regular programming cases, and if your program really needs to use values bigger than 2^31 - 1, you often just want a long anyway. Once you're into the territory of using longs, it's even harder to come up with a reason why you really can't get by with 2^63 - 1 positive integers. Whenever we go to 128 bit processors it'll be even less of an issue.


I know this post is too old; however for your interest, in Java 8 and later, you can use the int data type to represent an unsigned 32-bit integer, which has a minimum value of 0 and a maximum value of 232−1. Use the Integer class to use int data type as an unsigned integer and static methods like compareUnsigned(), divideUnsigned() etc. have been added to the Integer class to support the arithmetic operations for unsigned integers.


The reason IMHO is because they are/were too lazy to implement/correct that mistake. Suggesting that C/C++ programmers does not understand unsigned, structure, union, bit flag... Is just preposterous.

Ether you were talking with a basic/bash/java programmer on the verge of beginning programming a la C, without any real knowledge this language or you are just talking out of your own mind. ;)

when you deal every day on format either from file or hardware you begin to question, what in the hell they were thinking.

A good example here would be trying to use an unsigned byte as a self rotating loop. For those of you who do not understand the last sentence, how on earth you call yourself a programmer.

DC


Because unsigned type is pure evil.

The fact that in C unsigned - int produces unsigned is even more evil.

Here is a snapshot of the problem that burned me more than once:

// We have odd positive number of rays, 
// consecutive ones at angle delta from each other.
assert( rays.size() > 0 && rays.size() % 2 == 1 );

// Get a set of ray at delta angle between them.
for( size_t n = 0; n < rays.size(); ++n )
{
    // Compute the angle between nth ray and the middle one.
    // The index of the middle one is (rays.size() - 1) / 2,
    // the rays are evenly spaced at angle delta, therefore
    // the magnitude of the angle between nth ray and the 
    // middle one is: 
    double angle = delta * fabs( n - (rays.size() - 1) / 2 ); 

    // Do something else ...
}

Have you noticed the bug yet? I confess I only saw it after stepping in with the debugger.

Because n is of unsigned type size_t the entire expression n - (rays.size() - 1) / 2 evaluates as unsigned. That expression is intended to be a signed position of the nth ray from the middle one: the 1st ray from the middle one on the left side would have position -1, the 1st one on the right would have position +1, etc. After taking abs value and multiplying by the delta angle I would get the angle between nth ray and the middle one.

Unfortunately for me the above expression contained the evil unsigned and instead of evaluating to, say, -1, it evaluated to 2^32-1. The subsequent conversion to double sealed the bug.

After a bug or two caused by misuse of unsigned arithmetic one has to start wondering whether the extra bit one gets is worth the extra trouble. I am trying, as much as feasible, to avoid any use of unsigned types in arithmetic, although still use it for non-arithmetic operations such as binary masks.


Java does have unsigned types, or at least one: char is an unsigned short. So whatever excuse Gosling throws up it's really just his ignorance why there are no other unsigned types.

Also Short types: shorts are used all the time for multimedia. The reason is you can fit 2 samples in a single 32-bit unsigned long and vectorize many operations. Same thing with 8-bit data and unsigned byte. You can fit 4 or 8 samples in a register for vectorizing.


With JDK8 it does have some support for them.

We may yet see full support of unsigned types in Java despite Gosling's concerns.


This is from an interview with Gosling and others, about simplicity:

Gosling: For me as a language designer, which I don't really count myself as these days, what "simple" really ended up meaning was could I expect J. Random Developer to hold the spec in his head. That definition says that, for instance, Java isn't -- and in fact a lot of these languages end up with a lot of corner cases, things that nobody really understands. Quiz any C developer about unsigned, and pretty soon you discover that almost no C developers actually understand what goes on with unsigned, what unsigned arithmetic is. Things like that made C complex. The language part of Java is, I think, pretty simple. The libraries you have to look up.


Your question is "Why doesn't Java support unsigned ints"?

And my answer to your question is that Java wants that all of it's primitive types: byte, char, short, int and long should be treated as byte, word, dword and qword respectively, exactly like in assembly, and the Java operators are signed operations on all of it's primitive types except for char, but only on char they are unsigned 16 bit only.

So static methods suppose to be the unsigned operations also for both 32 and 64 bit.

You need final class, whose static methods can be called for the unsigned operations.

You can create this final class, call it whatever name you want and implement it's static methods.

If you have no idea about how to implement the static methods then this link may help you.

In my opinion, Java is not similar to C++ at all, if it neither support unsigned types nor operator overloading, so I think that Java should be treated as completely different language from both C++ and from C.

It is also completely different in the name of the languages by the way.

So I don't recommend in Java to type code similar to C and I don't recommend to type code similar to C++ at all, because then in Java you won't be able to do what you want to do next in C++, i.e. the code won't continue to be C++ like at all and for me this is bad to code like that, to change the style in the middle.

I recommend to write and use static methods also for the signed operations, so you don't see in the code mixture of operators and static methods for both signed and unsigned operations, unless you need only signed operations in the code, and it's okay to use the operators only.

Also I recommend to avoid using short, int and long primitive types, and use word, dword and qword respectively instead, and you are about call the static methods for unsigned operations and/or signed operations instead of using operators.

If you are about to do signed operations only and use the operators only in the code, then this is okay to use these primitive types short, int and long.

Actually word, dword and qword don't exist in the language, but you can create new class for each and the implementation of each should be very easy:

The class word holds the primitive type short only, the class dword holds the primitive type int only and the class qword holds the primitive type long only. Now all the unsigned and the signed methods as static or not as your choice, you can implement in each class, i.e. all the 16 bit operations both unsigned and signed by giving meaning names on the word class, all the 32 bit operations both unsigned and signed by giving meaning names on the dword class and all the 64 bit operations both unsigned and signed by giving meaning names on the qword class.

If you don't like giving too many different names for each method, you can always use overloading in Java, good to read that Java didn't remove that too!

If you want methods rather than operators for 8 bit signed operations and methods for 8 bit unsigned operations that have no operators at all, then you can create the Byte class (note that the first letter 'B' is capital, so this is not the primitive type byte) and implement the methods in this class.

About passing by value and passing by reference:

If I am not wrong, like in C#, primitive objects are passed by value naturally, but class objects are passed by reference naturally, so that means that objects of type Byte, word, dword and qword will be passed by reference and not by value by default. I wish Java had struct objects as C# has, so all Byte, word, dword and qword could be implemented to be struct instead of class, so by default they were passed by value and not by reference by default, like any struct object in C#, like the primitive types, are passed by value and not by reference by default, but because that Java is worse than C# and we have to deal with that, then there is only classes and interfaces, that are passed by reference and not by value by default. So if you want to pass Byte, word, dword and qword objects by value and not by reference, like any other class object in Java and also in C#, you will have to simply use the copy constructor and that's it.

That's the only solution that I can think about. I just wish that I could just typedef the primitive types to word, dword and qword, but Java neither support typedef nor using at all, unlike C# that supports using, which is equivalent to the C's typedef.

About output:

For the same sequence of bits, you can print them in many ways: As binary, as decimal (like the meaning of %u in C printf), as octal (like the meaning of %o in C printf), as hexadecimal (like the meaning of %x in C printf) and as integer (like the meaning of the %d in C printf).

Note that C printf doesn't know the type of the variables being passed as parameters to the function, so printf knows the type of each variable only from the char* object passed to the first parameter of the function.

So in each of the classes: Byte, word, dword and qword, you can implement print method and get the functionality of printf, even though the primitive type of the class is signed, you still can print it as unsigned by following some algorithm involving logical and shift operations to get the digits to print to the output.

Unfortunately the link I gave you doesn't show how to implement these print methods, but I am sure you can google for the algorithms you need to implement these print methods.

That's all I can answer your question and suggest you.


I can think of one unfortunate side-effect. In java embedded databases, the number of ids you can have with a 32bit id field is 2^31, not 2^32 (~2billion, not ~4billion).





integer