# java javascript - Fastest way to determine if an integer's square root is an integer

I figured out a method that works ~35% faster than your 6bits+Carmack+sqrt code, at least with my CPU (x86) and programming language (C/C++). Your results may vary, especially because I don't know how the Java factor will play out.

My approach is threefold:

- First, filter out obvious answers. This includes negative numbers and looking at the last 4 bits. (I found looking at the last six didn't help.) I also answer yes for 0. (In reading the code below, note that my input is
`int64 x`

.)`if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) ) return false; if( x == 0 ) return true;`

- Next, check if it's a square modulo 255 = 3 * 5 * 17. Because that's a product of three distinct primes, only about 1/8 of the residues mod 255 are squares. However, in my experience, calling the modulo operator (%) costs more than the benefit one gets, so I use bit tricks involving 255 = 2^8-1 to compute the residue. (For better or worse, I am not using the trick of reading individual bytes out of a word, only bitwise-and and shifts.)

To actually check if the residue is a square, I look up the answer in a precomputed table.`int64 y = x; y = (y & 4294967295LL) + (y >> 32); y = (y & 65535) + (y >> 16); y = (y & 255) + ((y >> 8) & 255) + (y >> 16); // At this point, y is between 0 and 511. More code can reduce it farther.`

`if( bad255[y] ) return false; // However, I just use a table of size 512`

- Finally, try to compute the square root using a method similar to Hensel's lemma. (I don't think it's applicable directly, but it works with some modifications.) Before doing that, I divide out all powers of 2 with a binary search:

At this point, for our number to be a square, it must be 1 mod 8.`if((x & 4294967295LL) == 0) x >>= 32; if((x & 65535) == 0) x >>= 16; if((x & 255) == 0) x >>= 8; if((x & 15) == 0) x >>= 4; if((x & 3) == 0) x >>= 2;`

The basic structure of Hensel's lemma is the following. (Note: untested code; if it doesn't work, try t=2 or 8.)`if((x & 7) != 1) return false;`

The idea is that at each iteration, you add one bit onto r, the "current" square root of x; each square root is accurate modulo a larger and larger power of 2, namely t/2. At the end, r and t/2-r will be square roots of x modulo t/2. (Note that if r is a square root of x, then so is -r. This is true even modulo numbers, but beware, modulo some numbers, things can have even more than 2 square roots; notably, this includes powers of 2.) Because our actual square root is less than 2^32, at that point we can actually just check if r or t/2-r are real square roots. In my actual code, I use the following modified loop:`int64 t = 4, r = 1; t <<= 1; r += ((x - r * r) & t) >> 1; t <<= 1; r += ((x - r * r) & t) >> 1; t <<= 1; r += ((x - r * r) & t) >> 1; // Repeat until t is 2^33 or so. Use a loop if you want.`

The speedup here is obtained in three ways: precomputed start value (equivalent to ~10 iterations of the loop), earlier exit of the loop, and skipping some t values. For the last part, I look at`int64 r, t, z; r = start[(x >> 3) & 1023]; do { z = x - r * r; if( z == 0 ) return true; if( z < 0 ) return false; t = z & (-z); r += (z & t) >> 1; if( r > (t >> 1) ) r = t - r; } while( t <= (1LL << 33) );`

`z = r - x * x`

, and set t to be the largest power of 2 dividing z with a bit trick. This allows me to skip t values that wouldn't have affected the value of r anyway. The precomputed start value in my case picks out the "smallest positive" square root modulo 8192.

Even if this code doesn't work faster for you, I hope you enjoy some of the ideas it contains. Complete, tested code follows, including the precomputed tables.

```
typedef signed long long int int64;
int start[1024] =
{1,3,1769,5,1937,1741,7,1451,479,157,9,91,945,659,1817,11,
1983,707,1321,1211,1071,13,1479,405,415,1501,1609,741,15,339,1703,203,
129,1411,873,1669,17,1715,1145,1835,351,1251,887,1573,975,19,1127,395,
1855,1981,425,453,1105,653,327,21,287,93,713,1691,1935,301,551,587,
257,1277,23,763,1903,1075,1799,1877,223,1437,1783,859,1201,621,25,779,
1727,573,471,1979,815,1293,825,363,159,1315,183,27,241,941,601,971,
385,131,919,901,273,435,647,1493,95,29,1417,805,719,1261,1177,1163,
1599,835,1367,315,1361,1933,1977,747,31,1373,1079,1637,1679,1581,1753,1355,
513,1539,1815,1531,1647,205,505,1109,33,1379,521,1627,1457,1901,1767,1547,
1471,1853,1833,1349,559,1523,967,1131,97,35,1975,795,497,1875,1191,1739,
641,1149,1385,133,529,845,1657,725,161,1309,375,37,463,1555,615,1931,
1343,445,937,1083,1617,883,185,1515,225,1443,1225,869,1423,1235,39,1973,
769,259,489,1797,1391,1485,1287,341,289,99,1271,1701,1713,915,537,1781,
1215,963,41,581,303,243,1337,1899,353,1245,329,1563,753,595,1113,1589,
897,1667,407,635,785,1971,135,43,417,1507,1929,731,207,275,1689,1397,
1087,1725,855,1851,1873,397,1607,1813,481,163,567,101,1167,45,1831,1205,
1025,1021,1303,1029,1135,1331,1017,427,545,1181,1033,933,1969,365,1255,1013,
959,317,1751,187,47,1037,455,1429,609,1571,1463,1765,1009,685,679,821,
1153,387,1897,1403,1041,691,1927,811,673,227,137,1499,49,1005,103,629,
831,1091,1449,1477,1967,1677,697,1045,737,1117,1737,667,911,1325,473,437,
1281,1795,1001,261,879,51,775,1195,801,1635,759,165,1871,1645,1049,245,
703,1597,553,955,209,1779,1849,661,865,291,841,997,1265,1965,1625,53,
1409,893,105,1925,1297,589,377,1579,929,1053,1655,1829,305,1811,1895,139,
575,189,343,709,1711,1139,1095,277,993,1699,55,1435,655,1491,1319,331,
1537,515,791,507,623,1229,1529,1963,1057,355,1545,603,1615,1171,743,523,
447,1219,1239,1723,465,499,57,107,1121,989,951,229,1521,851,167,715,
1665,1923,1687,1157,1553,1869,1415,1749,1185,1763,649,1061,561,531,409,907,
319,1469,1961,59,1455,141,1209,491,1249,419,1847,1893,399,211,985,1099,
1793,765,1513,1275,367,1587,263,1365,1313,925,247,1371,1359,109,1561,1291,
191,61,1065,1605,721,781,1735,875,1377,1827,1353,539,1777,429,1959,1483,
1921,643,617,389,1809,947,889,981,1441,483,1143,293,817,749,1383,1675,
63,1347,169,827,1199,1421,583,1259,1505,861,457,1125,143,1069,807,1867,
2047,2045,279,2043,111,307,2041,597,1569,1891,2039,1957,1103,1389,231,2037,
65,1341,727,837,977,2035,569,1643,1633,547,439,1307,2033,1709,345,1845,
1919,637,1175,379,2031,333,903,213,1697,797,1161,475,1073,2029,921,1653,
193,67,1623,1595,943,1395,1721,2027,1761,1955,1335,357,113,1747,1497,1461,
1791,771,2025,1285,145,973,249,171,1825,611,265,1189,847,1427,2023,1269,
321,1475,1577,69,1233,755,1223,1685,1889,733,1865,2021,1807,1107,1447,1077,
1663,1917,1129,1147,1775,1613,1401,555,1953,2019,631,1243,1329,787,871,885,
449,1213,681,1733,687,115,71,1301,2017,675,969,411,369,467,295,693,
1535,509,233,517,401,1843,1543,939,2015,669,1527,421,591,147,281,501,
577,195,215,699,1489,525,1081,917,1951,2013,73,1253,1551,173,857,309,
1407,899,663,1915,1519,1203,391,1323,1887,739,1673,2011,1585,493,1433,117,
705,1603,1111,965,431,1165,1863,533,1823,605,823,1179,625,813,2009,75,
1279,1789,1559,251,657,563,761,1707,1759,1949,777,347,335,1133,1511,267,
833,1085,2007,1467,1745,1805,711,149,1695,803,1719,485,1295,1453,935,459,
1151,381,1641,1413,1263,77,1913,2005,1631,541,119,1317,1841,1773,359,651,
961,323,1193,197,175,1651,441,235,1567,1885,1481,1947,881,2003,217,843,
1023,1027,745,1019,913,717,1031,1621,1503,867,1015,1115,79,1683,793,1035,
1089,1731,297,1861,2001,1011,1593,619,1439,477,585,283,1039,1363,1369,1227,
895,1661,151,645,1007,1357,121,1237,1375,1821,1911,549,1999,1043,1945,1419,
1217,957,599,571,81,371,1351,1003,1311,931,311,1381,1137,723,1575,1611,
767,253,1047,1787,1169,1997,1273,853,1247,413,1289,1883,177,403,999,1803,
1345,451,1495,1093,1839,269,199,1387,1183,1757,1207,1051,783,83,423,1995,
639,1155,1943,123,751,1459,1671,469,1119,995,393,219,1743,237,153,1909,
1473,1859,1705,1339,337,909,953,1771,1055,349,1993,613,1393,557,729,1717,
511,1533,1257,1541,1425,819,519,85,991,1693,503,1445,433,877,1305,1525,
1601,829,809,325,1583,1549,1991,1941,927,1059,1097,1819,527,1197,1881,1333,
383,125,361,891,495,179,633,299,863,285,1399,987,1487,1517,1639,1141,
1729,579,87,1989,593,1907,839,1557,799,1629,201,155,1649,1837,1063,949,
255,1283,535,773,1681,461,1785,683,735,1123,1801,677,689,1939,487,757,
1857,1987,983,443,1327,1267,313,1173,671,221,695,1509,271,1619,89,565,
127,1405,1431,1659,239,1101,1159,1067,607,1565,905,1755,1231,1299,665,373,
1985,701,1879,1221,849,627,1465,789,543,1187,1591,923,1905,979,1241,181};
bool bad255[512] =
{0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,
1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1,
0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1,
1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1,
1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1,
1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1,
1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,
1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,
0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,
1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1,
0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1,
1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1,
1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1,
1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1,
1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,
1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,
0,0};
inline bool square( int64 x ) {
// Quickfail
if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) )
return false;
if( x == 0 )
return true;
// Check mod 255 = 3 * 5 * 17, for fun
int64 y = x;
y = (y & 4294967295LL) + (y >> 32);
y = (y & 65535) + (y >> 16);
y = (y & 255) + ((y >> 8) & 255) + (y >> 16);
if( bad255[y] )
return false;
// Divide out powers of 4 using binary search
if((x & 4294967295LL) == 0)
x >>= 32;
if((x & 65535) == 0)
x >>= 16;
if((x & 255) == 0)
x >>= 8;
if((x & 15) == 0)
x >>= 4;
if((x & 3) == 0)
x >>= 2;
if((x & 7) != 1)
return false;
// Compute sqrt using something like Hensel's lemma
int64 r, t, z;
r = start[(x >> 3) & 1023];
do {
z = x - r * r;
if( z == 0 )
return true;
if( z < 0 )
return false;
t = z & (-z);
r += (z & t) >> 1;
if( r > (t >> 1) )
r = t - r;
} while( t <= (1LL << 33) );
return false;
}
```

I'm looking for the fastest way to determine if a `long`

value is a perfect square (i.e. its square root is another integer):

- I've done it the easy way, by using the built-in Math.sqrt() function, but I'm wondering if there is a way to do it faster by restricting yourself to integer-only domain.
- Maintaining a lookup table is impratical (since there are about
2
^{31.5}integers whose square is less than 2^{63}).

Here is the very simple and straightforward way I'm doing it now:

```
public final static boolean isPerfectSquare(long n)
{
if (n < 0)
return false;
long tst = (long)(Math.sqrt(n) + 0.5);
return tst*tst == n;
}
```

*Notes: I'm using this function in many Project Euler problems. So no one else will ever have to maintain this code. And this kind of micro-optimization could actually make a difference, since part of the challenge is to do every algorithm in less than a minute, and this function will need to be called millions of times in some problems.*

A new solution posted by A. Rex has proven to be even faster. In a run over the first 1 billion integers, the solution only required 34% of the time that the original solution used. While the John Carmack hack is a little better for small values of *n*, the benefit compared to this solution is pretty small.

Here is the A. Rex solution, converted to Java:

```
private final static boolean isPerfectSquare(long n)
{
// Quickfail
if( n < 0 || ((n&2) != 0) || ((n & 7) == 5) || ((n & 11) == 8) )
return false;
if( n == 0 )
return true;
// Check mod 255 = 3 * 5 * 17, for fun
long y = n;
y = (y & 0xffffffffL) + (y >> 32);
y = (y & 0xffffL) + (y >> 16);
y = (y & 0xffL) + ((y >> 8) & 0xffL) + (y >> 16);
if( bad255[(int)y] )
return false;
// Divide out powers of 4 using binary search
if((n & 0xffffffffL) == 0)
n >>= 32;
if((n & 0xffffL) == 0)
n >>= 16;
if((n & 0xffL) == 0)
n >>= 8;
if((n & 0xfL) == 0)
n >>= 4;
if((n & 0x3L) == 0)
n >>= 2;
if((n & 0x7L) != 1)
return false;
// Compute sqrt using something like Hensel's lemma
long r, t, z;
r = start[(int)((n >> 3) & 0x3ffL)];
do {
z = n - r * r;
if( z == 0 )
return true;
if( z < 0 )
return false;
t = z & (-z);
r += (z & t) >> 1;
if( r > (t >> 1) )
r = t - r;
} while( t <= (1L << 33) );
return false;
}
private static boolean[] bad255 =
{
false,false,true ,true ,false,true ,true ,true ,true ,false,true ,true ,true ,
true ,true ,false,false,true ,true ,false,true ,false,true ,true ,true ,false,
true ,true ,true ,true ,false,true ,true ,true ,false,true ,false,true ,true ,
true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,false,true ,false,
true ,true ,true ,false,true ,true ,true ,true ,false,true ,true ,true ,false,
true ,false,true ,true ,false,false,true ,true ,true ,true ,true ,false,true ,
true ,true ,true ,false,true ,true ,false,false,true ,true ,true ,true ,true ,
true ,true ,true ,false,true ,true ,true ,true ,true ,false,true ,true ,true ,
true ,true ,false,true ,true ,true ,true ,false,true ,true ,true ,false,true ,
true ,true ,true ,false,false,true ,true ,true ,true ,true ,true ,true ,true ,
true ,true ,true ,true ,true ,false,false,true ,true ,true ,true ,true ,true ,
true ,false,false,true ,true ,true ,true ,true ,false,true ,true ,false,true ,
true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,false,true ,true ,
false,true ,false,true ,true ,false,true ,true ,true ,true ,true ,true ,true ,
true ,true ,true ,true ,false,true ,true ,false,true ,true ,true ,true ,true ,
false,false,true ,true ,true ,true ,true ,true ,true ,false,false,true ,true ,
true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,false,false,
true ,true ,true ,true ,false,true ,true ,true ,false,true ,true ,true ,true ,
false,true ,true ,true ,true ,true ,false,true ,true ,true ,true ,true ,false,
true ,true ,true ,true ,true ,true ,true ,true ,false,false,true ,true ,false,
true ,true ,true ,true ,false,true ,true ,true ,true ,true ,false,false,true ,
true ,false,true ,false,true ,true ,true ,false,true ,true ,true ,true ,false,
true ,true ,true ,false,true ,false,true ,true ,true ,true ,true ,true ,true ,
true ,true ,true ,true ,true ,false,true ,false,true ,true ,true ,false,true ,
true ,true ,true ,false,true ,true ,true ,false,true ,false,true ,true ,false,
false,true ,true ,true ,true ,true ,false,true ,true ,true ,true ,false,true ,
true ,false,false,true ,true ,true ,true ,true ,true ,true ,true ,false,true ,
true ,true ,true ,true ,false,true ,true ,true ,true ,true ,false,true ,true ,
true ,true ,false,true ,true ,true ,false,true ,true ,true ,true ,false,false,
true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,
false,false,true ,true ,true ,true ,true ,true ,true ,false,false,true ,true ,
true ,true ,true ,false,true ,true ,false,true ,true ,true ,true ,true ,true ,
true ,true ,true ,true ,true ,false,true ,true ,false,true ,false,true ,true ,
false,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,false,
true ,true ,false,true ,true ,true ,true ,true ,false,false,true ,true ,true ,
true ,true ,true ,true ,false,false,true ,true ,true ,true ,true ,true ,true ,
true ,true ,true ,true ,true ,true ,false,false,true ,true ,true ,true ,false,
true ,true ,true ,false,true ,true ,true ,true ,false,true ,true ,true ,true ,
true ,false,true ,true ,true ,true ,true ,false,true ,true ,true ,true ,true ,
true ,true ,true ,false,false
};
private static int[] start =
{
1,3,1769,5,1937,1741,7,1451,479,157,9,91,945,659,1817,11,
1983,707,1321,1211,1071,13,1479,405,415,1501,1609,741,15,339,1703,203,
129,1411,873,1669,17,1715,1145,1835,351,1251,887,1573,975,19,1127,395,
1855,1981,425,453,1105,653,327,21,287,93,713,1691,1935,301,551,587,
257,1277,23,763,1903,1075,1799,1877,223,1437,1783,859,1201,621,25,779,
1727,573,471,1979,815,1293,825,363,159,1315,183,27,241,941,601,971,
385,131,919,901,273,435,647,1493,95,29,1417,805,719,1261,1177,1163,
1599,835,1367,315,1361,1933,1977,747,31,1373,1079,1637,1679,1581,1753,1355,
513,1539,1815,1531,1647,205,505,1109,33,1379,521,1627,1457,1901,1767,1547,
1471,1853,1833,1349,559,1523,967,1131,97,35,1975,795,497,1875,1191,1739,
641,1149,1385,133,529,845,1657,725,161,1309,375,37,463,1555,615,1931,
1343,445,937,1083,1617,883,185,1515,225,1443,1225,869,1423,1235,39,1973,
769,259,489,1797,1391,1485,1287,341,289,99,1271,1701,1713,915,537,1781,
1215,963,41,581,303,243,1337,1899,353,1245,329,1563,753,595,1113,1589,
897,1667,407,635,785,1971,135,43,417,1507,1929,731,207,275,1689,1397,
1087,1725,855,1851,1873,397,1607,1813,481,163,567,101,1167,45,1831,1205,
1025,1021,1303,1029,1135,1331,1017,427,545,1181,1033,933,1969,365,1255,1013,
959,317,1751,187,47,1037,455,1429,609,1571,1463,1765,1009,685,679,821,
1153,387,1897,1403,1041,691,1927,811,673,227,137,1499,49,1005,103,629,
831,1091,1449,1477,1967,1677,697,1045,737,1117,1737,667,911,1325,473,437,
1281,1795,1001,261,879,51,775,1195,801,1635,759,165,1871,1645,1049,245,
703,1597,553,955,209,1779,1849,661,865,291,841,997,1265,1965,1625,53,
1409,893,105,1925,1297,589,377,1579,929,1053,1655,1829,305,1811,1895,139,
575,189,343,709,1711,1139,1095,277,993,1699,55,1435,655,1491,1319,331,
1537,515,791,507,623,1229,1529,1963,1057,355,1545,603,1615,1171,743,523,
447,1219,1239,1723,465,499,57,107,1121,989,951,229,1521,851,167,715,
1665,1923,1687,1157,1553,1869,1415,1749,1185,1763,649,1061,561,531,409,907,
319,1469,1961,59,1455,141,1209,491,1249,419,1847,1893,399,211,985,1099,
1793,765,1513,1275,367,1587,263,1365,1313,925,247,1371,1359,109,1561,1291,
191,61,1065,1605,721,781,1735,875,1377,1827,1353,539,1777,429,1959,1483,
1921,643,617,389,1809,947,889,981,1441,483,1143,293,817,749,1383,1675,
63,1347,169,827,1199,1421,583,1259,1505,861,457,1125,143,1069,807,1867,
2047,2045,279,2043,111,307,2041,597,1569,1891,2039,1957,1103,1389,231,2037,
65,1341,727,837,977,2035,569,1643,1633,547,439,1307,2033,1709,345,1845,
1919,637,1175,379,2031,333,903,213,1697,797,1161,475,1073,2029,921,1653,
193,67,1623,1595,943,1395,1721,2027,1761,1955,1335,357,113,1747,1497,1461,
1791,771,2025,1285,145,973,249,171,1825,611,265,1189,847,1427,2023,1269,
321,1475,1577,69,1233,755,1223,1685,1889,733,1865,2021,1807,1107,1447,1077,
1663,1917,1129,1147,1775,1613,1401,555,1953,2019,631,1243,1329,787,871,885,
449,1213,681,1733,687,115,71,1301,2017,675,969,411,369,467,295,693,
1535,509,233,517,401,1843,1543,939,2015,669,1527,421,591,147,281,501,
577,195,215,699,1489,525,1081,917,1951,2013,73,1253,1551,173,857,309,
1407,899,663,1915,1519,1203,391,1323,1887,739,1673,2011,1585,493,1433,117,
705,1603,1111,965,431,1165,1863,533,1823,605,823,1179,625,813,2009,75,
1279,1789,1559,251,657,563,761,1707,1759,1949,777,347,335,1133,1511,267,
833,1085,2007,1467,1745,1805,711,149,1695,803,1719,485,1295,1453,935,459,
1151,381,1641,1413,1263,77,1913,2005,1631,541,119,1317,1841,1773,359,651,
961,323,1193,197,175,1651,441,235,1567,1885,1481,1947,881,2003,217,843,
1023,1027,745,1019,913,717,1031,1621,1503,867,1015,1115,79,1683,793,1035,
1089,1731,297,1861,2001,1011,1593,619,1439,477,585,283,1039,1363,1369,1227,
895,1661,151,645,1007,1357,121,1237,1375,1821,1911,549,1999,1043,1945,1419,
1217,957,599,571,81,371,1351,1003,1311,931,311,1381,1137,723,1575,1611,
767,253,1047,1787,1169,1997,1273,853,1247,413,1289,1883,177,403,999,1803,
1345,451,1495,1093,1839,269,199,1387,1183,1757,1207,1051,783,83,423,1995,
639,1155,1943,123,751,1459,1671,469,1119,995,393,219,1743,237,153,1909,
1473,1859,1705,1339,337,909,953,1771,1055,349,1993,613,1393,557,729,1717,
511,1533,1257,1541,1425,819,519,85,991,1693,503,1445,433,877,1305,1525,
1601,829,809,325,1583,1549,1991,1941,927,1059,1097,1819,527,1197,1881,1333,
383,125,361,891,495,179,633,299,863,285,1399,987,1487,1517,1639,1141,
1729,579,87,1989,593,1907,839,1557,799,1629,201,155,1649,1837,1063,949,
255,1283,535,773,1681,461,1785,683,735,1123,1801,677,689,1939,487,757,
1857,1987,983,443,1327,1267,313,1173,671,221,695,1509,271,1619,89,565,
127,1405,1431,1659,239,1101,1159,1067,607,1565,905,1755,1231,1299,665,373,
1985,701,1879,1221,849,627,1465,789,543,1187,1591,923,1905,979,1241,181
};
```

I've tried the different solutions presented below.

- After exhaustive testing, I found that adding
`0.5`

to the result of Math.sqrt() is not necessary, at least not on my machine. - The John Carmack hack was faster, but it gave incorrect results starting at n=410881. However, as suggested by BobbyShaftoe, we can use the Carmack hack for n < 410881.
- Newton's method was a good bit slower than
`Math.sqrt()`

. This is probably because`Math.sqrt()`

uses something similar to Newton's Method, but implemented in the hardware so it's much faster than in Java. Also, Newton's Method still required use of doubles. - A modified Newton's method, which used a few tricks so that only integer math was involved, required some hacks to avoid overflow (I want this function to work with all positive 64-bit signed integers), and it was still slower than
`Math.sqrt()`

. - Binary chop was even slower. This makes sense because the binary chop will on average require 16 passes to find the square root of a 64-bit number.

The one suggestion which did show improvements was made by John D. Cook. You can observe that the last hex digit (i.e. the last 4 bits) of a perfect square must be 0, 1, 4, or 9. This means that 75% of numbers can be immediately eliminated as possible squares. Implementing this solution resulted in about a 50% reduction in runtime.

Working from John's suggestion, I investigated properties of the last *n* bits of a perfect square. By analyzing the last 6 bits, I found that only 12 out of 64 values are possible for the last 6 bits. This means 81% of values can be eliminated without using any math. Implementing this solution gave an additional 8% reduction in runtime (compared to my original algorithm). Analyzing more than 6 bits results in a list of possible ending bits which is too large to be practical.

Here is the code that I have used, which runs in 42% of the time required by the original algorithm (based on a run over the first 100 million integers). For values of *n* less than 410881, it runs in only 29% of the time required by the original algorithm.

```
private final static boolean isPerfectSquare(long n)
{
if (n < 0)
return false;
switch((int)(n & 0x3F))
{
case 0x00: case 0x01: case 0x04: case 0x09: case 0x10: case 0x11:
case 0x19: case 0x21: case 0x24: case 0x29: case 0x31: case 0x39:
long sqrt;
if(n < 410881L)
{
//John Carmack hack, converted to Java.
// See: http://www.codemaestro.com/reviews/9
int i;
float x2, y;
x2 = n * 0.5F;
y = n;
i = Float.floatToRawIntBits(y);
i = 0x5f3759df - ( i >> 1 );
y = Float.intBitsToFloat(i);
y = y * ( 1.5F - ( x2 * y * y ) );
sqrt = (long)(1.0F/y);
}
else
{
//Carmack hack gives incorrect answer for n >= 410881.
sqrt = (long)Math.sqrt(n);
}
return sqrt*sqrt == n;
default:
return false;
}
}
```

**Notes**:

- According to John's tests, using
`or`

statements is faster in C++ than using a`switch`

, but in Java and C# there appears to be no difference between`or`

and`switch`

. - I also tried making a lookup table (as a private static array of 64 boolean values). Then instead of either switch or
`or`

statement, I would just say`if(lookup[(int)(n&0x3F)]) { test } else return false;`

. To my surprise, this was (just slightly) slower.~~I'm not sure why.~~This is because array bounds are checked in Java.

You'll have to do some benchmarking. The best algorithm will depend on the distribution of your inputs.

Your algorithm may be nearly optimal, but you might want to do a quick check to rule out some possibilities before calling your square root routine. For example, look at the last digit of your number in hex by doing a bit-wise "and." Perfect squares can only end in 0, 1, 4, or 9 in base 16, So for 75% of your inputs (assuming they are uniformly distributed) you can avoid a call to the square root in exchange for some very fast bit twiddling.

Kip benchmarked the following code implementing the hex trick. When testing numbers 1 through 100,000,000, this code ran twice as fast as the original.

```
public final static boolean isPerfectSquare(long n)
{
if (n < 0)
return false;
switch((int)(n & 0xF))
{
case 0: case 1: case 4: case 9:
long tst = (long)Math.sqrt(n);
return tst*tst == n;
default:
return false;
}
}
```

When I tested the analogous code in C++, it actually ran slower than the original. However, when I eliminated the switch statement, the hex trick once again make the code twice as fast.

```
int isPerfectSquare(int n)
{
int h = n & 0xF; // h is the last hex "digit"
if (h > 9)
return 0;
// Use lazy evaluation to jump out of the if statement as soon as possible
if (h != 2 && h != 3 && h != 5 && h != 6 && h != 7 && h != 8)
{
int t = (int) floor( sqrt((double) n) + 0.5 );
return t*t == n;
}
return 0;
}
```

Eliminating the switch statement had little effect on the C# code.

I'm not sure if it would be faster, or even accurate, but you could use John Carmack's Magical Square Root, algorithm to solve the square root faster. You could probably easily test this for all possible 32 bit integers, and validate that you actually got correct results, as it's only an appoximation. However, now that I think about it, using doubles is approximating also, so I'm not sure how that would come into play.

I ran my own analysis of several of the algorithms in this thread and came up with some new results. You can see those old results in the edit history of this answer, but they're not accurate, as I made a mistake, and wasted time analyzing several algorithms which aren't close. However, pulling lessons from several different answers, I now have two algorithms that crush the "winner" of this thread. Here's the core thing I do differently than everyone else:

```
// This is faster because a number is divisible by 2^4 or more only 6% of the time
// and more than that a vanishingly small percentage.
while((x & 0x3) == 0) x >>= 2;
// This is effectively the same as the switch-case statement used in the original
// answer.
if((x & 0x7) != 1) return false;
```

However, this simple line, which most of the time adds one or two very fast instructions, greatly simplifies the `switch-case`

statement into one if statement. However, it can add to the runtime if many of the tested numbers have significant power-of-two factors.

The algorithms below are as follows:

**Internet**- Kip's posted answer**Durron**- My modified answer using the one-pass answer as a base**DurronTwo**- My modified answer using the two-pass answer (by @JohnnyHeggheim), with some other slight modifications.

Here is a sample runtime if the numbers are generated using `Math.abs(java.util.Random.nextLong())`

```
0% Scenario{vm=java, trial=0, benchmark=Internet} 39673.40 ns; ?=378.78 ns @ 3 trials
33% Scenario{vm=java, trial=0, benchmark=Durron} 37785.75 ns; ?=478.86 ns @ 10 trials
67% Scenario{vm=java, trial=0, benchmark=DurronTwo} 35978.10 ns; ?=734.10 ns @ 10 trials
benchmark us linear runtime
Internet 39.7 ==============================
Durron 37.8 ============================
DurronTwo 36.0 ===========================
vm: java
trial: 0
```

And here is a sample runtime if it's run on the first million longs only:

```
0% Scenario{vm=java, trial=0, benchmark=Internet} 2933380.84 ns; ?=56939.84 ns @ 10 trials
33% Scenario{vm=java, trial=0, benchmark=Durron} 2243266.81 ns; ?=50537.62 ns @ 10 trials
67% Scenario{vm=java, trial=0, benchmark=DurronTwo} 3159227.68 ns; ?=10766.22 ns @ 3 trials
benchmark ms linear runtime
Internet 2.93 ===========================
Durron 2.24 =====================
DurronTwo 3.16 ==============================
vm: java
trial: 0
```

As you can see, `DurronTwo`

does better for large inputs, because it gets to use the magic trick very very often, but gets clobbered compared to the first algorithm and `Math.sqrt`

because the numbers are so much smaller. Meanwhile, the simpler `Durron`

is a huge winner because it never has to divide by 4 many many times in the first million numbers.

Here's `Durron`

:

```
public final static boolean isPerfectSquareDurron(long n) {
if(n < 0) return false;
if(n == 0) return true;
long x = n;
// This is faster because a number is divisible by 16 only 6% of the time
// and more than that a vanishingly small percentage.
while((x & 0x3) == 0) x >>= 2;
// This is effectively the same as the switch-case statement used in the original
// answer.
if((x & 0x7) == 1) {
long sqrt;
if(x < 410881L)
{
int i;
float x2, y;
x2 = x * 0.5F;
y = x;
i = Float.floatToRawIntBits(y);
i = 0x5f3759df - ( i >> 1 );
y = Float.intBitsToFloat(i);
y = y * ( 1.5F - ( x2 * y * y ) );
sqrt = (long)(1.0F/y);
} else {
sqrt = (long) Math.sqrt(x);
}
return sqrt*sqrt == x;
}
return false;
}
```

And `DurronTwo`

```
public final static boolean isPerfectSquareDurronTwo(long n) {
if(n < 0) return false;
// Needed to prevent infinite loop
if(n == 0) return true;
long x = n;
while((x & 0x3) == 0) x >>= 2;
if((x & 0x7) == 1) {
long sqrt;
if (x < 41529141369L) {
int i;
float x2, y;
x2 = x * 0.5F;
y = x;
i = Float.floatToRawIntBits(y);
//using the magic number from
//http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf
//since it more accurate
i = 0x5f375a86 - (i >> 1);
y = Float.intBitsToFloat(i);
y = y * (1.5F - (x2 * y * y));
y = y * (1.5F - (x2 * y * y)); //Newton iteration, more accurate
sqrt = (long) ((1.0F/y) + 0.2);
} else {
//Carmack hack gives incorrect answer for n >= 41529141369.
sqrt = (long) Math.sqrt(x);
}
return sqrt*sqrt == x;
}
return false;
}
```

And my benchmark harness: (Requires Google caliper 0.1-rc5)

```
public class SquareRootBenchmark {
public static class Benchmark1 extends SimpleBenchmark {
private static final int ARRAY_SIZE = 10000;
long[] trials = new long[ARRAY_SIZE];
@Override
protected void setUp() throws Exception {
Random r = new Random();
for (int i = 0; i < ARRAY_SIZE; i++) {
trials[i] = Math.abs(r.nextLong());
}
}
public int timeInternet(int reps) {
int trues = 0;
for(int i = 0; i < reps; i++) {
for(int j = 0; j < ARRAY_SIZE; j++) {
if(SquareRootAlgs.isPerfectSquareInternet(trials[j])) trues++;
}
}
return trues;
}
public int timeDurron(int reps) {
int trues = 0;
for(int i = 0; i < reps; i++) {
for(int j = 0; j < ARRAY_SIZE; j++) {
if(SquareRootAlgs.isPerfectSquareDurron(trials[j])) trues++;
}
}
return trues;
}
public int timeDurronTwo(int reps) {
int trues = 0;
for(int i = 0; i < reps; i++) {
for(int j = 0; j < ARRAY_SIZE; j++) {
if(SquareRootAlgs.isPerfectSquareDurronTwo(trials[j])) trues++;
}
}
return trues;
}
}
public static void main(String... args) {
Runner.main(Benchmark1.class, args);
}
}
```

**UPDATE:** I've made a new algorithm that is faster in some scenarios, slower in others, I've gotten different benchmarks based on different inputs. If we calculate modulo `0xFFFFFF = 3 x 3 x 5 x 7 x 13 x 17 x 241`

, we can eliminate 97.82% of numbers that cannot be squares. This can be (sort of) done in one line, with 5 bitwise operations:

```
if (!goodLookupSquares[(int) ((n & 0xFFFFFFl) + ((n >> 24) & 0xFFFFFFl) + (n >> 48))]) return false;
```

The resulting index is either 1) the residue, 2) the residue `+ 0xFFFFFF`

, or 3) the residue `+ 0x1FFFFFE`

. Of course, we need to have a lookup table for residues modulo `0xFFFFFF`

, which is about a 3mb file (in this case stored as ascii text decimal numbers, not optimal but clearly improvable with a `ByteBuffer`

and so forth. But since that is precalculation it doesn't matter so much. You can find the file here (or generate it yourself):

```
public final static boolean isPerfectSquareDurronThree(long n) {
if(n < 0) return false;
if(n == 0) return true;
long x = n;
while((x & 0x3) == 0) x >>= 2;
if((x & 0x7) == 1) {
if (!goodLookupSquares[(int) ((n & 0xFFFFFFl) + ((n >> 24) & 0xFFFFFFl) + (n >> 48))]) return false;
long sqrt;
if(x < 410881L)
{
int i;
float x2, y;
x2 = x * 0.5F;
y = x;
i = Float.floatToRawIntBits(y);
i = 0x5f3759df - ( i >> 1 );
y = Float.intBitsToFloat(i);
y = y * ( 1.5F - ( x2 * y * y ) );
sqrt = (long)(1.0F/y);
} else {
sqrt = (long) Math.sqrt(x);
}
return sqrt*sqrt == x;
}
return false;
}
```

I load it into a `boolean`

array like this:

```
private static boolean[] goodLookupSquares = null;
public static void initGoodLookupSquares() throws Exception {
Scanner s = new Scanner(new File("24residues_squares.txt"));
goodLookupSquares = new boolean[0x1FFFFFE];
while(s.hasNextLine()) {
int residue = Integer.valueOf(s.nextLine());
goodLookupSquares[residue] = true;
goodLookupSquares[residue + 0xFFFFFF] = true;
goodLookupSquares[residue + 0x1FFFFFE] = true;
}
s.close();
}
```

Example runtime. It beat `Durron`

(version one) in every trial I ran.

```
0% Scenario{vm=java, trial=0, benchmark=Internet} 40665.77 ns; ?=566.71 ns @ 10 trials
33% Scenario{vm=java, trial=0, benchmark=Durron} 38397.60 ns; ?=784.30 ns @ 10 trials
67% Scenario{vm=java, trial=0, benchmark=DurronThree} 36171.46 ns; ?=693.02 ns @ 10 trials
benchmark us linear runtime
Internet 40.7 ==============================
Durron 38.4 ============================
DurronThree 36.2 ==========================
vm: java
trial: 0
```

I want this function to work with all positive 64-bit signed integers

`Math.sqrt()`

works with doubles as input parameters, so you won't get accurate results for integers bigger than **2^53**.

An integer problem deserves an integer solution. Thus

Do binary search on the (non-negative) integers to find the greatest integer t such that `t**2 <= n`

. Then test whether `r**2 = n`

exactly. This takes time O(log n).

If you don't know how to binary search the positive integers because the set is unbounded, it's easy. You starting by computing your increasing function f (above `f(t) = t**2 - n`

) on powers of two. When you see it turn positive, you've found an upper bound. Then you can do standard binary search.

For performance, you very often have to do some compromsies. Others have expressed various methods, however, you noted Carmack's hack was faster up to certain values of N. Then, you should check the "n" and if it is less than that number N, use Carmack's hack, else use some other method described in the answers here.

The following simplification of maaartinus's solution appears to shave a few percentage points off the runtime, but I'm not good enough at benchmarking to produce a benchmark I can trust:

```
long goodMask; // 0xC840C04048404040 computed below
{
for (int i=0; i<64; ++i) goodMask |= Long.MIN_VALUE >>> (i*i);
}
public boolean isSquare(long x) {
// This tests if the 6 least significant bits are right.
// Moving the to be tested bit to the highest position saves us masking.
if (goodMask << x >= 0) return false;
// Remove an even number of trailing zeros, leaving at most one.
x >>= (Long.numberOfTrailingZeros(x) & (-2);
// Repeat the test on the 6 least significant remaining bits.
if (goodMask << x >= 0 | x <= 0) return x == 0;
// Do it in the classical way.
// The correctness is not trivial as the conversion from long to double is lossy!
final long tst = (long) Math.sqrt(x);
return tst * tst == x;
}
```

It would be worth checking how omitting the first test,

```
if (goodMask << x >= 0) return false;
```

would affect performance.

This a rework from decimal to binary of the old Marchant calculator algorithm (sorry, I don't have a reference), in Ruby, adapted specifically for this question:

```
def isexactsqrt(v)
value = v.abs
residue = value
root = 0
onebit = 1
onebit <<= 8 while (onebit < residue)
onebit >>= 2 while (onebit > residue)
while (onebit > 0)
x = root + onebit
if (residue >= x) then
residue -= x
root = x + onebit
end
root >>= 1
onebit >>= 2
end
return (residue == 0)
end
```

Here's a workup of something similar (please don't vote me down for coding style/smells or clunky O/O - it's the algorithm that counts, and C++ is not my home language). In this case, we're looking for residue == 0:

```
#include <iostream>
using namespace std;
typedef unsigned long long int llint;
class ISqrt { // Integer Square Root
llint value; // Integer whose square root is required
llint root; // Result: floor(sqrt(value))
llint residue; // Result: value-root*root
llint onebit, x; // Working bit, working value
public:
ISqrt(llint v = 2) { // Constructor
Root(v); // Take the root
};
llint Root(llint r) { // Resets and calculates new square root
value = r; // Store input
residue = value; // Initialise for subtracting down
root = 0; // Clear root accumulator
onebit = 1; // Calculate start value of counter
onebit <<= (8*sizeof(llint)-2); // Set up counter bit as greatest odd power of 2
while (onebit > residue) {onebit >>= 2; }; // Shift down until just < value
while (onebit > 0) {
x = root ^ onebit; // Will check root+1bit (root bit corresponding to onebit is always zero)
if (residue >= x) { // Room to subtract?
residue -= x; // Yes - deduct from residue
root = x + onebit; // and step root
};
root >>= 1;
onebit >>= 2;
};
return root;
};
llint Residue() { // Returns residue from last calculation
return residue;
};
};
int main() {
llint big, i, q, r, v, delta;
big = 0; big = (big-1); // Kludge for "big number"
ISqrt b; // Make q sqrt generator
for ( i = big; i > 0 ; i /= 7 ) { // for several numbers
q = b.Root(i); // Get the square root
r = b.Residue(); // Get the residue
v = q*q+r; // Recalc original value
delta = v-i; // And diff, hopefully 0
cout << i << ": " << q << " ++ " << r << " V: " << v << " Delta: " << delta << "\n";
};
return 0;
};
```

Project Euler is mentioned in the tags and many of the problems in it require checking numbers >> 2^64. Most of the optimizations mentioned above don't work easily when you are working with an 80 byte buffer.

I used java BigInteger and a slightly modified version of Newton's method, one that works better with integers. The problem was that exact squares n^2 converged to (n-1) instead of n because n^2-1 = (n-1)(n+1) and the final error was just one step below the final divisor and the algorithm terminated. It was easy to fix by adding one to the original argument before computing the error. (Add two for cube roots, etc.)

One nice attribute of this algorithm is that you can immediately tell if the number is a perfect square - the final error (not correction) in Newton's method will be zero. A simple modification also lets you quickly calculate floor(sqrt(x)) instead of the closest integer. This is handy with several Euler problems.

I checked all of the possible results when the last n bits of a square is observed. By successively examining more bits, up to 5/6th of inputs can be eliminated. I actually designed this to implement Fermat's Factorization algorithm, and it is very fast there.

```
public static boolean isSquare(final long val) {
if ((val & 2) == 2 || (val & 7) == 5) {
return false;
}
if ((val & 11) == 8 || (val & 31) == 20) {
return false;
}
if ((val & 47) == 32 || (val & 127) == 80) {
return false;
}
if ((val & 191) == 128 || (val & 511) == 320) {
return false;
}
// if((val & a == b) || (val & c == d){
// return false;
// }
if (!modSq[(int) (val % modSq.length)]) {
return false;
}
final long root = (long) Math.sqrt(val);
return root * root == val;
}
```

The last bit of pseudocode can be used to extend the tests to eliminate more values. The tests above are for k = 0, 1, 2, 3

It first tests whether it has a square residual with moduli of power of two, then it tests based on a final modulus, then it uses the Math.sqrt to do a final test. I came up with the idea from the top post, and attempted to extend upon it. I appreciate any comments or suggestions.

**Update:** Using the test by a modulus, (modSq) and a modulus base of 44352, my test runs in 96% of the time of the one in the OP's update for numbers up to 1,000,000,000.

If speed is a concern, why not partition off the most commonly used set of inputs and their values to a lookup table and then do whatever optimized magic algorithm you have come up with for the exceptional cases?

Here is the simplest and most concise way, although I do not know how it compares in terms of CPU cycles. This works great if you only wish to know if the root is a whole number. If you really care if it is an integer, you can also figure that out. Here is a simple (and pure) function:

```
public static boolean isRootWhole(double number) {
return Math.sqrt(number) % 1 == 0;
}
```

If you do not need micro-optimization, this answer is better in terms of simplicity and maintainability. If you will be getting negative numbers, perhaps you will want to use Math.abs() on the number argument as the Math.sqrt() argument.

On my 3.6Ghz Intel i7-4790 CPU, a run of this algorithm on 0 - 10,000,000 took an average of 35 - 37 nanoseconds per calculation. I did 10 sequential runs, printing the average time spent on each of the ten million sqrt calculations. Each total run took just a little over 600 ms to complete.

If you are performing a lesser number of calculations, the earlier calculations take a bit longer.

Regarding the Carmac method, it seems like it would be quite easy just to iterate once more, which should double the number of digits of accuracy. It is, after all, an extremely truncated iterative method -- Newton's, with a very good first guess.

Regarding your current best, I see two micro-optimizations:

- move the check vs. 0 after the check using mod255
- rearrange the dividing out powers of four to skip all the checks for the usual (75%) case.

I.e:

```
// Divide out powers of 4 using binary search
if((n & 0x3L) == 0) {
n >>=2;
if((n & 0xffffffffL) == 0)
n >>= 32;
if((n & 0xffffL) == 0)
n >>= 16;
if((n & 0xffL) == 0)
n >>= 8;
if((n & 0xfL) == 0)
n >>= 4;
if((n & 0x3L) == 0)
n >>= 2;
}
```

Even better might be a simple

```
while ((n & 0x03L) == 0) n >>= 2;
```

Obviously, it would be interesting to know how many numbers get culled at each checkpoint -- I rather doubt the checks are truly independent, which makes things tricky.

# Newton's Method with integer arithmetic

If you wish to avoid non-integer operations you could use the method below. It basically uses Newton's Method modified for integer arithmetic.

```
/**
* Test if the given number is a perfect square.
* @param n Must be greater than 0 and less
* than Long.MAX_VALUE.
* @return <code>true</code> if n is a perfect
* square, or <code>false</code> otherwise.
*/
public static boolean isSquare(long n)
{
long x1 = n;
long x2 = 1L;
while (x1 > x2)
{
x1 = (x1 + x2) / 2L;
x2 = n / x1;
}
return x1 == x2 && n % x1 == 0L;
}
```

This implementation can not compete with solutions that use `Math.sqrt`

. However, its performance can be improved by using the filtering mechanisms described in some of the other posts.