[java] Der schnellste Weg, um festzustellen, ob die Wurzel einer ganzen Zahl eine ganze Zahl ist



Answers

Ich bin ziemlich spät zur Party, aber ich hoffe, eine bessere Antwort zu geben; kürzer und (unter der Annahme, dass meine benchmark korrekt ist) auch viel faster .

long goodMask; // 0xC840C04048404040 computed below
{
    for (int i=0; i<64; ++i) goodMask |= Long.MIN_VALUE >>> (i*i);
}

public boolean isSquare(long x) {
    // This tests if the 6 least significant bits are right.
    // Moving the to be tested bit to the highest position saves us masking.
    if (goodMask << x >= 0) return false;
    final int numberOfTrailingZeros = Long.numberOfTrailingZeros(x);
    // Each square ends with an even number of zeros.
    if ((numberOfTrailingZeros & 1) != 0) return false;
    x >>= numberOfTrailingZeros;
    // Now x is either 0 or odd.
    // In binary each odd square ends with 001.
    // Postpone the sign test until now; handle zero in the branch.
    if ((x&7) != 1 | x <= 0) return x == 0;
    // Do it in the classical way.
    // The correctness is not trivial as the conversion from long to double is lossy!
    final long tst = (long) Math.sqrt(x);
    return tst * tst == x;
}

Der erste Test fängt die meisten Nicht-Quadrate schnell auf. Es verwendet eine 64-Item-Tabelle, die lange gepackt ist, so dass es keine Kosten für den Array-Zugriff gibt (Indirection- und Grenzen-Checks). Für eine gleichmäßig zufällige long gibt es eine Wahrscheinlichkeit von 81,25%, hier zu enden.

Der zweite Test erfasst alle Zahlen mit einer ungeraden Anzahl von Zweien in ihrer Faktorisierung. Die Methode Long.numberOfTrailingZeros ist sehr schnell, da sie in eine einzige i86-Anweisung JIT-ed wird.

Nach dem Ablegen der abschließenden Nullen behandelt der dritte Test Zahlen, die mit 011, 101 oder 111 enden, binär, die keine perfekten Quadrate sind. Es kümmert sich auch um negative Zahlen und behandelt auch 0.

Der letzte Test fällt auf die double . Da double nur eine Mantisse von 53 Bits hat, beinhaltet die Umwandlung von long zu double das Runden für große Werte. Trotzdem ist der Test korrekt (außer der proof ist falsch).

Der Versuch, die Idee von mod255 zu integrieren, war nicht erfolgreich.

Question

Ich suche nach dem schnellsten Weg, um zu bestimmen, ob ein long Wert ein perfektes Quadrat ist (dh seine Quadratwurzel ist eine andere ganze Zahl). Ich habe es einfach gemacht, indem ich die eingebaute Math.sqrt () - Funktion benutzt habe, aber ich frage mich, ob es eine Möglichkeit gibt, es schneller zu machen, indem man sich auf die Integer-Domain beschränkt. Die Pflege einer Nachschlagetabelle ist unpraktisch (da es etwa 2 31.5 Ganzzahlen gibt, deren Quadrat kleiner als 2 63 ist ).

Hier ist die sehr einfache und direkte Art, wie ich es jetzt mache:

public final static boolean isPerfectSquare(long n)
{
  if (n < 0)
    return false;

  long tst = (long)(Math.sqrt(n) + 0.5);
  return tst*tst == n;
}

Hinweise: Ich verwende diese Funktion in vielen Project Euler- Problemen. Also wird niemand sonst diesen Code beibehalten müssen. Und diese Art der Mikrooptimierung könnte tatsächlich einen Unterschied machen, da ein Teil der Herausforderung darin besteht, jeden Algorithmus in weniger als einer Minute durchzuführen, und diese Funktion muss bei einigen Problemen millionenfach aufgerufen werden.

Update 2 : Eine neue Lösung von A. Rex hat sich als noch schneller erwiesen. Bei einem Durchlauf über die ersten 1 Milliarde Ganzzahlen benötigte die Lösung nur 34% der Zeit, die die ursprüngliche Lösung verwendete. Während der John Carmack Hack ein wenig besser für kleine Werte von n ist , ist der Vorteil im Vergleich zu dieser Lösung ziemlich klein.

Hier ist die A. Rex-Lösung, die in Java konvertiert wurde:

private final static boolean isPerfectSquare(long n)
{
  // Quickfail
  if( n < 0 || ((n&2) != 0) || ((n & 7) == 5) || ((n & 11) == 8) )
    return false;
  if( n == 0 )
    return true;

  // Check mod 255 = 3 * 5 * 17, for fun
  long y = n;
  y = (y & 0xffffffffL) + (y >> 32);
  y = (y & 0xffffL) + (y >> 16);
  y = (y & 0xffL) + ((y >> 8) & 0xffL) + (y >> 16);
  if( bad255[(int)y] )
      return false;

  // Divide out powers of 4 using binary search
  if((n & 0xffffffffL) == 0)
      n >>= 32;
  if((n & 0xffffL) == 0)
      n >>= 16;
  if((n & 0xffL) == 0)
      n >>= 8;
  if((n & 0xfL) == 0)
      n >>= 4;
  if((n & 0x3L) == 0)
      n >>= 2;

  if((n & 0x7L) != 1)
      return false;

  // Compute sqrt using something like Hensel's lemma
  long r, t, z;
  r = start[(int)((n >> 3) & 0x3ffL)];
  do {
    z = n - r * r;
    if( z == 0 )
      return true;
    if( z < 0 )
      return false;
    t = z & (-z);
    r += (z & t) >> 1;
    if( r > (t  >> 1) )
    r = t - r;
  } while( t <= (1L << 33) );
  return false;
}

private static boolean[] bad255 =
{
   false,false,true ,true ,false,true ,true ,true ,true ,false,true ,true ,true ,
   true ,true ,false,false,true ,true ,false,true ,false,true ,true ,true ,false,
   true ,true ,true ,true ,false,true ,true ,true ,false,true ,false,true ,true ,
   true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,false,true ,false,
   true ,true ,true ,false,true ,true ,true ,true ,false,true ,true ,true ,false,
   true ,false,true ,true ,false,false,true ,true ,true ,true ,true ,false,true ,
   true ,true ,true ,false,true ,true ,false,false,true ,true ,true ,true ,true ,
   true ,true ,true ,false,true ,true ,true ,true ,true ,false,true ,true ,true ,
   true ,true ,false,true ,true ,true ,true ,false,true ,true ,true ,false,true ,
   true ,true ,true ,false,false,true ,true ,true ,true ,true ,true ,true ,true ,
   true ,true ,true ,true ,true ,false,false,true ,true ,true ,true ,true ,true ,
   true ,false,false,true ,true ,true ,true ,true ,false,true ,true ,false,true ,
   true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,false,true ,true ,
   false,true ,false,true ,true ,false,true ,true ,true ,true ,true ,true ,true ,
   true ,true ,true ,true ,false,true ,true ,false,true ,true ,true ,true ,true ,
   false,false,true ,true ,true ,true ,true ,true ,true ,false,false,true ,true ,
   true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,false,false,
   true ,true ,true ,true ,false,true ,true ,true ,false,true ,true ,true ,true ,
   false,true ,true ,true ,true ,true ,false,true ,true ,true ,true ,true ,false,
   true ,true ,true ,true ,true ,true ,true ,true ,false,false,true ,true ,false,
   true ,true ,true ,true ,false,true ,true ,true ,true ,true ,false,false,true ,
   true ,false,true ,false,true ,true ,true ,false,true ,true ,true ,true ,false,
   true ,true ,true ,false,true ,false,true ,true ,true ,true ,true ,true ,true ,
   true ,true ,true ,true ,true ,false,true ,false,true ,true ,true ,false,true ,
   true ,true ,true ,false,true ,true ,true ,false,true ,false,true ,true ,false,
   false,true ,true ,true ,true ,true ,false,true ,true ,true ,true ,false,true ,
   true ,false,false,true ,true ,true ,true ,true ,true ,true ,true ,false,true ,
   true ,true ,true ,true ,false,true ,true ,true ,true ,true ,false,true ,true ,
   true ,true ,false,true ,true ,true ,false,true ,true ,true ,true ,false,false,
   true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,
   false,false,true ,true ,true ,true ,true ,true ,true ,false,false,true ,true ,
   true ,true ,true ,false,true ,true ,false,true ,true ,true ,true ,true ,true ,
   true ,true ,true ,true ,true ,false,true ,true ,false,true ,false,true ,true ,
   false,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,true ,false,
   true ,true ,false,true ,true ,true ,true ,true ,false,false,true ,true ,true ,
   true ,true ,true ,true ,false,false,true ,true ,true ,true ,true ,true ,true ,
   true ,true ,true ,true ,true ,true ,false,false,true ,true ,true ,true ,false,
   true ,true ,true ,false,true ,true ,true ,true ,false,true ,true ,true ,true ,
   true ,false,true ,true ,true ,true ,true ,false,true ,true ,true ,true ,true ,
   true ,true ,true ,false,false
};

private static int[] start =
{
  1,3,1769,5,1937,1741,7,1451,479,157,9,91,945,659,1817,11,
  1983,707,1321,1211,1071,13,1479,405,415,1501,1609,741,15,339,1703,203,
  129,1411,873,1669,17,1715,1145,1835,351,1251,887,1573,975,19,1127,395,
  1855,1981,425,453,1105,653,327,21,287,93,713,1691,1935,301,551,587,
  257,1277,23,763,1903,1075,1799,1877,223,1437,1783,859,1201,621,25,779,
  1727,573,471,1979,815,1293,825,363,159,1315,183,27,241,941,601,971,
  385,131,919,901,273,435,647,1493,95,29,1417,805,719,1261,1177,1163,
  1599,835,1367,315,1361,1933,1977,747,31,1373,1079,1637,1679,1581,1753,1355,
  513,1539,1815,1531,1647,205,505,1109,33,1379,521,1627,1457,1901,1767,1547,
  1471,1853,1833,1349,559,1523,967,1131,97,35,1975,795,497,1875,1191,1739,
  641,1149,1385,133,529,845,1657,725,161,1309,375,37,463,1555,615,1931,
  1343,445,937,1083,1617,883,185,1515,225,1443,1225,869,1423,1235,39,1973,
  769,259,489,1797,1391,1485,1287,341,289,99,1271,1701,1713,915,537,1781,
  1215,963,41,581,303,243,1337,1899,353,1245,329,1563,753,595,1113,1589,
  897,1667,407,635,785,1971,135,43,417,1507,1929,731,207,275,1689,1397,
  1087,1725,855,1851,1873,397,1607,1813,481,163,567,101,1167,45,1831,1205,
  1025,1021,1303,1029,1135,1331,1017,427,545,1181,1033,933,1969,365,1255,1013,
  959,317,1751,187,47,1037,455,1429,609,1571,1463,1765,1009,685,679,821,
  1153,387,1897,1403,1041,691,1927,811,673,227,137,1499,49,1005,103,629,
  831,1091,1449,1477,1967,1677,697,1045,737,1117,1737,667,911,1325,473,437,
  1281,1795,1001,261,879,51,775,1195,801,1635,759,165,1871,1645,1049,245,
  703,1597,553,955,209,1779,1849,661,865,291,841,997,1265,1965,1625,53,
  1409,893,105,1925,1297,589,377,1579,929,1053,1655,1829,305,1811,1895,139,
  575,189,343,709,1711,1139,1095,277,993,1699,55,1435,655,1491,1319,331,
  1537,515,791,507,623,1229,1529,1963,1057,355,1545,603,1615,1171,743,523,
  447,1219,1239,1723,465,499,57,107,1121,989,951,229,1521,851,167,715,
  1665,1923,1687,1157,1553,1869,1415,1749,1185,1763,649,1061,561,531,409,907,
  319,1469,1961,59,1455,141,1209,491,1249,419,1847,1893,399,211,985,1099,
  1793,765,1513,1275,367,1587,263,1365,1313,925,247,1371,1359,109,1561,1291,
  191,61,1065,1605,721,781,1735,875,1377,1827,1353,539,1777,429,1959,1483,
  1921,643,617,389,1809,947,889,981,1441,483,1143,293,817,749,1383,1675,
  63,1347,169,827,1199,1421,583,1259,1505,861,457,1125,143,1069,807,1867,
  2047,2045,279,2043,111,307,2041,597,1569,1891,2039,1957,1103,1389,231,2037,
  65,1341,727,837,977,2035,569,1643,1633,547,439,1307,2033,1709,345,1845,
  1919,637,1175,379,2031,333,903,213,1697,797,1161,475,1073,2029,921,1653,
  193,67,1623,1595,943,1395,1721,2027,1761,1955,1335,357,113,1747,1497,1461,
  1791,771,2025,1285,145,973,249,171,1825,611,265,1189,847,1427,2023,1269,
  321,1475,1577,69,1233,755,1223,1685,1889,733,1865,2021,1807,1107,1447,1077,
  1663,1917,1129,1147,1775,1613,1401,555,1953,2019,631,1243,1329,787,871,885,
  449,1213,681,1733,687,115,71,1301,2017,675,969,411,369,467,295,693,
  1535,509,233,517,401,1843,1543,939,2015,669,1527,421,591,147,281,501,
  577,195,215,699,1489,525,1081,917,1951,2013,73,1253,1551,173,857,309,
  1407,899,663,1915,1519,1203,391,1323,1887,739,1673,2011,1585,493,1433,117,
  705,1603,1111,965,431,1165,1863,533,1823,605,823,1179,625,813,2009,75,
  1279,1789,1559,251,657,563,761,1707,1759,1949,777,347,335,1133,1511,267,
  833,1085,2007,1467,1745,1805,711,149,1695,803,1719,485,1295,1453,935,459,
  1151,381,1641,1413,1263,77,1913,2005,1631,541,119,1317,1841,1773,359,651,
  961,323,1193,197,175,1651,441,235,1567,1885,1481,1947,881,2003,217,843,
  1023,1027,745,1019,913,717,1031,1621,1503,867,1015,1115,79,1683,793,1035,
  1089,1731,297,1861,2001,1011,1593,619,1439,477,585,283,1039,1363,1369,1227,
  895,1661,151,645,1007,1357,121,1237,1375,1821,1911,549,1999,1043,1945,1419,
  1217,957,599,571,81,371,1351,1003,1311,931,311,1381,1137,723,1575,1611,
  767,253,1047,1787,1169,1997,1273,853,1247,413,1289,1883,177,403,999,1803,
  1345,451,1495,1093,1839,269,199,1387,1183,1757,1207,1051,783,83,423,1995,
  639,1155,1943,123,751,1459,1671,469,1119,995,393,219,1743,237,153,1909,
  1473,1859,1705,1339,337,909,953,1771,1055,349,1993,613,1393,557,729,1717,
  511,1533,1257,1541,1425,819,519,85,991,1693,503,1445,433,877,1305,1525,
  1601,829,809,325,1583,1549,1991,1941,927,1059,1097,1819,527,1197,1881,1333,
  383,125,361,891,495,179,633,299,863,285,1399,987,1487,1517,1639,1141,
  1729,579,87,1989,593,1907,839,1557,799,1629,201,155,1649,1837,1063,949,
  255,1283,535,773,1681,461,1785,683,735,1123,1801,677,689,1939,487,757,
  1857,1987,983,443,1327,1267,313,1173,671,221,695,1509,271,1619,89,565,
  127,1405,1431,1659,239,1101,1159,1067,607,1565,905,1755,1231,1299,665,373,
  1985,701,1879,1221,849,627,1465,789,543,1187,1591,923,1905,979,1241,181
};

Update : Ich habe die verschiedenen unten aufgeführten Lösungen ausprobiert.

  • Nach gründlichen Tests habe ich festgestellt, dass das Hinzufügen von 0.5 zum Ergebnis von Math.sqrt () nicht notwendig ist, zumindest nicht auf meiner Maschine.
  • Der John-Carmack-Hack war schneller, gab jedoch falsche Ergebnisse ab n = 410881. Wie von BobbyShaftoe , können wir jedoch den Carmack-Hack für n <410881 verwenden.
  • Newtons Methode war ein gutes Stück langsamer als Math.sqrt() . Dies liegt wahrscheinlich daran, dass Math.sqrt() etwas ähnlich der Newton-Methode verwendet, aber in der Hardware implementiert ist, so dass es viel schneller ist als in Java. Auch Newtons Methode erforderte immer noch die Verwendung von Doppelpunkten.
  • Eine modifizierte Newton-Methode, die einige Tricks verwendete, so dass nur Integer-Mathe beteiligt war, erforderte einige Hacks, um einen Überlauf zu vermeiden (ich möchte, dass diese Funktion mit allen positiven 64-Bit-Ganzzahlen mit Math.sqrt() ), und sie war immer noch langsamer als Math.sqrt() .
  • Binary Chop war noch langsamer. Dies ist sinnvoll, da das binäre Chop im Durchschnitt 16 Durchläufe benötigt, um die Quadratwurzel einer 64-Bit-Zahl zu finden.

Der einzige Vorschlag, der Verbesserungen zeigte, wurde von John D. Cook gemacht . Sie können beobachten, dass die letzte hexadezimale Zahl (dh die letzten 4 Bits) eines perfekten Quadrats 0, 1, 4 oder 9 sein muss. Dies bedeutet, dass 75% der Zahlen sofort als mögliche Quadrate eliminiert werden können. Die Implementierung dieser Lösung führte zu einer Reduzierung der Laufzeit um etwa 50%.

Ausgehend von Johns Vorschlag untersuchte ich Eigenschaften der letzten n Bits eines perfekten Quadrats. Durch Analyse der letzten 6 Bits habe ich festgestellt, dass nur 12 von 64 Werten für die letzten 6 Bits möglich sind. Dies bedeutet, dass 81% der Werte eliminiert werden können, ohne irgendeine Mathematik zu verwenden. Durch die Implementierung dieser Lösung wurde die Laufzeit um weitere 8% reduziert (im Vergleich zu meinem ursprünglichen Algorithmus). Das Analysieren von mehr als 6 Bits führt zu einer Liste möglicher Endbits, die zu groß ist, um praktisch zu sein.

Hier ist der Code, den ich verwendet habe, der in 42% der Zeit läuft, die vom ursprünglichen Algorithmus benötigt wird (basierend auf einem Durchlauf über die ersten 100 Millionen ganzen Zahlen). Für Werte von n weniger als 410881 läuft es in nur 29% der Zeit, die vom ursprünglichen Algorithmus benötigt wird.

private final static boolean isPerfectSquare(long n)
{
  if (n < 0)
    return false;

  switch((int)(n & 0x3F))
  {
  case 0x00: case 0x01: case 0x04: case 0x09: case 0x10: case 0x11:
  case 0x19: case 0x21: case 0x24: case 0x29: case 0x31: case 0x39:
    long sqrt;
    if(n < 410881L)
    {
      //John Carmack hack, converted to Java.
      // See: http://www.codemaestro.com/reviews/9
      int i;
      float x2, y;

      x2 = n * 0.5F;
      y  = n;
      i  = Float.floatToRawIntBits(y);
      i  = 0x5f3759df - ( i >> 1 );
      y  = Float.intBitsToFloat(i);
      y  = y * ( 1.5F - ( x2 * y * y ) );

      sqrt = (long)(1.0F/y);
    }
    else
    {
      //Carmack hack gives incorrect answer for n >= 410881.
      sqrt = (long)Math.sqrt(n);
    }
    return sqrt*sqrt == n;

  default:
    return false;
  }
}

Anmerkungen :

  • Laut Johns Tests ist die Verwendung von Anweisungen in C ++ schneller als die Verwendung eines switch , aber in Java und C # scheint es keinen Unterschied zwischen und zu geben.
  • Ich habe auch versucht, eine Nachschlagetabelle zu erstellen (als privates statisches Array mit 64 booleschen Werten). Dann würde ich anstelle von entweder switch oder or statement einfach if(lookup[(int)(n&0x3F)]) { test } else return false; sagen if(lookup[(int)(n&0x3F)]) { test } else return false; . Zu meiner Überraschung war dies (nur geringfügig) langsamer. Ich bin mir nicht sicher warum. Dies liegt daran, dass Array-Grenzen in Java überprüft werden .



If you want speed, given that your integers are of finite size, I suspect that the quickest way would involve (a) partitioning the parameters by size (eg into categories by largest bit set), then checking the value against an array of perfect squares within that range.




This is the fastest Java implementation I could come up with, using a combination of techniques suggested by others in this thread.

  • Mod-256 test
  • Inexact mod-3465 test (avoids integer division at the cost of some false positives)
  • Floating-point square root, round and compare with input value

I also experimented with these modifications but they did not help performance:

  • Additional mod-255 test
  • Dividing the input value by powers of 4
  • Fast Inverse Square Root (to work for high values of N it needs 3 iterations, enough to make it slower than the hardware square root function.)

public class SquareTester {

    public static boolean isPerfectSquare(long n) {
        if (n < 0) {
            return false;
        } else {
            switch ((byte) n) {
            case -128: case -127: case -124: case -119: case -112:
            case -111: case -103: case  -95: case  -92: case  -87:
            case  -79: case  -71: case  -64: case  -63: case  -60:
            case  -55: case  -47: case  -39: case  -31: case  -28:
            case  -23: case  -15: case   -7: case    0: case    1:
            case    4: case    9: case   16: case   17: case   25:
            case   33: case   36: case   41: case   49: case   57:
            case   64: case   65: case   68: case   73: case   81:
            case   89: case   97: case  100: case  105: case  113:
            case  121:
                long i = (n * INV3465) >>> 52;
                if (! good3465[(int) i]) {
                    return false;
                } else {
                    long r = round(Math.sqrt(n));
                    return r*r == n; 
                }
            default:
                return false;
            }
        }
    }

    private static int round(double x) {
        return (int) Double.doubleToRawLongBits(x + (double) (1L << 52));
    }

    /** 3465<sup>-1</sup> modulo 2<sup>64</sup> */
    private static final long INV3465 = 0x8ffed161732e78b9L;

    private static final boolean[] good3465 =
        new boolean[0x1000];

    static {
        for (int r = 0; r < 3465; ++ r) {
            int i = (int) ((r * r * INV3465) >>> 52);
            good3465[i] = good3465[i+1] = true;
        }
    }

}



You should get rid of the 2-power part of N right from the start.

2nd Edit The magical expression for m below should be

m = N - (N & (N-1));

and not as written

End of 2nd edit

m = N & (N-1); // the lawest bit of N
N /= m;
byte = N & 0x0F;
if ((m % 2) || (byte !=1 && byte !=9))
  return false;

1st Edit:

Minor improvement:

m = N & (N-1); // the lawest bit of N
N /= m;
if ((m % 2) || (N & 0x07 != 1))
  return false;

End of 1st edit

Now continue as usual. This way, by the time you get to the floating point part, you already got rid of all the numbers whose 2-power part is odd (about half), and then you only consider 1/8 of whats left. Ie you run the floating point part on 6% of the numbers.




I want this function to work with all positive 64-bit signed integers

Math.sqrt() works with doubles as input parameters, so you won't get accurate results for integers bigger than 2^53 .




Considering for general bit length (though I have used specific type here), I tried to design simplistic algo as below. Simple and obvious check for 0,1,2 or <0 is required initially. Following is simple in sense that it doesn't try to use any existing maths functions. Most of the operator can be replaced with bit-wise operators. I haven't tested with any bench mark data though. I'm neither expert at maths or computer algorithm design in particular, I would love to see you pointing out problem. I know there is lots of improvement chances there.

int main()
{
    unsigned int c1=0 ,c2 = 0;  
    unsigned int x = 0;  
    unsigned int p = 0;  
    int k1 = 0;  
    scanf("%d",&p);  
    if(p % 2 == 0) {  
        x = p/2; 
    }  
    else {  
        x = (p/2) +1;  
    }  
    while(x) 
    {
        if((x*x) > p) {  
            c1 = x;  
            x = x/2; 
        }else {  
            c2 = x;  
            break;  
        }  
    }  
    if((p%2) != 0)  
        c2++;

    while(c2 < c1) 
    {  
        if((c2 * c2 ) == p) {  
            k1 = 1;  
            break;  
        }  
        c2++; 
    }  
    if(k1)  
        printf("\n Perfect square for %d", c2);  
    else  
        printf("\n Not perfect but nearest to :%d :", c2);  
    return 0;  
}  



I was thinking about the horrible times I've spent in Numerical Analysis course.

And then I remember, there was this function circling around the 'net from the Quake Source code:

float Q_rsqrt( float number )
{
  long i;
  float x2, y;
  const float threehalfs = 1.5F;

  x2 = number * 0.5F;
  y  = number;
  i  = * ( long * ) &y;  // evil floating point bit level hacking
  i  = 0x5f3759df - ( i >> 1 ); // wtf?
  y  = * ( float * ) &i;
  y  = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
  // y  = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed

  #ifndef Q3_VM
  #ifdef __linux__
    assert( !isnan(y) ); // bk010122 - FPE?
  #endif
  #endif
  return y;
}

Which basically calculates a square root, using Newton's approximation function (cant remember the exact name).

It should be usable and might even be faster, it's from one of the phenomenal id software's game!

It's written in C++ but it should not be too hard to reuse the same technique in Java once you get the idea:

I originally found it at: http://www.codemaestro.com/reviews/9

Newton's method explained at wikipedia: http://en.wikipedia.org/wiki/Newton%27s_method

You can follow the link for more explanation of how it works, but if you don't care much, then this is roughly what I remember from reading the blog and from taking the Numerical Analysis course:

  • the * (long*) &y is basically a fast convert-to-long function so integer operations can be applied on the raw bytes.
  • the 0x5f3759df - (i >> 1); line is a pre-calculated seed value for the approximation function.
  • the * (float*) &i converts the value back to floating point.
  • the y = y * ( threehalfs - ( x2 * y * y ) ) line bascially iterates the value over the function again.

The approximation function gives more precise values the more you iterate the function over the result. In Quake's case, one iteration is "good enough", but if it wasn't for you... then you could add as much iteration as you need.

This should be faster because it reduces the number of division operations done in naive square rooting down to a simple divide by 2 (actually a * 0.5F multiply operation) and replace it with a few fixed number of multiplication operations instead.




It ought to be possible to pack the 'cannot be a perfect square if the last X digits are N' much more efficiently than that! I'll use java 32 bit ints, and produce enough data to check the last 16 bits of the number - that's 2048 hexadecimal int values.

...

Ok. Either I have run into some number theory that is a little beyond me, or there is a bug in my code. In any case, here is the code:

public static void main(String[] args) {
    final int BITS = 16;

    BitSet foo = new BitSet();

    for(int i = 0; i< (1<<BITS); i++) {
        int sq = (i*i);
        sq = sq & ((1<<BITS)-1);
        foo.set(sq);
    }

    System.out.println("int[] mayBeASquare = {");

    for(int i = 0; i< 1<<(BITS-5); i++) {
        int kk = 0;
        for(int j = 0; j<32; j++) {
            if(foo.get((i << 5) | j)) {
                kk |= 1<<j;
            }
        }
        System.out.print("0x" + Integer.toHexString(kk) + ", ");
        if(i%8 == 7) System.out.println();
    }
    System.out.println("};");
}

und hier sind die Ergebnisse:

(ed: elided for poor performance in prettify.js; view revision history to see.)




I like the idea to use an almost correct method on some of the input. Here is a version with a higher "offset". The code seems to work and passes my simple test case.

Just replace your:

if(n < 410881L){...}

code with this one:

if (n < 11043908100L) {
    //John Carmack hack, converted to Java.
    // See: http://www.codemaestro.com/reviews/9
    int i;
    float x2, y;

    x2 = n * 0.5F;
    y = n;
    i = Float.floatToRawIntBits(y);
    //using the magic number from 
    //http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf
    //since it more accurate
    i = 0x5f375a86 - (i >> 1);
    y = Float.intBitsToFloat(i);
    y = y * (1.5F - (x2 * y * y));
    y = y * (1.5F - (x2 * y * y)); //Newton iteration, more accurate

    sqrt = Math.round(1.0F / y);
} else {
    //Carmack hack gives incorrect answer for n >= 11043908100.
    sqrt = (long) Math.sqrt(n);
}



It should be much faster to use http://en.wikipedia.org/wiki/Newton%27s_method to calculate the Integer Square Root , then square this number and check, as you do in your current solution. Newton's method is the basis for the Carmack solution mentioned in some other answers. You should be able to get a faster answer since you're only interested in the integer part of the root, allowing you to stop the approximation algorithm sooner.

Another optimization that you can try: If the Digital Root of a number doesn't end in 1, 4, 7, or 9 the number is not a perfect square. This can be used as a quick way to eliminate 60% of your inputs before applying the slower square root algorithm.




An integer problem deserves an integer solution. Thus

Do binary search on the (non-negative) integers to find the greatest integer t such that t**2 <= n . Then test whether r**2 = n exactly. This takes time O(log n).

If you don't know how to binary search the positive integers because the set is unbounded, it's easy. You starting by computing your increasing function f (above f(t) = t**2 - n ) on powers of two. When you see it turn positive, you've found an upper bound. Then you can do standard binary search.




The sqrt call is not perfectly accurate, as has been mentioned, but it's interesting and instructive that it doesn't blow away the other answers in terms of speed. After all, the sequence of assembly language instructions for a sqrt is tiny. Intel has a hardware instruction, which isn't used by Java I believe because it doesn't conform to IEEE.

So why is it slow? Because Java is actually calling a C routine through JNI, and it's actually slower to do so than to call a Java subroutine, which itself is slower than doing it inline. This is very annoying, and Java should have come up with a better solution, ie building in floating point library calls if necessary. Oh well.

In C++, I suspect all the complex alternatives would lose on speed, but I haven't checked them all. What I did, and what Java people will find usefull, is a simple hack, an extension of the special case testing suggested by A. Rex. Use a single long value as a bit array, which isn't bounds checked. That way, you have 64 bit boolean lookup.

typedef unsigned long long UVLONG
UVLONG pp1,pp2;

void init2() {
  for (int i = 0; i < 64; i++) {
    for (int j = 0; j < 64; j++)
      if (isPerfectSquare(i * 64 + j)) {
    pp1 |= (1 << j);
    pp2 |= (1 << i);
    break;
      }
   }
   cout << "pp1=" << pp1 << "," << pp2 << "\n";  
}


inline bool isPerfectSquare5(UVLONG x) {
  return pp1 & (1 << (x & 0x3F)) ? isPerfectSquare(x) : false;
}

The routine isPerfectSquare5 runs in about 1/3 the time on my core2 duo machine. I suspect that further tweaks along the same lines could reduce the time further on average, but every time you check, you are trading off more testing for more eliminating, so you can't go too much farther on that road.

Certainly, rather than having a separate test for negative, you could check the high 6 bits the same way.

Note that all I'm doing is eliminating possible squares, but when I have a potential case I have to call the original, inlined isPerfectSquare.

The init2 routine is called once to initialize the static values of pp1 and pp2. Note that in my implementation in C++, I'm using unsigned long long, so since you're signed, you'd have to use the >>> operator.

There is no intrinsic need to bounds check the array, but Java's optimizer has to figure this stuff out pretty quickly, so I don't blame them for that.




Regarding the Carmac method, it seems like it would be quite easy just to iterate once more, which should double the number of digits of accuracy. It is, after all, an extremely truncated iterative method -- Newton's, with a very good first guess.

Regarding your current best, I see two micro-optimizations:

  • move the check vs. 0 after the check using mod255
  • rearrange the dividing out powers of four to skip all the checks for the usual (75%) case.

Ie:

// Divide out powers of 4 using binary search

if((n & 0x3L) == 0) {
  n >>=2;

  if((n & 0xffffffffL) == 0)
    n >>= 32;
  if((n & 0xffffL) == 0)
      n >>= 16;
  if((n & 0xffL) == 0)
      n >>= 8;
  if((n & 0xfL) == 0)
      n >>= 4;
  if((n & 0x3L) == 0)
      n >>= 2;
}

Even better might be a simple

while ((n & 0x03L) == 0) n >>= 2;

Obviously, it would be interesting to know how many numbers get culled at each checkpoint -- I rather doubt the checks are truly independent, which makes things tricky.




If you do a binary chop to try to find the "right" square root, you can fairly easily detect if the value you've got is close enough to tell:

(n+1)^2 = n^2 + 2n + 1
(n-1)^2 = n^2 - 2n + 1

So having calculated n^2 , the options are:

  • n^2 = target : done, return true
  • n^2 + 2n + 1 > target > n^2 : you're close, but it's not perfect: return false
  • n^2 - 2n + 1 < target < n^2 : ditto
  • target < n^2 - 2n + 1 : binary chop on a lower n
  • target > n^2 + 2n + 1 : binary chop on a higher n

(Sorry, this uses n as your current guess, and target for the parameter. Apologise for the confusion!)

I don't know whether this will be faster or not, but it's worth a try.

EDIT: The binary chop doesn't have to take in the whole range of integers, either (2^x)^2 = 2^(2x) , so once you've found the top set bit in your target (which can be done with a bit-twiddling trick; I forget exactly how) you can quickly get a range of potential answers. Mind you, a naive binary chop is still only going to take up to 31 or 32 iterations.






Links