c++ does - Bit counting in a contiguous memory chunk



pointers use (8)

I was asked in an interview the following question.

int countSetBits(void *ptr, int start, int end); 

Synopsis: Assume that ptr points to a big chunk of memory. Viewing this memory as contiguous sequence of bits, start and end are bit positions. Assume start and end have proper values and ptr is pointing to an initialized chunck of memory.

Question: Write a C code to count number of bits set from start to end [inclusive] and return the count.

Just to make it more clear

 ptr---->+-------------------------------+
         | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
         +-------------------------------+
         | 8 | 9 |                   |15 |
         +-------------------------------+
         |                               |
         +-------------------------------+
              ...
              ...
         +-------------------------------+
         |               | S |           |
         +-------------------------------+
              ...
              ...
         +-------------------------------+
         |    | E |                      |
         +-------------------------------+
              ...
              ...

My solution:

int countSetBits(void *ptr, int start, int end )
{
    int count = 0, idx; 

    char *ch; 

    for (idx = start; idx <= end; idx++) 
    {     ch = ptr + (idx/8); 

          if((128 >> (idx%8)) & (*ch)) 
          {
                   count++; 
          }
    }

    return count; 
}

I gave a very lengthy and somewhat inefficient code during the interview. I worked on it later and came up with above solution.

I am very sure SO community can provide more elegant solution. I am just curious to see their response.

PS: Above code is not compiled. It is more like a pseudo code and may contain errors.


Answers

Disclaimer: No attempt to compile the following code has been made.

/*
 * Table counting the number of set bits in a byte.
 * The byte is the index to the table.
 */
uint8_t  table[256] = {...};

/***************************************************************************
 *
 * countBits - count the number of set bits in a range
 *
 * The most significant bit in the byte is considered to be bit 0.
 *
 * RETURNS: 0 on success, -1 on failure
 */
int countBits (
    uint8_t *  buffer,
    int        startBit,  /* starting bit */
    int        endBit,    /* End-bit (inlcusive) */
    unsigned * pTotal     /* Output: number of consecutively set bits */
    ) {
    int      numBits;     /* number of bits left to check */
    int      mask;        /* mask to apply to byte from <buffer> */
    int      bits;        /* # of bits to end of byte */
    unsigned count = 0;   /* total number of bits set */
    uint8_t  value;       /* value read from the buffer */

    /* Return -1 if parameters fail sanity check (skipped) */

    numBits   = (endBit - startBit) + 1;

    index  = startBit >> 3;
    bits   = 8 - (startBit & 7);
    mask   = (1 << bits) - 1;

    value = buffer[index] & mask;  /* mask-out any bits preceding <startBit> */
    numBits -= bits;

    while (numBits > 0) {          /* Note: if <startBit> and <endBit> are in */
        count += table[value];     /* same byte, this loop gets skipped. */
        index++;
        value = buffer[index];
        numBits -= 8;
    }

    if (numBits < 0) {             /* mask-out any bits following <endBit> */
        bits   = 8 - (endBit & 7);
        mask   = 0xff << bits;
        value &= mask;
    }

    count += table[value];

    *pTotal = count;
    return 0;
}

Edit: Function header updated.


There are many ways to solve the problem. This is a good post that compares the performance of the most common options.


The most quick and efficient way to my opinion is to use a table of 256 entries, where every element represents number of bits in the index. Index is a next byte from the memory location.

something like this:

int bit_table[256] = {0, 1, 1, 2, 1, ...};
char* p = ptr + start;
int count = 0;
for (p; p != ptr + end; p++)
    count += bit_table[*(unsigned char*)p];

Depending on the industry you applied in, look-up tables might not be an acceptable means of optimization while platform / compiler specific optimizations are. Knowing that most compilers and CPU instruction sets have a pop count instruction, I'd go for this. It's a simplicity vs. performance trade-off though because right now I'm still iterating over a list of chars.

Also note that contrary to most answers I assume start and end are byte-offsets because it's not specified in the question that they're not and it's the default in most cases.

int countSetBits(void *ptr, int start, int end )
{
    assert(start < end);

    unsigned char *s = ((unsigned char*)ptr + start);
    unsigned char *e = ((unsigned char*)ptr + end);

    int r = 0;

    while(s != e)
    {
        // __builtin_clz is not defined for 0 input.
        if(*s) r += 32 - __builtin_clz(*s);
        s++;
    }

    return r;
}

You might find this page interesting, it contains several alternative solutions for your problem.


The version of @dimitri is likely the fastest. But it is difficult to build the table of bit counts for all 128 8-bit chars in an interview. You can get a very fast version with a table for 16 hex numbers 0x0, 0x1, ..., 0xF, that you can build easily:

int countBits(void *ptr, int start, int end) {
    // start, end are byte indexes
    int hexCounts[16] =   {0, 1, 1, 2,   1, 2, 2, 3,
                           1, 2, 3, 3,   2, 3, 3, 4}; 
    unsigned char * pstart = (unsigned char *) ptr + start;
    unsigned char * pend = (unsigned char *) ptr + end;
    int count = 0;
    for (unsigned char * p = pstart; p <= pend; ++p) {
        unsigned char b = *p;
        count += hexCounts[b & 0x0F] + hexCounts[(b >> 4) & 0x0F];
    }
    return count;
}

EDIT: If start and end are bit indexes then the bits in the first and last bytes would be counted first before the above function is called:

int countBits2(void *ptr, int start, int end) {
    // start, end are bit indexes
    if (start > end) return 0;
    int count = 0;
    unsigned char* pstart = (unsigned char *) ptr + start/8; // first byte
    unsigned char* pend = (unsigned char *) ptr + end/8;     // last byte
    int istart = start % 8;                                  // index in first byte
    int iend = end % 8;                                      // index in last byte 
    unsigned char b = *pstart;                               // byte
    if (pstart == pend) {                                    // count in 1 byte only
        b = b << istart;
        for (int i = istart; i <= iend; ++i) {               // between istart, iend
            if (b & 0x80) ++count; 
            b = b << 1;
        }
    }
    else {                                                   // count in 2 bytes
        for (int i = istart; i < 8; ++i) {                   // from istart to 7
            if (b & 1) ++count; 
            b = b >> 1;
        }
        b = *pend;
        for (int i = 0; i <= iend; ++i) {                    // from 0 to iend
            if (b & 0x80) ++count; 
            b = b << 1;
        }
    }
    return count + countBits(ptr, start/8 + 1, end/8 - 1);
}

Boundary conditions, they get no respect...

Everyone here seems to be concentrating on the lookup table to count the bits. And that's OK, but I think that even more important when answering an interview question is to make sure you handle the boundary conditions.

The look up table is just an optimization. It's much more important to get the answer right than to get it fast. If this were my interview, going straight for the lookup table without even mentioning that there are some tricky details about handling the first few and last few bits that aren't on full-byte boundaries would be worse than coming up with a solution that counted each bit ploddingly, but got the boundary conditions right.

So I think Bhaskar's solution in his question is probably superior to the most of the answers mentioned here - it seems to handle the boundary conditions.

Here's a solution that uses a lookup table and tries to still handle the boundaries (it's only lightly tested, so I won't claim that it's 100% correct). It's also uglier than I'd like, but it's late:

typedef unsigned char uint8_t;

static
size_t bits_in_byte( uint8_t val)
{
    static int const half_byte[] = { 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4 };

    int result1 = half_byte[val & 0x0f];
    int result2 = half_byte[(val >> 4) & 0x0f];

    return result1 + result2;
}


int countSetBits( void* ptr, int start, int end) 
{
    uint8_t*    first;
    uint8_t*    last;
    int         bits_first;
    int         bits_last;
    uint8_t     mask_first;
    uint8_t     mask_last;

    size_t count = 0;

    // get bits from the first byte
    first = ((uint8_t*) ptr) + (start / 8);
    bits_first = 8 - start % 8;
    mask_first = (1 << bits_first) - 1;
    mask_first = mask_first << (8 - bits_first);


    // get bits from last byte
    last = ((uint8_t*) ptr) + (end / 8);
    bits_last = 1 + (end % 8);
    mask_last = (1 << bits_last) - 1;

    if (first == last) {
        // we only have a range of bits in  the first byte
        count = bits_in_byte( (*first) & mask_first & mask_last);        
    }
    else {
        // handle the bits from the first and last bytes specially
        count += bits_in_byte((*first) & mask_first);
        count += bits_in_byte((*last) & mask_last);

        // now we've collected the odds and ends from the start and end of the bit range
        // handle the full bytes in the interior of the range

        for (first = first+1; first != last; ++first) {
            count += bits_in_byte(*first);
        }
    }

    return count;
}

Note that a detail that would have to be worked out as part of the interview is whether the bits within a byte are indexed starting at the least-significant-bit (lsb) or most-significant-bit (msb). In other words, if the start index were specified as 0, would a byte with the value 0x01 or a byte with the value 0x80 have the bit set in that index? Sort of like deciding whether the indexes consider the bit order within a byte as big-endian or little-endian.

There's no 'right' answer for this - the interviewer would have to specify what the behavior should be. I'll also note that my example solution handles this in the opposite way to the OP's example code (I was going by how I interpreted the diagram, with the indexes reading as 'bit numbers' as well). The OPs' solution considers the bit order as big-endian, my function treats them as little-endian. So even though both handle partial bytes at the star & end of the range, they'll give different answers. Which is the right answer depends on what the actual spec for the problem is.


Every 1-bit in the multiplier is used to copy one of the bits into its correct position:

  • 1 is already in the correct position, so multiply by 0x0000000000000001.
  • 2 must be shifted 7 bit positions to the left, so we multiply by 0x0000000000000080 (bit 7 is set).
  • 3 must be shifted 14 bit positions to the left, so we multiply by 0x0000000000000400 (bit 14 is set).
  • and so on until
  • 8 must be shifted 49 bit positions to the left, so we multiply by 0x0002000000000000 (bit 49 is set).

The multiplier is the sum of the multipliers for the individual bits.

This only works because the bits to be collected are not too close together, so that the multiplication of bits which do not belong together in our scheme either fall beyond the 64 bit or in the lower don't-care part.

Note that the other bits in the original number must be 0. This can be achieved by masking them with an AND operation.





c++ c algorithm bit-manipulation