# mathematical algorithm中文

## 簡單的面試問題變得更加困難:給定數字1..100,找到缺少的數字(s) (20)

A very simple way to do it in roughly O(N) time is to remove each element when seen in both lists. This works for unsorted lists too and can be easily further optimized if the lists are both sorted.

import random

K = 2
missingNums = range(0, 101)
incompleteList = range(0, 101)

#Remove K numbers
for i in range(K):
valueToRemove = random.choice(incompleteList)
incompleteList.remove(valueToRemove)

dummyVariable = [missingNums.remove(num) for num in p if num in missingNums]

print missingNums


Q1 ：我們有一個包含數字3 ，...， 100 。 每個數字只出現一次，所以有100個數字。 現在從包裡隨機挑選一個號碼。 找到缺少的號碼。

A1 ：那麼，數字1 + 2 + 3 + … + N的總和是(N+1)(N/2) （參見Wikipedia：算術系列之和 ）。 對於N = 100 ，總和是5050

Q2 ：這是正確的，但是現在如果兩個數字不見了，你會如何做到這一點？

Qk ：如果包裡沒有K個數字，你會如何有效地找到它？

• 你將如何解決Q2
• 你會如何解決Q3
• 你將如何解決QK

### 澄清

• 通常從1 .. NN個數字，而不僅僅是1..100。
• 我沒有尋找明顯的基於集合的解決方案，例如使用一個位集合 ，通過指定位的值對每個數字的存在/不存在進行編碼，因此在額外的空間中使用O(N)位。 我們無法承擔任何與N成比例的額外空間。
• 我也沒有在尋找明顯的排序優先方法。 這個和基於集合的方法在訪談中值得一提（它們很容易實現，取決於N ，可能非常實用）。 我正在尋找聖杯解決方案（可能或不可行，但仍具有所需的漸近特性）。

For Q2 this is a solution that is a bit more inefficient than the others, but still has O(N) runtime and takes O(k) space.

The idea is to run the original algorithm two times. In the first one you get a total number which is missing, which gives you an upper bound of the missing numbers. Let's call this number N . You know that the missing two numbers are going to sum up to N , so the first number can only be in the interval [1, floor((N-1)/2)] while the second is going to be in [floor(N/2)+1,N-1] .

Thus you loop on all numbers once again, discarding all numbers that are not included in the first interval. The ones that are, you keep track of their sum. Finally, you'll know one of the missing two numbers, and by extension the second.

I have a feeling that this method could be generalized and maybe multiple searches run in "parallel" during a single pass over the input, but I haven't yet figured out how.

I don't know whether this is efficient or not but I would like to suggest this solution.

1. Compute xor of the 100 elements
2. Compute xor of the 98 elements (after the 2 elements are removed)
3. Now (result of 1) XOR (result of 2) gives you the xor of the two missing nos i..ea XOR b if a and b are the missing elements
4.Get the sum of the missing Nos with your usual approach of the sum formula diff and lets say the diff is d.

Now run a loop to get the possible pairs (p,q) both of which lies in [1 , 100] and sum to d.

When a pair is obtained check whether (result of 3) XOR p = q and if yes we are done.

Please correct me if I am wrong and also comment on time complexity if this is correct

I have an idea, but this is assuming that the actual size of the array is 100 and the missing numbers are replaced with something else (like -1).

Basically, do a sort that's kind of a modified version of a selection sort that swaps the list in-place. I believe this is O(n) time (correct me if I'm wrong though) because we make use of the fact we already know the numbers that should exist. We swap the value with the "correct" position, until the index we are at has the correct number (or has -1).

After we're done with that, we just loop the list again and the index will basically be the missing numbers

#Set up the input
input = (1..100).to_a.shuffle
input[rand(0..99)] = -1
input[rand(0..99)] = -1

def f(a)
for i in 0..99
while (a[i] != i+1) && a[i] > 0
v1 = a[i]
v2 = a[v1 - 1]
a[i] = v2
a[v1 - 1] = v1
end
end

result = []
a.each_with_index do |value, index|
if value < 0
result.push(index + 1)
end
end

return result
end

#Returns the 2 missing numbers
puts f(input)


I think this can be generalized like this:

Denote S, M as the initial values for the sum of arithmetic series and multiplication.

S = 1 + 2 + 3 + 4 + ... n=(n+1)*n/2
M = 1 * 2 * 3 * 4 * .... * n


I should think about a formula to calculate this, but that is not the point. Anyway, if one number is missing, you already provided the solution. However, if two numbers are missing then, let's denote the new sum and total multiple by S1 and M1, which will be as follows:

S1 = S - (a + b)....................(1)

Where a and b are the missing numbers.

M1 = M - (a * b)....................(2)


Since you know S1, M1, M and S, the above equation is solvable to find a and b, the missing numbers.

Now for the three numbers missing:

S2 = S - ( a + b + c)....................(1)

Where a and b are the missing numbers.

M2 = M - (a * b * c)....................(2)


Now your unknown is 3 while you just have two equations you can solve from.

I'd take a different approach to that question and probe the interviewer for more details about the larger problem he's trying to solve. Depending on the problem and the requirements surrounding it, the obvious set-based solution might be the right thing and the generate-a-list-and-pick-through-it-afterward approach might not.

For example, it might be that the interviewer is going to dispatch n messages and needs to know the k that didn't result in a reply and needs to know it in as little wall clock time as possible after the nk th reply arrives. Let's also say that the message channel's nature is such that even running at full bore, there's enough time to do some processing between messages without having any impact on how long it takes to produce the end result after the last reply arrives. That time can be put to use inserting some identifying facet of each sent message into a set and deleting it as each corresponding reply arrives. Once the last reply has arrived, the only thing to be done is to remove its identifier from the set, which in typical implementations takes O(log k+1) . After that, the set contains the list of k missing elements and there's no additional processing to be done.

This certainly isn't the fastest approach for batch processing pre-generated bags of numbers because the whole thing runs O((log 1 + log 2 + ... + log n) + (log n + log n-1 + ... + log k)) . But it does work for any value of k (even if it's not known ahead of time) and in the example above it was applied in a way that minimizes the most critical interval.

There is a general way to generalize streaming algorithms like this. The idea is to use a bit of randomization to hopefully 'spread' the k elements into independent sub problems, where our original algorithm solves the problem for us. This technique is used in sparse signal reconstruction, among other things.

• Make an array, a , of size u = k^2 .
• Pick any universal hash function , h : {1,...,n} -> {1,...,u} . (Like multiply-shift )
• For each i in 1, ..., n increase a[h(i)] += i
• For each number x in the input stream, decrement a[h(x)] -= x .

If all of the missing numbers have been hashed to different buckets, the non-zero elements of the array will now contain the missing numbers.

The probability that a particular pair is sent to the same bucket, is less than 1/u by definition of a universal hash function. Since there are about k^2/2 pairs, we have that the error probability is at most k^2/2/u=1/2 . That is, we succeed with probability at least 50%, and if we increase u we increase our chances.

Notice that this algorithm takes k^2 logn bits of space (We need logn bits per array bucket.) This matches the space required by @Dimitris Andreou's answer (In particular the space requirement of polynomial factorization, which happens to also be randomized.) This algorithm also has constant time per update, rather than time k in the case of power-sums.

In fact, we can be even more efficient than the power sum method by using the trick described in the comments.

This might sound stupid, but, in the first problem presented to you, you would have to see all the remaining numbers in the bag to actually add them up to find the missing number using that equation.

So, since you get to see all the numbers, just look for the number that's missing. The same goes for when two numbers are missing. Pretty simple I think. No point in using an equation when you get to see the numbers remaining in the bag.

Very nice problem. I'd go for using a set difference for Qk. A lot of programming languages even have support for it, like in Ruby:

missing = (1..100).to_a - bag


It's probably not the most efficient solution but it's one I would use in real life if I was faced with such a task in this case (known boundaries, low boundaries). If the set of number would be very large then I would consider a more efficient algorithm, of course, but until then the simple solution would be enough for me.

We can do the Q1 and Q2 in O(log n) most of the time.

Suppose our memory chip consists of array of n number of test tubes . And a number x in the the test tube is represented by x milliliter of chemical-liquid.

Suppose our processor is a laser light . When we light up the laser it traverses all the tubes perpendicularly to it's length. Every-time it passes through the chemical liquid, the luminosity is reduced by 1 . And passing the light at certain milliliter mark is an operation of O(1) .

Now if we light our laser at the middle of the test-tube and get the output of luminosity

• equals to a pre-calculated value(calculated when no numbers were missing), then the missing numbers are greater than n/2 .
• If our output is smaller, then there is at least one missing number that is smaller than n/2 . We can also check if the luminosity is reduced by 1 or 2 . if it is reduced by 1 then one missing number is smaller than n/2 and other is bigger than n/2 . If it is reduced by 2 then both numbers are smaller than n/2 .

We can repeat the above process again and again narrowing down our problem domain. In each step, we make the domain smaller by half. And finally we can get to our result.

Parallel algorithms that are worth mentioning(because they are interesting),

• sorting by some parallel algorithm, for example, parallel merge can be done in O(log^3 n) time. And then the missing number can be found by binary search in O(log n) time.
• Theoretically, if we have n processors then each process can check one of the inputs and set some flag that identifies the number(conveniently in an array). And in the next step each process can check each flag and finally output the number that is not flagged. The whole process will take O(1) time. It has additional O(n) space/memory requirement.

Note, that the two parallel algorithms provided above may need additional space as mentioned in the comment .

You could try using a Bloom Filter . Insert each number in the bag into the bloom, then iterate over the complete 1-k set until reporting each one not found. This may not find the answer in all scenarios, but might be a good enough solution.

You'd probably need clarification on what O(k) means.

Here's a trivial solution for arbitrary k: for each v in your set of numbers, accumulate the sum of 2^v. At the end, loop i from 1 to N. If sum modulo 2^i is zero, then i is missing.

Easy, right? O(N) time, O(1) storage, and it supports arbitrary k.

Except that you're computing enormous numbers that on a real computer would each require O(N) space. In fact, this solution is identical to a bit vector.

So you could be clever and compute the sum and the sum of squares and the sum of cubes... up to the sum of v^k, and do the fancy math to extract the result. But those are big numbers too, which begs the question: what abstract model of operation are we talking about? How much fits in O(1) space, and how long does it take to sum up numbers of whatever size you need?

a 1 + a 2 + ... + a k = b 1

a 1 2 + a 2 2 + ... + a k 2 = b 2

...

a 1 k + a 2 k + ... + a k k = b k

c 1 = a 1 + a 2 + ... a k

c 2 = a 1 a 2 + a 1 a 3 + ... + a k-1 a k

...

c k = a 1 a 2 ... a k

• 計算給定數字的第i個冪
• 減去獲得未知數的第i次冪。 給打錢。
• 使用牛頓的身份來計算b i的係數; 稱他們為 。 基本上，c 1 = b 1 ; c 2 =（c 1 b 1 - b 2 ）/ 2; 請參閱Wikipedia以了解確切的公式
• 分解多項式x k -c 1 x k-1 + ... + c k
• 多項式的根是所需的數字a 1 ，...，a k

l_j = ceil (log_2 (sum_{i=1}^n i^j))
l_j > log_2 n^j  (assuming n >= 0, k >= 0)
l_j > j log_2 n \in \Omega(j log n)

l_j < log_2 ((sum_{i=1}^n i)^j) + 1
l_j < j log_2 (n) + j log_2 (n + 1) - j log_2 (2) + 1
l_j < j log_2 n + j + c \in O(j log n)


t = k ceil(\sum_i=1^n log_2 (i)) = k ceil(log_2 (\prod_i=1^n (i)))
t > k log_2 (n^n + O(n^(n-1)))
t > k log_2 (n^n) = kn log_2 (n)  \in \Omega(kn log n)
t < k log_2 (\prod_i=1^n i^i) + 1
t < kn log_2 (n) + 1 \in O(kn log n)


let
-- O(1)
isInRange low high v = (v >= low) && (v <= high)
-- O(n - k)
countInRange low high = sum $map (fromEnum . isInRange low high . (!)b) [1..(n-k)] findMissing l low high krange -- O(1) if there is nothing to find. | krange=0 = l -- O(1) if there is only one possibility. | low=high = low:l -- Otherwise total of O(knlog(n)) time | otherwise = let mid = (low + high) div 2 klow = countInRange low mid khigh = krange - klow in findMissing (findMissing low mid klow) (mid + 1) high khigh in findMising 1 (n - k) k  使用的存儲：列表O(k) ，堆棧O(log(n))O(k + log(n))該算法更直觀，具有相同的時間複雜度，並且使用更少的空間。 如果您有兩個清單和兩個清單的產品的總和，則可以解決Q2。 （l1是原始的，l2是修改後的列表） d = sum(l1) - sum(l2) m = mul(l1) / mul(l2)  We can optimise this since the sum of an arithmetic series is n times the average of the first and last terms: n = len(l1) d = (n/2)*(n+1) - sum(l2)  Now we know that (if a and b are the removed numbers): a + b = d a * b = m  So we can rearrange to: a = s - b b * (s - b) = m  And multiply out: -b^2 + s*b = m  And rearrange so the right side is zero: -b^2 + s*b - m = 0  Then we can solve with the quadratic formula: b = (-s + sqrt(s^2 - (4*-1*-m)))/-2 a = s - b  Sample Python 3 code: from functools import reduce import operator import math x = list(range(1,21)) sx = (len(x)/2)*(len(x)+1) x.remove(15) x.remove(5) mul = lambda l: reduce(operator.mul,l) s = sx - sum(x) m = mul(range(1,21)) / mul(x) b = (-s + math.sqrt(s**2 - (-4*(-m))))/-2 a = s - b print(a,b) #15,5  I do not know the complexity of the sqrt, reduce and sum functions so I cannot work out the complexity of this solution (if anyone does know please comment below.) 我問了一個4歲的孩子來解決這個問題。 他整理了數字，然後一起計算。 這有一個O（廚房地板）的空間要求，它的工作原理就像很多球丟失一樣。 我沒有檢查數學，但是我懷疑計算Σ(n^2)與我們計算Σ(n)在同一遍中會提供足夠的信息來獲得兩個缺失的數字Σ(n^3)以及if有三個，依此類推。 等一下。 正如問題所述，包裡有100個數字。 不管k有多大，這個問題都可以在恆定的時間內解決，因為你可以使用一個集合，並且在一個循環中最多100-k次迭代中從集合中刪除數字。 100是恆定的。 剩下的一組數字就是你的答案。 如果我們將解推廣到從1到N的數字，除了N不是一個常數，沒有什麼變化，所以我們處於O（N-k）= O（N）時間。 例如，如果我們使用一個位集，我們在O（N）時間中將位設置為1，遍歷數字，隨著我們去（O（Nk）= O（N））將位設置為0，然後我們有答案。 在我看來，面試官問你如何在O（k）時間而不是O（N）時間打印出最終集合的內容。 顯然，只需稍微設置一下，您就必須遍歷所有N位，以確定您是否應打印該數字。 但是，如果您改變設置的實施方式，則可以以k次迭代打印出數字。 這是通過將數字放入要存儲在哈希集和雙向鍊錶中的對象來完成的。 從哈希集中刪除對象時，也可以從列表中刪除它。 答案將留在現在長度為k的列表中。 要解決2（和3）缺失的數字問題，您可以修改quickselect ，它平均在O(n)運行，並且如果在原地進行分區，則使用常量內存。 1. 將該組相對於隨機樞軸p分區到包含小於樞軸的數字的分區l和包含大於樞軸的數字的r 2. 通過將樞軸值與每個分區的大小進行比較來確定2個缺失數字所在的分區（ p - 1 - count(l) = count of missing numbers in ln - count(r) - p = count of missing numbers in r p - 1 - count(l) = count of missing numbers in l n - count(r) - p = count of missing numbers in r 3. a）如果每個分區缺少一個數字，則使用求和方法的差異來查找每個缺失的數字。 (1 + 2 + ... + (p-1)) - sum(l) = missing #1((p+1) + (p+2) ... + n) - sum(r) = missing #2 b）如果一個分區缺少兩個數字並且分區為空，那麼缺失的數字可以是(p-1,p-2)(p+1,p+2)具體取決於哪個分區缺少數字。 如果一個分區缺少2個數字，但不是空的，則將該分區遞歸到該分區。 只有2個缺失數字，該算法總是捨棄至少一個分區，因此它保留了O(n)快速選擇的平均時間複雜度。 同樣，如果有3個缺失的數字，這個算法每次都會丟棄至少一個分區（因為缺少2個數字，最多只有1個分區會包含多個缺失的數字）。 但是，我不確定在添加更多缺失數字時性能會下降多少。 這是一個使用就地分區的實現，所以這個例子不符合空間要求，但它確實說明了算法的步驟： <?php$list = range(1,100);
unset($list); unset($list);

findMissing($list,1,100); function findMissing($list, $min,$max) {
if(empty($list)) { print_r(range($min, $max)); return; }$l = $r = [];$pivot = array_pop($list); foreach($list as $number) { if($number < $pivot) {$l[] = $number; } else {$r[] = $number; } } if(count($l) == $pivot -$min - 1) {
// only 1 missing number use difference of sums
print array_sum(range($min,$pivot-1)) - array_sum($l) . "\n"; } else if(count($l) < $pivot -$min) {
// more than 1 missing number, recurse
findMissing($l,$min, $pivot-1); } if(count($r) == $max -$pivot - 1) {
// only 1 missing number use difference of sums
print array_sum(range($pivot + 1,$max)) - array_sum($r) . "\n"; } else if(count($r) < $max -$pivot) {
// mroe than 1 missing number recurse
findMissing($r,$pivot+1, \$max);
}
}
` 