For array of integer numbers, Checksum is some derived value calculated using all the members of an array in some whimsical way. Its main property is that if two arrays are equal, their checksums are equal too. (however, if checksums are equal, it does not necessarily mean they are calculated over the same arrays)
The straightforward use of checksums in programming is for verification of large amounts of data. After receiving some file (from email for example, or unpacking some archive) we are usually interested whether it was corrupted or not. Storing and transferring checksums along with the files in such cases are good idea (archivers do this automatically).
For the practical example we have calculations from the task Weighted sum of digits,
where checksum is calculated (regarding
initial numer as an array of digits) as the sum of digits multiplied by their position in the number's decimal
representation. Such checksum will obviously yield the same values for numbers 1
, 10
, 100
etc. Though anyway
this checksum function is far better than simple sum of digits since it covers greater range of values for the
same range of initial numbers.
Of course the mentioned property - the range covered by values of checksum (for any possible sets of initial data not exceeding some determined size) and uniformity of distribution of checksums in this range - these are the most important metrics of whether the method is good or not.
We will often use the other simple algorithm:
seed
- let us use 113
;limit
for checksum - let it be 10000007
;result
value to 0;index
to the start of array;index
to result
;result
by seed
;result
by modulo limit
(i.e. take the remainder of dividing result
by limit
if necessary);index
to point to next element of array;Let us have an example - here follows the the calculation of checksum for array a = [3, 1, 4, 1, 5, 9]
:
result = 0
result = (result + a[0]) * seed = (0 + 3) * 113 = 339
result = (result + a[1]) * seed = (339 + 1) * 113 = 38420
result = (result + a[2]) * seed = (38420 + 4) * 113 = 4341912
result = (result + a[3]) * seed = (4341912 + 1) * 113 = 490636169
result = result % limit = 490636169 % 10000007 = 635826
result = (result + a[4]) * seed = (635826 + 5) * 113 = 71848903
result = result % limit = 71851163 % 10000007 = 1848854
result = (result + a[5]) * seed = (1848854 + 9) * 113 = 208921519
result = result % limit = 209176899 % 10000007 = 8921379
so result is 8921379
. Note that on first few steps we skipped taking the remainder (since while result is less than
limit, its remainder after division is just the same).
Task Array Checksum provides an exercise for exactly this algorithm.