One may be mystified as to why averaging helps so much, but there is a simple reason for the effectiveness of averaging.
Suppose that two classifiers have an error rate of 70%. Then, when they agree they are right. But when they disagree, one of them is often right, so now the average prediction will place much more weight on the correct answer.
The effect will be especially strong whenever the network is confident when it’s right and unconfident when it’s wrong.