Earlier I outlined an account of knowledge (here and here) in which I argued that knowledge is best defined as statements justified correctly, and confirmed to a greater degree than competing hypotheses. Moreover, I argued that it was essential to being justified in the right way that the hypothesis was probable given the evidence, which implies that Bayes’ theorem is part of the justification process. There are two classic problems, however, which face a definition of knowledge that would that rely on Bayes’ theorem. One is that Bayes’ theorem requires “priors” in order to proceed. Priors are an estimation of the probability of the hypothesis in the absence of evidence. Often priors are taken to represent a guess as to how likely the hypothesis is, or our degree of belief. Both of these interpretations however are extremely subjective, and might seem to discredit Bayes’ theorem as an objective measure of knowledge. A second problem is that Bayes’ theorem confirms to an equal degree hypotheses such as “all emeralds are green” and “all emeralds are grue”, where grue means green before the year 3000 and blue afterwards. Although these problems may seem unrelated there is a single solution that resolves both of them, and moreover it also solves Wittgenstein’s problem of “following a rule”, and (partly) solves Hume’s problem of induction.

The solution is that priors shouldn’t be taken as representing our degree of belief, but how similar a hypothesis is to other hypothesis. [1] Thus, if the hypothesis in question is very similar to other well-confirmed hypotheses, then its prior probability will be high, and if it is similar to hypothesis that are unlikely, given the evidence we have, its prior probability will be low. This means that the hypothesis “all emeralds are grue” will have a much lower prior probability than “all emeralds are green” because there are many hypotheses like “all emeralds are grue” that have an extremely low probability (for example “all emeralds are green before 2000 and blue afterwards”). This, however, does not mean that unusual (but true) hypothesis will never be deemed knowledge. Given enough evidence a hypothesis with a low prior probability will still be judged more likely than one with a higher prior probability if we are applying Bayes’ theorem, given that it fits the evidence better. And this is just what we should expect, that hypotheses such as “aliens built the pyramids” need more evidence to support them than hypotheses such as “slaves built the pyramids”.

Wittgenstein’s problem about following a rule can be resolved in a similar manner. The problem itself is as follows: Given two ways of following the rule “add 1 to the previous number”: “add 1 to the previous number for all numbers” and “add 1 to the previous number up until 5000, add two afterwards”, which one should we adopt? As Wittgenstein pointed out we can only show someone else a finite number of examples of the rule, and thus how can we claim they are wrong if they pick the second version (assuming it doesn’t conflict with an example given). If we reason as we did above the answer should be clear; switching behavior at some number is unlikely to be correct, since many similar interpretations, which involve switching at some lower number, have been ruled out by the examples provided, and thus we should follow the rule “add 1 to the previous number for all numbers” as it is more likely. [2]

We can address Hume’s problem of induction in a similar way. Hume says that we can’t conclude from a large number of cases, say 500, that the 501st case will turn out the same. Again we have two competing hypotheses, that it always works the same way, and that it works the same way only for the first 500. But, just as in the case of “following the rule”, we should judge the second hypothesis as less likely because the similar hypotheses, i.e. it will follow the rule only for the first 499, only for the first 498, ect, are false. A follower of Hume could however argue that I have shown that a hypothesis should be judged by its similarity to existing hypotheses only for past cases. How do we know that our rule will hold for future cases? Obviously we could apply the rule to itself, which would validate it, but this would be circular reasoning. We might simply accept the circular reasoning, but since it seems somewhat distasteful to me I call this only a partial solution.

Finally, I would like to address an unrelated concern that a theory of knowledge such as this one might raise. The concern is that if knowledge is to be defined as statements justified in such a rigorous and formal way then how can we say that we have knowledge in our everyday lives? Certainly we don’t exercise Bayes’ theorem to know that it is raining outside when we look out the window and see raindrops. My response to this is that a belief is knowledge if the formal process would label it as knowledge given the evidence and that the actual process responsible for generating the belief is usually in agreement with the formal process. In essence that means that there are two definitions of knowledge, one in the formal, rigid, and as close to the truth as possible sense, and one in the reliable sense (the kind usually investigated by naturalized epistemology). This shouldn’t be a shocker since I already introduced the idea here.

Notes:

1: Mathematically calculating the degree of similarity is slightly complicated, and I don’t have a complete formula for it yet. The outline however is as follows: each other hypothesis has a degree of similarity to the hypothesis being tested, and the sum of all these degrees of similarity should be 1. The prior probability for the hypothesis should then be calculated as the sum of the probability of each other hypothesis times its degree of similarity. The real problem is calculating the degree of similarity, but in any case the exact method is not necessary for our argument here.

2: We might call this process second order theorizing, theorizing about theories themselves.

One other way to modify Bayes’ theorem might be to have a preference for hypotheses that take less information (in the computer science sense) to encode. Always being green requires fewer bits to express than changing from green later. Similarly, Wittgenstein’s problem will take more bits to express. This basically works out to a modified Ockham’s razor.

Comment by Carl — September 1, 2006 @ 1:30 am

You mentioned that a long time ago too, but it does suffer the problem of being hard to formalize.

Comment by Peter — September 1, 2006 @ 1:39 am

No more so than the rest of the crazy CS math that I don’t understand. ;-D You should specify the decompression function for the theory should be as small as possible though. So, while someone could say, “Well, in my encoding scheme, grue is encoded as 1, but green is encoded 101010101010101111110,” then you could just ask to see their decoder, and you’ll find that, sure enough, it’s much bigger than it needs to be, especially compared to a decoder that requires the year 3000 to be written in the number instead of assuming grue as a special case that deserves a smaller way of writing it.

Comment by Carl — September 1, 2006 @ 2:00 am