For obvious reasons I am concerned with the method of philosophy, I desire to arrive upon some acceptable method for philosophy that we can recognize as being well-suited to producing philosophical theories. The reason this matters, and the reason this isn’t a trivial quest, is the acceptable part. Because of course everyone producing philosophical theories is using *some* method, because even constructing a theory by randomly rearranging letters until they make words is a method, just not a good one. Given that there are two reasons this seems pressing to me. One is that there doesn’t seem to be any consensus about what the method of philosophy is, which is rather absurd. It means, effectively, that two philosophers could disagree philosophically, because of differences in their methods, and simply have no way to resolve their disagreements, even in principle. Accepting this situation is tantamount to accepting an anything goes attitude towards philosophy, because any imaginable theory can be produced by some method. So if we simply accept differences in method without criticism we are essentially accepting every theory without criticism. The other reason this seems pressing is that many of the commonly used methods, such as conceptual analysis, seem obviously absurd (obvious in the sense that there is a disconnect between the method and the kinds of philosophical theories we would like to arrive at, given that conceptual analysis can only legitimately produce claims about our concepts and never about anything with an objective existence), and so repairs seem to be in order.

In many ways, however, it is easier to investigate method in general, rather than philosophical method specifically, and this has the added benefit of resulting in claims that are possibly useful in other disciplines where methods play a critical role. To really analyze methods we need to formalize them to some extent, so that we can look at the specifics of how the method works, rather than speaking in vague generalities. Let me begin then by dividing any method into three distinct parts: the input, the state, and the results. The input represents new pieces of information that the investigator, the person using the method, receives. In the physical sciences, for example, the input represents individual observations made. And the results are simply the current theory endorsed by the method. The state represents information used in the method but which isn’t reflected in the theories produced. The state, for example, might contain a record of past theories, past observations, and so on, all of which have an effect on what theory is endorsed by the method as a result of some new information. Obviously this state seems like an artificial construct, and in many ways it is, because it is simply one way to formalize methods in general; we could either consider each method as resulting in a theory given all the information received to date, or we can consider the method as resulting a theory given new information plus some record of what we have already done. If this way of working with methods in abstract doesn’t pan out we can always try the other way if the problems seem to be stemming from this choice.

Of course to actually get anywhere beyond this vague outline and to formalizing actual theories would require some way to formally represent the content of a theory, the information we receive as input, operations on theories, and to determine how similar two theories are (for reasons that will become apparent momentarily). Naturally I don’t have any of that apparatus at my fingertips, however we can consider methods that operate purely on numbers and make some headway in analyzing methods in general by working with these more limited examples. Every method then can be defined by three functions. The first is the input function i(k), which yields the k-th input. Next we have the state function, f_{s}, defined as f_{s}(k) = g(f_{s}(k-1), i(k)), f_{s}(0) = s_{0}, where the function g and the s_{0} value will obviously be method dependant. And, finally we have the results function, f_{y}, defined as f_{y}(k) = h(f_{s}(k-1), i(k)). And so by defining the functions i, g, and h and s_{0} we define our method and can examine how it behaves as more and more information is received (as k increases).

As discussed previously one standard we would like our methods to meet is for them to be universal, they must yield the same results for everyone. Obviously this depends in part on the input function, even if we don’t receive the same input in the same order we must all be able to have the same information in the ideally long run. But that is an epistemological issue that can’t be settled by examining the method by itself. However, we can determine whether the method necessarily converges to a single value in the long run (as k → ∞) for certain classes of input functions (for example, the same input function but with the values rearranged). If it doesn’t then it is a flawed method, because it implies that it could lead us to different people using that method to different conclusions even if they had essentially the same source of information, and that violate the assumption of universality.

Let’s consider some examples. We will consider two methods, both of which have g(x,y) = max(x,y), h(x,y) = max(x,y), and s_{0} = 0. However, for our first method we will let the input function i(x) be defined by selecting a value randomly for each x value, with probability .5 of value 1, .25 of value 2, .125 of value 3, and so on. The class of input functions we are considering then is naturally every possible function generated by that method. And for our second method we will let the input function i(x) be defined by again selecting a value randomly for each x value, but this time we pick a value from the reals in the range (0,1), with each real having an equal probability of being picked (for clarification, the values 0 and 1 themselves are not in this range).

It should be intuitively obvious that the first method does not converge to a single value, because as we consider more and more inputs we will always find larger and larger values, although they will, on average, be spaced farther and farther apart. On the other hand, our second method does converge towards 1, although it too constantly yields larger and larger values, they just never exceed 1. But how can we *prove* this, because clearly in interesting cases it may not be obvious whether our method converges. We can steal a relatively standard formula here and assert that the method converges to some k if and only if:

∀ε>0∀γ<1∃x∀y>x P(k-ε < f_{y}(y) < k+ε) > γ

This asserts that we can pick an arbitrarily small ε and some arbitrarily high probability γ (although not a probability of one) and we can find some value such that the result yielded by our method after that value is more than γ likely to be within ε.

This allows us to disprove that our first method converges simply by observing that for any k we might pick there is always a finitely large probability of coming across a larger value, and thus that γ cannot be arbitrarily “tightened”, and so that it doesn’t converge for any value. And our second method can be equally easily proven to converge to 1, because for any ε we might pick there is always some probability that the input will yield a value larger than 1-ε. And the probability that we will come across such a value increases towards 1 arbitrarily closely, and so no matter what γ we pick we can always find a suitable x, although it may be very large.

But those were relatively easy cases (and designed as such), let us now consider something slightly harder. First, however, I must define the encoding and decoding functions. The encoding function <x,y> encodes two values into a single number, and the decoding function (x)_{y} extracts those numbers, such that (<a,b>)_{0} = a and (<a,b>)_{1} = b. For our method we will define g(x,y) = <((x)_{0}*(x)_{1}+y)/((x)_{1}+1), (x)_{1}+1>, h(x,y) = (g(x,y))_{0}, and s_{0} = <0,0>. Since what this does might not be obvious I’ll explain in words. The state of the function is a pair of numbers, the first of which is the average input so far, and the second of which is the number of inputs processed so far. Upon receiving a new input the method yields the average input including that one. And for this method we will define the input function i(x) by randomly selecting a value from the integers between one and ten, inclusive, each with equal probability.

Again, it is intuitively obvious that this function converges, to 5.5 specifically, but it is much harder to prove that it does so. First of all working with it as is would involve some fancy footwork involving computing the probability of any given sum, and so on. To simply our task we will pretend that the input consist of only two values, those over 5.5, the highs, which are of value 7.75, and those under 5.5, the lows, which are of value 2.25. Obviously this approximates the actual input, but a complete proof would need to prove that it does so. Now, consider an arbitrary ε and this method run over g inputs, also arbitrary. Assume that as a baseline the highs and lows are equal in number, and thus that the average is exactly 5.5. How many additional highs would be required to make the average greater than ε+5.5? That is something we can calculate. Let k be the number of additional highs.

(5.5*(g-k) + 7.75*k)/g > 5.5+ε

5.5*(g-k) + 7.75*k > 5.5*g+ε*g

7.75*k > 5.5*k+ε*g

2.25*k > ε*g

k > ε*g/2.25

Thus we have more than ε*g/2.25 highs in order to be farther above the average than ε. The next step is to calculate how probable finding a particular ratio of highs to lows is for an arbitrary g. Fortunately for us this is relatively easy, the probability is .5^{h}*.5^{l}, where h is the number of highs and l is the number of lows. Now what we need to do is calculate the total probability of finding ε*g/2.25 more highs than lows. For any particular number, x, more or less the probability is .5^{g/2 – x/2}*.5^{g/2 + x/2}. To get the probability of finding ε*g/2.25 more we must integrate.

Fortunately this is relatively easy to do, and yields:

.5^{g}*(g-(ε*g)/2.25)

This is the probability that the average will exceed 5.5+ε after g inputs.

Thus the probability that the average will be within ε of 5.5 after g inputs is 1 – 2*.5^{g}*(g-(ε*g)/2.25).

Fortunately this obviously approaches 1 arbitrarily closely as g increases, thus proving that the method converges to 5.5. That was a lot of work for such a simple method, and thus reveals that if we are going to get anywhere substantial with this kind of analysis what will be needed is general rules, which state that all methods with certain features converge, because otherwise we will face the extremely difficult task of proving convergence in ever more complicated cases.

Another interesting fact that this investigation reveals is that whether certain methods converge may depend on the input being fed to them (as was the case with our first two examples). Obviously the input is something we can’t control, and given our epistemic situation we can’t even say what general constraints the input obeys, since there is always the chance that what we have been given is simply a very unlikely sequence. This would imply that for some methods in some cases we simply couldn’t know whether they converged. To overcome this problem we might build in a “sanity requirement” into our methods. Generally it will be possible by examining the method to determine what kinds of inputs it will converge for. We can thus use the state of the method to record what inputs have been seen so far and, at every step, determine what kinds of converging inputs are possible and how likely it is that the values being yielded will vary as they do if we are actually dealing with such an input. If the method is varying in ways that seem extremely unlikely given an input that will produce convergence we might have the method produce a result that indicates that it the input is invalid, and will thus converge on this “invalid” result. For example, given the averaging method discussed above we might build in a sanity requirement that yields this invalid answer if the average value shifts too much after a large number of inputs. Obviously such shifts aren’t impossible, it might be that the initial run so far has been highly improbable or that we are encountering an extremely improbable sequence somewhat earlier than might be expected. And so guaranteeing that the method always converges when the nature of the input is unknown may sacrifice accuracy, as there are rare situations where the method will now converge to the invalid result when previously it would converged on an actual result. But, since we can make these situations arbitrarily improbable, this is an acceptable tradeoff for being able to handle input sequences where convergence is impossible.