inference from the totality of deductive consequences one can draw from it.

An answer to this question is that a semantical de¬nition of depth informa-

tion is still possible. And one can “almost” implement it in an IF ¬rst-order

language. For in such a language, there exists a complete disproof procedure,

and hence a complete deductive method of weeding out inconsistent con-

stituents. Hence, everything that has been said remains applicable except that

the resulting notion of information is the one that goes naturally with no-

counterexample information.

But how can we, in general, assign measures of depth information and

depth probability to ¬rst-order propositions? Here, the groundwork has been

done for us by Carnap and other inductive logicians. Admittedly, they have

considered mostly only special cases, especially monadic ¬rst-order languages.

But their results are informative enough to show what the overall situation is.

In his early work on inductive logic, Carnap tried to develop a purely logical

measure of probability for monadic ¬rst-order languages. The general situa-

tion in such languages is that we have a contingency table of N cells into which

we can classify observed individuals. The logical probabilities are the prior

probabilities of different distributions of individuals into the cells. Observa-

tions of individuals falling into the different cells yields the information we

have to base our probability judgment on.

One of the simple but basic insights into this situation is that the prior

probabilities determine, and are determined by, how rapidly an inductive

logician allows evidence to in¬‚uence his or her (or its, if we are doing arti¬cial

Who Has Kidnapped Information? 199

intelligence) probabilities on evidence. This idea can be expressed explicitly by

considering what is known as the “characteristic function” of the probability

measure. It is the function that expresses the probability that, given a speci¬ed

body of evidence, the next randomly chosen individual belongs to the cell No.

i (1 ¤ i ¤ N). The requirement of randomness is implemented by stipulating

that our probability distribution is exchangeable.

This requirement is a kind of symmetry assumption. Carnap initially made

use of other strong symmetry assumptions, too. He assumed that the charac-

teristic function f depends only on the number ni , of observed individuals in

the cell No. i, on the total number of observed individuals n, and on the num-

ber N of the cells. He established the remarkable result that even these strong

assumptions do not determine uniquely the characteristic function nor a for-

tiori the prior probabilities. These are determined only up to a freely chosen

parameter . More explicitly the characteristic function has the form

ni +

N (3)

n+

This is a fascinating expression. It can be considered as a kind of weighted aver-

age between the observed relative frequency ni /n and the purely symmetrical

factor 1/N. The greater is, the more heavily the second factor is weighted.

What this means is that the greater is, the more slowly an inductive reasoner

is letting experience affect his or her posterior probabilities”that is to say, to

move them closer to the observed relative frequencies and away from the a

priori factor 1/N. In other words, is an index of caution.

Now, the optimal choice of such an index of caution cannot be made on

logical principles alone. The optimal choice of depends on how orderly one™s

universe of discourse is. In fact, it has been shown that the optimal choice of

is a monotonic function of the amount of order in the universe as measured

by its entropy. (See Kurt Walk™s paper in Hintikka and Suppes 1966.)

These innocent-looking technical results and others like it have extremely

important philosophical implications. Even if we make the strong symme-

try assumptions that Carnap uses in his sample case, our measures of proba-

bility and information depend on factual assumptions concerning the world,

roughly speaking on the regularity (orderliness) of the world. In the simplest

case, this dependence is mediated by the index of caution . The uneliminability

of this index means that all use of the concepts of probability and information

rests on tacit assumptions concerning the orderliness of the world.

This is a remarkable result. It has something of a deja vu ring about it,

´`

in that any historian of philosophy knows that assumptions concerning the

regularity of the world have played a major role as a proposed foundation of

inductive inference. Perhaps we can also understand how such assumptions

steal into one™s approach so easily. Many philosophers have not realized that

the mere use of probabilistic concepts in dealing with induction presupposes

an assumption concerning the degree of orderliness of the universe. (It is an

Socratic Epistemology

200

indication of the critical acumen of David Hume that he did not fall into

this trap. He saw that the use of probabilities in induction does not solve the

philosophical problem.)

But the situation is in fact even more complicated. (For more on the fol-

lowing discussion, see the articles in Hintikka and Suppes 1966 and 1970.) The

notion of orderliness is systematically ambiguous. There are different kinds

of order and disorder in the world. For instance, even in the simple situation

studied by Carnap, one kind of orderliness is manifested by some of the cells™

being totally empty. That means that certain general laws hold in the world.

Carnap™s symmetry assumptions do not allow us to cope with that kind of

eventuality in realistic terms. However, if we simply allow the characteristic

function to depend also on the number of cells left empty by the evidence,

we obtain a more ¬‚exible treatment that enables us to deal with inductive

generalizations. (See Hintikka and Niiniluoto 1980.)

As one might have expected, it turns out that the characteristic function

will then depend on other parameters, which can be interpreted as indices of

caution of different sorts and whose optimal choice depends on the different

kinds of order in our universe of discourse. When relations are admitted, even

further kinds of orderliness and further kinds of correlated indices of caution

make their appearance.

The upshot of the work of Carnap and others is that any measure of a

priori probability, and hence any measure of information that we might want

to use, will tacitly embody multiple assumptions concerning different kinds

of orderliness in the world. As we might put it, there is no purely logical

quantitative notion of information, even though there exists a comparative

notion of information for the typical languages (¬rst-order languages) with

which we have been dealing.

This result is diametrically opposite to what Carnap undoubtedly hoped to

establish. It is of interest to note that already during Carnap™s sojourn at the

IAS in 1952“54 John von Neumann objected to Carnap™s idea and claimed that

all information is at bottom physical in nature. (See Kohler 2001.) A further

¨

examination would be needed to understand von Neumann™s reasons for his

essentially correct claim. One thing we can be certain about, however: John von

Neumann™s claim was not, and should not be, backed up by a rejection of the

analytic“synthetic distinction, as has been claimed. This distinction concerns

the case of zero information, which was seen to be unaffected by the vagaries of

assigning numerical measures of probability and information to propositions.

These results concern both depth information and surface information.

Since any measure of the order in the world obviously depends on the lan-

guage used, this impossibility of a purely logical notion of information can be

considered the price we must pay for being able to use any reasonable notion

of information. These results also show in what sense the notion of informa-

tion is and is not objective. The choice of one™s measure of information is

not determined purely logically. Such a choice amounts to a guess concerning

Who Has Kidnapped Information? 201

objective reality, and it is in principle affected by what one knows about the

reality.

Our results have also implications for the philosophical appraisal of the

Bayesian approach to scienti¬c inference. What they show is that there is

no presuppositionless Bayesian inference. Such an inference relies on a prior

probability distinction that inevitably embodies assumptions concerning the

world. The dependence does not invalidate practical applications of Bayesian

methods. Indeed, statisticians such as L. J. Savage have turned the dependence

into a resource in that they propose to use it on purpose to codify background

information. (Savage 1962 and 1972.) However, this kind of use of Bayesian

methods is not what philosophers typically have in mind. In order to reach

a theoretically satisfactory overview, the straightforward Bayesian inference

ought to be complemented by a theory as to how our prior probabilities should

be modi¬ed in the tests of evidence.

Moreover, if the only application of our probabilistic concepts is the con¬r-

mation of hypotheses and theories, it can be argued”and has been argued”

that in the limit, Bayesian methods yield the right result as long as no possible

hypothesis is assigned a zero prior probability. But this is not the only use of

probabilities. In particular, it does not help to understand how the notion of

information can be used in the best way. As far as this notion is concerned,

Keynes was right: In the long evidential run, one™s favorite measure of infor-

mation might very well turn out to be dead wrong. Hence, what is badly needed

is a theory of how to change one™s priors in the light of evidence. The priority

of prior probabilities should not be understood as temporal priority, as far as

actual inquiry is concerned.

It seems to me that such a theory would make a substantial difference to

many people™s ways of thinking about probabilistic reasoning. It will be sug-

gested in Chapter 9 that one victim of such a theory will be the currently fash-