1 Introduction

Joint attention has been credited with facilitating a vast range of social and cognitive functions in human ontogeny, ranging from the understanding of other mindsFootnote 1 and social and spatial perspective-taking (Moll & Kadipasaoglu 2013; Moll & Meltzoff 2011) to language acquisition (Tomasello 2014) and the comprehension of perceptual objects as publicly accessible (Seemann 2022). It also plays an obvious role in adult human cooperation and interaction. In this paper I argue that the reach of joint attention is even more comprehensive than these suggestions allow: it solves the “base problem” as it arises for some classic theories of perceptual common knowledge. The problem is that an account is needed of the perceptual base of some forms of common knowledge that gets by without itself invoking common knowledge. The paper solves the problem by developing a theory of joint attention as consisting in the exercise of joint know-how involving particular and sometimes distal targets and arguing that certain joint perceivers can always have a minimal form of propositional common knowledge about the location of these targets. On such a view, perceptual common knowledge is based on the experience of a process that is maintained by way of perceivers’ exercise of an object-involving form of joint know-how. Some reductive theories of collective intentionality (e.g., Bratman 1992, 1999) require that agents’ intentions and subplans are common knowledge (or “out in the open”) between them. For these theories the base problem arises again. The enacted theory of joint attention can solve the problem. The argument is exactly parallel to the common knowledge case. The openness of joint agents’ intentions and meshing subplans is explained by appeal to their practical knowledge of how to maintain the process by way of which they pursue the collective intention. They can then make this knowledge explicit by linguistic communication. When they succeed in communicating knowledge of their meshing subplans as pursued in a joint action context, they necessarily have this knowledge in common. For theories of collective intentionality that include a common-knowledge condition, the experience of participating in a perceptually constituted joint action provides the base that renders harmless the regress that otherwise threatens reductive analyses.

The paper has three parts. I begin with a discussion of Schiffer’s (1972) and Lewis’s (1969) classic analyses of common knowledge and argue that they give rise to what I call the “base problem”. The problem is that insofar as their analyses require appeal to perceptual scenarios that produce common knowledge of facts about that scenario in the perceivers, an explanation is needed of how the perceptual base can produce these facts that does not involve common knowledge. In part two, I argue that solving the base problem is possible on an account of the base as a process maintained by the exercise of an object-involving form of joint know-how. Joint perceivers always enjoy a practical form of knowledge about their target and some perceivers can make this knowledge explicit and then have a minimal form of common knowledge about their target. The base problem has therefore a solution. In part three, I show how this solution of the base problem can be applied to analyses of collective intentionality that include a common-knowledge requirement.

2 Common Knowledge and the “Base Problem”

The discussion of joint attention is closely tied to that of perceptual common knowledge. Indeed, Schiffer’s analysis assumes and Lewis’s analysis allows joint attention as what Lewis calls the “base” of common knowledge. In this section I trace the function of joint attention in their respective accounts and show why their treatments give rise to what I call the “base problem”.

Schiffer (1972) defines what he calls “mutual knowledge”Footnote 2 as follows:

S and A mutually know that p iff

S knows that p.

A knows that p.

S knows that A knows that p.

A knows that S knows that p.

S knows that A knows that S knows that p.

A knows that S knows that A knows that p.

Etc.

Schiffer illustrates the structure of mutual knowledge with an example in which this knowledge is produced perceptually. S and A are seated at a table with a candle placed between them, and the mutually known proposition is “there is a candle on the table”. Schiffer then asks how S and A each know that the other perceiver knows that there is a candle on the table. To answer that question, he introduces the concept of “normalcy”. A “normal” person is “a person with normal sense faculties, intelligence, and experience” (31). If such a person “has his eyes open and his head facing an object of a certain size (etc.), then that person will see that an object of a certain sort is before him”. Then, if S knows that A is normal and sees that A’s eyes are open and his head is facing the candle, S knows that A knows that there is a candle on the table; and so forth. Schiffer says that the resulting regress is “perfectly harmless” and that “the phenomenon which obtains in this case” is general: “it will obtain, broadly speaking, whenever S and A know that p, know that each other knows that p, and all of the relevant facts are “out in the open” (32).

The “relevant facts” in question must include the condition of normalcy. Suppose S and A are normal in Schiffer’s sense but the fact that they are is not out in the open between them. Then S and A are not entitled to the inference that the other person knows that there is a candle on the table, and they are consequently not in a position to know in common that p. The epistemic openness of normalcy is a condition of common knowledge.

I now argue that on one possible construal this condition raises a problem for Schiffer’s contention that the iterative regress contained in the analysis of common knowledge is harmless. On this construal, the openness of the common knowledge-producing perceptual scenario is not simply a feature of the perceivers’ experience but is in need of further analysis. Then the question arises what makes it the case that the normalcy of A and S is out in the open between them. It would not be plausible to argue that it is normal for normal people to know that they meet the conditions of normalcy. This option is unavailable because the openness of normalcy cannot be itself based on an appeal to normalcy if it is to explain how normalcy can be out in the open between people. A different explanation is needed. It is not obvious what shape such an explanation might take. Certainly the normalcy of a situation cannot be known a priori, since not all people (the neurodivergent, the blind) are normal in Schiffer’s sense. Where conditions are known to be normal, at least part of this knowledge must be based on present or past experience. I can come to know that your eyesight is normal, for instance, by observing you competently navigate challenging and novel perceptual environments, or by inferring your normal eyesight from the normalcy of the many other people I’ve interacted with in the past. I can also come to know that you are of at least average intelligence by observing you solve certain tasks, or again infer it from the intelligence demonstrated by many other people previously. If this is right, knowledge of a person’s or context’s normalcy can only be inferentially acquired, on the basis of occurrent or past perceptual information or, perhaps, by being told.

This poses a problem for the requirement that the normalcy of a situation in which common knowledge is available has to be out in the open between perceivers. Since even well-justified inferences to normalcy can lead to false conclusions, there is no collective knowing whether the normalcy of a social perceptual scenario is epistemically open in the way required by its role as a condition of common knowledge. Common knowledge turns out to depend on whether agents’ justification for their beliefs leads to truth, and there is no knowing whether it does. Even where justification of mutual belief about normalcy leads to truth, this does not entail that knowledge of normalcy is out in the open between perceivers: since competent reasoners know that justification does not always lead to truth, collective agents could never know that they know in common that the condition of normalcy obtains. Yet in that case the introduction of normalcy as a condition of common knowledge makes common knowledge impossible: since the conditions that have to be met for common knowledge to be possible have to be out in the open between agents, since these conditions include normalcy, and since normalcy can never be out in the open, there cannot be common knowledge.

I have just sketched a two-step argument that shows that, on the construal at hand, the appeal to normalcy in Schiffer’s analysis of common knowledge makes the argument viciously circular. The first step of the argument is this:

  1. 1.

    Common knowledge requires that the condition of normalcy be met

  2. 2.

    The fact that it is met has to be out in the open between S and A

  3. 3.

    For a fact to be out in the open between S and A, they have to know it in common

  4. 4.

    S and A have to know in common that normalcy obtains if they are to have common knowledge

This first step shows that, on one construal, Schiffer’s analysis of common knowledge is circular: common knowledge requires that perceivers’ normalcy be common knowledge between them. You may think, as a reviewer did, that this step alone warrants the conclusion that Schiffer’s argument, on the construal at issue, is deficient. Then you can simply omit the following discussion. But you could think that the circle laid out in steps 1–4 is not vicious; that there is a straightforward explanation of how perceivers attain common knowledge of normalcy. I now supply an argument, already sketched in my previous remarks, to the extent that there is no such explanation. It will turn out that since S and A can never know in common that the conditions of normalcy are met, S and A cannot know in common that p. The second step of the argument shows that the circularity of Schiffer’s analysis, on the reading at issue, is in fact problematic.

  1. 5.

    Knowledge of normalcy can only be obtained by way of fallible (but rational) inferences

  2. 6.

    Therefore, S and A can each come to falsely (but rationally) believe that normalcy obtains; and they can each falsely (but rationally) believe that the other knows that normalcy obtains

  3. 7.

    Therefore, S and A can each falsely (but rationally) believe that they know in common that normalcy obtains

  4. 8.

    Whenever S and A falsely believe that they know in common that normalcy obtains, they do not know in common that normalcy obtains

  5. 9.

    Then they do not know in common that p [from (4)]

  6. 10.

    Since S and A cannot generally distinguish between knowing and falsely believing that normalcy obtains, they do not know whether they know in common that p.

  7. 11.

    Perceivers who know in common that p always know that they do.

  8. 12.

    S and A do not know in common that p.

I take (5 to 7) to be uncontroversial. Normalcy cannot be known a priori, knowledge of normalcy is inferential and therefore fallible, and hence it is possible that candidate collective knowers come to falsely believe that they know in common that normalcy obtains. I can falsely believe that you have “normal sense faculties” and therefore perceptually know that there is a candle on the table even though you are blind and don’t have this perceptual knowledge; and I can on this basis come to falsely believe that we know in common that there is a candle on the table [steps (8 and 9)]. But since false belief is not generallyFootnote 3 distinguishable from knowledge to the person who holds it, the person is not in the relevant cases in a position to distinguish between a situation in which there is common knowledge of normalcy and a situation in which there isn’t; since common knowledge of normalcy is a condition of the common knowledge that p, perceivers cannot know whether they know in common that p [step (10)]. This is a problem: since knowers who have common knowledge know that they do,Footnote 4 Schiffer’s perceivers cannot have it [steps (11 and 12)].Footnote 5

A natural reaction to this diagnosis is to argue that it demands too much of common knowledge (and, by implication, of knowledge in general). Lewis’s (1969, p. 56) account of common knowledge analyses it in the weaker terms of “reason to believe”:

“Let us say that it is common knowledge in a population P that ___ if and only if some state of affairs A holds such that:

  1. (1)

    Everyone in P has reason to believe that A holds.

  2. (2)

    A indicates to everyone in P that everyone in P has reason to believe that A holds.

  3. (3)

    A indicates to everyone in P that ___.”

Lewis calls A the “basis for common knowledge in P that ___” and claims that “A provides the members of P with part of what they need to form expectations of arbitrarily high order, regarding sequences of members of P, that ___.” You can think of Schiffer’s “candle” scenario as such a basis for common knowledge.Footnote 6 Then, S and A have reason to believe that the candle scenario holds (that they are each looking at the candle between them, in a way that they also see that the other is looking at the candle and seeing them looking at the candle); the candle scenario indicates to A and S that they have reason to believe that the scenario holds; and the candle scenario indicates to A and S that there is a candle on the table.

On the face of it, this analysis of common knowledge escapes the objection to Schiffer above. If common knowledge is based on a scenario of joint attention in which subjects have reason to believe about each participant that they are jointly attending to a target, and the joint scenario indicates to them a perception-based fact about the target, then there cannot be a case in which they acquire false beliefs about the other’s beliefs about the target on the grounds of false (but rational) inferences from what is visible to them. That they come to know these facts in common is guaranteed by the stipulation, right at the outset, that “A holds”.

Whether this account works for scenarios in which the base of common knowledge is perceptual depends entirely on how you think about A. The question arises how to characterize joint attention. One option is to take it that joint attention is itself analysable in terms of common knowledge.Footnote 7 Then the proposal is flatly, and viciously, circular: common knowledge of a perceptual proposition is theorized as being based on joint attention, but joint attention is itself explained in terms of the common knowledge of that proposition. Another option is to take it that joint attention is somehow primitive. Then the charge needs to be avoided that joint attention just gets stipulated into existence to serve as the base of perceptual common knowledge.

The problems that arise for Lewis and Schiffer are thus closely related. For Schiffer, the problem is that the description of the perceptual base of common knowledge requires that certain conditions (those of “normalcy”) be out in the open between perceivers, where accounting for this openness itself requires an appeal to common knowledge. For Lewis, the description of the perceptual situation that may serve as the base of (true and rational) mutual belief must appeal to joint attention. If joint attention is defined as a perceptual scenario that produces common knowledge in its participants, then this definition again invokes common knowledge. In both cases, the analysis turns out to be circular. Also in both cases, the circle is vicious, since there is no good (non-circular) explanation of how the common knowledge invoked in the description of the perceptual scenario in and about which common knowledge is attained.

I call this the “base problem” arising for Schiffer’s and Lewis’s analyses:

(BP)For theories of common knowledge that require appeal to a perceptual base, a non-viciously circular and non-viciously regressive account is needed of this base that explains how it can produce common knowledge of a perceptual fact in the perceivers who help constitute the base.

3 Joint Attention

How might one address the base problem? The intuitive answer I shall be developing is that the experienced social world itself provides the base that makes common knowledge about it possible. The kinds of environmental, mental, and social facts that are out in the open between agents, so that joint action on objects contained in that environment becomes possible, are directly accessible to the agents in ways that it is a condition of their access that they are shared between these agents. This is nothing more than a common-sense description of what happens when we perceive and act together: the world and its objects are perceptually available to us in a social mode that allows us, often effortlessly, to share relevant facts about them with each other and that thus facilitates joint action. The base of perceptual common knowledge, and of the complex social phenonema that build on it, is the social world that we share.

In the next sections, I develop a view that substantiates these intuitions.Footnote 8 Here is a sketch of the core idea. Joint attention can be thought of as a process that is maintained by way of the execution of a minimal form of joint know-how (see also Seemann, under review). For a certain class of joint perceivers (broadly, those capable of linguistic communication), the experience of joint attention is apt to produce in them a minimal kind of common knowledge that is “luminous” in Williamson’s (2000) sense. A mental state is luminous just when its subjects know that they are in that state. If common knowledge is luminous, then knowers who know in common that p always each know that they enjoy this knowledge. Suppose these knowers are linguistically communicating joint perceivers. These perceivers can always know, when jointly attending to a target, the proposition that expresses the target’s location in social space. Since social space is constituted in social interaction, this knowledge is necessarily of a common kind. Since it is produced intentionally, typically in linguistic communication, it is luminous. The explanation of how joint attention produces common knowledge avoids circularity because it begins with an account of joint know-how whose description does not invoke common knowledge and explains luminous common knowledge by appeal to the possibility of linguistic expression and communication of some facts that are already practically known to the agents who exercise this joint know-how.

3.1 Joint Attention as a Kind of Joint Know-How

Avoiding the base problem requires an account of joint attention that gets by without mentioning perceivers’ epistemically open intentional states and their interrelation. The account that most obviously delivers on this requirement is Campbell’s (2005, 2011) relational and “object-centered” theory of joint attention. Campbell sees joint attention as an experience that has the perceiver, co-perceiver, and a target object as constituents. This triadic experience is said to be a “primitive phenomenon of consciousness”. There is, on this view, a particular kind of experience available to creatures who are jointly attending to objects with others. The other person “enters the individuation of the experience”, which is thus of an irreducibly different kind than an individual perceiver’s experience of an object. On this experiential view, the base problem does not arise: since the experience is thought of in terms of a sui generis triadic perceptual relation, no appeal to common knowledge is required in the metaphysical description of the experience.

Campbell’s broad sketch of the experiential view is not without its problems. Battich and Geurts (2021) raise a variety of difficulties that put pressure on the conception of joint attention as primitive in Campbell’s sense. They point out that this conception of joint attention does not block recursion and that joint attention must be preceded by the recognition of the other person as a co-attender. But for that to be possible, something like Schiffer’s normalcy condition has to be met. As they put it, “the bottom line is that at least some knowledge must be involved in any analysis of joint attention” (Battich & Geurts 2021, p. 9). There is also the general worry that since, on any plausible account, perceptual experience is inherently perspectival, it is not obvious how to make metaphysical or phenomenological sense of an experience with two perceivers as constituents. How can there be an experience that presents its target as being perceived from a variety of perspectives? It would seem that the joint experience decomposes into two mereological constituents along the lines of Baron-Cohen (1999), who describes joint attention in terms of mutual “seeing-what-the-other-sees”.

Substantiating the experiential view so that it masters at least some of these challenges is possible if you think of episodes of joint attention as processes that are maintained by the execution of a practical form of knowledge that the perceivers can only deploy in interaction with each other. The description of these interactions has to be purposive but cannot rely on agents’ interrelated representational states, such as their intentions, that would be out in the open between them, since then the base problem arises again.Footnote 9 The traditional way to think about motor action without relying on intentions that represent their conditions of satisfaction is to subscribe to a view that has its roots in Merleau-Ponty’s (1945/2002) concept of “motor intentionality”, such as Dreyfus’s (1993/2014) notion of “skillful coping” or Gallagher’s (2005) and Hutto’s (2008) versions of enactivism about the mind and cognition. For the purposes of this paper, my aim is to develop an account that does not require subscription to this kind of view, though it is compatible with it. To this end, I introduce the technical notion of a “doing”. Doings are purposive bodily movements that can be described without appeal to intentional state concepts. Thinking of an agent’s contribution to the process that constitutes an instance of joint attention in terms of a doing is sufficient for the theory of joint attention I shall be developing. The question of whether doings should be further characterized in terms of some version of motor intentionality or by appeal to interlocking intentional states that are not out in the open between agents remains, for the purposes of this paper, open.  

(DOING) A doing is a proprioceptively experienced bodily movement that involves a perceptually present object and that the moving creature prolongs.

The core consideration is that it is always true that agents who act purposefully prolong what they are doing for as long as they are doing what they are doing, regardless of what their reasons are (if any). Compare the notions of “doing” and of “bodily moving”. It is not true of the latter notion that agents prolong their movements as long as they are moving. Suppose a doctor is probing your reflexes by tapping your knee, as a result of which your lower leg moves even if you form the firm intention to keep still. Then you are bodily moving even though you are not prolonging the movement while it is going on. So it is informative to say, about doings in contrast to other kinds of bodily movements, that agents prolong them while they are going on.

Creatures can prolong what they are doing for many reasons. They may prolong what they are doing because they are enjoying the activity or because they are pursuing some external goal, but they can also keep doing what they are doing if they have no apparent reason for doing so at all (think of the doodles you draw while on the phone with someone). Since everything we do eventually comes to an end, the notion of a doing is temporally indexed: it is only within a certain temporal interval that creatures prolong their doings. You thus can be doing something, in my sense of the term, even though what you are doing will end once you have achieved an external goal, or once you don’t find it interesting anymore, or once it is terminated by external factors (you have to do something else; you fall asleep). You still prolong the doing as long as it is going on. Call this the “intrinsic motivation” that is inherent to a doing. Intrinsic motivations are unlike distal intentions in that they could not be entertained outside of the doing. They are also different from what Searle (1983) calls “intentions-in-action” in that the prolongation of the doing does not require meeting internally represented conditions of satisfaction: it is not that creatures intend to prolong their doings and can succeed or fail in doing so. If they are doing something, they are necessarily prolonging what they are doing within the doing’s temporal boundaries. All this is compatible with the possibility that the doer entertains intentions, distal or not, and that these intentions play a causal, explanatory or justificatory role in the doing. But the notion of a doing is compatible also with the view that intentions play no role in (some of) the things we do and that there nevertheless is a distinction to be drawn between doings and reflex-like bodily movement.Footnote 10

Doings are always carried out in environments that are perceptually present to the agent, and they “involve” objects or scenes in the environment. “Involvement” here is a technical notion designed to capture doers’ purposive engagement with the environment while avoiding having to spell out this engagement by appeal to intentions that represent their conditions of satisfaction. Doings involve objects in the sense that interaction with the object constitutes the doing in question. For example, touching an object constitutively involves the object: you can only touch it if there is in fact direct contact between your body and the object. Touch does thus not have conditions of satisfaction that can be spelled out relative to physical contact (though “trying” or “intending” to touch has such conditions). Other kinds of doings can then be modelled on touch. Suppose you are pointing at a distal object, where the pointing qualifies as a doing (it is being prolonged while it is going on; it makes use of proprioception; it involves the distal object). The pointing involves the object in the sense that the doing could not take place without the object being pointed at. “Involvement” thus does not require physical manipulation. It only requires that specifying the doing, as the kind of doing it is, constitutively includes mentioning the object, so that you could not execute the doing if the object were not there, say. Like knowing and perceiving (but unlike believing), doing is in this sense factive.

Some doings are carried out with other people. And some of these social doings can only be done with other people. You cannot play a game of tennis by yourself, you cannot seesaw by yourself, and you cannot jointly attend to a target by yourself. Call these doings “joint”:

(JD) A joint doing consists of at least two creatures’ bodily movements by way of which they prolong what they are doing, where the doing involves one object that is perceptually present to both creatures and where its prolongation requires that each creature co-ordinate its own movements with the movements of the perceptually present other creature.

More would need to be said about the crucial notion of co-ordination for a complete account, but JD is sufficient for present purposes. I understand joint attention as a kind of joint doing whose objects are, or could be, out of reach and that thus requires participants to deploy techniques, such as perceptual attention, that enable them to involve such objects. The participants in a joint doing exercise a kind of joint know-how: they know how to co-ordinate their movements with those of their co-agent so as to prolong what they are jointly doing. This minimal account of joint know-how, based as it is on the technical notion of a doing, gets by without requiring either subscription to or rejection of (some version of) the notion of motor intentionality. It only requires that the contributions of agents by way of which they maintain an episode of joint attention are describable as purposive movements that are being prolonged by these agents while they are going on. This requirement has two implications for the view of joint attention I am developing. First, on the resulting view joint attention is something agents do purposively. Joint attention is always endogenous; it could not be in its entirety the consequence of external factors. This is plausible: even though you and I can be made to look at the same object by external factors (a loud bang; a flash of light), this does not by itself to joint attention. Joint attention always requires purposive co-ordination of bodily movement. Secondly, participants in joint attention always have an intrinsic motivation to prolong the episode for as long as they do. Even where you and I jointly attend to the wallet I am forcing you to hand over to me at gunpoint, you are intrinsically motivated to attend to the wallet with me while the episode is going on. This is compatible with you not wanting to hand over your wallet to me and not wanting to be interacting with me at all. It only amounts to the difference between you attending to the wallet with me and refusing to coordinate your attention with mine, for instance by directing your gaze elsewhere.

3.2 Social Triangulation and Spatial Common Knowledge

Joint agents who prolong an episode of joint attention co-ordinate their movements with those of their co-agents and co-perceivers. This requires them to adapt their movements to those of the other agent and to put the other agent in a position to adapt their movements in turn. For this to be possible, they cannot simply react to the other’s contributions. They have to take an active role to put the other agent in a position to contribute to the joint doing.Footnote 11 Consider a case of joint attention: if you and I are to jointly attend to a target, it is not sufficient that I follow your pointing gesture so that my gaze comes to focus on the object you are making salient. My own movements have to contribute to what we are jointly doing. That is, I have to check whether the object I am focusing on really is the one you are making salient and I have to move so as to make that object salient to you. In short, if we are to jointly attend to a target, we have to point it out and keep it salient to each other. This requires that we each triangulate the target’s location relative to the standpoint of the other person. Call this, “social triangulation”. Perceivers prolong an episode of joint attention by participating in a continuous process of making and keeping the target salient to each other by socially triangulating its location. When they are involved in this kind of mutual triangulation, they operate in what I call “social space”. Social space is a spatial framework in which targets are mutually singled out relative to the standpoints of agents’ co-perceivers. It thus is a framework in which not only one’s own but also one’s co-agent’s location is presented as a standpoint.

Perceivers who jointly attend to a target exercise a joint form of know-how, as described in the previous section. They then enjoy a practical kind of spatial knowledge: they always know where the target is located relative to the standpoint of their co-perceiver. This knowledge does not entail that perceivers know where the object is in allocentric space: successful social triangulation is possible even if the target’s location in allocentric space is misrepresented (by a mirror arrangement, for instance). Since social triangulation is dynamic (it requires that each perceiver continuously adapt their pointing gestures and other motor movements to those of their co-perceiver), the practical knowledge of a target’s location in social space is of a joint kind: no individual agent could have it on their own.Footnote 12

Some joint perceivers are capable of entertaining and linguistically communicating propositions that express the location of the target object in social space. These perceivers can communicate their practical knowledge of a target’s location in social space to their co-perceivers by saying things like,

(PK) THIS* is the location L of the target T we are looking at,

Where their utterance of PK is accompanied by the kind of pointing gesture that is apt to help prolong an episode of joint attention and the uttered token of “THIS*” refers to the location of the target in the social spatial framework in which the involved perceivers operate. When they communicate with their co-perceivers by uttering PK and accompanying their utterance with the right kind of pointing gesture, speakers and hearers acquire luminous common knowledge of the object’s location in social space. It is a form of common knowledge because no speaker could entertain it on their own, because knowing the proposition expressed by an utterance of PK requires that one’s addressee know it also and because it is always true that if one of the communicators knows it, then each communicator knows that each communicator knows it.Footnote 13 The communication that takes place between the speaker and the hearer of PK can only be an exchange between two participants in an episode of joint attention. It expresses and makes explicit spatial knowledge that speaker and hearers, as joint perceivers and agents, already possess in practical form.

In joint perceptual contexts, PK is always true. The experience of joint attention supplies perceivers with reasons for PK. When asked why they are saying that PK, a speaker can always reinforce the salience of T by pointing it out to the hearer. Joint scenarios thus meet the conditions stipulated by Lewis: in an episode of joint attention, all joint perceivers have reason to believe that the perceptual scenario is joint; the joint scenario indicates to all perceivers that all perceivers have reason to believe that the perceptual scenario is joint; and the perceptual scenario indicates to all perceivers that PK. Once perceivers know in common that PK, they can construe iterations of what each perceiver knows about each perceiver’s knowledge that PK. Thus common knowledge is luminous: when perceivers know in common that PK, they each know that each knows that they know in common that PK. However, its luminosity is epistemologically unproblematic. The iterations that make PK luminous are entailed by the common knowledge that PK; they do not constitute this knowledge. The regress really is harmless.

This account avoids the base problem as it arises for Lewis and Schiffer. The perceptual base of common knowledge is joint attention, joint attention is defined as a process that is maintained by way of an exercise in joint know-how involving a distal object, and joint know-how is describable without appeal to agents’ cognitive (or otherwise intentional) states. Some of these agents can express their joint know-how linguistically. When these agents manage to linguistically communicate to each other the location of the target of their joint attention in social space, they know this location in common and this common knowledge is necessarily luminous. This account of joint attention can be inserted into Lewis’s account so that it meets the conditions of the base of common knowledge. For Schiffer, the problem was that on one reading the condition of the openness of normalcy could only be met if it was commonly known between perceivers, which led to a vicious circle. The problem can be avoided on an account of joint attention on which the involved perceivers’ normalcy is out in the open between perceivers in virtue of their experience of jointly attending to a target. The enacted theory of joint attention delivers such an account. The experience of joint attention is the experience of a process that is maintained through the exercise of a joint know-how. When a perceiver enjoys this experience, she is participating in a joint doing. When such perceivers can linguistically communicate the location of the target of their joint attention to each other, they luminously know the location of this target in common. These perceivers then also are in a position to know that the conditions of normalcy obtain, and that this is common knowledge between them. The enacted account turns things upside down with regard to the condition of normalcy: it suggests that where perceivers exercise a joint form of know how that involves distal perceptual objects, they are normal in the required sense. Common knowledge of normalcy thus conceived is a consequence, not a prerequisite, of successfully exercised joint know how.

The view that joint attention is a process that is maintained by perceivers’ joint know-how is attractive not only because it helps avoid the base problem for some classic theories of common knowledge. It also provides answers to some of the questions arising for the experiential account of joint attention proposed by Campbell and others. The enacted account conceives joint attention as a temporally extended process that is prolonged by way of the contributions of its participants. Each participant’s experience then is of the process that they help constitute. On this view, the other person enters the perceiver’s experience because the experience is of a process that is co-constituted by the contributions of each perceiver. And the experience has three constituents because it presents the target as singled out via a process of triangulation that, for each perceiver, takes the co-perceiver’s location as a standpoint. All this is compatible with the core tenet of the experiential view that the experience of joint attention is primitive. It is primitive in the sense that it cannot be reductively analysed in terms of the cognitive or phenomenal states of its participants. Of course, it is not primitive in the sense that nothing more could be said about it, but this is not a demand that could be met by any plausible theory.

For all this, the sceptic can still respond that the enacted account of joint attention does not explain how to handle the possibility that an individual perceiver may be mistaken about participating in a joint scenario. You can come to falsely believe that I am jointly attending to a target with you by misconstruing my direction of gaze, for instance. It is even possible that two perceivers each come to falsely believe that PK and thus each falsely believe that they know in common that PK. The objection is that common knowledge, though luminous, does not forestall the possibility of mutual false belief. This is, of course, true. But the argument is misguided. The challenge gets off the ground because of an implicit commitment to reductionism about the experience of joint attention. It takes it, tacitly, that what you can think of as the “subject base” of the experience is the individual perceiver, and that, therefore, the possibility of falsidical experience needs to be settled at the individual level. On such a view, however, joint attention could not produce luminous common knowledge and the base problem would have no solution. The enacted view is designed precisely to avoid this unattractive conclusion. The defender of enactivism does not have to deny that the bearers of joint experiences are individuals. But since the experience is of a joint process that resists decomposition into the individual contributions of its participants and that produces a minimal form of common knowledge in these participants, the experience of the participant is of an epistemologically different kind than that of the solo perceiver who falsely believes that she is participating in such a process. The defender of the enacted view is thus committed to a social version of epistemological disjunctivism about experience (Seemann 2019, pp. 67–72). On this view, the observation that perceptual mistakes are always possible does not imply that joint experiences are to be individuated by appeal to perceivers’ individual beliefs about the character of their experience.

4 The Base Problem and Collective Intentionality

I have argued that some classic analyses of common knowledge face the base problem and I have suggested that an enacted theory of joint attention can help avoid this problem. In this section I show that this theory does not just remove a difficulty for these analyses of common knowledge. It is also important in the context of some discussions about joint action and collective intentionality. The relevant theories are sometimes called “reductive” (Blomberg 2015). They are given this label because they attempt to explain how agents can act jointly via an analysis of the interrelation of individuals’ mental states. Broadly, for joint action to be possible, each agent has to intend to jointly do x with the other person; each agent has to intend to contribute what is required of them to bring about x; each agent believes that the other person intends to contribute what is required of them to bring about x; and they each to contribute what is required of them because they believe that others intend to contribute what is required of them also.Footnote 14 So joint action is explained reductively in terms of a complex structure of individuals’ intentions that interlock in certain ways.

Theories of collective intentionality that are reductive in this sense are not, however, eliminative. They do not propose that a complete explanation of complex social phenomena is possible without appeal to collectivity concepts of any kind. A popular move is to require that the intentions and their interrelations that explain how agents can act jointly on shared goals are “out in the open” between them.Footnote 15 To explain this cognitive openness of the situation in which collective intentions can be formed and executed, some theories introduce a common knowledge condition without which the reductive analysis remains incomplete. The reductivism of this family of approaches to collective intentionality requires the epistemic openness of the relation between their relevant intentions and subplans and thus avoids eliminativism about the social by stipulating a common knowledge condition.

If the argument in the first part of this paper is correct and classic accounts of common knowledge face the base problem, then the problem arises also for reductive theories of collective intentionality insofar as they rely on these or related accounts. Consider Bratman (1999, p. 121):

We intend to J if and only if

  1. 1.

    (a) I intend that we J and (b) you intend that we J.

  2. 2.

    I intend that we J in accordance with and because of 1a, 1b, and meshing subplans of 1a and 1b; you intend that we J in accordance with and because of 1a, 1b, and meshing subplans of 1a and 1b.

  3. 3.

    1 and 2 are common knowledge between us.

(3) Introduces the common knowledge condition that makes the theory susceptible to the base problem. Sometimes the condition can be satisfied by linguistic communication alone, in the absence of a perceptual context in which the action is executed. Suppose we intend (Searle’s example) to make a sauce hollandaise together later, and we now linguistically communicate our respective intentions and subplans (I will later pour the ingredients and you will mix them while I pour) to each other. Then we have satisfied the common knowledge condition. There is a question whether this kind of linguistic communication itself requires some kind of joint perceptual context, but I am leaving this question aside for present purposes. I am interested in cases in which a collective intention is directly tied to a perceptual joint action context in which agents coordinate their motor movements in pursuit of a shared goal. Suppose we are in our kitchen and the communication of our intentions and subplans is at least in parts effected by our actions: having established earlier that we want to make a sauce together but having left open how, I now fetch the ingredients, demonstratively put the bowl on the table and hand you the mixer; you take the mixer from me; I begin to pour the ingredients and you mix them. Then some of the epistemic openness that satisfies the common knowledge condition is perceptually constituted. In these cases, going by BP, a non-viciously circular and non-viciously regressive account is needed of the perceptual base that makes available common knowledge of the agents’ intentions and thus ensures that these intentions are out in the open between perceivers. I have offered such an account: the enacted view of joint attention explains how joint perceivers capable of linguistically expressing some of the practical knowledge by way of whose execution they prolong a process of joint attention can acquire a minimal form of perceptual common knowledge about the location of its target. Elsewhere (Seemann 2019, pp. 73–77) I have called this kind of common knowledge “primary”: it could only be enjoyed in virtue of agents’ exercise of a perception-based form of joint know-how, and it is always luminously enjoyed by joint agents who can linguistically communicate to each other propositions of the form PK.

Once this primary kind of perceptual common knowledge is established through linguistic communication (and the performance of suitable accompanying demonstrative gestures), speakers can expand on it. They can, for instance, communicate to each other their intentions about the referent of a token of “THIS*”. In this way they can acquire a secondary kind of perception-based common knowledge. In the above example, our meshing subplans in the making of the sauce are practically known by us in the way we prolong what we are jointly doing, and they are thus out in the open even if they are not explicit common knowledge between us. But we can communicate to each other propositions that express the subplans by way of whose execution we seek to realise our collective intention, in ways that demonstratively reference what we are doing. I can assert, while moving so as to contribute to our joint making of the sauce, that I intend to contribute to my intention to make a sauce hollandaise with you by pouring the ingredients by way of my present movements and you can assert that you intend to contribute to your intention to make a sauce hollandaise with me by mixing the ingredients by way of your present movements. In the kinds of projects under consideration, what we are each doing requires coordination between our respective contributions. We can then formulate propositions that connect our collective intentions to the joint doings by way of which we realize these intentions. They take something like the following form:

(CP) I intend that we J by me φ-ing like *thus and you Ψ-ing like *so,

where “J” stands for our joint action, “φ” and “Ψ” for the subplans by way of which we realise J, and “*thus” and “*so” are the demonstratives whose token utterance refers to the motor movements by way of which we carry out our subplans. In the kinds of scenarios in which an utterance of “*thus” or “*so” refers to the motor movements by way of which agents participate in a joint doing, these subplans necessarily mesh in Bratman’s sense. When participants in a joint doing communicate with each other by expressing propositions of the form CP and accompanying them with suitable demonstrative gestures, they acquire propositional common knowledge of what it is that they are jointly doing. CP then plays exactly the same role for the intentions with which joint agents act in the pursuit of collective goals as PK does for the location of the object of a joint doing: in each case, the communication of these propositions between the participants in a joint doing transforms joint know-how into propositional common knowledge. Agents who can communicate and thus come to know CP can always also communicate and thus come to know in common that PK: in order to know what subplans agents are jointly executing in pursuit of a joint intention, they have to know in common where the target is that is involved in the execution of their subplans. The converse, however, is not true: it is always possible that speakers who seek to expand their collective knowledge by communicating to each other propositions expressing what they are doing fail to do so. Agents can be joint doers, know in common where the object is that is involved in what they are doing, and yet be mistaken about the collective intentions with which they believe to be executing their actions. Secondary common knowledge presupposes primary common knowledge, but the reverse is not true.

I have argued that on the enacted view of joint attention, the base problem can be avoided for reductive analyses of collective intentions that rely on classic theories of common knowledge, at least for scenarios in which there is a direct demonstrative connection between agents’ joint doings and the intentions with which they are acting. The general idea is the same as in the discussion of common knowledge in part two of this paper: joint agents exercise an object-involving form of know-how by which they prolong what they are doing, and this know-how is describable without appeal to interlocking intentional states that are out in the open between the agents. They can then, given the requisite conceptual and linguistic capacities, communicate to each other propositions that express facts about the target they are acting on or the joint doing they are performing, and they can demonstratively connect these propositions to the doing and its target. This strategy always yields a minimal primary kind of spatial common knowledge that, because it is effected through linguistic communication, is always luminous and therefore solves the base problem for common knowledge. But it may also produce common knowledge of the subplans by way of whose joint execution agents pursue their collective intentions. Differently from the primary kind, this secondary form of common knowledge is not necessarily available to linguistic joint agents: they may be involved in a joint doing but form mistaken beliefs, even false mutual beliefs, about the intentions with which they are acting.