Principles of Videoconferencing
In this work, I am suggesting the development
of a morphology of videoconferenced communication.
The ideal system is flexible, so that each user can configure it
and create and assign meanings according to his/her preference.
Much is lost when people cannot communcate face-to-face.
The following notes represent an attempt
to work toward the minimizing of the dysjunctions that have typified videoconferencing
and to compensate with new abilities enabled by electronic communication.
The Position of the Camera
Videoconferencing is a two-way medium. A can see B, and B can see A.
For A to see B's face, and for A to present his/her face to B, A must do two things:
A must look at a screen (which shows B's face),
and A must look towards a camera (which is directed at A's face).
For A to be able to do both these acts simultaneously,
the screen (showing B's face) and the camera (picking up A's face)
must be near each other (or nearly in line with each other).
Presently, the industry standard is for the camera to be positioned above the screen.
This is good, but not ideal, for if a (normal) camera is positioned above the screen,
A has two unsatisfying choices: if A looks into the eyes of (the screen image of) B,
A's gaze is directed beneath the camera, thus B will see an image of A looking downward;
if, on the other hand, A looks directly into the camera, A can't look directly at B's eyes.
In such cases, A generally looks at the screen,
with A not realizing or caring that on the screen B is watching,
(the image of) A appears as looking downward--
and with A not realizing or caring that on the screen s/he is watching,
B also appears as looking downward.
The first step toward improving this situation is to realize that there
is indeed a problem here,
that a non-connection is occurring, that the connection could be more direct.
The solution to the above-described problem is for the point of view
(p.o.v.) of the camera
that is directed at A's face to be coming from the spot directly between the eyes of B's screen image--
if B's face is filling the screen, then the camera's p.o.v. would be from 2/3 of the way up, vertically,
and centered on the screen, horizontally.
The main logistical difficulty in implementing this solution
is that if the camera is placed in front of the screen,
it partially blocks the screen image (of B's face).
This difficulty can be overcome in a number of ways:
1) The camera and the arm supporting it can be miniaturized.
2) The camera can remain above or beside the screen, but using mirrors,
the p.o.v. of the camera can be manipulated.
3) The camera can be embedded in the screen,
or can be placed behind the screen (if the screen is translucent)--
although if video were projected from behind the screen,
a camera behind the screen would cast a shadow.
4) Additional screens at the A site can display B's image unblocked--
not for A's benefit, but for others onsite.
The purpose of this discussion is not to insist
that a videoconference is invalid if the p.o.v. of the camera
does not originate at the spot between the eyes of (the screen image of) one's distant partner,
but rather to point out that the fullest, most direct and satisfying connection occurs when this is the case.
If, for practical reasons, the ideal can not be achieved,
at least the participants should be aware of the lack
and do what they can to compensate for it.
Even if it is possible to place the p.o.v. of the camera between one's
partner's image's eyes,
I do not mean to suggest that it must be kept there throughout a video conference.
My purpose here is only to attempt to establish that the p.o.v. of the camera
being between one's partner's image's eyes, is the ideal, the "default" position in a videoconference.
Seeing, Being Seen, and Seeing and Being Seen
To see and be seen in a videoconference,
one must remain in the camera's field of vision and look in one direction--
unless one is using miniaturized equipment attached to one's head,
or unless there are monitors and cameras all around the space.
If, in the course of a videoconference, one does not need to see the
one can move around the space and face in different directions.
The camera (which is sending one's image outward)
must then likewise be moved around and kept in front of one--
if one wants to continue sending a frontal image to one's distant partner.
Within a single videoconference, participants can decide
to use a variety of screen configurations.
Different configurations can be appropriate
for different relational and dramatic situations.
Sometimes participants do not want to see their own images,
other times they do: a good system should allow both options.
Among the options for screen configuration are:
1) "Just the Other."
One can choose to have the other party's image fill the entire screen
one is looking at. In a multiparty videoconference, a common system
is for one's voice to make one's image appear on the screens others are seeing--
that is, the present (or latest) speaker is seen by all.
2) "Split Screen."
A's image can be on one side of the screen, B's image on the other.
3) "Compartmentalization." (grid).
This is an elaboration of the split screen method.
If there are 4 parties, the screen can be divided into 4 rectangles (2 horizontal, 2 vertical);
if there are 30 parties, the screen can be divided into 30 rectangles (6 horizontal, 5 vertical); and so on.
4) "Picture-in-Picture" (p-in-p).
A looks at the screen: B's image fills the screen,
and A's image appears in a small rectangle in the screen's lower right corner.
Possible variations: The rectangle can be any proportion or size.
The rectangle can be moved around the screen.
Instead of a rectangle, the "p in p" can be a circle.
(To give an example of a dramatic use
of this kind of shaping and positioning of images--
the circle, showing A, could represent a crystal ball
which could placed above the image of B's hand.)
The images of the two (or more parties) can be placed within a single frame.
If this is done, peoples' images overlap when they move toward each others' images.
The mixture can be equal, or one image can be more intense than the other(s).
6) "Keyed and Superimposed."
The image from one side can be "keyed"--
that is, it can be reduced to a flat solid color, which can be dense or just an outline.
This can look like a mask. This mask can then be superimposed upon
the full regular recatangular video image produced by the other side.
7) "ChromaKeyed and Superimposed."
A solid color (traditionally, blue) behind A is replaced by the image supplied by B--
that is, A's face could be superimposed on the full regular recatngular video image
supplied by B.
If the object is for both sides to see the identical picture,
one side must do the screen configuring.
(It is easy to decide who will do it if only one side has the configuring equipment.)
If A is configuring, he/she is combining B's incoming picture with the local picture of A,
and A then can both view and send out to B the composite picture A has created.
One value of Mixing and Superimposing is that
in cases where one wants to see images of both oneself and one's distant partner,
one can do so by looking at one spot--
one does not have to constantly look back and forth between self and other.
(If one has access to the mixing control mechanism,
one can briefly and/or partially bring up one's own image,
for one's own viewing or for both viewers).
A basic question here is whether all parties want to see the same composite
or whether each party wants to configure and select for itself.
There are cases where each is preferable,
and, again, the ideal system allows for both options.
If a person at site A is speaking and a number of people at site B are
the screen might be configured in the following way:
Either through the use of P-in-P, Mixing, or Superimposing,
a close-up of the speaker's face might appear in the center of the picture,
and a wide shot of multiple listeners at site B might fill the periphery.
Or--numerous close-ups of listeners' faces might be arranged in the periphery.
(These listeners could all be at site B, or could each be at a different site.)
In such cases, the size, nature, and location of a person's image
could indicate the role he/she is playing in the videoconference at that moment.
I am looking forward to experimenting in this area:
my hyposthesis is that these factors--especially placement of images in figurative designs--
greatly influences the form and content of conversations.
One way to process an image is to key it, as mentioned above.
An image can be keyed to the color of one's choice.
An image can also be mosaic-ized (pixelated), frozen, stretched, etc.
Processing one's own or one's distant partner's image
can be a means of expressing one's feelings.
There is no need to say, "Doing such-and-such always means a particular thing,"
but conventions, whether suggested by instinct or culture, do develop.
For examples: Keying a person's face to any color can signify a rush
in the person doing the keying and/or in the person whose image is being keyed.
Keying a face to red may signify anger or passion.
Mosaic-izing a face may signify that this person is feeling
out-of-focus, bored, confused, and is withdrawing emotionally--
traditionally (in video) this technique has been used as a transition to an edit.
Freezing a face may signify that the person doing the freezing
needs to think about the current matter for a moment--
a freeze-frame may mark a significant moment, a highlight.
Freezing the picture can be like putting a bookmark in a book,
or like underlining text on a page.
Drawing provides yet another track, level,
on which videoconferencing partners can communicate.
Since so many tracks are usually lost
when people give up face-to-face communication
(such as smell, heat, and the three-dimensional kinetic sense),
it seems fair to in return provide other tracks in videoconferencing,
including some not usually available in face-to-face communication.
What I am trying to develop in the following discussion of electronic
in the tradition of Wassily Kandinsky's work--a general lexicon,
to suggest a grammar of aesthetic expression in the videoconferencing context.
As in the cases of Screen-Configuring and Image-Processing,
with Drawing there can be no absolutes in ascribing meaning to visual markings.
Each individual and group of individuals should be free
to decide upon many things for themsleves--
and only they need know what they mean to convey to each other
by each particular aesthetic manipulation.
Nonetheless, I am pointing out and suggesting
some possible default meanings for certain markings.
These standard, default meanings can always be changed and/or overridden,
but there may be value in recognizing what they are (if indeed there are any universals).
The default locations for drawing in accompaniment of faces
is in the periphery of the screen around those faces,
and/or along the border (if there is any) that separates the faces.
Making designs around the periphery of the central faces
can signify that the drawer is engaged in and enjoying the conversation
(it can also signify boredom). Making such designs all around the periphery
can signify a certain steadiness, patience, satisfaction--
there is no need to interrupt the pattern, no need to interrupt the cycle
until it is complete, coming full circle.
To abort a peripheral development may signify impatience with the conversation.
By going full circle around the periphery, ending where one began,
maintaining the pattern throughout, creating a symmetrical and balanced composition,
one may be expressing contentedness (or one may be repressing true feelings).
Once one completes such a decorative pattern, one can erase it at once or a bit at a time.
It is common for videoconferencing participants to draw along the outlines
faces, bodies, borders, objects in the background.
Outlining is an act of adornment, embellishment--
one is following, accepting, internalizing and re-externalizing
what has been presented to one. One is going along with the program.
Drawing over a face is an act of oppresion and subversion.
It is, in a sense, an interruption. This is often done in great emotion, or in humor.
Perhaps one is saying, "I can't say this in words or through facial expressions--
the quickest way to express what I mean is just to visually annihiliate your image."
Usually the marking is erased quickly so that an unobscured vision of the face can return.
If one leaves markings over the face of one's conversation partner,
that marking provides an ongoing comment about that face and what is coming out of it.
Abstract designs: Repetitive decorations, embellishments, ornaments
are the equivilant of "doodling" on a piece of paper.
This activity is not subversive of what is being discussed,
but it may signify that the conversation is not holding one's entire attention--
or, the designs may ornament and so comment upon the video image (and the verbal conversation).
Abstract designs can be a conscious, semi-conscious, or unconscious expression of emotion.
They also signify--in that they result from--a physical action
(the movement of one's hand, arm, or whatever part of one's body is feeding data through the input device.)
Representational designs: Surrounding a face with realistic, or figurative,
can connote the successful integration of that individual with the objects, or environments, being represented.
Or, the objects drawn can just represent things on that person's mind.
The act of drawing figurative objects in itself can signify that the drawer is enjoying him/herself,
or it may signify a desire to teach, educate, communicate something in particular.
The drawer is creating patiently and with discipline, working within the system--
the system of codifying and representing--
and is giving, sharing the product as a sort of a gift.
(This true of drawing in videoconferencing in general.)
Spatial: Through the development and/or repetition of visual motifs,
one creates a visual spatial rhythm. If, for instance,
the spiral loops one is drawing around a periphery
start getting larger, smaller, or irregular in shape,
one is disrupting the spatial rhythm--
this may be a comment on the general relationship and conversation.
Temporal: The rate at which one creates markings is the temporal rhythm.
Drawing can be done to its own temporal rhythm,
or it can be done in time with the spoken words
or with the movements of the face(s) on the screen.
With both spatial and temporal rhythms,
it is the changes in rhythm that draw attention and make statements.
The Psychology of Videoconferencing (incomplete)
In a library, it has been said, differentiation in time is overcome
to some extent
in that one can experience thoughts of authors who lived long ago.
Through videoconferencing, differentiation in space can be overcome (also only to some extent).
While videoconferencing provides a higher degree of contact
than most other forms of literary or electronic communication,
it is not and never will be a substitute for face-to-face communication.
In face-to-face communication, there is always the possibility of physical contact,
be it positive or negative, and physical contact continues to be a highlight of human existence.
Participants in a videoconference can only simulate physical togetherness
and plan for actual togetherness to occur in the future.
Thus, through videoconferencing (like all mediated communication),
one plans regarding ultimate unions, the unions which actually occur in face-to-face experience.
Perhaps because it potentially offers so much
there tends to be extreme disappointment when a videoconfeence does not work.
Especially in this early stage of the field,
participants in a videoconference need to always be emotionally prepared
to scale down and simplify the event, should some tracks not be working at the moment.
One must always keep in mind that good will is the most important factor
in enabling good and productive communication.