How does time affect our ability to perceive and reason with causality?
A randomized controlled trial conducted in 2001 tested whether prayer could improve patient outcomes, such as reducing how long patients were in the hospital.1 The double-blind study (neither doctors nor patients knew who was in each group) enrolled 3,393 adult hospital patients who had bloodstream infections, with approximately half assigned to the control group and half to the prayer intervention group. Of the measured outcomes, both length of hospital stay and fever were reduced in the intervention group, with the difference being statistically significant (p-values of 0.01 and 0.04).
Yet if this intervention is so effective, why aren’t all hospitals employing it? One reason is that the patients in this study were in the hospital from 1990 to 1996—meaning that the prayers for their recovery took place long after their hospital stays and outcomes occurred. In fact, the prayers were not only retroactive, but also remote, occurring at a different place and time and by people who had no contact with the patients.
A cause affecting something in the past is completely contrary to our understanding of causality, which usually hinges on causes preceding their effects (if not being nearby in time as well) and there being a plausible physical connection linking cause and effect. Yet the study was conducted according to the usual standards for randomized trials (such as double blinding) and the results were statistically significant by commonly used criteria. While the article elicited many letters to the editor of the journal about the philosophical and religious implications, issues of faith were not the point. Instead, the study challenges readers to ask whether they would accept results that conflict severely with their prior beliefs if they came from a study that conformed to their standards for methodological soundness and statistical significance.
Can you envision a study that would convince you that a cause can lead to something happening in the past? The point here is that even though the study seems sound, we are unlikely to believe the intervention is the cause, because it so violates our understanding of the timing of cause and effect. If your prior belief in a hypothesis is low enough, then there may be no experiment that can meaningfully change it.
Time is often what enables us to distinguish cause from effect (an illness preceding weight loss tells us that weight loss couldn’t have caused the illness), lets us intervene effectively (some medications must be given shortly after an exposure), and allows us to predict future events (knowing when a stock price will rise is much more useful than just knowing it will rise at some undetermined future time). Yet time can also be misleading, as we may find correlations between unrelated time series with similar trends, we can fail to find causes when effects are delayed (such as between environmental exposures and health outcomes), and unrelated events may be erroneously linked when one often precedes the other (umbrella vendors setting up shop before it rains certainly do not cause the rain).
How is it that we can go from a correlation, such as between exercise and weight loss, to deducing that exercise causes weight loss and not the other way around? Correlation is a symmetric relation (the correlation between height and age is exactly the same as that for age and height), yet causal relationships are asymmetric (hot weather can cause a decrease in running speed without running causing changes in the weather). While we can rely on background knowledge, knowing that it’s implausible for the speed at which someone runs to affect the weather, one of the key pieces of information that lets us go from correlations to hypotheses for causality is time.
Hume dealt with the problem of asymmetry by stipulating that cause and effect cannot be simultaneous and that the cause must be the earlier event. Thus, if we observe a regular pattern of occurrence, it can only be that the earlier event is responsible for the later one.2 However, Hume’s philosophical work was mainly theoretical, and while it makes sense intuitively that our perception of causality depends on temporal priority, it does not mean that this is necessarily the case.
When you see one billiard ball moving toward another, striking it, and the second being launched forward, you rightly believe the first ball has caused the second to move. On the other hand, if there was a long delay before the second ball moved or the first ball actually stopped short of it, you might be less likely to believe that the movement was a result of the first ball. Is it the timing of events that leads to a perception of causality, or does this impression depend on spatial locality?
To understand this, we now pick back up with the psychologist Albert Michotte, who we last saw in Chapter 2. In the 1940s, he conducted a set of experiments to disentangle how time and space affect people’s perception of causality.3 In a typical experiment, participants saw two shapes moving on a screen and were asked to describe what they saw. By varying different features of the motion, such as whether the shapes touched and whether one moved before the other, he tried to pinpoint what features contributed to participants having impressions of causality.
Michotte’s work is considered a seminal study in causal perception, though there has been controversy over his methods and documentation of results. In many cases it’s unclear how many participants were in each study, what their demographic characteristics were, exactly how responses were elicited, and how participants were recruited. Further, the exact responses and why they were interpreted as being causal or not are not available. According to Michotte, many of the participants were colleagues, collaborators, and students—making them a more expert pool than the population at large. While the work was an important starting point for future experiments, the results required further replication and follow-up studies.4
In Michotte’s experiments where two shapes both traveled across the screen, with neither starting to move before or touching the other (as in Figure 4-1a), participants tended not to describe the movement in causal terms.5 On the other hand, when one shape traveled toward the other, and the second moved after its contact with the first (as in Figure 4-1b), participants often said that the first circle was responsible for the movement of the second,6 using causal language such as pushes and launches. Even though the scenes simply depict shapes moving on a screen, with no real causal dependency between their trajectories, people still interpret and describe the motion causally.7 This phenomenon, where viewers describe the motion of the second shape as being caused by the first shape acting as a launcher, is called the launching effect. Notably, introducing a spatial gap between the shapes (as in Figure 4-1c) did not remove impressions of causality.8 That is, if the order of events remained the same, so that one circle moved toward another, stopped before touching it, and the second circle started moving right after the first stopped, participants still used causal language. From this experiment, it seems in some cases temporal priority might be a more important cue than spatial contiguity, but this may depend on features of the problem and the exact spatial distance.
While the original methodology cannot be replicated exactly from the published descriptions, other work has confirmed the launching effect. Its prevalence, though, has been lower than suggested by Michotte, with potentially only 64–87% of observers describing the motion as causal when they first see it.9
Now imagine one ball rolling toward another. The first ball stops once it touches the second, and, after a pause, the second ball starts to roll in the same direction the first had been moving. Did the first ball cause the second to move? Does it matter if the delay is 1 second or 10 seconds? Hume argued that contiguity in space and time are essential to inferring a relationship, but in practice we don’t always see every link in a causal chain. To examine the effect of a delay on impressions of causality, Michotte created scenes just like those we have seen with the two balls, with a pause between one shape ending its motion and the other beginning to move, as shown in Figure 4-1d. He found that despite the spatial contiguity (the shapes did touch), a delay in motion eradicated all impressions of causality.10
Aside from questions about the participants’ level of expertise (and how much they knew about both the experiments and Michotte’s hypotheses), one of the limitations of these experiments is that the participants are only describing the behavior of shapes on a screen, rather than attempting to discover a system’s properties by interacting with it. Think of this as the difference between watching someone push an elevator call button and seeing when the elevator arrives, and being able to push the button yourself at the exact intervals of your choosing. While Michotte’s work showed that people might describe scenes in causal terms under certain conditions, what happens in a physical system where participants can control when the cause happens?
Building on Michotte’s studies, Shanks, Pearson, and Dickinson (1989) conducted seminal work on how time modulates causal judgments, and unlike Michotte’s experiments, the system was an instrument with which the participants interacted. Here, pushing the space bar on a keyboard led to a flashing triangle being displayed on a computer screen, and participants had to determine to what extent pushing the space bar caused the triangle to appear.
With physical objects, we have good reason to suspect that one object doesn’t cause another to move if there’s a long delay between the objects coming in contact and movement beginning. In other cases, though, effects should not be expected to appear instantaneously. Exposure to a pathogen does not immediately cause disease, regulatory policies can take years to have measurable effects, and weight loss due to exercise is a gradual process. It seems problematic, then, that experiments seem to show that delays always reduce judgments of causality or lead to spurious inferences.
More recent research has found that while delays may make it more difficult to correctly judge causality, this could in part depend on what timing people expect to see. While a 10-minute delay between hitting a golf ball and it moving conflicts severely with our knowledge of physics, a 10-year delay between exposure to a carcinogen and developing cancer is not unexpected. The role of the delay length may depend in part on what we already know about the problem and how we expect things to work. In many of the psychological experiments mentioned so far, the setups are evocative of situations participants are familiar with and in which they expect an immediate effect. For example, Michotte’s moving circles stand in for balls (where one expects the second ball to move immediately upon being hit, and a delay would be unusual), and Shanks et al.’s work involved keyboards (where one expects a button press to yield a quick response). On the other hand, if participants were given scenarios such as evaluating whether smoking caused lung cancer given the smoking history for an individual and their lung cancer diagnosis, they might find that a person taking up smoking and being diagnosed with cancer a week later is highly implausible, as smoking likely takes much longer to cause cancer.
To investigate this, Buehner and May (2003) performed a similar study as Shanks et al., except they manipulated the participants’ expectations by giving them background knowledge that there could be a delay between pressing a key and the triangle lighting up. Comparisons between two groups of participants, where only one received information about the potential delay, showed that while delays always led to lower judgments of the efficacy of causes, the instruction reduced this effect. Further, the order of experiments (whether participants saw the delayed or contiguous effects first) significantly affected results. That is, if participants saw the delayed effect first, their causal ratings were much higher than if they experienced the contiguous condition first. This effect of experience lends support to the idea that it is not simply the order of events or the length of a delay that influences judgment, but how these interact with prior knowledge. Participants in Michotte’s experiments saw circles moving on a screen, but interpreted the shapes as if they were physical objects and thus brought their own expectations of how momentum is transferred.
While prior information limited how much delays reduced judgments of causality in Buehner and May’s study, this effect was puzzlingly still present even though participants knew that a delay was possible. One explanation for the results is that the experimental setup still involved pushing a button and an effect happening on the screen. It’s possible that strong prior expectations of how quickly computers process keyboard input could not be eliminated by the instructions, and that participants still used their prior experience of the timing of button presses and responses even when instructed otherwise.
In each scenario, while the delays no longer reduced causal judgment, participants still judged instantaneous effects as being caused even when that wasn’t supported by the information they were given about the problem. Part of the challenge is designing an experiment that ensures that participants have strong expectations for the delay length and that these are consistent with their prior knowledge of how things work. Later work used a tilted board where a marble would enter at the top and roll down out of sight to trigger a light switch at the bottom. The angle of the board could be varied, so that if it is nearly vertical, a long delay between the marble entering and light illuminating seems implausible, while if the board is nearly horizontal such a delay would be expected. This is similar to the fast and slow mechanisms used in the psychological experiments we looked at in Chapter 2. Using this setup, Buehner and McGregor (2006) demonstrated that, in some cases, an instantaneous effect may make a cause seem less likely. While most prior studies showed that delays make it harder to find causes, and at best can have no impact on inference, this study was able to show that in some cases a delay can actually facilitate finding causes (with a short delay and low tilted table reducing judgments of causality). This is a key contribution, as it showed that delays do not always impede inferences or make causes seem less plausible. Instead, the important factor is how the observed timing relates to our expectations.
Note that in these experiments the only question was to what extent pressing the button caused a visual effect, or whether a marble caused a light to illuminate, rather than distinguishing between multiple possible candidate causes. In general, we need to not only evaluate how likely a particular event is to cause an outcome but also develop the hypotheses for what factors might be causes in the first place. If you contract food poisoning, for example, you’re not just assessing whether a single food was the cause of this poisoning, but evaluating all of the things you’ve eaten to determine the culprit. Time may be an important clue, as foods from last week are unlikely candidates, while more recently eaten foods provide more plausible explanations.
Some psychological studies have provided evidence of this type of thinking, showing that when the causal relationships are unknown, timing information may in fact override over other clues, like how often the events co-occur. However, this can also lead to incorrect inferences. In the case of food poisoning, you might erroneously blame the most recent thing you ate based on timing alone, while ignoring other information, such as what foods or restaurants are most frequently associated with food poisoning. A study by Lagnado and Sloman (2006) found that even when participants were informed that there could be time delays that might make the order of observations unreliable, participants often drew incorrect conclusions about causal links. That is, they still relied on timing for identifying relationships, even when this information conflicted with how often the factors were observed together.
Now imagine you flip a switch. You’re not sure what the switch controls, so you flip it a number of times. Sometimes a light turns on immediately after, but other times there’s a delay. Sometimes the delay lasts 1 minute, but other times it lasts 5 minutes. Does the button cause the illumination? This is a bit like what happens when you push the button at a crosswalk, where it does not seem to make the signal change any sooner. The reason it’s hard to determine whether there’s a causal relationship is because the delay between pushing the button and the light changing varies so much. Experiments that varied the consistency of a delay showed that static lags between a cause and effect (e.g., a triangle always appears on the screen exactly 4 seconds after pressing a button versus varied delays between 2 and 6 seconds) led to higher causal ratings, and that as the variability in delays increased, causal ratings decreased.12 Intuitively, if the delay stays within a small range around the average, it seems plausible that slight variations in other factors or even delays in observation could explain this. On the other hand, when there’s huge variability in timing, such as side effects from a medication occurring anywhere from a day up to 10 years after the medication is taken, then there is more plausibly some other factor that determines the timing (hastening or delaying the effect), more than one mechanism by which the cause can yield the effect, or a confounded relationship.
Say a friend tells you that a new medication has helped her allergies. If she says that the medication caused her to stop sneezing, what assumptions do you make about the order of starting the medication and no longer sneezing? Based on the suggested relationship, you probably assume that taking the medication preceded the symptoms stopping. In fact, while timing helps us find causes, the close link between time and causality also leads us to infer timing information from causal relationships. Some research has found that knowledge of causes can influence how we perceive the duration of time between two events,13 and even the order of events.14
One challenge is that the two events might seem to be simultaneous only because of the granularity of measurements or limitations on our observation ability. For example, microarray experiments measure the activities of thousands of genes at a time, and measurements of these activity levels are usually made at regular intervals, such as every hour. Two genes may seem to have the exact same patterns of activity—simultaneously being over- or underexpressed—when looking at the data, even if the true story is that one being upregulated causes the other to be upregulated shortly after. Yet if we can’t see the ordering and we don’t have any background knowledge that says one must have acted before the other, all we can say is that their expression levels are correlated, not that one is responsible for regulating the other.
Similarly, medical records do not contain data on every patient every day, but rather form a series of irregularly spaced timepoints (being only the times when people seek medical care). Thus we might see that, as of a particular date, a patient both is on a medication and has a seeming side effect, but we know only that these were both present, not whether the medication came first and is a potential cause of the side effect. In long-term cohort studies, individuals may be interviewed on only a yearly basis, so if an environmental exposure or other factor has an effect at a shorter timescale, that ordering cannot be captured by this (assuming the events can even be recalled in an unbiased way). In many cases either event could plausibly come first, and their co-occurrence doesn’t necessitate a particular causal direction.
The most challenging case is when there is no timing information at all, such as in a cross-sectional survey where data is collected at a single time. One example is surveying a random subset of a population to determine whether there is an association between cancer and a particular virus. Without knowing which came first, one couldn’t know which causes which if they appear correlated (does the virus cause cancer or does cancer make people more susceptible to viruses?), or if there is any causality at all. Further, if a causal direction is assumed based on some prior belief about which came first, rather than which actually did come first, we might be misled into believing a causal relationship when we can find only correlations. For example, many studies have tried to determine whether phenomena such as obesity and divorce can spread through social networks due to influence from social ties (i.e., contagion). Without timing information, there’s no way to say which direction is more plausible.15
While some philosophers, such as Hans Reichenbach, have tried to define causality in probabilistic terms without using timing information (instead trying to get the direction of time from the direction of causality),16 and there are computational methods that in special cases can identify causal relationships without temporal information,17 most approaches assume the cause is before the effect and use this information when it is available.
One of the only examples of a cause and effect that seem to be truly simultaneous, so that no matter what timescale we measured at, we couldn’t find which event was first, comes from physics. In what’s called the Einstein–Podolsky–Rosen (EPR) paradox, two particles are entangled so that if the momentum or position of one changes, these features of the other particle change to match.18 What makes this seemingly paradoxical is that the particles are separated in space and yet the change happens instantaneously—necessitating causality without spatial contiguity or temporal priority (the two features we’ve taken as key). Einstein called nonlocal causality “spooky action at a distance,”19 as causation across space would require information traveling faster than the speed of light, in violation of classical physics.20 Note, though, that there’s a lot of controversy around this point among both physicists and philosophers.21
One proposal to deal with the challenge of the EPR paradox is with backward causation (sometimes called retrocausality). That is, allowing that causes can affect events in the past, rather than just the future. If when the particle changed state it sent a signal to the other entangled particle at a past timepoint to change its state as well, then the state change would not require information transfer to be faster than the speed of light (though it enables a sort of quantum time travel).22 In this book, though, we’ll take it as a given that time flows in one direction and that, even if we might not observe the events as sequential, a cause must be earlier than its effect.
Does a decrease in the pirate population cause an increase in global temperature? Does eating mozzarella cheese cause people to study computer science?23 Do lemon imports cause highway fatalities to fall?
Figure 4-2a depicts the relationship between lemon imports and highway fatalities, showing that as more lemons are imported, highway deaths decrease.24 Though these data have a Pearson correlation of –0.98, meaning they are almost perfectly negatively correlated, no one has yet proposed increasing lemon imports to stop traffic deaths.
Now look at what happens in Figure 4-2b when we plot both imports and deaths as a function of time. It turns out that imports are steadily decreasing over time, while deaths are increasing over the same period. The data in Figure 4-2a are actually also a time series, in reverse chronological order. Yet we could replace lemon imports with any other time series that is declining over time—Internet Explorer market share, arctic sea ice volume, smoking prevalence in the US—and find the exact same relationship.
The reason is that these time series are nonstationary, meaning that properties such as their average values change with time. For example, variance could change so that the average lemon import is stable, but the year-to-year swings are not. Electricity demand over time would be nonstationary on two counts, as overall demand is likely increasing over time, and there is seasonality in the demand. On the other hand, the outcome of a long series of coin flips is stationary, since the probability of heads or tails is exactly the same at every timepoint.
Having a similar (or exactly opposite) trend over time may make some time series correlated, but it does not mean that one causes another. Instead, it is yet another way that we can find a correlation without any corresponding causation. Thus, if stocks in a group are all increasing in price over a particular time period, we might find correlations between all of their prices even if their day-to-day trends are quite different. In another example, shown in Figure 4-3, autism diagnoses seem to grow at a similar rate as the number of Starbucks stores,25 as both happen to grow exponentially—but so do many other time series (such as GDP, number of web pages, and number of scientific articles). A causal relationship here is clearly implausible, but that’s not always obviously the case, and a compelling story can be made to explain many correlated time series. If I’d instead chosen, say, percent of households with high-speed Internet, there wouldn’t be any more evidence of a link than that both happen to be increasing, but some might try to develop an explanation for how the two could be related. Yet this is still only a correlation, and one that may disappear entirely if we look at a different level of temporal granularity or adjust for the fact that the data are nonstationary.
Another type of nonstationarity is when the population being sampled changes over time. In 2013, the American Heart Association (AHA) and American College of Cardiology (ACC) released new guidelines for treating cholesterol along with an online calculator for determining 10-year risk of heart attack or stroke.26 Yet some researchers found that the calculator was overestimating risk by 75–150%, which could lead to significant overtreatment, as guidelines for medication are based on each patient’s risk level.27
The calculator takes into account risk factors like diabetes, hypertension, and current smoking, but it does not and cannot ask about all possible factors that would affect risk level, such as details about past smoking history. The coefficients in the equations (how much each factor contributes) were estimated from data collected in the 1990s, so the implicit assumption is that the other population features will be the same in the current population. However, smoking habits and other important lifestyle factors have changed over time. Cook and Ridker (2014) estimate that 33% of the population (among whites) smoked at the beginning of one longitudinal study, compared to less than 20% of the same population today,28 leading to a different baseline level of risk and potentially resulting in the overestimation.29
We often talk about external validity, which is whether a finding can be extrapolated outside of a study population (we’ll look at this in much more depth in Chapter 7), but another type of validity is across time. External validity refers to how what we learn in one place tells us about what will happen in another. For example, do the results of a randomized controlled trial in Europe tell us whether a medication will be effective in the United States? Over time there may also be changes in causal relationships (new regulations will change what affects stock prices), or their strength (if most people read the news online, print ads will have less of an impact). Similarly, an advertiser might figure out how a social network can influence purchases, but if the way people use the social network changes over time, that relationship will no longer hold (e.g., going from links only to close friends to many acquaintances). When using causal relationships one is implicitly assuming that the things that make the relationships work are stable across time.
A similar scenario could occur if we looked at, say, readmission rates in a hospital over time. Perhaps readmissions increased over time, starting after a new policy went into effect or after there was a change in leadership. Yet it may be that the population served by the hospital has also changed over time and is now a sicker population to begin with. In fact, the policy itself may have changed the population. We’ll look at this much more in Chapter 9, as we often try to learn about causal relationships to make policies while the policies themselves may change the population. As a result, the original causal relationships may no longer hold, making the intervention ineffective. One example we’ll look at is the class size reduction program in California schools, where a sudden surge in demand for teachers led to a less experienced population of instructors.
New causal relationships may also arise, such as the introduction of a novel carcinogen. Further, the meaning of variables may change. For example, language is constantly evolving, with both new words emerging and existing words being used in new ways (e.g., bad being used to mean good). If we find a relationship involving content of political speeches and favorability ratings, and the meaning of the words found to cause increases in approval changes, then the relationship will no longer hold. As a result, predictions of increased ratings will fail and actions such as crafting new speeches may be ineffective. On a shorter timescale, this can be true when there are, say, daily variations that aren’t taken into account.
There are a few strategies for dealing with nonstationary time series. One can of course just ignore the nonstationarity, but better approaches include using a shorter time period (if a subset of the series is stationary) when there’s enough data to do so, or transforming the time series into one that is stationary.
A commonly used example of nonstationarity, introduced by Elliot Sober,30 is the relationship between Venetian sea levels and British bread prices, which seem correlated as both increase over time. Indeed, using the data Sober makes up for the example, shown in Figure 4-4a (note that units for the variables are not given), the Pearson correlation for the variables is 0.8204. While the two time series are always increasing, the exact amount of the increases each year varies and what we really want to understand is how those changes are related. The simplest approach is to then look at the difference, rather than the raw values. That is, how much did sea levels or bread prices increase relative to the previous year’s measurement? Using the change from year to year, as shown in Figure 4-4b, the correlation drops to 0.4714.
This approach, called differencing (literally, taking the difference between consecutive data points), is the simplest way of making a time series stationary. Even if two time series have the same long-term trends (such as a consistent increase), the differenced data may no longer be correlated if the daily or yearly fluctuations differ. In general, just differencing does not guarantee that the transformed time series will be stationary, and more complex transformations of the data may be needed.31
This is one reason work with stock market data often uses returns (the change in price) rather than actual price data. Note that this is exactly what went wrong with the lemons and highway deaths, and why we could find similar relationships for many pairs of time series. If the overall trends are similar and significant, then they contribute the most to the correlation measure, overwhelming any differences in the shorter-term swings, which may be totally uncorrelated.32
Is there an optimal day of the week to book a flight? Should you exercise in the morning or at night? How long should you wait before asking for a raise? Economists often talk about seasonal effects, which are patterns that recur at the same time each year and are a form of nonstationarity, but temporal trends are found in many other types of time series, such as movie attendance (which is affected by seasonality and holidays) and emergency room visits (which may spike with seasonal illnesses). This means that if we find factors that drive movie ticket sales in the winter, these factors may not be applicable if we try to use them to increase sales in the summer. Other patterns may be due to day of the week (e.g., due to commuting habits) or the schedule of public holidays.
While the order of events may help us learn about causes (if we observe an illness preceding weight loss, we know that weight loss couldn’t have caused the illness) and form better predictions (knowing when to expect an effect), using causes effectively requires more information than just knowing which event came first. We need to know first if a relationship is only true at some times, and second what the delay is between cause and effect.
This is why it’s crucial to collect and report data on timing. Rapid treatment can improve outcomes in many diseases, such as stroke, but efficacy doesn’t always decrease linearly over time. For instance, it has been reported that if treatment for Kawasaki disease is started within 10 days after symptoms start, patients have a significantly lower risk of future coronary artery damage. Treatment before day 7 is even better, but treatment before day 5 does not further improve outcomes.33 In other cases, whether a medication is taken in the morning or at night may alter its efficacy. Thus if a medication is taken at a particular time or just each day at the same time during a trial, but in real use outside the trial the timing of doses varies considerably, then it may not seem to work as well as clinical trials predicted.
Determining when to act also requires knowing how long a cause takes to produce an effect. This could mean determining when before an election to run particular advertisements, when to sell a stock after receiving a piece of information, or when to start taking antimalarial pills before a trip. In some cases, actions may be ineffective if they don’t account for timing, such as showing an ad too early (when other, later, causes can intervene), making trading decisions before a stock’s price has peaked, or not starting prophylactic medication early enough for it to be protective.
Similarly, timing may also affect our decision of whether to act at all, as it affects our judgments of both the utility of a cause and its potential risks. The utility of a cause depends on both the likelihood that the effect will occur (all else being equal, a cause with a 90% chance of success would be preferable to one with only a 10% chance), and how long it will take. Smoking, for example, is known to cause lung cancer and cardiovascular disease, but these don’t develop immediately after taking up smoking. Knowing the likelihood of cancer alone is not enough to make informed decisions about the risk of smoking unless you also know the timing. It’s possible that, for some people, a small chance of illness in the near future may seem more risky than a near certainty of disease in the far future.
However, when deciding on an intervention we’re usually not just making a decision about whether to use some particular cause to achieve an outcome, but rather choosing between potential interventions. On an episode of Seinfeld, Jerry discusses the multitude of cold medication options, musing, “This is quick-acting, but this is long-lasting. When do I need to feel good, now or later?”34 While this information adds complexity to the decision-making process, it enables better planning based on other constraints (e.g., an important meeting in an hour versus a long day of classes).
Time is one of the key features that lets us distinguish causes from correlations, as we assume that when there is a correlation, the factor that came first is the only potential cause. Yet because the sequence of events is so crucial, it may be given too much credence when trying to establish causality.
Say a school cafeteria decides to reduce its offerings of fried and high-calorie foods, and increase the amount of fruits, vegetables, and whole grains that are served. Every month after this, the weight of the school’s students decreases. Figure 4-5 shows a made-up example with the median weight (number so that half are less and half are greater) of the students over time. There’s a sudden drop after the menu change, and this drop is sustained for months after. Does that mean the healthy new offerings caused a decrease in weight?
This type of figure, where there’s a clear change in a variable’s value after some event, is often used to make such a point, but it simply cannot support that reasoning. Common examples of this are when proponents of a particular law point to a drop in mortality rates after its introduction, or an individual thinks a medication led to a side effect because it started a few days after they begin taking the drug.
In the cafeteria case, we have no idea whether the students are the same (perhaps students who liked healthier foods transferred in and students who hated the new menu transferred out), whether students or parents asked for the new menu because they were already trying to lose weight, or whether there was another change at the same time that was responsible for the effect (perhaps sporting activities and recess time increased simultaneously). It’s rarely, if ever, just one thing changing while the rest of the world stays completely the same, so presenting a time series with just two variables mistakenly gives the impression of isolating the effect of the new factor. It is still only a correlation, albeit a temporal one.
Interventions in the real world are much more complicated and less conclusive than laboratory experiments. Say there’s a suspected cluster of cancer cases in an area with an industrial plant. Eventually the plant is closed, and steps are taken to reverse the contamination of water and soil. If the cancer rate decreases after the plant closure, can we conclude that it was responsible for the disease? We actually have no idea whether the decrease was a coincidence (or if the initial increase itself was simply a coincidence), whether something else changed at the same time and was truly responsible, and so on. Additionally, the numbers are often quite small, so any variations are not statistically significant.
This is a commonly known logical fallacy referred to as “post hoc ergo propter hoc,” which means “after therefore because.” That is, one erroneously concludes that one event is caused by another simply because it follows the first one. For example, one may examine how some rate changed after a particular historical event—did the rate of car accident deaths decrease after the introduction of seat belt laws? However, many changes are happening at the same time, and the system itself may even change as a result of the intervention. We’ll talk about that challenge in depth in Chapter 7, but perhaps healthier cafeteria food only indirectly leads to weight loss by prompting people to exercise more. Similarly, temporal patterns such as a sports team winning every time it rains before a game may lead to one thinking there’s a causal relationship, even though these events are most plausibly explained as coincidental. This problem often arises if we focus on a short timespan, ignoring long-term variations. Two extremely snowy winters in a row, taken in isolation, may lead to erroneous conclusions about winter weather patterns. By instead looking at decades of data, we can understand yearly fluctuations in the context of the overall trend. Finally, two events may co-occur only because other factors make them likely to occur at the same time. For instance, if children are introduced to new foods around the same age that symptoms of a particular illness become apparent, many may report a seeming link between the two because they always happen at around the same time.
A related fallacy is “cum hoc ergo propter hoc” (with therefore because), which is finding a causal link between events that merely occur together. The difference with post hoc is that there’s a temporal ordering of events, which is why that error is especially common.
As always, there could be a common cause of the first event and effect (e.g., do depression medications make people suicidal, or are depressed people more likely to be suicidal and take medication?), but the effect also may have happened anyway, and is merely preceded by the cause. For example, say I have a headache and take some medication. A few hours later, my headache is gone. Can I say that it was due to the medication? The timing makes it seem the like headache relief was a result of the medication, but I cannot say for sure whether it would have happened anyway in the absence of medication. I would need many trials of randomly choosing to take or not take medication and recording how quickly a headache abated to say anything at all about this relationship. In Chapter 7 we’ll see why this is still a weak experiment, and we should be comparing the medication to a placebo.
Just as events being nearby in time can lead to erroneous conclusions of causality, lengthy delays between cause and effect may lead to failure to infer a causal link. While some effects happen quickly—striking a billiard ball makes it move—others are brought about by slow-acting processes. Smoking is known to cause lung cancer, but there’s a long delay between when someone starts smoking and when they get cancer. Some medications lead to side effects decades after they’re taken. Changes in fitness due to exercise build slowly over time, and if we are looking at weight, this may seem to initially increase if muscle builds before fat is lost. If we expect an effect to closely follow its cause, then we may fail to draw connections between these genuinely related factors. While it is logistically difficult for scientists to collect data over a period of decades to learn about factors affecting health, this is also part of the difficulty for individuals correlating factors such as diet and physical activity with their health.