epicmodel

library(epicmodel)
#> Loading required package: magrittr

In this vignette, we describe the general workflow for sufficient-component cause (SCC) model creation with epicmodel. But first of all, if you haven’t done so, read Rothman’s paper (Rothman 1976)!

Task 1: Create steplist

The steplist is the input from which the SCC model is created. The steplist contains many steps that together form the mechanisms of outcome occurrence. These steps are basically IF/THEN-statements and can be chained together to form mechanisms, e.g., IF cell A produces cytokine X THEN cell B produces cytokine Y => IF cell B produces cytokine Y THEN cell C produces cytokine Z. To learn why SCC models are based on these steps and how their exact structure looks like, see vignette("steplist").

For the purpose of this introduction, it is enough to know that steps need a very specific structure in order to enable automatic SCC model creation. epicmodel therefore provides the Steplist Creator Shiny App, which helps creating a steplist that has the necessary structure. You can launch the shiny app with:

launch_steplist_creator()

Throughout this introduction, we will use the built in steplist_rain.

steplist <- steplist_rain

steplist_rain is a dummy example and describes weather-induced ways of getting wet. Let’s first inspect the steplist using print().

print(steplist)
#> ✖ unchecked (please run `check_steplist()` before continuing)
#> WHAT:  7  WHAT segments
#> DOES:  6  DOES segments
#> WHERE:  3  WHERE segments
#> MODULE:  3  modules
#> STEP:  10  STEPs
#> ICC:  0  incompatible component-cause pairs
#> OUTCOME:  1  outcome definition

Print tells us what the steplist contains. Again, see vignette("steplist") to find out what all these different elements are.

You can see two important things. First, the steplist contains only 10 steps, which shows that it’s probably not a realistic example. Second, the first line tells you that the steplist is “unchecked” and asks you to run check_steplist(). check_steplist() ensures that the steplist’s structure is fine for SCC model creation.

steplist_checked <- check_steplist(steplist)
#> 
#> ── Checking epicmodel_steplist steplist ────────────────────────────────────────
#> ✔ Checking WHAT IDs was successful.
#> ✔ Checking DOES IDs was successful.
#> ✔ Checking WHERE IDs was successful.
#> ✔ Checking Module IDs was successful.
#> ✔ Checking ICC IDs was successful.
#> ✔ Checking WHAT keywords was successful.
#> ✔ Checking DOES keywords was successful.
#> ✔ Checking WHERE keywords was successful.
#> ✔ Checking Module keywords was successful.
#> ✔ Checking Modules was successful.
#> ✔ Checking ICC entries was successful.
#> ✔ Checking WHAT segments was successful.
#> ✔ Checking DOES segments was successful.
#> ✔ Checking WHERE segments was successful.
#> ✔ Checking references was successful.
#> ✔ Checking start/end steps was successful.
#> ✔ Checking THEN statements was successful.
#> ✔ Checking THEN/IF/IFNOT equality was successful.
#> ✔ Checking outcome definitions was successful.
#> ── Summary ─────────────────────────────────────────────────────────────────────
#> ✔ Checking successful!

You can see from the output that check_steplist() conducts many checks. See the function documentation for a detailed description. In our case, all checks have been successful and therefore our steplist is now “checked”. When running print() again, you will see a confirmation.

print(steplist_checked)
#> ✔ checked successfully
#> WHAT:  7  WHAT segments
#> DOES:  6  DOES segments
#> WHERE:  3  WHERE segments
#> MODULE:  3  modules
#> STEP:  10  STEPs
#> ICC:  0  incompatible component-cause pairs
#> OUTCOME:  1  outcome definition

With a checked steplist, you can also use summary() and plot(). Let’s start with summary():

summary(steplist_checked)
#> 
#> ── Outcome Definitions ──
#> 
#> • you get wet
#> 
#> ── Component causes ──
#> 
#> • no vacation
#> • weekday
#> • rain
#> • get groceries
#> 
#> ── Interventions ──
#> 
#> • take vacation
#> • take umbrella
#> • work from home
#> 
#> ── End steps ──
#> 
#> • you get wet
#> 
#> ── Other steps ──
#> 
#> • walk to work
#> • go outside

The output contains the outcome definition as well as all the steps split into different types: Component causes, Interventions, End steps, and Other steps. vignette("steplist") will tell you more about these different types of steps. Now, let’s try plot().

plot(steplist_checked)

#>    Label   Module           Step
#> 1    CC1 activity    no vacation
#> 2    CC2     fate        weekday
#> 3    CC3  weather           rain
#> 4    CC4 activity  get groceries
#> 5     I1 activity  take vacation
#> 6     I2 activity  take umbrella
#> 7     I3 activity work from home
#> 8     S1 activity   walk to work
#> 9     S2 activity     go outside
#> 10    E1  weather    you get wet

plot() shows you how your steps are chained together. The graph is created using the DiagrammeR package. The legend shown below the graph links the node labels to your steps and tells you to which module a step belongs. Steps that belong together can be grouped into modules when creating the steplist. The node colors in the graph, e.g., depend on the step’s module. See vignette("modules") to learn more. Later, we will see similar mechanisms for every sufficient cause.

Task 2: Create SCC model from steplist

But first, we need to actually create the SCC model. This is done by function create_scc(). See the function documentation to learn more about the applied algorithm. It only has a single input: the checked steplist.

scc_model <- create_scc(steplist_checked)
#> 
#> ── Create SCC Model ──
#> 
#> ✔ 15/15 | Check if set of component causes is sufficient
#> ✔ 5/5 | Check if sufficiency dependends on IFNOT conditions
#> ✔ 5/5 | Check if sufficient cause is minimal
#> ℹ 2/5 sufficient causes are minimal

As our steplist is very small, SCC model creation is fast. However, with realistic steplists, create_scc() can take some time because it evaluates every possible combination of component causes. From the output we see that 15 sets of component causes have been checked. When running summary(steplist_checked) before, we saw that there are 4 component causes. Because every one of them can be present or absent, we have 2 ^ 4 = 16 combinations. Of course, there’s no need to evaluate the set where all component causes are absent, which leaves us with 15 combinations. We can further see that 5 of these 15 combinations were sufficient, i.e., fulfilled the outcome definition. We ignore the “Check if sufficiency depends on IFNOT conditions” for this introduction, but you can learn more from vignette("scc"). Finally, we see that 2 out of the 5 sufficient causes are minimal. For a set of component causes to form a sufficient cause, it needs to be minimally sufficient, i.e., if you would remove any of its component causes, the remaining ones would not be sufficient anymore. In our example, you will see shortly that component causes “rain” and “get groceries” are sufficient to “get wet”. If it would not be raining or if you would not go outside to get groceries, you would not get wet. Therefore, it is minimally sufficient. If you add another component cause, e.g., it’s a “weekday”, to the set, it would of course still lead to the outcome but it would not be minimally sufficient anymore. Now, let’s inspect our SCC model using print() or summary(), which are identical for SCC models.

print(scc_model)
#> 
#> ── Outcome Definitions ──
#> 
#> • you get wet
#> 
#> ── SC 1 ──
#> 
#> ✔ Always sufficient
#> Component causes:
#> • rain
#> • get groceries
#> 
#> Modules
#> • activity: 50% (2/4)
#> • weather: 50% (2/4)
#> 
#> ── SC 2 ──
#> 
#> ✔ Always sufficient
#> Component causes:
#> • no vacation
#> • weekday
#> • rain
#> 
#> Modules
#> • activity: 50% (3/6)
#> • weather: 33% (2/6)
#> • fate: 17% (1/6)
#> 

In the output, we are first reminded of our outcome definition before the sufficient causes (SC) are listed. As we saw before, there are two of them: SC 1 & SC 2. Both sufficient causes are reported to be “Always sufficient”. This refers to their sufficiency status. Due to the structure of steplists, it is possible that sets of component causes are sufficient, i.e., lead to the outcome, only under some specific circumstances. To learn more, see vignette("scc"). This is, however, not the case here. Based on our steplist, both sets of component causes, i.e. “rain” & “get groceries”, as well as “no vacation” & “weekday” & “rain” are always sufficient to cause outcome “you get wet”. Modules also make an appearance here. Again, see vignette("modules") to learn more.

Task 3: Use SCC model

Learning about sufficient causes is interesting, but we can do a lot more with our SCC model.

Causal pies

First, of course, we want to see some causal pies, for which we can simple use plot(). You’ll notice that we additionally specified unknown = FALSE. This argument controls if unknown causes are included in the plot. To learn more about unknown causes, see vignette("scc").

plot(scc_model, unknown = FALSE)

Standardized effect size

In epidemiology, we usually calculate risk ratios or odds ratios to estimate the strength of the effect of the exposure on the outcome. We learn from SCC models, however, that these effect sizes are no natural constants but that their value depends on the population under study. epicmodel offers a function to derive a standardized effect estimate from the SCC model. Let’s look at the output to explore how it works.

effect_size(scc_model)
#> ✔ 4/4 | Check impact of every component cause
#>                        Component Cause                  Impact
#> 1                                 rain necessary [5/8 vs. 0/8]
#> 2                        get groceries      4.00 [4/8 vs. 1/8]
#> 3 IFNOT take vacation THEN no vacation      1.50 [3/8 vs. 2/8]
#> 4                              weekday      1.50 [3/8 vs. 2/8]

From the output, we see that epicmodel calculates a value for each component cause. To do this, it lists all possible sets of component causes (there are 16 of them as we already calculated), splits them into the half where the component cause of interest is present and the half where the component cause of interest is absent, and records the sets that would cause the outcome for each half. Let’s look, e.g., at the square brackets [] for “get groceries”: The first part records sets with “get groceries” present and 4 of the 8 lead to the outcome. The second part records sets without “get groceries” and only 1 of the 8 leads to the outcome. The number in front is simply the ratio of these two fractions. You can see that the standardized effect size for “get groceries” with 4.00 is higher than for “no vacation” or “weekday” with 1.50, which makes sense because “get groceries” only needs “rain” to be sufficient, while “no vacation” and “weekday” need “rain” as well as each other. “rain” is marked as a necessary cause because none of the 8 sets of component causes that do not include “rain” lead to outcome occurrence.

Mechanisms

When printing our SCC model, we saw that “no vacation”, “weekday”, and “rain” together formed a sufficient cause. Maybe you wondered, how the three of them would cause “you get wet”? We can explore further by inspecting the mechanisms. We already saw the complete one when plotting our steplist. Using mechanism(), we can split them up by sufficient cause. Using plot(), produces one graph for every sufficient cause. using print() provides us with the legend we already know from earlier. Graphs can be downloaded with export_mechanism().

mech <- mechanism(scc_model)
print(mech)
plot(mech)
#>  Label Module   Step          
#>  CC1   activity no vacation   
#>  CC2   fate     weekday       
#>  CC3   weather  rain          
#>  CC4   activity get groceries 
#>  I1    activity take vacation 
#>  I2    activity take umbrella 
#>  I3    activity work from home
#>  S1    activity walk to work  
#>  S2    activity go outside    
#>  E1    weather  you get wet

Interventions

Let’s take another look at our steplist summary:

summary(steplist_checked)
#> 
#> ── Outcome Definitions ──
#> 
#> • you get wet
#> 
#> ── Component causes ──
#> 
#> • no vacation
#> • weekday
#> • rain
#> • get groceries
#> 
#> ── Interventions ──
#> 
#> • take vacation
#> • take umbrella
#> • work from home
#> 
#> ── End steps ──
#> 
#> • you get wet
#> 
#> ── Other steps ──
#> 
#> • walk to work
#> • go outside

You can see that three steps are listed as interventions, “take vacation”, “take umbrella”, and “work from home”. For now, we don’t need to now, how interventions are defined exactly (again, see vignette("steplist") for the details), but from their names we can see that they are actions which might prevent the outcome. epicmodel can investigate their impact based on your SCC model. In general, there are two directions such an investigation can take:

  • Which sufficient causes can be prevented by a given intervention?
  • Which intervention prevents the outcome for an individual with a given set of component causes?

Let’s investigate first, which sufficient causes can be prevented by intervention “work from home”. We specify "all" for argument causes and "THENd4e1", i.e., the step ID for intervention “work from home”, for argument intervention.

intervene(scc_model, causes = "all", intervention = "THENd4e1")
#> 
#> ── Intervention ────────────────────────────────────────────────────────────────
#> 
#> ── Cause Set 1 ──
#> 
#> • rain
#> • get groceries
#> Status without intervention
#> ✔ Always sufficient
#> Status with intervention
#> ✖ No considered intervention is able to prevent the outcome
#> 
#> ── Cause Set 2 ──
#> 
#> • no vacation
#> • weekday
#> • rain
#> Status without intervention
#> ✔ Always sufficient
#> Status with intervention
#> ✔ Complete prevention by the following minimal intervention sets
#> 
#> ── Intervention Set 1
#> • work from home

In the output, we get a comparison of the status without and with intervention for each sufficient cause. For Cause Set 1, it is reported that “No considered intervention is able to prevent the outcome”, but for Cause Set 2, there is complete prevention by intervention set “work from home”.

Now, let’s imagine you are in the following situation: it’s raining, it’s a weekday, you have vacation, but want to get some groceries, i.e., c("THENa1","THENa5","THENd2a3") in step IDs. Since you are open to all suggestions, let’s specify "all" for argument intervention.

intervene(scc_model, causes = c("THENa1","THENa5","THENd2a3"), intervention = "all")
#> 
#> ── Intervention ────────────────────────────────────────────────────────────────
#> 
#> ── Cause Set 1 ──
#> 
#> • weekday
#> • rain
#> • get groceries
#> Status without intervention
#> ✔ Always sufficient
#> Status with intervention
#> ✔ Complete prevention by the following minimal intervention sets
#> 
#> ── Intervention Set 1
#> • take umbrella

Surprise, you can completely prevent getting wet by taking an umbrella! Please note that epicmodel reports only minimal intervention sets. You could, e.g., additionally work from home (even though it doesn’t really make sense in this example) and still prevent getting wet. However, you only need to take the umbrella. For a better example, consider the following situation: it’s raining, it’s a weekday, you don’t have vacation, and you don’t have an umbrella. We can look at this scenario with:

intervene(scc_model, causes = c("IFNOTd6a6THENd5a6","THENa5","THENa1"), intervention = c("THENd6a6","THENd4e1"))
#> 
#> ── Intervention ────────────────────────────────────────────────────────────────
#> 
#> ── Cause Set 1 ──
#> 
#> • no vacation
#> • weekday
#> • rain
#> Status without intervention
#> ✔ Always sufficient
#> Status with intervention
#> ✔ Complete prevention by the following minimal intervention sets
#> 
#> ── Intervention Set 1
#> • work from home
#> 
#> ── Intervention Set 2
#> • take vacation

The output shows that, to prevent the outcome, you can either work from home or take vacation. Both together would work as well, of course, but again, this intervention set is not minimal.

DAGs

Finally, epicmodel can transform SCC models to directed acyclic graphs (DAGs) following VanderWeele & Robins (2007).

dag <- scc_to_dag(scc_model)
dag$dag
#> dag {
#> CC1 [pos="1.000,0.250"]
#> CC2 [pos="1.000,0.750"]
#> CC3 [pos="1.000,1.250"]
#> CC4 [pos="1.000,1.750"]
#> O [outcome,pos="4.000,1.000"]
#> SC1 [pos="3.000,0.000"]
#> SC2 [pos="3.000,1.000"]
#> USC [pos="3.000,2.000"]
#> U_SC1 [pos="2.000,0.000"]
#> U_SC2 [pos="2.000,1.000"]
#> U_USC [pos="2.000,2.000"]
#> CC1 -> SC1
#> CC1 -> SC2
#> CC2 -> SC1
#> CC3 -> SC2
#> CC4 -> SC2
#> SC1 -> O
#> SC2 -> O
#> USC -> O
#> U_SC1 -> SC1
#> U_SC2 -> SC2
#> U_USC -> USC
#> }

Element dag of the output contains a dagitty object thanks to the dagitty package. Printing it prints its dagitty model code, which, e.g., can be pasted to DAGitty online on https://www.dagitty.net for further processing. You can also plot dagitty objects with epicmodel’s plot_dag() function, which tries to mimic DAGitty’s online layout in a ggplot object.

plot_dag(dag$dag)

References

  • Rothman KJ (1976): Causes. American Journal of Epidemiology 104(6):587-592.
  • VanderWeele TJ, Robins JM (2007): Directed acyclic graphs, sufficient causes, and the properties of conditioning on a common effect. American Journal of Epidemiology 166 (9): 1096–1104.