We can “read” genes with ease now, but still can’t say what most of them “mean.” To show why we need clearer “causology” and fitter metaphors, let's scrutinize cars and their parts like we do bodies and genes.
1. We can “read” genes with ease now, but still can’t say what most of them “mean.” Mastering precisely how they “cause” higher-level traits will require clearer “causology” and fitter metaphors.
2. Genes (more precisely, gene products) contribute to fiendishly complex processes that confound the standard stats grinder. To illustrate, imagine scrutinizing cars and their parts like we do bodies and genes in “genome-wide association studies” (GWAS). The details don’t matter here, beyond that a car-GWAS would analyze a car-level trait like fuel efficiency by variations in the properties of all the car’s parts.
3. Consider a car having standard and sporty models. The latter have larger gas-guzzling engines and available pimped-up painted brake calipers. And let’s say sporty buyers more often pick red brakes, then statistically speaking red brakes bring greater gas guzzling “risk.”
4. If I’m not mistaken (please correct me stats geeks), no stats-only data wizardry can distinguish such non-causal entanglements (p-values can’t discern “phantom patterns”).
5. Generally, part-level properties can have non-causal and non-random “links” to higher-level traits. And including non-causal factors distorts the statistics (misallocating the variation that seems “explained by,” “accounted for,” or “linked to”). Lacking causal insights, you always run the “red-brake” risk.
6. Regarding metaphors, gene products work more like words than car parts (genes aren’t static “blueprints”). They act via sentence-like structures with collective effects and multiple “meanings.” But we lack the rules (~cellular syntax, gene grammar) for how parts of biology compose life’s activity-sentences.
7. Genes also sort of work like music: Typically “played” in precise synchrony to orchestrate many molecular melodies (simultaneous biochemical sentences) enabling enormous ensemble effects.
8. And life typically has way more moving parts than cars, and more complex transient casual structures. It’s traits often have multiple hetero-causal etiologies (roadmaps exhibiting sufficient but not necessary logic). Current stats can’t disentangle hetero-causal effects (larger type-mixed samples often won’t help).
10. Thankfully, fitter thinking is afoot—for instance, geno-pheno mapping (Massimo Pigliucci), better “Laws of Biology” (Kevin Mitchell), Reductionist Bias Corrections (Krakauer), and Causal Structure Modeling (Judea Pearl).
11. Biology and social science need less primarily parts-focused thinking (you can't grasp chess by studying the properties of its pieces alone), and ways to handle different kinds of causes and roles—see Krakauer’s Figure 4, Aristotle’s four causes, Tinbergen's four questions, Marr’s three levels. Much in these fields is more process-or-algorithm shaped (often resisting Occam’s Razor).
12. Related iffy thinking exists far beyond genomics. As mostly practiced, stats presume a flat or “heap” causal structure that’s often ill-suited for process-oriented life, or car making, or even cooking (cooks need step-by-step recipes to turn parts into wholes).
13. Statistical analysis without causal insights often runs the red-brake risk. The habit of adding variables to “control for” factors can misallocate variation (itself often a nonsensical or low quality quantification).
14. Similar structureless-sausage data risks pervade black box approaches to Big Data and AI.
15. You know that correlation doesn’t imply causation, but AI doesn’t “know” that.
Illustration by Julia Suits, The New Yorker cartoonist & author of The Extraordinary Catalog of Peculiar Inventions
Hilarious examples that prove how correlation does not equal causality.
Big Think has been talking about the dangers in confusing correlation with causality for some time. You know, how the amount of one thing seems to correspond to something else, and can we therefore conclude that the first caused the second, or vice versa? Nope. Make that a double-nope.
Sometimes things just look like each other without actually being at all connected. Here's the Nic Cage/drowning data:
It's not just an academic matter, either, as the confusion between correlation and causality can lead to dangerously wrong conclusions as it has for the people who incorrectly believe vaccines cause autism.
So let's have some fun with this, and enjoy just how idiotic the correspondences between obviously unrelated data sets can be. All of these are from author Tyler Vigen's awesome collection, including the Nicolas Cage/drowning one above. He also has a book, Spurious Connections. These are some of our favorites from Vigen's web page.
Here's a graph that "proves" that the more the U.S. spends on space, the more our suicide rate by hanging and other forms of asphyxiation goes up.
There are lots of reasons to question the value of the Miss America pageant, but this is new. The winner's age seems to cause murders by heat.
In what has to be the best endorsement for arcade game-playing, look at how many computer doctorates it "produces."
And finally, for those of us who have always resented those Scripps spelling smartypantses, venomous spiders feel us. And then bite us.
Got enough proof now for any argument that's based on ridiculous correspondences? There are more over at Vigen's website, and they're all pretty hilarious.