Once a week.
Subscribe to our weekly newsletter.
Understanding Data - Context
Data is an abstraction of real life, and real life can be complicated, but if you gather enough context, you can at least put forth a solid effort to make sense of it.
Look up at the night sky, and the stars look like dots on a flat surface. The lack of visual depth makes the translation from sky to paper fairly straightforward, which makes it easier to imagine constellations. Just connect the dots. However, although you perceive stars to be the same distance away from you, they are actually varying light years away.
If you could fly out beyond the stars, what would the constellations look like? This is what Santiago Ortiz wondered as he visualized stars from a different perspective, as shown in Figure 1-25.
The initial view places the stars in a global layout, the way you see them. You look at Earth beyond the stars, but as if they were an equal distance away from the planet.
Zoom in, and you can see constellations how you would from the ground, bundled in a sleeping bag in the mountains, staring up at a clear sky.
The perceived view is fun to see, but flip the switch to show actual distance, and it gets interesting. Stars transition, and the easy-to-distinguish constellations are practically unrecognizable. The data looks different from this new angle.
This is what context can do. It can completely change your perspective on a dataset, and it can help you decide what the numbers represent and how to interpret them. After you do know what the data is about, your understanding helps you find the fascinating bits, which leads to worthwhile visualization.
Without context, data is useless, and any visualization you create with it will also be useless. Using data without knowing anything about it, other than the values themselves, is like hearing an abridged quote secondhand and then citing it as a main discussion point in an essay. It might be okay, but you risk finding out later that the speaker meant the opposite of what you thought.
You have to know the who, what, when, where, why, and how -- the metadata, or the data about the data -- before you can know what the numbers are actually about.
Who: A quote in a major newspaper carries more weight than one from a celebrity gossip site that has a reputation for stretching the truth. Similarly, data from a reputable source typically implies better accuracy than a random online poll.
For example, Gallup, which has measured public opinion since the 1930s, is more reliable than say, someone (for example, me) experimenting with a small, one-off Twitter sample late at night during a short period of time. Whereas the former works to create samples representative of a region, there are unknowns with the latter.
Speaking of which, in addition to who collected the data, who the data is about is also important. Going back to the gumballs, it's often not financially feasible to collect data about everyone or everything in a population. Most people don't have time to count and categorize a thousand gumballs, much less a million, so they sample. The key is to sample evenly across the population so that it is representative of the whole. Did the data collectors do that?
How: People often skip methodology because it tends to be complex and for a technical audience, but it's worth getting to know the gist of how the data of interest was collected.
If you're the one who collected the data, then you're good to go, but when you grab a dataset online, provided by someone you've never met, how will you know if it's any good? Do you trust it right away, or do you investigate? You don't have to know the exact statistical model behind every dataset, but look out for small samples, high margins of error, and unfit assumptions about the subjects, such as indices or rankings that incorporate spotty or unrelated information.
Sometimes people generate indices to measure the quality of life in countries, and a metric like literacy is used as a factor. However, a country might not have up-to-date information on literacy, so the data gatherer simply uses an estimate from a decade earlier. That's going to cause problems because then the index works only under the assumption that the literacy rate one decade earlier is comparable to the present, which might not be (and probably isn't) the case.
What: Ultimately, you want to know what your data is about, but before you can do that, you should know what surrounds the numbers. Talk to subject experts, read papers, and study accompanying documentation.
In introduction statistics courses, you typically learn about analysis methods, such as hypothesis testing, regression, and modeling, in a vacuum, because the goal is to learn the math and concepts. But when you get to real-world data, the goal shifts to information gathering. You shift from, "What is in the numbers?" to "What does the data represent in the world; does it make sense; and how does this relate to other data?"
A major mistake is to treat every dataset the same and use the same canned methods and tools. Don't do that.
When: Most data is linked to time in some way in that it might be a time series, or it's a snapshot from a specific period. In both cases, you have to know when the data was collected. An estimate made decades ago does not equate to one in the present. This seems obvious, but it's a common mistake to take old data and pass it off as new because it's what's available. Things change, people change, and places change, and so naturally, data changes.
Where: Things can change across cities, states, and countries just as they do over time. For example, it's best to avoid global generalizations when the data comes from only a few countries. The same logic applies to digital locations. Data from websites, such as Twitter or Facebook, encapsulates the behavior of its users and doesn't necessarily translate to the physical world.
Although the gap between digital and physical continues to shrink, the space between is still evident. For example, an animated map that represented the "history of the world" based on geotagged Wikipedia, showed popping dots for each entry, in a geographic space. The end of the video is shown in Figure 1-26.
The result is impressive, and there is a correlation to the real-life timeline for sure, but it's clear that because Wikipedia content is more prominent in English-speaking countries the map shows more in those areas than anywhere else.
Why: Finally, you must know the reason data was collected, mostly as a sanity check for bias. Sometimes data is collected, or even fabricated, to serve an agenda, and you should be wary of these cases. Government and elections might be the first thing that come to mind, but so-called information graphics around the web, filled with keywords and published by sites trying to grab Google juice, have also grown up to be a common culprit. (I fell for these a couple of times in my early days of blogging for FlowingData, but I learned my lesson.)
Learn all you can about your data before anything else, and your analysis and visualization will be better for it. You can then pass what you know on to readers.
However, just because you have data doesn't mean you should make a graphic and share it with the world. Context can help you add a dimension -- a layer of information -- to your data graphics, but sometimes it means it's better to hold back because it's the right thing to do.
In 2010, Gawker Media, which runs large blogs like Lifehacker and Gizmodo, was hacked, and 1.3 million usernames and passwords were leaked. They were downloadable via BitTorrent. The passwords were encrypted, but the hackers cracked about 188,000 of them, which exposed more than 91,000 unique passwords. What would you do with that kind of data?
The mean thing to do would be to highlight usernames with common (read that poor) passwords, or you could go so far as to create an application that guessed passwords, given a username.
A different route might be to highlight just the common passwords, as shown in Figure 1-27. This offers some insight into the data without making it too easy to log in with someone else's account. It might also serve as a warning to others to change their passwords to something less obvious. You know, something with at least two symbols, a digit, and a mix of lowercase and uppercase letters. Password rules are ridiculous these days. But I digress.
With data like the Gawker set, a deep analysis might be interesting, but it could also do more harm than good. In this case, data privacy is more important, so it's better to limit what you show and look at.
Whether you should use data is not always clear-cut though. Sometimes, the split between what's right and wrong can be gray, so it's up to you to make the call. For example, on October 22, 2010, Wikileaks, an online organization that releases private documents and media from anonymous sources, released 391,832 United States Army field reports, now known as the Iraq War Logs. The reports recorded 66,081 civilian deaths out of 109,000 recorded deaths, between 2004 and 2009.
The leak exposed incidents of abuse and erroneous reporting, such as civilian deaths classified as "enemy killed in action." On the other hand, it can seem unjustified to publish findings about classified data obtained through less than savory means.
Maybe there should be a golden rule for data: Treat others' data the way you would want your data treated.
In the end, it comes back to what data represents. Data is an abstraction of real life, and real life can be complicated, but if you gather enough context, you can at least put forth a solid effort to make sense of it.
Excerpted with permission from the publisher, Wiley, from Data Points: Visualization That Means Something by Nathan Yau. Copyright © 2013
Nathan Yau, author of Data Points: Visualization That Means Something, has a PhD in statistics and is a statistical consultant who helps clients make use of their data through visualization. He created the popular site FlowingData.com, and is the author of Visualize This: The FlowingData Guide to Design, Visualization, and Statistics, also published by Wiley.
While legalization has benefits, a new study suggests it may have one big drawback.
- A new study finds that rates of marijuana use and addiction have gone up in states that have recently legalized the drug.
- The problem was most severe for those over age of 26, with cases of addiction rising by a third.
- The findings complicate the debate around legalization.
Cannabis Use Disorder, is that when you get so high you can’t figure out how to smoke anymore?
Cannabis use disorder, also known as CUD or cannabis/marijuana addiction, is a psychological disorder described in DSM 5 as "the continued use of cannabis despite clinically significant impairment." This includes people being unable to cut down on their usage despite wanting to, those who often use it despite finding it severely impairs their ability to function, or those who are putting themselves in danger to secure access to the drug.
While an understanding that marijuana can be addictive has existed for some time, and the image of the pothead who smokes so much they can hardly function is prevalent in our society, the effects of legalization on addiction rates have somehow gone understudied until now. Importantly, previous studies had failed to consider usage rates amongst populations over the age of 25.
In the new study, published in JAMA Psychiatry, focused on self-reported data on monthly drug use in four states where marijuana is now legal, Colorado, Washington, Alaska, and Oregon, from both before and after the drug was legalized in each state and compared it to others which have not yet legalized.
The data gave insights into the drug use habits of the respondents and specifically gave information about if they had smoked at all in the last month, the frequency of their drug use, and if they had ever had issues with how much they were using drugs.The researchers ultimately considered the responses of 505,796 individuals.
The increase in cannabis usage they found was considerable. The number of respondents over the age of 26 who claimed to have used the drug in the last month went up by 23% compared with their counterparts in states that have yet to legalize. Abuse of the drug by this group rose by 37%.
Teen usage rose by 25%, and addiction rates rose as well. This increase was small, though, and the authors have suggested it may be due to an unknown factor. The rate of usage or abuse for respondents between the ages of 18 and 25 did not increase at all.
After breaking the results down by demographics, the primary finding held; adults over the age of 26 are using marijuana more often when it is legalized, and they are starting to use it too much.
The grain of salt
As in any study where findings are self-reported, the exact numbers you see here should be taken with a grain of salt. They could be slightly higher or lower. As this study relies on people self-reporting their usage of a drug that is still illegal in many places, it is very possible that the apparent spike in addiction rates is caused by more accurate reporting, as people who live in an area where pot is still illegal may be less likely to report smoking it every day.
And it should be repeated a thousand times over that correlation and causation are not the same thing. There could be some unknown factor causing these increases in each case.
Despite these qualifications, the study is still useful in giving us a general sense of what may happen in states that have yet to legalize.
What does this mean for society and drug users?
While claims of "reefer madness" are greatly exaggerated, marijuana has several well established and thoroughly studied side effects. While occasional use isn't terribly harmful, addiction can be. Lead author Magdalena Cerdá of New York University explains in the study that heavy marijuana use is associated with "psychological and physical health concerns, lower educational attainment, decline in social class, unemployment, and motor vehicle crashes."
A substantial increase in the number of people who are addicted to the stuff will incur costs to society down the line.
Of course, a 37% increase in problematic usage means that the percentage of adults smoking too much went from .9% to 1.23% of the population responding to the survey. This makes it far less prevalent than issues with alcohol, which affected around 6% of all Americans in 2018.
Recently, Big Think's Philip Perry wrote a piece about how legalization could improve the health of millions by allowing the government to regulate the purity of commercially sold marijuana. This remains true. However, it must be weighed against the findings of this study, which suggests that at least some of these health gains will be wiped out by increased addiction rates.
What does this mean for legalization efforts?
The legalization steamroller will undoubtedly keep rolling along. While health concerns are one factor in the debate over marijuana, it is only one of many. In Illinois, where I live, weed will become legal on January 1st of 2020. The legalization campaign and legislation were more concerned with issues of social justice, the failures of prohibition, and finding a new source of tax revenue (since we're half broke) than with matters of potential addiction.
As Vox reports, the authors of the study aren't suggesting that legalization shouldn't take place; that is another, broader debate. They merely wish to present the fact that legalization has a particular side effect that we should be aware of.
While this study is unlikely to change anybody's stance on if weed should be legalized or not, it does show us a critical element to be considered when discussing drug policy. No drug is perfectly safe, and we have reason to believe that legalizing marijuana will mean that more people will have a hard time with it. Let's hope that legalization proponents keep that in mind as they rack up their victories.
When Olympic athletes perform dazzling feats of athletic prowess, they are using the same principles of physics that gave birth to stars and planets.
- Much of the beauty of gymnastics comes from the physics principle called the conservation of angular momentum.
- Conservation of angular momentum tells us that when a spinning object changes how its matter is distributed, it changes its rate of spin.
- Conservation of angular momentum links the formation of planets in star-forming clouds to the beauty of a gymnast's spinning dismount from the uneven bars.
It is that time again when we watch in awe as Olympic athletes perform dazzling feats of athletic prowess. But as we stare in rapt attention at the speed, grace, and strength they exhibit, it is also a good time to pay attention to how they embody, literally, fundamental principles that shape the entire universe. Yes, I'm talking about physics. On our screens, these athletes are giving us lessons in the principles that giants like Isaac Newton struggled mightily to articulate.
Naturally, there are many Olympic events from which we could learn some basic principles of physics. Swimming shows us hydrodynamic drag. Boxing teaches us about force and impulse. (Ouch!) But today, we will focus on gymnastics and the cosmic importance of the conservation of angular momentum.
The conservation of angular momentum
Much of the beauty of gymnastics comes from the spins and flips athletes perform as they launch themselves into the air from the vault or uneven bars. These are all examples of rotations — and so much of the structure and history of the universe, from planets to galaxies, comes down to the physics of rotating objects. And so much of the physics of rotating objects comes down to the conservation of angular momentum.
Let's start with the conservation of regular or "linear" momentum. Momentum is the product of mass and velocity. Way back in the age of Galileo and Newton, physicists came to understand that in the interactions between bodies, the sum of their momentums had to be conserved (which really means "does not change"). This is a familiar idea to anyone who has played billiards: when a moving pool ball strikes a stationary one, the first ball stops while the second scoots away. The total momentum of the system (the mass times velocity of both balls taken together) is conserved, leaving the originally moving ball unmoving and the originally stationary ball carrying all the system's momentum.
Credit: Sergey Nivens and Victoria VIAR PRO via Adobe Stock
Rotating objects also obey a conservation law, but now it is not just the mass of an object that matters. The distribution of mass — that is, where the mass is located relative to the center of the rotation — is also a factor. Conservation of angular momentum tells us that if a spinning object is not subject to any forces, then any changes in how its matter is distributed must lead to a change in its rate of spin. Comparing the conservation of angular momentum to the conservation of linear momentum, the "distribution of mass" is analogous to mass, and the "rate of spin" is analogous to velocity.
There are many places in cosmic physics where this conservation of angular momentum is key. My favorite example is the formation of stars. Every star begins its life as a giant cloud of slowly spinning interstellar gas. The clouds are usually supported against their own gravitational weight by gas pressure, but sometimes a small nudge from, say, a passing supernova blast wave will force the cloud to begin gravitational collapse. As the cloud begins to shrink, the conservation of angular momentum forces the spin rate of material in the cloud to speed up. As material is falling inward, it also rotates around the cloud's center at ever higher rates. Eventually, some of that gas is going so fast that a balance between the gravity of the newly forming star and what is called centrifugal force is achieved. That stuff then stops moving inward and goes into orbit around the young star, forming a disk, some material of which eventually becomes planets. So, the conservation of angular momentum is, literally, why we have planets in the universe!
Gymnastics, a cosmic sport
How does this appear in gymnastics? When athletes hurl themselves into the air to perform a flip, the only force acting on them is gravity. But since gravity only affects their "center of mass," it cannot apply forces in a way that changes the athlete's spin. But the gymnasts can do that for themselves by using the conservation of angular momentum.
By changing how their mass is arranged, gymnasts can change how fast they spin. You can see this in the dismount phase of the uneven bar competitions. When a gymnast comes off the bars and performs a flip by tucking their legs inward, they can quickly increase their rotation rate in midair. The sudden dramatic increase in the speed of their flip is what makes us gasp in astonishment. It is both scary and a beautiful testament to the athletes' ability to intuitively control the physics of their bodies. And it is also the exact same physics that controls the birth of planets.
"As above so below," goes the old saying. You should keep that in mind as you watch the glory that is the Olympics. That is because it is not just athletes that have this intuitive understanding of physics. We all have it, and we use it every day, from walking down the stairs to swinging a hammer. So, it is no exaggeration to claim that the first place we came to understand the deepest principles of physics was not in contemplating the heavens but moving through the world in our own earthbound flesh.
How the British obsession with tea triggered wars, led to bizarre espionage, and changed the world — many times.
- Today, tea is the single most popular drink worldwide, with a global market that outstrips all the nearest rivals combined.
- The British Empire went to war over tea, ultimately losing its American colonies and twice beating the Chinese in the "Opium Wars."
- The British desire to secure homegrown tea resulted in their sending botanist Robert Fortune on a Hollywood-worthy mission to secure Chinese tea plants and steal horticultural secrets.
After water, tea is the most common drink in the world. It is more popular than coffee, soft drinks, and alcohol combined. 84 percent of Brits enjoy a daily "cuppa," but this is a mere bagatelle against the Turks, who drink on average three to four cups every day. The tea industry is worth $200 billion worldwide and is set to grow by half by 2025.
Tea is such a huge part of many cultures, that it even has origin myths. For instance, one involves the Buddha waking up after falling asleep during his meditation. Disgusted at his lack of self-discipline, he cut off his eyelids and threw them to the ground. These lids then grew into tea plants to help future meditators stay awake.
Tea really matters to a lot of people. And, it mattered so much to the British and their empire that it directed their entire foreign policy. It also inspired one of the most incredible and ridiculous tales of 19th century espionage.
A spot of tea
When the European powers of the 16th century first traded with, then militarily colonized, various East Asian nations, it was impossible not to come across tea. Since the 9th century, the Tang Dynasty of China had already popularized tea across the region. Tea was already firmly entrenched when the Portuguese became the first Europeans to sample it (in 1557), followed by the Dutch, who first shipped a batch back to mainland Europe.
Britain was relatively late to the tea party, not arriving until well into the 17th century. In fact, in Samuel Pepys' 1660 diaries, he makes reference to "a cup of tee (a China drink) of which I had never drunk before." It was only after King Charles II's Portuguese wife popularized it at court that tea became a fashionable societal drink.
After the Brits got going, there was no stopping them. Tea became a huge business. However, since tea was monopolized by the East India Company and the government imposed a whopping 120 percent tax on it, an army of smuggler gangs opened back channels to get tea to the poorer masses. Eventually, in 1784, Prime Minister William Pitt the Younger got wise to the popular cry for tea. To stamp out the black market, he slashed the tax on the leaf to just 12.5 percent. From then on, tea became the everyman's drink — marketed as medicinal, invigorating, and tasty.
A cup, a cup, my kingdom for a cup!
Tea became so important to the British that it even sparked wars across the empire.
Most famously, when the British imposed a three pennies per pound tax on all tea the East India Company exported to America, it led to the outraged destruction of an entire ship's tea cargo. The "Boston Tea Party" was the first major defiant act of the American colonies and led ultimately to ham-fisted and insensitive countermeasures from the London government. These, in turn, sparked the U.S. War of Independence.
Less well known is how Britain went to war with China over tea. Twice.
Credit: Ingo Doerrie via Unsplash
Back then, tea was only being grown and exported from China to British India and then around the empire. As such, it led to a massive trade imbalance, where the largely self-sufficient China only wanted British silver in return for their famous and delicious homegrown tea leaves. This sort of economic policy, known as mercantilism, made Britain really mad.
In retaliation, Britain grew opium and flooded China with the drug. When China (quite understandably) objected to this, Britain sent in the gunboats. The subsequent "Opium Wars" were only ever going to go one way, and when China sued for peace, they were lumped with $20 million worth of reparations — and had to cede Hong Kong to Britain (which only returned in 1997).
The tea spy: on her majesty's secret service
But even these wars did not resolve the trade deficit with China. The attempts to make tea in British India resulted in insipid rubbish, and the British needed the good stuff. So, they turned to a Scottish botanist named Robert Fortune, whose mission was simple: cross the border into China, integrate himself amongst Chinese tea farmers, and smuggle out both their expertise and preferably their tea plants.
Fortune accepted the mission, even though he could not speak a word of Chinese and had barely left his native Britain. (A forefather of 007 he was not.) But not one to let these details get in the way, he shaved his hair, plaited a pigtail that resembled those worn by the Chinese, and then set off on his adventure.
And what an adventure it was. He came under attack by bandits and brigands, his ship was bombarded by pirates, and he had to endure fever, tropical storms, and typhoons. In spite of all this, Fortune not only managed to learn Chinese and travel around the forbidden City of Suzhou and its surrounding tea-farming land, but he also integrated himself into secluded peasant communities. When the skeptical tea farmers challenged Fortune on why he was so tall, he fooled them by claiming that he was a very important state official — all of whom were tall, apparently.
An Indian speciali-tea
Amazingly, Fortune had good fortune and got away with it. Over the course of his three-year mission, he secreted out several shipments of new tea plants to Britain as well as the art of bonsai (previously, a closely held secret). Most of the smuggled tea leaves died from mold and moisture in transit, but Fortune persisted, and eventually the British began to cultivate their own tea plants using Chinese tea farming techniques in their colonial Indian soils.
It was not long until an Indian variant, almost indistinguishable from the stolen Chinese one, began to dominate the market, not least for Britain's huge and growing empire. Within 20 years of Fortune's remarkable mission, the East India Company had more than fifty contractors pumping out tea worldwide.
Today, things have reverted back. China now produces not only substantially more than India (in second place) but more than the top ten countries combined. In total, 40 percent of the world's tea comes from China. But it was British tea — and Robert Fortune's incredible and unlikely mission — which catalyzed the huge global market. Without this overly confident Scottish plant-lover, the world's love of tea might look very different.