Once a week.
Subscribe to our weekly newsletter.
Predictive policing: Data can be used to prevent crime, but is that data racially tinged?
Predictive policing introduces a scientific element to law enforcement decisions, such as whether to investigate or detain, how long to sentence, and whether to parole.
As predictive analytics advances decision making across the public and private sectors, nowhere could this prove more important – nor more risky – than in law enforcement. If the rule of law is the cornerstone of society, getting it right is literally foundational. But the art of policing by data, without perpetuating or even magnifying the human biases captured within the data, turns out to be a very tricky art indeed.
Predictive policing introduces a scientific element to law enforcement decisions, such as whether to investigate or detain, how long to sentence, and whether to parole. In making such decisions, judges and officers take into consideration the calculated probability a suspect or defendant will be convicted for a crime in the future. Calculating predictive probabilities from data is the job of predictive modeling (aka machine learning) software. It automatically establishes patterns by combing historical conviction records, and in turn these patterns – together a predictive model – serve to calculate the probability for an individual whose future is as-yet unknown. Such predictive models base their calculations on the defendant's demographic and behavioral factors. These factors may include prior convictions, income level, employment status, family background, neighborhood, education level, and the behavior of family and friends.
Ironically, the advent of predictive policing came about in part to address the very same social justice infringements for which it’s criticized. With stop and frisk and other procedures reported to be discriminatory and often ineffective, there emerged a movement to turn to data as a potentially objective, unbiased means to optimize police work. Averting prejudice was part of the impetus. But the devil’s in the detail. In the process of deploying predictive policing and analyzing its use, complications involving racial bias and due process revealed themselves.
The first-ever comprehensive overview, The Rise of Big Data Policing: Surveillance, Race, and the Future of Law Enforcement, strikes an adept balance in covering both the promise and the peril of predictive policing. No one knows how much of a high wire act it is to justly deploy this technology better than the book’s author, law professor Andrew Guthrie Ferguson. The book’s mission is to highlight the risks and set a cautionary tone – however, Ferguson avoids the common misstep of writing off predictive policing as an endeavor that will always intrinsically stand in opposition to racial justice. The book duly covers the technical capabilities, underlying technology, historical developments, and numerical evidence that support both its deployed value and its further potential (on a closely-related topic, I covered the analogous value of applying predictive analytics for homeland security).
The book then balances this out by turning to the pitfalls, inadvertent yet dire threats to civil liberties and racial justice. Here are some of the main topics the book covers in that arena.
As Ferguson puts it, “The question arises about how to disentangle legacy police practices that have resulted in disproportionate numbers of African American men being arrested or involved in the criminal justice system… if input data is infected with racial bias, how can the resulting algorithmic output be trusted?” It turns out that predictive models consulted for sentencing decisions falsely flag black defendants more often than white defendants. That is, among those who will not re-offend, the predictive system inaccurately labels black defendants as higher-risk more often than it does for white defendants. In what is the most widely cited piece on bias in predictive policing, ProPublica reports that the nationally used COMPAS model (Correctional Offender Management Profiling for Alternative Sanctions) falsely flags black defendants at almost twice the rate of white defendants (44.9% and 23.5%, respectively). However, this is only part of a mathematical conundrum that, to some, blurs the meaning of “fairness.” Despite the inequity in false flags, each individual flag is itself racially equitable: Among those flagged as higher risk, the portion falsely flagged is similar for both black and white defendants. Ferguson’s book doesn’t explore this hairy conundrum in detail, but you can learn more in an article I published about it.
Ground Truth: One Source of Data Bias
The data analyzed to develop crime-predicting models includes proportionately more prosecutions of black criminals than white ones and, conversely, proportionately fewer cases of black criminals getting away with crime (false negatives) than of white criminals. Starting with a quote from the ACLU’s Ezekiel Edwards, Ferguson spells out why this is so:
"Time and again, analysis of stops, frisks, searches, arrests, pretrial detentions, convictions, and sentencing reveal differential treatment of people of color.” If predictive policing results in more targeted police presence, the system runs the risk of creating its own self-fulfilling prediction. Predict a hot spot. Send police to arrest people at the hot spot. Input the data memorializing that the area is hot. Use that data for your next prediction. Repeat.
Since the prevalence of this is, by definition, not observed and not in the data, measures of model performance do not reveal the extent to which black defendants are unjustly flagged more often. After all, the model doesn’t predict crime per se; it predicts convictions – you don’t know what you don’t know. Although Ferguson doesn’t refer to this as a lack of ground truth, that is the widely used term for this issue, one that is frequently covered, e.g., by The Washington Post and by data scientists.
Constitutional Issues: Generalized Suspicion
A particularly thorny dispute about fairness – that’s actually an open constitutional question – arises when predictive flags bring about searches and seizures. The Fourth Amendment dictates that any search or seizure be “reasonable,” but this requirement is vulnerable to corruption when predictive flags lead to generalized suspicion, i.e., suspicion based on bias (such as the individual’s race) or factors that are not specific to the individual (such as the location in which the individual finds him- or herself). For example, Ferguson tells of a black driver in a location flagged for additional patrolling due to a higher calculated probability of crime. The flag has placed nearby a patrol, who pulls over the driver in part due to subjective “gut” suspicion, seeing also that there is a minor vehicle violation that may serve to explain the stop’s “reasonableness”: the vehicle’s windows are more heavily tinted than permitted by law. It’s this scenario's ambiguity that illustrates the dilemma. Do such predictive flags lead to false stops that are rationalized retroactively rather than meeting an established standard of reasonableness? “The shift to generalized suspicion also encourages stereotyping and guilt by association. This, in turn, weakens Fourth Amendment protections by distorting the individualized suspicion standard on the street,” Ferguson adds. This could also magnify the cycle perpetuating racial bias, further corrupting ground truth in the data.
Transparency: Opening Up Otherwise-Secret Models that Help Determine Incarceration
Crime-predicting models must be nakedly visible, not amorphous black boxes. To keep their creators, proponents, and users accountable, predictive models must be open and transparent so they’re inspectable for bias. A model’s inner workings matter when assessing its design, intent, and behavior. For example, race may hold some influence on a model’s output by way of proxies. Although such models almost never input race directly, they may incorporate unchosen, involuntary factors that approximate race, such as family background, neighborhood, education level, and the behavior of family and friends. For example, FICO credit scores have been criticized for incorporating factors such as the “number of bank accounts kept, [which] could interact with culture – and hence race – in unfair ways.”
Despite this, model transparency is not yet standard. For example, the popular COMPAS model, which informs sentencing and parole decisions, is sealed tight. The ways in which it incorporates such factors is unknown – to law enforcement, the defendant, and the public. In fact, the model’s creators recently revealed it only incorporates a selection of six of the 137 factors collected, but which six remains a proprietary secret. However, the founder of the company behind the model has stated that, if factors correlated with race, such as poverty and joblessness, “…are omitted from your risk assessment, accuracy goes down” (so we are left to infer the model may incorporate such factors).
In his book, Ferguson calls for accountability, but stops short of demanding transparency, largely giving the vendors of predictive models a pass, in part to protect “private companies whose business models depend on keeping proprietary technology secret.” I view this allowance as inherently contradictory, since a lack of transparency necessarily compromises accountability. Ferguson also argues that most lay-consumers of model output, such as patrolling police officers, would not be equipped to comprehend a model’s inner workings anyway. However, that presents no counterargument to the benefit of transparency for third party analytics experts who may serve to audit a predictive model. Previously, before his book, Ferguson had influenced my thinking in the opposite direction with a quote he gave me for my writing (a couple years before his book came out). He told me, “Predictive analytics is clearly the future of law enforcement. The problem is that the forecast for transparency and accountability is less than clear.”
I disagree with Ferguson’s position that model transparency may in some cases be optional (a position he also covers in an otherwise-valuable presentation accessible online). This opacity infringes on liberty. Keeping the inner workings of crime-predictive models proprietary is like having an expert witness without allowing the defense to cross-examine. It’s like enforcing a public policy the details of which are confidential. There’s a movement to make such algorithms transparent in the name of accountability and due process, in part forwarded by pertinent legislation in Wisconsin and in New York City, although the U.S. Supreme Court declined to take on a pertinent case last year.
Deployment: It’s How You Use It that Matters
In conclusion, Ferguson lands on the most pertinent point: It’s how you use it. “This book ends with a prediction: Big data technologies will improve the risk-identification capacities of police but will not offer clarity about appropriate remedies.” By “remedy,” this lawyer is referring to the way police respond, the actions taken. When it comes to fairness in predictive policing, it is less the underlying number crunching and more the manner in which it’s acted upon that makes the difference.
Should judges use big data tools for sentencing decisions? The designer of the popular COMPAS crime-predicting model did not originally intend it be used this way. However, he “gradually softened on whether this could be used in the courts or not.” But the Wisconsin Supreme Court set limits on the use of proprietary scores in future sentencing decisions. Risk scores “may not be considered as the determinative factor in deciding whether the offender can be supervised safely and effectively in the community.”
To address the question of how model predictions should be acted upon, I urge law enforcement to educate and guide decision makers on how big data tools inevitably encode racial inequity. Train judges, parole boards, and officers to understand the pertinent caveats when they’re given the calculated probability a suspect, defendant, or convict will offend or reoffend. In so doing, empower these decision makers to incorporate such considerations in whatever manner they deem fit – just as they already do with the predictive probabilities in the first place. See my recent article for more on the considerations upon which officers of the law should reflect.
Ferguson’s legal expertise serves well as he addresses the dilemma of translating predictions based on data into police remedies – and it serves well throughout the other varied topics of this multi-faceted, well-researched book. The Amazon description calls the book “a must read for anyone concerned with how technology will revolutionize law enforcement and its potential threat to the security, privacy, and constitutional rights of citizens.” I couldn’t have put it better myself.
Eric Siegel, Ph.D., founder of the Predictive Analytics World and Deep Learning World conference series – which include the annual PAW Government – and executive editor of The Predictive Analytics Times, makes the how and why of predictive analytics (aka machine learning) understandable and captivating. He is the author of the award-winning Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, a former Columbia University professor, and a renowned speaker, educator, and leader in the field.
Geologists discover a rhythm to major geologic events.
- It appears that Earth has a geologic "pulse," with clusters of major events occurring every 27.5 million years.
- Working with the most accurate dating methods available, the authors of the study constructed a new history of the last 260 million years.
- Exactly why these cycles occur remains unknown, but there are some interesting theories.
Our hearts beat at a resting rate of 60 to 100 beats per minute. Lots of other things pulse, too. The colors we see and the pitches we hear, for example, are due to the different wave frequencies ("pulses") of light and sound waves.
Now, a study in the journal Geoscience Frontiers finds that Earth itself has a pulse, with one "beat" every 27.5 million years. That's the rate at which major geological events have been occurring as far back as geologists can tell.
A planetary calendar has 10 dates in red
Credit: Jagoush / Adobe Stock
According to lead author and geologist Michael Rampino of New York University's Department of Biology, "Many geologists believe that geological events are random over time. But our study provides statistical evidence for a common cycle, suggesting that these geologic events are correlated and not random."
The new study is not the first time that there's been a suggestion of a planetary geologic cycle, but it's only with recent refinements in radioisotopic dating techniques that there's evidence supporting the theory. The authors of the study collected the latest, best dating for 89 known geologic events over the last 260 million years:
- 29 sea level fluctuations
- 12 marine extinctions
- 9 land-based extinctions
- 10 periods of low ocean oxygenation
- 13 gigantic flood basalt volcanic eruptions
- 8 changes in the rate of seafloor spread
- 8 times there were global pulsations in interplate magmatism
The dates provided the scientists a new timetable of Earth's geologic history.
Tick, tick, boom
Credit: New York University
Putting all the events together, the scientists performed a series of statistical analyses that revealed that events tend to cluster around 10 different dates, with peak activity occurring every 27.5 million years. Between the ten busy periods, the number of events dropped sharply, approaching zero.
Perhaps the most fascinating question that remains unanswered for now is exactly why this is happening. The authors of the study suggest two possibilities:
"The correlations and cyclicity seen in the geologic episodes may be entirely a function of global internal Earth dynamics affecting global tectonics and climate, but similar cycles in the Earth's orbit in the Solar System and in the Galaxy might be pacing these events. Whatever the origins of these cyclical episodes, their occurrences support the case for a largely periodic, coordinated, and intermittently catastrophic geologic record, which is quite different from the views held by most geologists."
Assuming the researchers' calculations are at least roughly correct — the authors note that different statistical formulas may result in further refinement of their conclusions — there's no need to worry that we're about to be thumped by another planetary heartbeat. The last occurred some seven million years ago, meaning the next won't happen for about another 20 million years.
Research shows that those who spend more time speaking tend to emerge as the leaders of groups, regardless of their intelligence.
If you want to become a leader, start yammering. It doesn't even necessarily matter what you say. New research shows that groups without a leader can find one if somebody starts talking a lot.
This phenomenon, described by the "babble hypothesis" of leadership, depends neither on group member intelligence nor personality. Leaders emerge based on the quantity of speaking, not quality.
Researcher Neil G. MacLaren, lead author of the study published in The Leadership Quarterly, believes his team's work may improve how groups are organized and how individuals within them are trained and evaluated.
"It turns out that early attempts to assess leadership quality were found to be highly confounded with a simple quantity: the amount of time that group members spoke during a discussion," shared MacLaren, who is a research fellow at Binghamton University.
While we tend to think of leaders as people who share important ideas, leadership may boil down to whoever "babbles" the most. Understanding the connection between how much people speak and how they become perceived as leaders is key to growing our knowledge of group dynamics.
The power of babble
The research involved 256 college students, divided into 33 groups of four to ten people each. They were asked to collaborate on either a military computer simulation game (BCT Commander) or a business-oriented game (CleanStart). The players had ten minutes to plan how they would carry out a task and 60 minutes to accomplish it as a group. One person in the group was randomly designated as the "operator," whose job was to control the user interface of the game.
To determine who became the leader of each group, the researchers asked the participants both before and after the game to nominate one to five people for this distinction. The scientists found that those who talked more were also more likely to be nominated. This remained true after controlling for a number of variables, such as previous knowledge of the game, various personality traits, or intelligence.
How leaders influence people to believe | Michael Dowling | Big Think www.youtube.com
In an interview with PsyPost, MacLaren shared that "the evidence does seem consistent that people who speak more are more likely to be viewed as leaders."
Another find was that gender bias seemed to have a strong effect on who was considered a leader. "In our data, men receive on average an extra vote just for being a man," explained MacLaren. "The effect is more extreme for the individual with the most votes."
The great theoretical physicist Steven Weinberg passed away on July 23. This is our tribute.
- The recent passing of the great theoretical physicist Steven Weinberg brought back memories of how his book got me into the study of cosmology.
- Going back in time, toward the cosmic infancy, is a spectacular effort that combines experimental and theoretical ingenuity. Modern cosmology is an experimental science.
- The cosmic story is, ultimately, our own. Our roots reach down to the earliest moments after creation.
When I was a junior in college, my electromagnetism professor had an awesome idea. Apart from the usual homework and exams, we were to give a seminar to the class on a topic of our choosing. The idea was to gauge which area of physics we would be interested in following professionally.
Professor Gilson Carneiro knew I was interested in cosmology and suggested a book by Nobel Prize Laureate Steven Weinberg: The First Three Minutes: A Modern View of the Origin of the Universe. I still have my original copy in Portuguese, from 1979, that emanates a musty tropical smell, sitting on my bookshelf side-by-side with the American version, a Bantam edition from 1979.
Inspired by Steven Weinberg
Books can change lives. They can illuminate the path ahead. In my case, there is no question that Weinberg's book blew my teenage mind. I decided, then and there, that I would become a cosmologist working on the physics of the early universe. The first three minutes of cosmic existence — what could be more exciting for a young physicist than trying to uncover the mystery of creation itself and the origin of the universe, matter, and stars? Weinberg quickly became my modern physics hero, the one I wanted to emulate professionally. Sadly, he passed away July 23rd, leaving a huge void for a generation of physicists.
What excited my young imagination was that science could actually make sense of the very early universe, meaning that theories could be validated and ideas could be tested against real data. Cosmology, as a science, only really took off after Einstein published his paper on the shape of the universe in 1917, two years after his groundbreaking paper on the theory of general relativity, the one explaining how we can interpret gravity as the curvature of spacetime. Matter doesn't "bend" time, but it affects how quickly it flows. (See last week's essay on what happens when you fall into a black hole).
The Big Bang Theory
For most of the 20th century, cosmology lived in the realm of theoretical speculation. One model proposed that the universe started from a small, hot, dense plasma billions of years ago and has been expanding ever since — the Big Bang model; another suggested that the cosmos stands still and that the changes astronomers see are mostly local — the steady state model.
Competing models are essential to science but so is data to help us discriminate among them. In the mid 1960s, a decisive discovery changed the game forever. Arno Penzias and Robert Wilson accidentally discovered the cosmic microwave background radiation (CMB), a fossil from the early universe predicted to exist by George Gamow, Ralph Alpher, and Robert Herman in their Big Bang model. (Alpher and Herman published a lovely account of the history here.) The CMB is a bath of microwave photons that permeates the whole of space, a remnant from the epoch when the first hydrogen atoms were forged, some 400,000 years after the bang.
The existence of the CMB was the smoking gun confirming the Big Bang model. From that moment on, a series of spectacular observatories and detectors, both on land and in space, have extracted huge amounts of information from the properties of the CMB, a bit like paleontologists that excavate the remains of dinosaurs and dig for more bones to get details of a past long gone.
How far back can we go?
Confirming the general outline of the Big Bang model changed our cosmic view. The universe, like you and me, has a history, a past waiting to be explored. How far back in time could we dig? Was there some ultimate wall we cannot pass?
Because matter gets hot as it gets squeezed, going back in time meant looking at matter and radiation at higher and higher temperatures. There is a simple relation that connects the age of the universe and its temperature, measured in terms of the temperature of photons (the particles of visible light and other forms of invisible radiation). The fun thing is that matter breaks down as the temperature increases. So, going back in time means looking at matter at more and more primitive states of organization. After the CMB formed 400,000 years after the bang, there were hydrogen atoms. Before, there weren't. The universe was filled with a primordial soup of particles: protons, neutrons, electrons, photons, and neutrinos, the ghostly particles that cross planets and people unscathed. Also, there were very light atomic nuclei, such as deuterium and tritium (both heavier cousins of hydrogen), helium, and lithium.
So, to study the universe after 400,000 years, we need to use atomic physics, at least until large clumps of matter aggregate due to gravity and start to collapse to form the first stars, a few millions of years after. What about earlier on? The cosmic history is broken down into chunks of time, each the realm of different kinds of physics. Before atoms form, all the way to about a second after the Big Bang, it's nuclear physics time. That's why Weinberg brilliantly titled his book The First Three Minutes. It is during the interval between one-hundredth of a second and three minutes that the light atomic nuclei (made of protons and neutrons) formed, a process called, with poetic flair, primordial nucleosynthesis. Protons collided with neutrons and, sometimes, stuck together due to the attractive strong nuclear force. Why did only a few light nuclei form then? Because the expansion of the universe made it hard for the particles to find each other.
What about the nuclei of heavier elements, like carbon, oxygen, calcium, gold? The answer is beautiful: all the elements of the periodic table after lithium were made and continue to be made in stars, the true cosmic alchemists. Hydrogen eventually becomes people if you wait long enough. At least in this universe.
In this article, we got all the way up to nucleosynthesis, the forging of the first atomic nuclei when the universe was a minute old. What about earlier on? How close to the beginning, to t = 0, can science get? Stay tuned, and we will continue next week.
To Steven Weinberg, with gratitude, for all that you taught us about the universe.
Long before Alexandria became the center of Egyptian trade, there was Thônis-Heracleion. But then it sank.