Understanding "why" may be the key to unlocking an AI's imagination.
- Humans are really good at imagining things based on the mixing and matching of existing elements.
- One of the holy grails of computer science is the development of an AI that can extrapolate from data, and USC researchers have a model for how that could happen.
- Rather than focusing on the tiny details of individual samples, the model uses groups of related samples to encourage AI to figure out broader principles.
One explanation of human imagination — and of creativity — is that it's the process of creating something new by combining existing elements in a novel way. It could be a daydream built on "what ifs," such as familiar rhythms and motifs turned into a new song, or seemingly unrelated bits of knowledge brought together for the first time as the building blocks of a breakthrough insight.
Using our imaginations comes naturally to us. We do it all the time in ways big and small. For artificial intelligence, however, recombining elements of different things is the opposite of what comes "naturally" to it. Machines learn by breaking things down and cataloguing the existing attributes of objects in order to identify them. These traits are not treated as free-floating characteristics available for mixing and matching in new ways.
Now, a team of researchers from the University of Southern California has announced the development of something profoundly new: a model for an AI with imagination.
"Humans can separate their learned knowledge by attributes — for instance, shape, pose, position, color," the study's lead author Yunhao Ge tells USC Viterbi, "and then recombine them to imagine a new object. Our paper attempts to simulate this process using neural networks."
They're all machines that flyCredit: yganko / Adobe Stock/Big Think
Machine learning typically occurs through the close examination of images and the recording of up-close attributes, such as the colors of pixels. The goal is for an algorithm to correctly identify a new image of the same (or a similar) object. AI makes no attempt to understand what the object is or how it works. Machine learning is mostly pattern recognition.
Scientists have long dreamed, however, of an AI that can extrapolate from what it has learned by inferring from small details an object's broader themes, including how it operates. Identifying a picture of an airplane is good; identifying why and how it's a "flying machine" is much better.
The key to endowing an AI with imagination, the USC researchers said, is "disentanglement," the idea that attributes can be unbundled, or separated, from the objects that exhibit those traits.
Indeed, disentanglement is the concept behind the increasingly amazing "deep fakes" proliferating on the internet. For instance, in a deep fake video, a face's movement is disentangled from its identity. This allows deep fake creators to, as Ge says, "synthesize new images and videos that substitute the original person's identity with another person, but keep the original movement."
Credit: khuruzero / Adobe Stock/Big Think
"Controllable disentangled representation learning" is the term USC's researchers have bestowed on the way their algorithm learns. Instead of being fed individual objects to catalogue, the AI is given a sample group of related images with the goal of analyzing them until it ultimately discovers the broader themes that unify them. The individual attributes can then be disentangled from the basic characteristics that identify an object.
"For instance," says Ge, "take the Transformer movie as an example. It can take the shape of Megatron car, the color and pose of a yellow Bumblebee car, and the background of New York's Times Square. The result will be a Bumblebee-colored Megatron car driving in Times Square, even if this sample was not witnessed during the training session."
The team has developed a dataset of 1.56 million images for training their AI.
From mimicking to understanding
According to the study's senior author Laurent Itti, "Deep learning has already demonstrated unsurpassed performance and promise in many domains, but all too often this has happened through shallow mimicry, and without a deeper understanding of the separate attributes that make each object unique."
Noting that their model can be applied to many different types of data, the researchers foresee AI being able to overcome its current myopia. For instance, it may help scientists discover new combinations of existing compounds based on an analysis of their disentangled properties. Autonomous driving AI could be that much more useful and safe if it can imagine, and thus anticipate, hazardous scenarios.
Itti concludes, "This new disentanglement approach, for the first time, truly unleashes a new sense of imagination in AI systems, bringing them closer to humans' understanding of the world."
A brief passage from a recent UN report describes what could be the first-known case of an autonomous weapon, powered by artificial intelligence, killing in the battlefield.
- Autonomous weapons have been used in war for decades, but artificial intelligence is ushering in a new category of autonomous weapons.
- These weapons are not only capable of moving autonomously but also identifying and attacking targets on their own without oversight from a human.
- There's currently no clear international restrictions on the use of new autonomous weapons, but some nations are calling for preemptive bans.
Nothing transforms warfare more violently than new weapons technology. In prehistoric times, it was the club, the spear, the bow and arrow, the sword. The 16th century brought rifles. The World Wars of the 20th century introduced machine guns, planes, and atomic bombs.
Now we might be seeing the first stages of the next battlefield revolution: autonomous weapons powered by artificial intelligence.
In March, the United Nations Security Council published an extensive report on the Second Libyan War that describes what could be the first-known case of an AI-powered autonomous weapon killing people in the battlefield.
The incident took place in March 2020, when soldiers with the Government of National Accord (GNA) were battling troops supporting the Libyan National Army of Khalifa Haftar (called Haftar Affiliated Forces, or HAF, in the report). One passage describes how GNA troops may have used an autonomous drone to kill retreating HAF soldiers:
"Logistics convoys and retreating HAF were subsequently hunted down and remotely engaged by the unmanned combat aerial vehicles or the lethal autonomous weapons systems such as the STM Kargu-2... and other loitering munitions. The lethal autonomous weapons systems were programmed to attack targets without requiring data connectivity between the operator and the munition: in effect, a true 'fire, forget and find' capability."
Still, because the GNA forces were also firing surface-to-air missiles at the HAF troops, it's currently difficult to know how many, if any, troops were killed by autonomous drones. It's also unclear whether this incident represents anything new. After all, autonomous weapons have been used in war for decades.
Lethal autonomous weapons
Lethal autonomous weapon systems (LAWS) are weapon systems that can search for and fire upon targets on their own. It's a broad category whose definition is debatable. For example, you could argue that land mines and naval mines, used in battle for centuries, are LAWS, albeit relatively passive and "dumb." Since the 1970s, navies have used active protection systems that identify, track, and shoot down enemy projectiles fired toward ships, if the human controller chooses to pull the trigger.
Then there are drones, an umbrella term that commonly refers to unmanned weapons systems. Introduced in 1991 with unmanned (yet human-controlled) aerial vehicles, drones now represent a broad suite of weapons systems, including unmanned combat aerial vehicles (UCAVs), loitering munitions (commonly called "kamikaze drones"), and unmanned ground vehicles (UGVs), to name a few.
Some unmanned weapons are largely autonomous. The key question to understanding the potential significance of the March 2020 incident is: what exactly was the weapon's level of autonomy? In other words, who made the ultimate decision to kill: human or robot?
The Kargu-2 system
One of the weapons described in the UN report was the Kargu-2 system, which is a type of loitering munitions weapon. This type of unmanned aerial vehicle loiters above potential targets (usually anti-air weapons) and, when it detects radar signals from enemy systems, swoops down and explodes in a kamikaze-style attack.
Kargu-2 is produced by the Turkish defense contractor STM, which says the system can be operated both manually and autonomously using "real-time image processing capabilities and machine learning algorithms" to identify and attack targets on the battlefield.
STM | KARGU - Rotary Wing Attack Drone Loitering Munition System youtu.be
In other words, STM says its robot can detect targets and autonomously attack them without a human "pulling the trigger." If that's what happened in Libya in March 2020, it'd be the first-known attack of its kind. But the UN report isn't conclusive.
It states that HAF troops suffered "continual harassment from the unmanned combat aerial vehicles and lethal autonomous weapons systems," which were "programmed to attack targets without requiring data connectivity between the operator and the munition: in effect, a true 'fire, forget and find' capability."
What does that last bit mean? Basically, that a human operator might have programmed the drone to conduct the attack and then sent it a few miles away, where it didn't have connectivity to the operator. Without connectivity to the human operator, the robot would have had the final call on whether to attack.
Key line 2: The loitering munitions/LAWS (depending upon how you frame it) were enabled to attack without data conn… https://t.co/5u89cDDA60— Jack McDonald (@Jack McDonald)1622114029.0
To be sure, it's unclear if anyone died from such an autonomous attack in Libya. In any case, LAWS technology has evolved to the point where such attacks are possible. What's more, STM is developing swarms of drones that could work together to execute autonomous attacks.
Noah Smith, an economics writer, described what these attacks might look like on his Substack:
"Combined with A.I., tiny cheap little battery-powered drones could be a huge game-changer. Imagine releasing a networked swarm of autonomous quadcopters into an urban area held by enemy infantry, each armed with little rocket-propelled fragmentation grenades and equipped with computer vision technology that allowed it to recognize friend from foe."
But could drones accurately discern friend from foe? After all, computer-vision systems like facial recognition don't identify objects and people with perfect accuracy; one study found that very slightly tweaking an image can lead an AI to miscategorize it. Can LAWS be trusted to differentiate between a soldier with a rifle slung over his back and, say, a kid wearing a backpack?
Opposition to LAWS
Unsurprisingly, many humanitarian groups are concerned about introducing a new generation of autonomous weapons to the battlefield. One such group is the Campaign to Stop Killer Robots, whose 2018 survey of roughly 19,000 people across 26 countries found that 61 percent of respondents said they oppose the use of LAWS.
In 2018, the United Nations Convention on Certain Conventional Weapons issued a rather vague set of guidelines aiming to restrict the use of LAWS. One guideline states that "human responsibility must be retained when it comes to decisions on the use of weapons systems." Meanwhile, at least a couple dozen nations have called for preemptive bans on LAWS.
The U.S. and Russia oppose such bans, while China's position is a bit ambiguous. It's impossible to predict how the international community will regulate AI-powered autonomous weapons in the future, but among the world's superpowers, one assumption seems safe: If these weapons provide a clear tactical advantage, they will be used on the battlefield.
A new method could make holograms for virtual reality, 3D printing, and more. You can even run it can run on a smartphone.
Despite years of hype, virtual reality headsets have yet to topple TV or computer screens as the go-to devices for video viewing.
One reason: VR can make users feel sick. Nausea and eye strain can result because VR creates an illusion of 3D viewing although the user is in fact staring at a fixed-distance 2D display. The solution for better 3D visualization could lie in a 60-year-old technology remade for the digital world: holograms.
Holograms deliver an exceptional representation of 3D world around us. Plus, they're beautiful. (Go ahead — check out the holographic dove on your Visa card.) Holograms offer a shifting perspective based on the viewer's position, and they allow the eye to adjust focal depth to alternately focus on foreground and background.
Researchers have long sought to make computer-generated holograms, but the process has traditionally required a supercomputer to churn through physics simulations, which is time-consuming and can yield less-than-photorealistic results. Now, MIT researchers have developed a new way to produce holograms almost instantly — and the deep learning-based method is so efficient that it can run on a laptop in the blink of an eye, the researchers say.
"People previously thought that with existing consumer-grade hardware, it was impossible to do real-time 3D holography computations," says Liang Shi, the study's lead author and a PhD student in MIT's Department of Electrical Engineering and Computer Science (EECS). "It's often been said that commercially available holographic displays will be around in 10 years, yet this statement has been around for decades."
Shi believes the new approach, which the team calls "tensor holography," will finally bring that elusive 10-year goal within reach. The advance could fuel a spillover of holography into fields like VR and 3D printing.
Shi worked on the study, published today in Nature, with his advisor and co-author Wojciech Matusik. Other co-authors include Beichen Li of EECS and the Computer Science and Artificial Intelligence Laboratory at MIT, as well as former MIT researchers Changil Kim (now at Facebook) and Petr Kellnhofer (now at Stanford University).
The quest for better 3D
Courtesy of the researchers
A typical lens-based photograph encodes the brightness of each light wave — a photo can faithfully reproduce a scene's colors, but it ultimately yields a flat image.
In contrast, a hologram encodes both the brightness and phase of each light wave. That combination delivers a truer depiction of a scene's parallax and depth. So, while a photograph of Monet's "Water Lilies" can highlight the paintings' color palate, a hologram can bring the work to life, rendering the unique 3D texture of each brush stroke. But despite their realism, holograms are a challenge to make and share.
First developed in the mid-1900s, early holograms were recorded optically. That required splitting a laser beam, with half the beam used to illuminate the subject and the other half used as a reference for the light waves' phase. This reference generates a hologram's unique sense of depth. The resulting images were static, so they couldn't capture motion. And they were hard copy only, making them difficult to reproduce and share.
Computer-generated holography sidesteps these challenges by simulating the optical setup. But the process can be a computational slog. "Because each point in the scene has a different depth, you can't apply the same operations for all of them," says Shi. "That increases the complexity significantly." Directing a clustered supercomputer to run these physics-based simulations could take seconds or minutes for a single holographic image. Plus, existing algorithms don't model occlusion with photorealistic precision. So Shi's team took a different approach: letting the computer teach physics to itself.
They used deep learning to accelerate computer-generated holography, allowing for real-time hologram generation. The team designed a convolutional neural network — a processing technique that uses a chain of trainable tensors to roughly mimic how humans process visual information. Training a neural network typically requires a large, high-quality dataset, which didn't previously exist for 3D holograms.
The team built a custom database of 4,000 pairs of computer-generated images. Each pair matched a picture — including color and depth information for each pixel — with its corresponding hologram. To create the holograms in the new database, the researchers used scenes with complex and variable shapes and colors, with the depth of pixels distributed evenly from the background to the foreground, and with a new set of physics-based calculations to handle occlusion. That approach resulted in photorealistic training data. Next, the algorithm got to work.
By learning from each image pair, the tensor network tweaked the parameters of its own calculations, successively enhancing its ability to create holograms. The fully optimized network operated orders of magnitude faster than physics-based calculations. That efficiency surprised the team themselves.
"We are amazed at how well it performs," says Matusik. In mere milliseconds, tensor holography can craft holograms from images with depth information — which is provided by typical computer-generated images and can be calculated from a multicamera setup or LiDAR sensor (both are standard on some new smartphones). This advance paves the way for real-time 3D holography. What's more, the compact tensor network requires less than 1 MB of memory. "It's negligible, considering the tens and hundreds of gigabytes available on the latest cell phone," he says.
The research "shows that true 3D holographic displays are practical with only moderate computational requirements," says Joel Kollin, a principal optical architect at Microsoft who was not involved with the research. He adds that "this paper shows marked improvement in image quality over previous work," which will "add realism and comfort for the viewer." Kollin also hints at the possibility that holographic displays like this could even be customized to a viewer's ophthalmic prescription. "Holographic displays can correct for aberrations in the eye. This makes it possible for a display image sharper than what the user could see with contacts or glasses, which only correct for low order aberrations like focus and astigmatism."
"A considerable leap"
Real-time 3D holography would enhance a slew of systems, from VR to 3D printing. The team says the new system could help immerse VR viewers in more realistic scenery, while eliminating eye strain and other side effects of long-term VR use. The technology could be easily deployed on displays that modulate the phase of light waves. Currently, most affordable consumer-grade displays modulate only brightness, though the cost of phase-modulating displays would fall if widely adopted.
Three-dimensional holography could also boost the development of volumetric 3D printing, the researchers say. This technology could prove faster and more precise than traditional layer-by-layer 3D printing, since volumetric 3D printing allows for the simultaneous projection of the entire 3D pattern. Other applications include microscopy, visualization of medical data, and the design of surfaces with unique optical properties.
"It's a considerable leap that could completely change people's attitudes toward holography," says Matusik. "We feel like neural networks were born for this task."
The work was supported, in part, by Sony.
New machine-learning algorithms from Columbia University detect cognitive impairment in older drivers.
An older person's cognitive health is not always obvious. Cognitive impairment and dementia manifest gradually over time, and a person may be unaware of their advance. During this subtle transition, such a person may continue living as they always have, going about their business at home and behind the wheel. But this could lead to a dangerous car accident.
So, researchers from Columbia University have announced the development of AI algorithms that can detect mild cognitive impairment and dementia in older people based on the way they drive. The authors report in the journal Geriatrics that their algorithm is 88 percent accurate.
"Driving is a complex task involving dynamic cognitive processes and requiring essential cognitive functions and perceptual motor skills," says senior author Guohua Li, professor of epidemiology. "Our study indicates that naturalistic driving behaviors can be used as comprehensive and reliable markers for mild cognitive impairment and dementia."
Random forest model
The algorithms the researchers developed were based on a common AI statistical method involving "decision trees" that form a "random forest model." The most successful algorithm, according to lead author Sharon Di, associate professor of civil engineering, was based on "variables derived from the naturalistic driving data and basic demographic characteristics, such as age, sex, race/ethnicity and education level."
Decision trees are often used in memes in which answering "yes" or "no" regarding some attribute leads you down a path to another question, which in turn ultimately leads to a final conclusion.
Data used in the study
The algorithm was developed using data sourced by the Longitudinal Research on Aging Drivers (LongROAD) study sponsored by the AAA Foundation for Traffic Safety. It came from in-vehicle recording devices that captured the driving behaviors of 2,977 participants from August 2015 through March 2019. At the time the project began, the motorists' ages ranged from 65 to 79 years. From the raw data, the authors of the new study derived 29 behavioral variables, which they used to develop cognitive profiles of the drivers.
Credit: Zoran Zeremski/Adobe Stock
The researchers then developed a series of machine-learning models to predict cognitive issues, with differing success rates. While models based on driving variables alone were just 66 percent accurate, and demographic models less so at 29 percent, using both models together produced an accuracy rate of 88 percent.
The researchers also explored the validity of individual factors as predictors of cognitive issues. In order of most reliable to least reliable, they were: (1) age; (2) percentage of trips traveled within 15 miles of home; (3) race/ethnicity; (4) minutes per round trip; and (5) number of hard braking events.
Li is hopeful that his team's work can help keep roadways and older drivers safe. "If validated," he says, "the algorithms developed in this study could provide a novel, unobtrusive screening tool for early detection and management of mild cognitive impairment and dementia in older drivers."
Measuring a person's movements and poses, smart clothes could be used for athletic training, rehabilitation, or health-monitoring.
In recent years there have been exciting breakthroughs in wearable technologies, like smartwatches that can monitor your breathing and blood oxygen levels.
But what about a wearable that can detect how you move as you do a physical activity or play a sport, and could potentially even offer feedback on how to improve your technique?
And, as a major bonus, what if the wearable were something you'd actually already be wearing, like a shirt of a pair of socks?
That's the idea behind a new set of MIT-designed clothing that use special fibers to sense a person's movement via touch. Among other things, the researchers showed that their clothes can actually determine things like if someone is sitting, walking, or doing particular poses.
The group from MIT's Computer Science and Artificial Intelligence Lab (CSAIL) says that their clothes could be used for athletic training and rehabilitation. With patients' permission, they could even help passively monitor the health of residents in assisted-care facilities and determine if, for example, someone has fallen or is unconscious.
The researchers have developed a range of prototypes, from socks and gloves to a full vest. The team's "tactile electronics" use a mix of more typical textile fibers alongside a small amount of custom-made functional fibers that sense pressure from the person wearing the garment.
According to CSAIL graduate student Yiyue Luo, a key advantage of the team's design is that, unlike many existing wearable electronics, theirs can be incorporated into traditional large-scale clothing production. The machine-knitted tactile textiles are soft, stretchable, breathable, and can take a wide range of forms.
"Traditionally it's been hard to develop a mass-production wearable that provides high-accuracy data across a large number of sensors," says Luo, lead author on a new paper about the project that is appearing in this month's edition of Nature Electronics. "When you manufacture lots of sensor arrays, some of them will not work and some of them will work worse than others, so we developed a self-correcting mechanism that uses a self-supervised machine learning algorithm to recognize and adjust when certain sensors in the design are off-base."
The team's clothes have a range of capabilities. Their socks predict motion by looking at how different sequences of tactile footprints correlate to different poses as the user transitions from one pose to another. The full-sized vest can also detect the wearers' pose, activity, and the texture of the contacted surfaces.
The authors imagine a coach using the sensor to analyze people's postures and give suggestions on improvement. It could also be used by an experienced athlete to record their posture so that beginners can learn from them. In the long term, they even imagine that robots could be trained to learn how to do different activities using data from the wearables.
"Imagine robots that are no longer tactilely blind, and that have 'skins' that can provide tactile sensing just like we have as humans," says corresponding author Wan Shou, a postdoc at CSAIL. "Clothing with high-resolution tactile sensing opens up a lot of exciting new application areas for researchers to explore in the years to come."
The paper was co-written by MIT professors Antonio Torralba, Wojciech Matusik, and Tomás Palacios, alongside PhD students Yunzhu Li, Pratyusha Sharma, and Beichen Li; postdoc Kui Wu; and research engineer Michael Foshey.
The work was partially funded by Toyota Research Institute.