Confronted with a baby—or puppy—most adults can’t stop themselves from dissolving into baby talk: “WHO’S the cutest? It’s YOU! YES it IS!” We slow down, increase our pitch by nearly an octave, and milk each vowel for all it’s worth. And even if the baby can’t speak yet, we mimic the turn-taking of a conversation.
This “parentese” is found across cultures, and babies exposed to more of it at home seem to do better at acquiring their home language. But it’s not all about instinct: a paper published in PNAS this week suggests that parents can be trained to improve their parentese and that this training gives their babies’ language a boost.
Learning to baby talk
Why does more parentese go hand in hand with language acquisition? It’s an open question. Recordings from parents and children in their homes show a correlation—the more parentese there is, the more likely the babies are to be a little more advanced with their language abilities. But is the parentese itself actually helping? And if so, how? Or is there another factor at play that boosts them both?
There’s some reason to think the parentese itself is actively helpful. Its simple, exaggerated language could make it easier for babies to grasp what’s being said. But it could also be that its melodic, theatrical qualities grab and hold babies’ attention, while also giving them space to practice conversation by babbling during their “turns.”
A group of researchers at the University of Washington, Seattle wanted to see whether parents could be coached on improving their parentese and whether this would affect their babies’ language development. So they tracked 71 families with young babies over the course of a year, asking the parents to record a full weekend of the family’s conversations when the babies were 6, 10, 14, and 18 months old.
They split the families into two groups, offering coaching to one group but not to the other. The control group still did all the recordings, but the coached group came in to the lab after the researchers had listened to each set of recordings and got personal feedback and pointers.
The coaching helped the parents to identify helpful habits in their own speech, like engaging in back-and-forth interactions with their babies. They were also given suggestions about what kinds of age-appropriate interactions they could have during activities like bathtime or meals.
The results were promising: parents in the coaching group showed more use of parentese over time compared to the control group and also engaged in more back-and-forth interactions with their babies. The babies themselves vocalized more, too—if you remove non-linguistic noises like coughing and count prelinguistic noises like babbling, the babies in the coaching group were chattier.
And at the end of the study, babies in the coaching group did better on language assessments than babies in the control group.
The researchers checked that factors like the parents’ level of education weren’t affecting the outcomes. They made sure that this was balanced across the two groups at the start of the experiment and had a look to see whether it was correlated with the children’s outcomes at the end. It wasn’t—babies from across the social class spectrum all seemed to get a boost when their parents received coaching.
But as promising as this research is, it’s just a start, and it does have some important weaknesses. For one thing, the control group didn’t have any intervention at all, while the coached group knew that researchers would be listening closely to their behavior to give them personal feedback. While it’s difficult to keep up an act for a whole weekend, it’s still possible this knowledge could have affected their behavior on the recordings.
And studying babies is messy, difficult, and time-consuming, with a really high drop-out rate among the participants. This, plus limited resources, usually means small samples, and this study is no exception. That doesn’t invalidate the results, but it does mean the data will be noisy, which could mean that the results are exaggerated. So more studies will be needed to confirm these results and understand them better.
Early language ability is linked to advantages later in life, but it’s a messy link that has a lot of different possible explanations. So one crucial question for future research to answer is whether these benefits persist later into the children’s lives—even after the coaching stops.
In 1993, a media studies professor at Fordham University named Edward Wachtel visited several famous caves in southern France, including Lascaux, Font-de-Gaume, Les Combarelles, and La Mouthe. His purpose: to study the cave art that has justly made these caves famous. Wachtel was puzzled by what he called “spaghetti lines” on the drawings, partially obscuring them. There were also images of, say, an ibex with two heads, a mammal with three trunks, or a bull drawing superimposed over the drawing of a deer.
His guide for the La Mouthe tour was a local farmer, and since there were no electric lights in this cave, the farmer brought along a gas lantern. When the farmer swung the lantern inside the cave, the color schemes shifted, and the engraved lines seemed to animate. “Suddenly, the head of one creature stood out clearly,” Wachtel recalled. “It lived for a second, then faded as another appeared.” As for those mysterious spaghetti lines, “they became a forest or a bramble patch that concealed and then reveled the animals within.”
Wachtel subsequently published a paper entitled, “The First Picture Show: Cinematic Aspects of Cave Art,” in which he concluded that the cave drawings were meant to be perceived in three dimensions—one of them being time. These could have been the first “protomovies,” he thought.
It’s an intriguing take, although it must be said that Wachtel’s ideas are speculative. There is no way to definitively prove what those prehistoric cave artists intended, and therefore it’s unwise to draw strong inferences about these being cinematic in nature, or to assume that this tells us anything about prehistoric artists’ conception of time. But his point about the importance of viewing cave paintings under the lighting conditions in which they were created and viewed in prehistoric times is sound.
Wachtel’s story recently resurfaced in a Twitter thread, and it couldn’t be more timely. Lighting sources could indeed hold vital clues to the different ways prehistoric peoples used caves, according to a new paper by a team of Spanish scientists, published in the journal PLOS ONE. They conducted in situ experiments with three different kinds of Paleolithic lighting sources, in the hopes of shedding some light (pun intended) on what those various illumination methods might tell us about the emergence of “human symbolic and artistic behavior” in the form of cave art.
There are nearly 350 such prehistoric caves in France and Spain alone, including the oldest cave painting yet known: a red hand stencil in Maltravieso cave in Caceres, Spain, likely drawn by a Neanderthal some 64,000 years ago. (The oldest known depiction of an animal was discovered in 2018 on the island of Borneo in Indonesia, dating back 40,000 years.) The Spanish team chose to conduct their experiments at the Isuntza 1 Cave in Spain’s Basque country, and selected two distinct spaces in particular.
The first was a large, wide chamber with walls of bedrock, with 99.7 percent relative humidity and an average temperature of 17.6 degrees C (63.6 degrees F). They thought it would be ideal as a “staying chamber” for the experiments The second space was a second, slightly smaller chamber with similar relative humidity (99.9 percent) and average temperatures (14.2 degrees C, or 57.5 degrees F) similar to the first space. The two spaces are connected by a rough passage 40 meters long (about 131 feet).
The Spanish researchers chose lighting types for their eight experiments based on known archaeological data: five torches tested in both spaces and the passage, as well as two stone lamps with animal fat, and a small fireplace, both tested just in the first space. All the torches were made from dry juniper branches joined together, like the remains of ancient torches found in the Aldene and Reseau Clastres caves. The researchers included a bit of birch to act as tinder, and added pine resin, animal fat, or a combination thereof to assess how well different fuel types worked.
The lamps were replicas of a sandstone lamp found in La Mouthe Cave in Dordogne, France. They used bovine animal fat as fuel, with three juniper wicks, arranged in a teepee shape inside the lamp. They also built a small fireplace on a clay substrate in the first chamber with juniper and oak as wood fuel.
For all the lighting experiments, the team measured how long the lighting source lasted (duration); the total amount of light reaching a specific surface or point relative to the human eye (illuminance, or lux); how much illumination was emitted in certain directions (luminous intensity); the minimum distance between the light source and total darkness (action radius); and luminance, which connects light intensity with the surface of the source. They also kept track of the highest temperature reached by each type of lighting source.
Those measurements showed that the various lighting sources had very different characteristics, and thus were probably used in different contexts. The wooden torches, for instance, emitted light in all directions, up to nearly six meters (19.6 feet), and lasted an average of 41 minutes. The torches exhibited uneven light intensity, and often needed to be relit by waving them from side to side, and they produced a lot smoke. So they worked best for exploring caves or crossing wide spaces. The team also found that adding resin intensified the flame, while adding animal fat extended its duration.
In contrast, the grease lamps emitted weaker light akin to the intensity of a candle, over a span of three meters (9.8 feet) or so. They burned consistently, and didn’t smoke, for over an hour, but they had a dazzling effect if the person was moving and didn’t illuminate the floor very well. Also, “It was necessary to maintain constant control over the wick to prevent it from sinking into the fatty fuel, causing the flame to be extinguished,” the authors wrote. This makes the lamps better suited for lighting small cave spaces over a longer period, complementing the advantages of the torches.
As for the fireplace—the only truly static system—its illumination covered a range of 6.6 meters (21.6 feet). However, it burned for just 30 minutes and gave off a lot of white smoke, making it unsuitable for use unless there were strong enough air currents to disperse that smoke. “The fireplace location was not appropriately placed regarding air currents,” the authors noted, which are “essential to achieving a prolonged stay underground. However, in the case of large fires, convection currents are produced, and they would be efficient enough to evacuate gases outside of the cave.”
The Spanish team also built a virtual 3D model of a section of the Atxurra cave known as the Ledge of the Horses. It’s a naturally formed platform just above a passage floor, with two panels of about 50 animal engravings: bison, goats, horses, and hinds, many of them overlapping. The ledge was also littered with scattered charcoal, lithic tools, and ashes from three probable fireplaces. In the virtual model, they conducted a spatial analysis of all three tested lighting sources.
The modeling showed that the decorated panels would be “barely perceptible” to someone standing in the lower parts of the gallery, even if that person were carrying a lamp or a torch. It would need to be illuminated from the top of the ledge to be seen. In contrast, the fireplaces appeared to be strategically located to illuminate the entire decorated space. Torches did prove to be a good lighting source for accessing that space, however, with an estimated travel time of 38.39 minutes—in line with the measured duration of the torches. “It does not seem by chance that the optimal routes estimated to access this space are covered with scattered charcoals, surely fallen from the torches used in the Magdalenian period,” the authors wrote.
The findings have no direct bearing on Wachtel’s speculation about prehistoric cinematic art. But the more archaeologists learn about Paleolithic lighting sources, the more we will understand about how those lighting sources affect human perception in a cave environment, with implications for the emergence of cave art. That’s why the Spanish team thinks it is essential to continue conducting these kinds of experiments.
“Only with a large corpus of archaeological remains, including different types of lighting systems (and fuels), studied through an interdisciplinary approach, will it be possible to adequately reproduce Paleolithic light resources,” they concluded in their paper, “Our experiments in Paleolithic lighting point to planning in the human use of caves in this period, and the importance of lighting studies to travel the activities carried out by our ancestors in the deep areas of caves. “
Roughly a thousand years ago, a young man in his early 20s met a violent end in England. 800 kilometers (500 miles) away, in Denmark, an older man who had survived a lifetime of battles died sometime in his 50s. At first glance, there’s nothing to suggest a connection between them over such a distance. But according to a recent study of their DNA, the two men were second-degree relatives: half-siblings, uncle and nephew, or grandfather and grandson.
Today, their skeletons lie side-by-side in the National Museum of Denmark, reunited after centuries, Agence France-Presse (AFP) reported.
Geneticists sequenced the pair’s DNA as part of a much larger study, which sampled and sequenced ancient DNA from more than 400 human skeletons at sites across Europe and Greenland. That data revealed that Vikings were much more ethnically diverse than historians have often assumed, and it helped track the migrations that defined the Viking Age. Against the backdrop of those larger patterns, the ancient DNA from two skeletons, buried hundreds of kilometers apart under very different circumstances, told a much more personal story.
“This is a big discovery because now you can trace movements across space and time through a family,” Jeannette Varberg of the National Museum of Denmark said.
Given what is known about the Viking Age, it’s easy to imagine at least the broad strokes of this family’s story. The 50-year-old may have been a veteran of raids along the coast of continental Europe, or a returning veteran of raids on the British Isles; his bones showed evidence of old, long-healed wounds sustained in combat. But he lived to a relatively old age for his time and occupation (as they say, beware an old man in a profession where men usually die young).
The 20-year-old may have may have died during a raid on the English coast, or he may have been caught up in King Ethelred II’s 1002 CE purge of Danes living in England. He ended up in a mass grave in Oxford, England, with his skull shattered by the blows that killed him. It’s reasonable to speculate that the two men knew each other, or at least knew of each other, but there’s not enough evidence for archaeologists to say whether they lived at the same time, or which of them was born first.
“It’s very difficult to tell if they lived in the same age or they differ maybe by a generation, because you have no material in the grave that can give a precise dating,” Varberg said.
It’s plausible that the young man who died in England went to battle with thoughts of impressing a sibling, an uncle, or a grandfather back in Denmark; perhaps they fought side-by-side, or perhaps he was hoping to live up to his elder’s stories. Then again, it’s equally plausible that the veteran warrior who died in Denmark remembered the stories of a sibling or older relative who died in battle far to the west.
Either way, the pair of warriors are an excellent reminder of what ancient DNA—and archaeology, more generally—can tell us about the past, from sweeping large-scale patterns of human movements to the much more personal lives of individual people and families. And once in a great while, both kinds of stories emerge from the same study.
In July 2020, OpenAI launched GPT-3, an artificial intelligence language model that quickly stoked excitement about computers writing poetry, news articles, and programming code. Just as quickly, it was shown to sometimes be foulmouthed and toxic. OpenAI said it was working on fixes, but the company recently discovered GPT-3 was being used to generate child porn.
Now OpenAI researchers say they’ve found a way to curtail GPT-3’s toxic text by feeding the program roughly 100 encyclopedia-like samples of writing by human professionals on topics like history and technology but also abuse, violence, and injustice.
OpenAI’s project shows how the tech industry is scrambling to constrain the dark side of a technology that’s shown enormous potential but also can spread disinformation and perpetuate biases. There’s a lot riding on the outcome: Big tech companies are moving rapidly to offer services based on these large language models, which can interpret or generate text. Google calls them central to the future of search, and Microsoft is using GPT-3 for programming. In a potentially more ominous development, groups are working on open source versions of these language models that could exhibit the same weaknesses and share them more widely. So researchers are looking to understand how they succeed, where they fall short, and how they can be improved.
Abubakar Abid is CEO of machine-learning testing startup Gradio and was among the first people to call attention to GPT-3’s bias against Muslims. During a workshop in December 2020, Abid examined the way GPT-3 generates text about religions using the prompt “Two ___ walk into a.” Looking at the first 10 responses for various religions, he found that GPT-3 mentioned violence once each for Jews, Buddhists, and Sikhs, twice for Christians, but nine out of 10 times for Muslims. In a paper earlier this year, Abid and several coauthors showed that injecting positive text about Muslims to a large language model reduced the number of violence mentions about Muslims by nearly 40 percentage points.
Other researchers are trying different approaches. Emily Dinan, a research engineer at Facebook AI Research, is testing ways to eliminate toxic text by making more of it. Dinan hires Amazon Mechanical Turk contractors to say awful things in conversations with language models to provoke them to generate hate speech, profanity, and insults. Humans then label that output as safe or unsafe; those labels help train AI to identify toxic speech.
GPT-3 has shown impressive ability to understand and compose language. It can answerSAT analogy questions better than most people, and it was able to fool Reddit users without being found out.
But even its creators knew GPT-3’s tendency to generate racism and sexism. Before it was licensed to developers, OpenAI released a paper in May 2020 with tests that found GPT-3 has a generally low opinion of Black people and exhibits sexism and other forms of bias. Despite those findings, OpenAI announced plans to commercialize the technology a month later. That’s a sharp contrast from the way OpenAI handled an earlier version of the model, GPT-2, in 2019. Then, it initially released only small versions of the model. At the same time, partners in academia issued multiple studies of how large language models can be misused or adversely impact society.
In the recent paper highlighting ways to reduce the toxicity of GPT-3, OpenAI disclosed tests showing the base version of GPT-3 refers to some people as animals and associates white people with terms like “supremacy” and “superiority”; such language perpetuates long-held stereotypes and dehumanizes non-white people. GPT-3 also makes racist jokes, condones terrorism, and accuses people of being rapists.
In another test, Xudong Shen, a National University of Singapore PhD student, rated language models based on how much they stereotype people by gender or whether they identify as queer, transgender, or nonbinary. He found that larger AI programs tended to engage in more stereotyping. Shen says the makers of large language models should correct these flaws. OpenAI researchers also found that language models tend to grow more toxic as they get bigger; they say they don’t understand why that is.
Text generated by large language models is coming ever closer to language that looks or sounds like it came from a human, yet it still fails to understand things requiring reasoning that almost all people understand. In other words, as some researchers put it, this AI is a fantastic bullshitter, capable of convincing both AI researchers and other people that the machine understands the words it generates.
UC Berkeley psychology professor Alison Gopnik studies how toddlers and young people learn to apply that understanding to computing. Children, she said, are the best learners, and the way kids learn language stems largely from their knowledge of and interaction with the world around them. Conversely, large language models have no connection to the world, making their output less grounded in reality.
“The definition of bullshitting is you talk a lot and it kind of sounds plausible, but there’s no common sense behind it,” Gopnik says.
Yejin Choi, an associate professor at the University of Washington and leader of a group studying common sense at the Allen Institute for AI, has put GPT-3 through dozens of tests and experiments to document how it can make mistakes. Sometimes it repeats itself. Other times it devolves into generating toxic language even when beginning with inoffensive or harmful text.
To teach AI more about the world, Choi and a team of researchers created PIGLeT, AI trained in a simulated environment to understand things about physical experience that people learn growing up, such as it’s a bad idea to touch a hot stove. That training led a relatively small language model to outperform others on common sense reasoning tasks. Those results, she said, demonstrate that scale is not the only winning recipe and that researchers should consider other ways to train models. Her goal: “Can we actually build a machine learning algorithm that can learn abstract knowledge about how the world works?”
Choi is also working on ways to reduce the toxicity of language models. Earlier this month, she and colleagues introduced an algorithm that learns from offensive text, similar to the approach taken by Facebook AI Research; they say it reduces toxicity better than several existing techniques. Large language models can be toxic because of humans, she says. “That’s the language that’s out there.”
Perversely, some researchers have found that attempts to fine-tune and remove bias from models can end up hurting marginalized people. In a paper published in April, researchers from UC Berkeley and the University of Washington found that Black people, Muslims, and people who identify as LGBT are particularly disadvantaged.
The authors say the problem stems, in part, from the humans who label data misjudging whether language is toxic or not. That leads to bias against people who use language differently than white people. Coauthors of that paper say this can lead to self-stigmatization and psychological harm, as well as force people to code switch. OpenAI researchers did not address this issue in their recent paper.
Jesse Dodge, a research scientist at the Allen Institute for AI, reached a similar conclusion. He looked at efforts to reduce negative stereotypes of gays and lesbians by removing from the training data of a large language model any text that contained the words “gay” or “lesbian.” He found that such efforts to filter language can lead to data sets that effectively erase people with these identities, making language models less capable of handling text written by or about those groups of people.
Dodge says the best way to deal with bias and inequality is to improve the data used to train language models instead of trying to remove bias after the fact. He recommends better documenting the source of the training data and recognizing the limitations of text scraped from the web, which may overrepresent people who can afford internet access and have the time to make a website or post a comment. He also urges documenting how content is filtered and avoiding blanket use of blocklists for filtering content scraped from the web.
Dodge created a checklist for researchers with about 15 data points to enforce standards and build on the work of others. Thus far the checklist has been used more than 10,000 times to encourage researchers to include information essential to reproducing their results. Papers that met more of the checklist items were more likely to be accepted at machine learning research conferences. Dodge says most large language models lack some items on the checklist, such as a link to source code or details about the data used to train an AI model; one in three papers published do not share a link to code to verify results.
But Dodge also sees more systemic issues at work. He says there’s growing pressure to move AI quickly from research into production, which he says can lead researchers to publish work about something trendy and move on without proper documentation.
In another recent study, Microsoft researchers interviewed 12 tech workers deploying AI language technology and found that product teams did little planning for how the algorithms could go wrong. Early prototyping of features such as writing aids that predict text or search completion tended to focus on scenarios in which the AI component worked perfectly.
The researchers designed an interactive “playbook” that prompts people working on an AI language project to think about and design for failures of AI text tech in the earliest stages. It is being tested inside Microsoft with a view to making it a standard tool for product teams. Matthew Hong, a researcher at the University of Washington who worked on the study with three colleagues while at Microsoft, says the study shows how AI language technology has in some ways changed faster than software industry culture. “Our field is going through a lot of growing pains trying to integrate AI into different products,” he says. “People are having a hard time catching up [and] anticipating or planning for AI failures.”