APS News

Profiles in Versatility

Physicist Tackles Voynich Manuscript with Statistical Methods

By Sophia Chen


Marcelo Montemurro insists that he isn’t obsessed with the Voynich manuscript.

The UK-based physicist has never held its calfskin pages in his hands. He has never been to the Yale Library stacks where the codex is kept. He doesn’t know what the manuscript says, and he doesn’t know if anyone ever will.

He just likes a good puzzle — and the Voynich manuscript is the king of them all.

No one knows who wrote its 200-something pages, nor the alphabet that was used. One passage, transliterated to Latin letters, reads: "qokedydy qokoloky qokeedy qokedy shedy." Its pages feature illustrations of unidentified plants, astronomical bodies — and naked women bathing in green liquid.

What experts do know: The calfskin has been carbon-dated to the 15th century. At the turn of the following century, Rudolf II, Holy Roman Emperor, paid 600 gold ducats for it, about $40,000 in today’s dollars. For about 200 years, no records about the manuscript exist. In 1912, a Polish book dealer named Wilfrid Voynich found it stashed in the summer villa of a Renaissance pope. He asked approximately twenty of the best World War I cryptologists to make sense of the unknown script. They failed to decode a single word.

Academics languished over it for decades. From the sparse Latin scrawled on some of the pages, they argued whether the author could be Roger Bacon, a medieval philosopher. Researchers transcribed the document into computer-readable scripts. In 2014, botanists claimed one of the images resembled a soap plant from Mexico. They suggested the manuscript was tied to the Aztecs.

Enter Montemurro. In 2006, he stumbled upon an article about the Voynich in a two-year-old issue of Scientific American. In the article, Gordon Rugg, a British computer scientist and psychologist, argued that the manuscript was meaningless. He’d overlaid a card with holes cut in it over a grid of random syllables — a Renaissance-era cryptography technique — and claimed he could produce Voynich-like words. Rugg thought that it could be a forger’s attempt to con a customer into buying a fake manuscript.

"I became absolutely interested in it," Montemurro says.

At the time, he was a relatively fresh hire at the University of Manchester’s department of neuroscience and psychology, where he still works today, studying neural networks.

But he had also researched linguistics for about five years, beginning when he was a physics Ph.D. student studying disordered magnets at Argentina’s National University of Córdoba. He set aside time from his Ph.D. work to dig through Shakespeare’s plays. With colleagues, he analyzed how the playwright distributed different types of speech throughout his work and searched for empirical laws that roughly predicted how sentences were strung together.

He thought about how to categorize language. Should he tally sentence by sentence or paragraph by paragraph? How do the statistics differ as sampling size changes?

The work, which he eventually published, wasn’t so different from physics, Montemurro says. To a physicist, a sentence is just a signal transmitted between two nodes in a network — where each node is a human brain. "I see language as a natural phenomenon on top of a social phenomenon," he says.

The Voynich was "the ideal system" to test his linguistic hypotheses. Since no one knew whether Voynichese is a language in the first place, he thought he could approach with it with fewer preconceptions. So he downloaded the digital version of the manuscript.

To study the manuscript, he uses the concept of linguistic entropy, first proposed by Claude Shannon in 1950. It describes the level of predictability in a text. A sequence of repeating letters like "aaaaaaaa" has zero entropy because each letter can be predicted from its previous one. A random sequence of letters, however, has high entropy.

Language, of course, is neither ordered nor random. Grammar provides a repeating structure, but content words differ between sections. By analyzing the entropy of a word, you can guess its purpose: Is it a proper noun that shows up in bursts, or an evenly distributed conjunction like "and"? Montemurro uses entropy to quantify information in a text without having to understand it at all.

The work doesn’t fit in a conventional academic field, says Mirko Degli Esposti, a University of Bologna mathematician who collaborates with Montemurro. Some humanities experts distrust statistical methods. "They think that using these quantitative methods on a piece of poetry somehow kills all the creativity inside," Degli Esposti says.

Quantitative experts doubt the work, too. Math purists, for example, think humanities problems "contaminate" the field. But these "contaminations" keep math relevant, Degli Esposti says.

"A century ago [the contamination] was physics," he says. "Now it is medicine, language, and the humanities."

More physicists and mathematicians are studying humanities problems, Degli Esposti says. The problems can be practical, too: For example, he researches plagiarism detection.

Montemurro doesn’t care which box academics place him in. He doesn’t get funding to study the Voynich, anyway. He considers himself a physicist because of how he approaches a problem. Inspired by statistical physics, he wants to develop statistical theory to describe how individual words create language, like the way individual particles create magnetism.

With a colleague, Montemurro published a paper in 2013 that compares Voynich statistics to English, Chinese, Latin, and even Fortran. The manuscript’s statistics resemble those of real languages too much to be fake, they wrote.

Their next step, Montemurro says, is to correlate the manuscript’s words with its images. "It is likely that some of the language written close to the figures refers to the figures," he says.

Some of his fascination with the manuscript is, of course, its Da Vinci Code factor. Can it reveal something new about the culture of the Middle Ages?

"People speculate that the Voynich contains forbidden medicinal information or alchemy, or something that could not be disseminated freely at the time," he says. "It would be significant for a historian to learn."

But whenever the excitement builds in his voice, he steadies himself.

"The Voynich has a problem," he says. "Some people become obsessed with it and really start feeling very personal about their theories. … For me, I’m quite open. I’m not defending a particular theory about it."

What if the botanists’ Aztec hypothesis is correct? What if someone confirms that a con man faked it? "If they come with substantial evidence, I won’t feel personally betrayed," he says. "That would be fantastic."

The author is a freelance science writer based in Tucson, Arizona.

Marcelo Montemurro

Marcelo Montemurro

©1995 - 2017, AMERICAN PHYSICAL SOCIETY
APS encourages the redistribution of the materials included in this newspaper provided that attribution to the source is noted and the materials are not truncated or changed.

Editor: David Voss
Staff Science Writer: Rachel Gaal
Contributing Correspondent: Alaina G. Levine
Publication Designer and Production: Nancy Bennett-Karasik