Audio Player

Starting at:

Theories of Everything with Curt Jaimungal

Greg Kondrak: Voynich Manuscript, Artificial Intelligence, Zodiac, Dorabella

February 4, 2023 1:29:11 undefined

ℹ️ Timestamps visible: Timestamps may be inaccurate if the MP3 has dynamically injected ads. Hide timestamps.

Transcript

Enhanced with Timestamps
230 sentences 11,490 words
Method: api-polled Transcription time: 86m 18s
[0:00] The Economist covers math, physics, philosophy, and AI in a manner that shows how different countries perceive developments and how they impact markets. They recently published a piece on China's new neutrino detector. They cover extending life via mitochondrial transplants, creating an entirely new field of medicine. But it's also not just science they analyze.
[0:20] Culture, they analyze finance, economics, business, international affairs across every region. I'm particularly liking their new insider feature. It was just launched this month. It gives you, it gives me, a front row access to The Economist's internal editorial debates.
[0:36] Where senior editors argue through the news with world leaders and policy makers in twice weekly long format shows. Basically an extremely high quality podcast. Whether it's scientific innovation or shifting global politics, The Economist provides comprehensive coverage beyond headlines. As a toe listener, you get a special discount. Head over to economist.com slash TOE to subscribe. That's economist.com slash TOE for your discount.
[1:06] You really want to be educated to be somebody who cannot be replaced by a computer. And I guarantee you that he will never be able to replace the most important part of us which is the creativity. Are you sure about that?
[1:26] Today we talk about the Voynich manuscript, the zodiac cipher, specialized non-human languages, the Dorabella cipher, ciphers in general, and how to go about decrypting them. Greg Kondrak is a professor of artificial intelligence, investigating natural language processing in a way that it relates to language reconstruction. He's employed these machine learning techniques to attempt what can be considered the most
[1:47] objective decipherment of the Voynich manuscript, which is a considerably rare illustrated codex handwritten in an otherwise unknown writing system. It's evaded any attempt to decode it since the Italian Renaissance. The Voynich manuscript is written on an expensive vellum, and it's just one of the several puzzles that Professor Kondrak has tackled, another example being the Dora Bella cipher. Greg Kondrak is also known for proving Chomsky's statement wrong, the statement that English orthography is close to optimal.
[2:12] My name is Kurt Jaimungal. I have a background in mathematical physics. This podcast is called Theories of Everything, is dedicated to the exploration of theories of everything from a theoretical physics perspective, but as well as exploring the role consciousness has to the fundamental laws of nature. Each sponsor, as well as the patrons, improves the quality of the videos drastically, it improves the depth, it improves the frequency, and it goes toward paying the staff, for instance, someone who's editing this full-time right now, and then we have an operations manager. In that vein, I want to thank today's sponsor, Brilliant.
[2:42] If you're familiar with Toe, you're familiar with Brilliant, but for those who don't know, Brilliant is a place where you go to learn math, science, and engineering through these bite-sized interactive learning experiences. For example, and I keep saying this, I would like to do a podcast on information theory, particularly Chiara Marletto, which is David Deutsch's student, has a theory of everything that she puts forward called constructor theory, which is heavily contingent on information theory. So I took their course on random variable distributions and knowledge and uncertainty,
[3:11] in order to learn a bit more about entropy. Now there's this formula for entropy, essentially hammered into you as an undergraduate, which seems to have fallen from the sky. However, when you take Brilliant's course, it was the first time that I could see that it's an extremely clear and intuitive formula. That is to say that it would be unnatural to define it in any other manner. Visit brilliant.org slash TOE, that is T-O-E, to get 20% off the annual subscription. And I recommend that you don't stop before four lessons.
[3:39] I think you'll be greatly surprised at the ease at which you can now comprehend subjects you previously had a difficult time grokking. At some point, I'll also go through the courses and give a recommendation
[4:02] Professor, what is the Voynich manuscript and why is it important?
[4:17] The Voynich manuscript is a medieval manuscript written in some code that was actually confirmed to be actually a genuine manuscript from the 15th century. It has illustrations, it has text. The script or the alphabet is unique for the manuscript. It has not been deciphered yet.
[4:49] Why hasn't it been deciphered? We don't know why it hasn't been deciphered. Some people say it's because there's nothing to decipher. It's just some kind of a joke. Or other people say that the encoding system is very complicated. And other people have other theories about it. Why do people say it's a joke and what do you make of that?
[5:18] Well, personally, I don't think, as I said, that this is a testable scientific hypothesis. You can guess that it's a joke, but it's very hard to prove something like that.
[5:33] Obviously, it cost a lot of money to produce this kind of manuscript in the Middle Ages, so that's one reason. I don't think it was a joke. But also, there's some work showing that there are some statistical properties that indicate there is actually a language that is being encoded.
[6:00] I've seen several documentaries on the Voynich manuscript, several, so not just one or two, there's a variety of them and then there's also a whole subreddit, so a whole reddit group dedicated to solving this. Why is this so difficult compared to other ciphers in the past? Like, what is it about this?
[6:18] Well, the difficulty here is that the first difficulty is that we don't know what language it is. Very often we have ciphers, we have messages, but we know what the message is. For example, the German Enigma machine was a very hard cipher, but we knew that it was German being encoded, which made it quite easier.
[6:45] And the second thing is that we don't know the script or the alphabet. If that alphabet was used for something else, we would know how to speak it, how to pronounce these words. And third, the problem is that this is just a unique document. There's no other document that is written in this way. So it's all self-contained and we don't even know, we're not exactly sure where it was even produced.
[7:14] It's strange that there's no other document that's like that. Firstly, that the script is different, like you mentioned that we don't know how to pronounce the words. Though, even in the Enigma code, it's not as if the words were meant to be pronounced and then understood like that. You could have translated it to zeros and ones, and it still would have been difficult. It still would have been the same problem.
[7:32] So, why does the fact that we don't understand how to pronounce the alphabet make a difference? Why can't we just say, look, this letter appears, let's call that letter 28, let's call that other letter 50? Well, yeah, well, so we have letters, but we have words. Words are made of letters. That's the truth in every language, human language, that you have words that are made of
[7:53] phonemes that usually every phoneme has its own letter and then there are certain utterance regularities for example there are certain words that can be pronounced and other words that cannot be pronounced and that varies from language to language
[8:15] So the moment you know what language we're speaking, we usually know the lexicon of that language, typically a few thousand words, and they have different frequencies, so that makes it a lot easier to decipher anything. Even if you just replace every word with a number, that will still give you some information about the frequency of the words, as long as you do something to compare it to.
[8:44] I know there are different kinds of ciphers, and I believe the simplest is called the substitution cipher. That's like when you're a kid and you just replace the letter A with C and so on. What other kinds of ciphers are there?
[8:57] So, principally there are substitution ciphers which is replacing symbols and transposition ciphers which is mixing them up, changing their sequence. And every cipher is a combination of those two methods.
[9:15] Now, we don't know what is Voynich. We don't know if it's just a simple substitution or is it a substitution combined with transposition. The paper that we wrote assumed that the transposition was involved and we tried to come up with a general method of breaking this kind of ciphers that combine substitutions and transpositions.
[9:42] I completely glossed over the history. Actually, not that I glossed over, I had forgotten to even ask you to give the audience an indication as to where was this document found? Why does it matter? Why is it that scholars even care about it? I'm sure there's plenty about the past that we don't know about, so why is it that many scholars, and not only scholars, groups of people, teams of people on Reddit are poring over trying to figure out what the heck is this saying? Where was it from? Can you give a bit of the history of it, please?
[10:12] Yeah, so it was found basically by the person whose name was Voynich, that's why it's called Voynich. It was a kind of a collector in the beginning of the 20th century. And since then the manuscript has been tracked back to 17th century to the court of a Roman emperor. And that's where it kind of ends, that's where the trail ends.
[10:41] But not long ago there was a chemical analysis done on the manuscript, so we're certain that it was actually written in the 15th century. So there is no doubt about that anymore.
[10:58] Now, you also asked the other question, which is why are people so fascinated with it? I think the main reason is that we are fascinated by puzzles. If we see something that seems to be a message, we want to know what that message is. We want to decipher it. And Voynich is like Mount Everest of cryptographic puzzles.
[11:27] It was actually studied by many years by people that were working for the US government and that were professional breakers, code breakers that broke codes during the Second World War. In the case of the Enigma code, at least we could presume that it has something to do with government secrets and war. Whereas with this, do we have any indication as to what the subject matter is?
[11:56] Well, we have the illustrations. That's what makes it really interesting is we have plants, people, medicines, all kinds of strange illustrations that seem to be related to the text.
[12:13] So this is not just a text like some kind of ancient inscription, but it is actually a codex, which is like a compendium of some kind of knowledge, which was quite common in Middle Ages. And is it quite common among a particular group that speaks a particular language? And thus we could figure out, okay, with this amount of probability, it's from this culture or this group of people.
[12:37] Yeah, so many of those codices are in Latin, which was the language of literature and science in Middle Ages. And you can actually find such books very similar looking from Middle Ages that are written in Latin. The best guesses about
[12:59] Is it controversial that it's in the 15th century and in northern Italy?
[13:14] Almost everything that you can say about Voynich manuscript is controversial. So this is what I consider a reasonable guess, but pretty much everybody has a different opinion where this manuscript comes from and what language it represents.
[13:32] Speaking of your research, now would be a great time to tell the audience what is it that you study and then how the heck did you become interested in the Voynich manuscript, other than a general curiosity for solving puzzles.
[13:57] Yeah, so I'm a computational linguist, I'd say. So I work at the computer science department at the University of Alberta in Canada. And I work on language in general, making computers understand human language, making writing programs that can process human language and do the work for us, because there's so much text that is available that nobody can actually read all of it.
[14:25] About the decipherment, the person that made this really interesting for me was Professor Kevin Knight from University of Southern California and he worked on various interesting projects and I saw his presentation of Voynich manuscript about 10 years ago, I'd say.
[14:48] And it was related to what I was doing. What Kevin, Dr. Knight said basically is that everything we do with language is a kind of decipherment. Because language is typically written, that's what we work with, a written language. Even if it's spoken, we work with some form of it that is a transcription.
[15:12] With respect to the images in the Voynich which will overlay on screen, is it strange to depict what they depict? The pictures? Yeah, what are the pictures of and is there something unique about them?
[15:39] Yes, so generally if I tell you that it depicts plants, for example, that's not strange because that's what medieval codices do. But if you look closely at those plants, if you're an expert in plants, I'm not, but
[15:55] experts on plants look at them and say well these don't really look like real plants. These look like made up plants. And then there are pictures of people that some kind of many naked bodies taking baths in some kind of green water
[16:14] In general, pictures of people are not strange, but those particular pictures are really strange and they are unlike anything else that we know from middle ages. They're strange because they're naked or they're strange because they're depicted alongside plants? What is it specifically that's unique?
[16:36] There's change because it's not clear what they're depicting. Are they depicting people taking baths? That I don't think was very common in Middle Ages. Or why are those figures, for example, all women, and why they are naked, right? In the 15th century that was not a normal thing to put in a book.
[16:58] There are also other things like zodiac signs or pictures of planets that you would expect to be quite normal because we even know what those zodiacs are. But it's difficult to connect the words that describe those pictures to the actual pictures.
[17:24] In one of the documentaries that I was watching about this, they said that a remarkable element is that there are extremely few errors. In a writing of this size, they would expect that there are some errors and then maybe you smudge it out or however they correct errors, and there's some way of detecting the frequency of the errors in a document. And most documents have, let's say, error percentage 2, 2%. They make an error every 2 out of 100 words.
[17:51] Yeah, so first of all, I did not work with the actual text. I worked with the transcription that somebody made. But it is, from what I know, it is true that there is very few corrections.
[18:16] My personal opinion that this may indicate that whoever was writing this, copying it, did not understand what they were writing. You usually make corrections if you write something and you say, oh, that's not what it should be, right? But if you write something in a language that you're totally unfamiliar with, you won't be able to notice that. That's interesting. In that case, that would imply that there's another copy.
[18:39] No, that means that there were people that maybe were copying the text into the manuscript because you would expect there was some draft that they were copying from because the manuscript itself was very expensive to write on.
[19:00] That's interesting. Then that means that it's extra difficult to decipher it because there are errors. We don't know this, but that would mean that there's more than the average amount of errors. What's the difference between in-ciphering, so creating a cipher out of something, and then encryption?
[19:24] Yeah, I'm not sure if there is a difference. You know, if you really try to find the difference, you could say that encryption is like you do encryption of a text, but you don't really know how it is done. You know, you apply some encryption program, whereas in ciphering implies that you go like letter by letter, look up the key and then cipher each letter separately.
[19:47] But in general I think it's pretty much the same thing. What were the different methods used to decipher the Voynich manuscript? Not just by you, but by others. What techniques do they employ?
[20:02] Well, that's kind of difficult to say because none of those methods actually worked. The decipherment has not been achieved in spite of many claims to the contrary. So there isn't really any algorithm involved. It's mostly based on people's intuition and theories.
[20:26] What are some of your attempts to decipher? Can you go through the successes and failures?
[20:44] Yes, so in our project, our assumption was that the first thing to basically to start with is to find out what language this is written in. If we don't know what the language it is written in, then there's no way to decipher it. So we devised some methods of detecting, identifying the language of the cipher even without deciphering it.
[21:10] and we used a large sample of about 400 languages and out of those 400 languages we assigned like a number, a score to each language in terms of the probability that this language is the language of the manuscript.
[21:28] How much higher was it than the second and third place?
[21:44] It was a clear difference, I would say, a significant difference between the second run in the list. So that was quite striking. Were you able to get any other historical documents that are written in Hebrew that have a similar art style and are of similar length just to see, well, is this common? Is this a common practice to the people who write in Hebrew or is this aberrant? Is this extremely unique?
[22:11] So the Hebrew manuscripts exist and they were written throughout Middle Ages in Hebrew by the Jewish scholars. And I'm not the only person that hypothesized that this was actually coming from the Jewish scholar community.
[22:33] Now, nobody used this kind of particular script, but this script does have some similarities to Hebrew script. For example, I actually don't speak Hebrew, or I don't know much about it, but I know that the Hebrew script does not include letters, sorry, include vowels, which makes the words shorter.
[23:02] And this is what we observe in Voynich that the words are quite short. And then the number of different symbols suggest that it is something like a substitution cipher because the number of symbols is similar to the number of phonemes in a typical language. So now that you have at least potentially identified the language, what's the next step?
[23:28] Yeah, a very good point. So, the next step is obviously try to match every symbol to a different letter of the Hebrew alphabet. And that usually is easy. Breaking simple substitution ciphers is easy. But it doesn't work in this case. It does not produce any sensible decipherment.
[23:53] So we came up with this hypothesis that the letters within words are actually transposed to make it more difficult to decipher. And when you kind of move the letters around it becomes very difficult to decipher it. So we came up with a method that could handle this kind of transposition within words and we tested it on other languages and it worked very well.
[24:22] However, when we apply this to the Vonage manuscript, it still does not produce any kind of readable decipherment. When you're testing it with other languages, are you testing it with encipherments that you contrive or are you testing it with ciphers that already exist from those other languages?
[24:44] No, we tested it on a mass scale with synthetic ciphers, so computer-generated ciphers, but these were generated from the actual text in those languages. You mentioned that there's two kinds of ciphers, at least so far, so there's substitution and then transposed, or transposition. What else is there? So Pig Latin, where you just add some words, is that considered transposition?
[25:10] No, I think piglining is like a game. What I'm saying is that it seems like there are other methods that exist, even if they're silly. So there's transposition, there's substitution. Is that primarily it or is there a seldom third?
[25:26] Well, I would say even Piglet and you can express it probably as some kind of substitution and transposition, right? So these serious ciphers like Enigma is basically, again, a combination of substitution and transposition. Actually, Enigma is just pure substitution, really. There's no transposition there, except that, of course, the spaces between words are removed.
[25:54] Well, that's disappointing. You're like, okay, great. I've made a headway. I found out that it's Hebrew. At least you're somewhat confident it's Hebrew. Then you say, well, okay, let me devise some way of deciphering any substitution plus transposition combination. It works on other ciphers. Great. Let me apply it here. Doesn't work. So now what are you thinking and what's next?
[26:15] Well, first of all, I'm not confident that it is actually Hebrew. All I can say is that out of those 400 languages that we had samples of, this is the one that got the highest score. So if I had to pick one of those 400, then I would pick Hebrew. But the language may actually not be in that 400 sample. There's thousands of languages in the world.
[26:41] and in addition it may not actually be actually any human language. Some people hypothesize that it's a made-up language like Esperanto.
[26:51] So, of course, we were excited to see some kind of clear preference for one of the languages. And we applied a kind of a scientific methodology to it. So we reported those results and they are replicable. If somebody else applies this to that sample, they will find exactly the same thing. But that doesn't mean that this language is actually Hebrew.
[27:23] So, yeah, if I was really convinced that this was Hebrew, I think the next thing I would have to do is to actually learn Hebrew, because that would be the only way to decipher that complicated manuscript full of errors. But of course, I have a lot of other projects to do, so I'm not going to study Hebrew for that purpose.
[27:50] But there are many people that know Hebrew and I'm sure if this was really Hebrew they would be able to decipher it themselves. So if someone watching speaks Hebrew and is a computer scientist and they also want to help, what should they do? Contact you? Or is there some program that they run?
[28:11] Some of those people were actually experts in Hebrew and in computers and in ciphers and even they could not make any progress. So then why do you think that if you were to learn Hebrew it would help?
[28:40] No, I said that if I really was 100% sure that it's Hebrew, then that would definitely help to know Hebrew, right? My work is from a point of view of a computer scientist, not from a point of view of a linguist or a cryptographer. So it's not as simple as saying identify the language, then suggest the different rules. So that is the substitution slash
[29:07] Transposition Combination.
[29:24] So, you know, as I said, the main value of Vonage Manosphere is that it forces you to come up with new methods that later may turn out to be useful for other things. What we came up with is a methodology for doing this and we proved it in our paper. It works.
[29:47] kind of 95% accuracy. If you take a language, whatever language, pick any language, provided it's in that 400 sample, substitute letters for other symbols, scramble them, give it to our program, it will decipher it with 95% accuracy. So that is proven and that is a replicable thing that was published in the paper.
[30:18] But that is made on the assumption that it is actually an actual human language included in that set of 400 languages that is being used for that purpose. The fact that it doesn't work with Voynich suggests that Voynich is not written in Hebrew or any of those 400 languages. So you ended up testing it on all 400 languages.
[30:45] No, we tested it on a smaller subset. I think it was six languages.
[30:52] Is it just computationally too difficult to do all of them? Like it takes up too much time? Can you not just tell the computer to run with it? The problem is that you need to build what's called a language model for each language. And for that you need a lot of text. And the European languages, usually all of them have a lot of text, like people write newspapers in them.
[31:17] But if you pick languages that are very small or very exotic, then it's very difficult to find any electronic text written in those languages. So then it's very difficult to derive a language model from those texts because they are too small. That was the reason. It's quite the conundrum.
[31:42] At least you were able to develop some new techniques that can be applied to other problems. Have you made any other progress other than what you've just indicated? So I mentioned that we worked on another undeciphered text. How about we just transition to that and I'll come back and forth to the Voynich at different points. Why don't you tell us about the Dorabella cipher?
[32:05] You know, for me, like I know there are people that just spend all their lives on Voynich, right? They're like obsessed with Voynich. But for me, it was just one project of many. So after Voynich, after we decided that we've done everything we could with it, we left it to other people to puzzle over. And there was another cipher that caught my attention, which is called the Dorabella cipher.
[32:31] And this was written in 20th century. We know who wrote it. It was an English composer
[32:42] who wrote a postcard to his friend and that postcard was deciphered. It included a decipher which is about 80 characters in a kind of a strange script. And that postcard survived and was published after his death. The composer's name is Elgar. And nobody has been able to decipher that short text.
[33:12] So that's the Drabula cipher, another undeciphered cipher. Now our approach was that maybe, you know, this is not a text, any language text, maybe this is just music because that guy was a composer.
[33:30] So what will happen if we try to decipher into music? So we came up with algorithms and implemented programs that can take a short piece of music that is encoded in some way and decipher it.
[33:54] And that's what happened. We published a paper that at the end produces a kind of reconstruction of a melody that is our best guess
[34:09] You said a peculiar statement about the Voynich manuscript, that it may not be a human language. Now do you mean to say a language that large groups of people speak or that it's an alien language, like it's not a homo sapien?
[34:27] It could be a made-up language, right? So you know that actually there exist languages that were invented, like Esperanto, and many languages, like hundreds of languages have been invented. This could be one of those, a language that was never spoken by any community, but somebody just kind of made up a language, and it's possible. Anybody can do that, invent their own language.
[34:56] I see. So still it's a human language in the sense that it's made by a human, but it's not a human language in the sense that it's not spoken by many people or even known about. So it's not as if an alligator made up this language or some other extradimensional entity made up the language or divinely inspired.
[35:14] That's right. A better word is probably natural language. We say natural language is the language that occur on the planet spoken by some community of people. I see. Stephen Bax is another professor who is no longer with us, but he studied this manuscript and I'm curious if you can go through what his theories on it are and then also your commentary on it.
[35:41] Well, actually, I'm not an expert in his theories or any other theories. You know, the ultimate test of a theory is that it produces a decipherment, right? So, as far as I know, no reasonable decipherment has been produced by Dr. Bax or anybody else.
[36:05] So it's not a huge motivation to study somebody's method if that method has not actually worked. How does one go about the process of learning a language from a computational perspective?
[36:20] So, you know, everybody speaks a language. That's the universal thing. Every human being, they have their own native language. Plus, they may speak other languages. But the majority of people, I think they just learn very well their own native language and they learn it as children.
[36:39] If you try to learn a language after you're like somehow like 10 years old, then you'll find out that it actually becomes a different process. It becomes more difficult and you actually have to go to school or study books or go on the internet and somebody teaches you a language. This is not how children learn a language. So there's a big difference between the native language and the second language that we learn.
[37:09] For example, when I speak, you can probably tell that English is not my first language. My first language is Polish.
[37:18] So, because it's not my native language and because I learned it as a teenager, you can tell from my accent that I'm not a native speaker. So, this already tells you something about what people call the language instinct, the ability of people to acquire language. Now, linguistics is
[37:46] science of the language which deals with various aspects of the language and those include things like phonetics and morphology, grammar, syntax, semantics, pragmatics, acquisition, many things.
[38:02] What are pragmatics, briefly, sorry? Yeah, pragmatics is probably what you're most interested in yourself is basically, for example, sentiment analysis is pragmatics, right? If you deal with sentiment analysis, you're not really interested in finding out what people say, but what they feel about what they say, right?
[38:25] So that's what we call pragmatics. It's not just about the message. It's about all the other stuff upon it. How do we feel about the message? That sounds terribly complicated.
[38:39] Is it? No. Well, it is difficult, but it's doable and it's not the hardest part. It is one of the tasks that people do and we have programs now that are very good at it. Okay, so continue on where you were, please.
[38:57] When you say that it sounds very complicated, it's because it's hard to define exactly what we mean by things like sentiment. What do you mean by sentiment? And then people say, well, are you angry? Are you happy? Are you sad?
[39:15] And then how many feelings do we have? Well, we have eight feelings. Really eight? No, maybe twelve, right? So these are things that are very difficult to define. It's much easier to deal with things like letters or phonemes, where we know exactly how many letters or phonemes we have in a language, and it's easier to write programs that deal with that.
[39:39] Yeah, so I'll give it a try. So you can imagine it's like a pipeline. So you start with, when you hear somebody speaking, you start with what are the sounds of the language, right? That's phonetics. And now once you've done that, then you try to figure out where one word starts, the other ends, right? You want to see, you want to identify the words because there's only a limited number of words.
[40:08] And that's what we call lexicon or lexicals.
[40:14] And then when you look at the words, you see that they are made up of sounds or letters, but they're also made of something bigger, which is called morphemes. And that's the stuff of morphology. That's the study of morphology. For example, if I say a word like ungrammaticality, then you can say, well, there are three parts of it, the un, the grammar, and the ality.
[40:42] That's the morphology. So these are considered the kind of low level, low levels of language and as you go up it becomes more interesting. So first of all, how are words put together into sentences? How is it that you can have sentences that you ask somebody, is that the proper English sentence and they say yes or no? They can tell even though they have no idea, they haven't studied linguistics.
[41:12] Every native speaker can tell you if a sentence is grammatical or not. That's the study that Noam Chomsky did in the 50s. Can we write a program that can tell a grammatical sentence from an ungrammatical sentence? And on top of that, on top of syntax is semantics, which is about the meaning of words. We can have perfectly grammatical sentences that are meaningless. And vice versa, we can have meaningless utterances that are not grammatical.
[41:43] This universal grammar of Chomsky's, it's true in the sense that you can create a program that can identify which sentences are grammatically correct and incorrect.
[42:03] Actually, I don't think so. I think that's what Chomsky tried to do all his life, but it has not been done as far as I know. But at least that was the state of the art about 10 years ago.
[42:20] Now, the last few years we have seen the neural language models appearing which are extremely effective and which as you know can produce a completely grammatical and text that also makes sense. Yeah, so by extension that means that these programs can tell the difference between a grammatical and ungrammatical sentence because they only produce grammatical sentences.
[42:47] Are there other universal concepts in language like universal grammar? So the universal concepts in language are the things that are in every language on earth, every natural language. If there is something that almost all languages possess but some languages don't, then it's not universal.
[43:12] There's a whole area of linguistics that is dealing with finding things that are universal in human languages. And as far as I know, there's a long list of those things. Have you used any machine learning or neural language processing in the decipherment of the Voynich?
[43:34] We did use machine learning, but not neural methods. No. The reason we didn't use neural methods for decipherment, and I think you have some experience already with these neural bots, is that they can make sense of everything.
[43:53] So, for example, Google Translate, if you give it something that doesn't make sense, it will still translate it into something that does. Obviously, we don't want something like that to be applied to Voynich manuscript, because we want to really know what's really there, not how to make sense out of it in some way, right?
[44:16] There isn't some way of identifying what makes sense and what doesn't in the same way that for some sentences you can identify if it's grammatically correct or incorrect, like that program has not been completely explicated like you mentioned with Chomsky, but maybe there's huge progress there. Is there not progress in saying this sentence makes sense or not? That is much harder to do. There is progress, yeah, every year there is progress, but we are still far from reaching that point.
[44:47] You've seen that there's ChatGPT and there's OpenAI's GPT-3. What's your opinion of them? Are you excited by them? Are you surprised by them?
[44:59] I'm excited that those tools become available, but I'm also kind of worried that people are too enthusiastic about them. And for me the problem is that they are basically what somebody called parrots. They're parrots that have heard a lot of language being spoken, everything that was ever written.
[45:24] And they are very good at repeating, putting together those sentences and words together. But there is no real understanding underneath. Those systems cannot tell us why they think these things that they say are true. They're basically repeating the words that have been written somewhere and rearranging
[45:49] To be fair, most people when they're putting out something that's creative, they're just repeating what they've seen and they're mixing it up and they believe it to be absolutely new. And also, just so you know,
[45:59] There is something creative about mixing up and then presenting it. And furthermore, most people, maybe even all of us, we don't know the motivations, like we'll confabulate some reason for why we created so-and-so. Like that's why the whole field of psychoanalysis came about, because we don't know why we do what we do. We make up some reason. So why does it matter that the computer doesn't know why it does what it's doing and that it's, quote-unquote, repeating? Well, mixing, let's say mixing, what's old?
[46:30] I don't think it matters if you're interested in a computer producing art, like writing a song or painting a picture. But it does matter if you rely on the computer to tell you what the truth is, right? Because if you don't, if somebody cannot explain to you why they believe something is true, then how can you trust them? These are deep questions.
[46:58] What I find remarkable is that you can just even a simple program asking it to code this in Python, code something that does this in Python, code something that does this in AutoHotKey or whatever it may be, and it does it or does it 90% the way there.
[47:14] So...
[47:35] Well, programming is a bit different story, because you can actually test programs. So if you ask whether it's a human or it's a bot to write a program, you can
[47:50] you provide a specification, then you can go through the testing, the test procedure and find out if that program really does what it does. So we don't actually have to trust anything, we can just test it. But if we don't have time to test it, then I would be wondering whether it's a good idea to depend on such a program.
[48:18] So going back to the Voynich, have you thought about if it's composed of at least one language, like maybe there are multiple?
[48:29] You know, it could be a lot of things there. You can make these encryption systems as complicated as you wish. So it's all possible. There is no limit. There will be no limit where we can say, well, we tried everything and now we know it doesn't make sense. So it must be some kind of a joke or some kind of random generator.
[48:59] But what is fascinating about Voynich is that we can use it to actually create new things, right? So like with Dora Bella, we take the cipher and we create a melody, right? And many people take Voynich and they produce decipherments that are like their own pieces of art, like their own books.
[49:28] The only problem is that everybody produces a different one, so none of them can be actually correct. But it is still a creation, so I think that is very good about Varnish that it exists.
[49:43] Have you thought about Voynage from less of a computational perspective and more just from a human motivation one? What the heck is this about? Why would someone go through such lengths to decipher this? Or maybe it's not even lengths. Like you mentioned, it could be something trivial. We're just overlooking. Like what other theories come up in your mind? Just surmising, just conjecture.
[50:06] Yes, so one of the more interesting theories that I've encountered actually comes from this US expert on decipherment.
[50:18] In the end, he said that he thinks this is an artificial language. Somebody created an artificial language and wrote that Voynich manuscript in that language. Well, if that's the case, then it would be very difficult to decipher it because we don't know the principles of that language. It could be a language that is completely unpronounceable. It's just a sequence of symbols.
[50:48] What else have you heard that is at least somewhat convincing? Maybe this one's at the top, but is there a second?
[51:12] You know, anybody can look at those illustrations, they're on the web, right? And if you look for them for a long time, sometimes I think this somebody was not quite... It wasn't a work of an expert, it was a work of somebody who actually didn't know what they were doing and just tried to create something like what they saw before in other codices, in other books.
[51:42] a little bit like a neural language model that just looks a lot of things, sees a lot, reads a lot and then produces something that looks like it shouldn't make sense but it doesn't. That's interesting.
[51:57] We know that those language mouths can be tricked to produce texts that just seem to make sense but are complete nonsense, right? For example, why it's good to eat crushed glass, right? We will give you all the reasons for that, why it is good to eat crushed glass.
[52:13] When it comes to the Dorabella cipher, there were some other people who came up with decipherments. I'm gonna read some right now. And then there's others, like, why am I very sad? And so on. I'm sure you've heard these.
[52:33] I've seen this before but it always makes me laugh when I hear it. For me it's complete nonsense. Why? Well, it is nonsense to imagine that a distinguished English composer would write something like that to his love interest.
[52:58] Imagine if you were to decipher Voynich. What would be next for you? No more ciphers or do you have your eye on another one? You know, I think this happened. I mean, Voynich has not been decrypted but there was a very interesting decipherment recently of actual cipher which was called Zodiac Cipher. I don't know if you've heard of it.
[53:27] And that is actually correct, right? That decipherment is not fake. It is actually a correct decipherment. So I would probably ask that person about their feelings, like how they feel about cracking that cipher. Is it like a complete bliss or is it like some kind of disappointment?
[53:49] You know, I put so much work into it and then I find that this text is actually kind of, you know, not interesting at all. It's like some kind of deranged mind writing it. So, yes, you know, there are one kind of tragedy is you don't achieve your goal and the other tragedy is if you do achieve your goal.
[54:12] Yeah, that's interesting. Let's get philosophical here. To me that means that you have to enjoy the process more than the state. Even though there's some end state and that's supposedly driving the process, you have to fall in love with the process because you may, if you're lucky and maybe unlucky, reach that state.
[54:31] Yeah, absolutely. And this is something that I do feel about problems in computational linguistics that I love doing this stuff and I just would be able to do this, you know, for free.
[54:49] because it's such huge fun to do this. But Voynich was just one of the projects that I got interested in and I learned from the project, I got some experience from that project that I think made me a better scientist so that I can apply this experience to the projects that actually do have a solution.
[55:17] Well, right now we are very excited to be working on semantics, on lexical semantics, and we are proposing, you know, we are finding things that other scientists find may be controversial, right? But the huge advantage of the work that we do is that we can actually provide proofs, mathematical proofs of what we do.
[55:47] And this gives us the satisfaction of actually being certain that we are doing something right because we can prove it. Going back to loving the road more than where you're going, I feel the same with this podcast. It's about theories of everything in the physics sense. So my background is in mathematical physics. And a part of me, I feel like I'll be extremely disappointed if I encounter or if we discover as people, as scientists,
[56:16] the theory of everything. There is something that's terribly fun about learning it and investigating. I don't want it to be over. I don't think you have to worry about that. Personally, I think, you know, looking at how the universe is constructed, I'm pretty sure it has some built-in mechanism so that we can never actually figure it out completely. What gives you that intuition?
[56:47] Well, when you talk about theories of errors, you obviously talk to physicists and that deal with quantum mechanics and things like that.
[57:04] and how there are certain principles that we can prove that we'll never know the truth, right? Like we'll never know where the particular particle is, what is its exact location and speed and so on.
[57:20] And this is for me an indication that these things are constructed in such a way that we will never be able to crack them completely. Alright, well, that's hopeful but also dismaying. At least it's both and not one without the other. So, about the zodiac.
[57:46] If that's a substitution cipher, was it a substitution and transposition or just substitution? Oh yeah, it was substitution and transposition and it was a very tricky transposition too. Yeah, why is that? And would your method have worked on the zodiac one?
[58:05] No, the method would not work on Zodiac because the assumption of our methods is that we know where the words are. So in Voynich there are spaces between words and we made this assumption that this is not just to confuse but they are actually words, right? Now in the Zodiac cipher there was no spaces between words.
[58:28] So, although it is possible to kind of hypothesis where the spaces are, that method, the particular method would not work on Zodiac. And the method used to crack the Zodiac cipher, can that method or methods be used to help with the Voynich?
[58:51] Actually, you know, I don't think so. I think the key of the decipherment in that case was just finding the specific pattern of transposition. So it was not any kind of cool new theory that is general and can be applied to various things. It was just kind of a stroke of luck. Like trial and error?
[59:19] It is always a challenge there when you do actual decipherment, but what I mean is that there is no method behind it that can be generalized and applied to other things. How has AI, and maybe this is a term that you don't want to use, but how has AI aided your field? So instead of saying AI, then reference a specific model like GANs have changed my field because of some sort of supervised learning in the form of A, B, and C changed my field.
[59:48] Yeah, so you cannot avoid the word neural nowadays when you talk about language understanding. It's a powerful new tool and everybody is very, very excited about it, including myself.
[60:09] So of course it changed everything because the story of language processing is that it started from a kind of a symbolic processing and then moved into the machine learning stage and then evolved into the neural methods which we use nowadays.
[60:31] So what is exciting about it is that every few years you have a new revolution and new methods and we make constant progress to the point that some people think that the problem of language has been solved. But it's not the case. Sorry, that the problem of language has been solved? The problem of language understanding has been solved. That we can basically now have programs that will do every language related task that we want.
[61:01] And it's not true. Who thinks that that problem has been solved?
[61:07] Well, you know, when I read these articles about the neural bots that can, you know, write newspaper articles or compose symphonies or something that sometimes you get an impression that, well, we're done, right? We can just leave it all to the computers and they will do everything for us.
[61:32] But what I tell my students is that you really want to become, to be educated, to be somebody who cannot be replaced by a computer. And I guarantee you that they will never be able to replace the most important part of us, which is the creativity. Are you sure about that? What is it about human creativity that a machine can't replicate?
[62:01] By the way, I'm not being skeptical, I just don't know. I'm curious what your thoughts are since you're in this field. Well, that's exactly what you said.
[62:11] Machine cannot replicate. Creativity and replication are opposite things. Creativity is doing something that has not been done before. Of course, you can say, well, it's just kind of building on what was before, but it's not replicating. It's not parroting. It's creating something new based on a deeper understanding of things.
[62:36] There's this old joke of if you want to create an apple pie from scratch, you have to first create the universe. It's like, well, did you get it from the farm? No, I bought it from this. Okay. But even if you had it from the farm, did you grow the dirt? Did you? Well, yes. Okay. But did you make the cow and so on and so on. In a sense,
[62:53] Whatever we think of as new, it's so tricky, like it depends on what the heck are we defining as novel, as creative. And I'm sure if we could look into our brain with a certain amount of resolution and we had the correct model, if it even could be modeled computationally.
[63:08] But regardless, maybe there's some non-computational model, if it can even be modeled, quote-unquote, model. The point is that I imagine it's conceivable to me that what we think of as outputting something creative is something that is algorithmic. Like, I'm not set on this, but it's conceivable.
[63:24] And if that's the case, then I don't see why a computer can't do it. Now whether or not a computer can feel and understand what it's doing, like that's a separate problem. But the actual output, I don't see an in principle reason why it can't be done. And I'm telling you this as a romantic, like I don't want this to be done. But I see more and more, like aspects that we thought computers could not do.
[63:45] I would like that to be the case. I want to be convinced of that.
[64:14] Well, first of all, when you said that something cannot be done, you cannot demonstrate that something cannot be done. You cannot really demonstrate that something can be done by doing it, right? So I will not be able or I don't think anybody would be able to demonstrate that computers cannot do something.
[64:39] But I am a computer scientist. I've programmed a lot. I worked with computers a lot. And I know that the computers are good at doing repeatedly certain things and repeating patterns that already exist, right?
[65:00] You cannot have an algorithm to create something that does not exist, right? That is novel, that is meaningful. Of course, you can create novel things. You can create chaos, right? You can create a random generator and this sequence of randomly generated numbers is unique. Is it novel? No, because it doesn't make sense. Are you afraid of where AI may be or are you more hopeful?
[65:30] I think it's a serious issue and we have to think about it, you know, because the danger I see is that people will trust those programs too much. And we bail them and we are responsible for telling them what we want them to do. If we don't do this right, they may do surprising things that we never actually anticipated.
[65:56] I think the key thing is that we want these things to be transparent. We want to know if they tell us a statement, then we want to know why they think the statement is true. We want them to provide the proof of something that they state. Obviously, they are not at this level yet. For example,
[66:21] They can write basically history books, right? But we don't know whether they are hallucinating or is it actual facts they are talking about.
[66:33] So there must be some way of them providing evidence of what they are saying is true. Like put references when you make a statement. Exactly. So I've been talking to students recently about what is true. How can we decide if a sentence is true or false?
[67:02] And the fact is that, you know, some people say everything is relative. Some people think this is true and other people think this is true. What I want the students to do is to decide first what is the speaker, the author of the utterance, if they think it's true or not. And this is non-trivial.
[67:26] But if they can establish that the author of the utterance or sentence believes it's true, then it is true with respect to that person, right? So we can say this is a true statement according to this person. And it is then kind of clear that this is some kind of evidence based on somebody's belief.
[67:49] So I do believe we can tell whether a statement is true or false modulo the author of the statement.
[68:00] Except in the case of AI, like in the case of people we can because they have intentions. But AIs, no. Currently, no. Is there a subfield in computer science that's dedicated to this problem? How did the machine come upon this decision? Can it explain the reasons? Yeah, many people are working on that because many people have realized that this is what we need in order to be able to use those tools. And what's that field called or subfield?
[68:26] Is there a name for when you're specifically trying to pry open that black box and then pull out something that is understandable to us? Like how did it make the decision?
[68:54] The word I've heard uses interpretability. So you want to have a program that not just does the job, but is also interpretable. So we can interpret why it does the job as it does. So the current non-interpretability
[69:13] of AI. Is that what you see as its greatest threat? Or do you see that like you've heard strong AI and you've heard of the singularity and that machines may turn on humans or that other people may use like if you invert certain parameters then a drug that was that a machine developed to produce a drug that was helpful can be
[69:34] turn to produce a drug that's extremely potent and deleterious. Do you see the non-interpretability of machines as the greatest issue that we have right now or is somehow connected to all those other issues?
[69:46] I don't know if it's the greatest issue, but it's an important issue. Another important issue is the so-called bias, right? These language models are trained on texts that have been written by people that are biased, and they become biased themselves. Obviously, we don't want that to be guided by such kind of texts.
[70:08] There's a phrase that you wrote down, English orthography is not close to optimal. Correct. Can you explain firstly what orthography is and then take us through that phrase?
[70:20] Yes, so orthography is the way we write language. So English exists primarily as a spoken thing, but we also write it down, like as every language. And the orthography is the way we write down the sounds. And as you may know, English doesn't have a very good orthography.
[70:42] Well, it doesn't seem to be good because it's very hard to learn and people that learn English, they make a lot of spelling errors and even native speakers find it difficult to write down words that they speak. So Noam Chomsky had that kind of a statement that English orthography is near optimal, is close to optimal, even though it appears not to be.
[71:09] So we had the projects when he kind of showed that it actually is not optimal, it's not close to optimal, it could be much better. And so that's the essence of that paper. Why did Chomsky think that it was?
[71:32] Chomsky had very good reasons for saying what he said, but, you know, in science our job is to question everything, right? And that's what we did in that project. We wanted to question that statement which seems to be nowadays accepted as truth by everybody.
[71:58] and to show that, to provide evidence for that, we wrote programs and we did simulations and we published this to show that it is not actually optimal, it is not close to optimal, could be much better. Yeah, so basically that's the point here.
[72:21] What was Chomsky's reasons for suggesting it was optimal? Because as you pointed out, it seems on the face that it's clear it's not. Like the word tough is with an F, but it ends with GH. It seems clear that it's not. So Chomsky must have had some reasons, and like you mentioned, he had good reasons. What were they? And then what was his response to, if any, to your results?
[72:44] Yes, so Chomsky was, when he wrote this in the 60s, he was going against the consensus rate, which was that English orthography is bad. And he questioned that and he said, no, it's actually near optimal. It would take a lot of time to go into those arguments, which are reasonable. However,
[73:13] There's more to it, right? Everything can be interpreted in different ways. The main assumption that is not spoken is that our writing system in English is based on the history of English and other languages.
[73:35] For example, a lot of English at some point was very influenced by French about a thousand years ago and that influenced the spelling of English. Now, even if we could change the orthography of English to something better, if there is something better,
[73:58] then that wouldn't be practically possible because people are just used to the way as it is written right now. And besides, English is spoken in many different countries and those countries would never agree on a new system.
[74:15] So in a sense Chomsky was right about so-called morphological consistency, that words that have the same morphemes, which are pronounced differently, should have the same representation for the morpheme. That representation shouldn't change.
[74:34] But there's also something called phonetic consistency and you gave example of that and that is just not good, right? There are just too many arbitrary solutions that reflect the pronunciation as it was 500 years ago. For example, the word tough as you said was actually pronounced with a consonant at the end 500 years ago.
[74:58] There's morphological consistency, phonetic consistency, and then there's orthographical optimality. Can you place numbers on those? Like, you can say this language is 90% optimal orthographically and 50% morphologically consistent. Can you actually place numbers on them? Yes, so
[75:25] Let me give it a try. So, for example, Finnish is considered an extremely good orthography. It's completely consistent in all kind of aspects.
[75:38] Some languages are, like Croatian for example, the orthography was created under the principle, write as you speak. So that has this consistency that you can just, you never make spelling mistakes, you just write as you speak. Sorry, which language was based like that? That sounds interesting.
[76:05] It used to be Serbo-Croatian, now these are separate languages, but it still applies to it. Now Spanish, which many people are familiar with, is another type of language where you always know how to read something.
[76:26] You may still make spelling mistakes, but you will never pronounce a written word in the wrong way. So that's another type of consistency. English doesn't have either of those. You as a native speaker will probably make mistakes unless you have a spell checker, even though you know perfectly well how to pronounce a word.
[76:52] and me as a second language learner of English, I will encounter words that I just don't know how to pronounce. So it is definitely a problem in English, but other languages are even more difficult, like the Japanese orthographic system is even more difficult than English.
[77:12] I'm curious if English stands out as best or worst in some metric, and if so, which? For instance, I heard that English can convey a complex sentence second best, something like that, and Mandarin is first. You can think of it as a simple language, as one that a child may just come up with on their own.
[77:31] I don't know the actual terminology, I just heard this. And I heard that English is actually pretty great, it's second in the world, and Mandarin is best at that. But anyway, the point is I just heard this, so what is English great at and not great at?
[77:53] Yes, so English and Chinese have something in common which is that they are analytic languages. So morphology in English is very basic compared to languages like Spanish or Polish.
[78:07] In Chinese it is even more simple, there is no morphology at all. So in that sense these analytical languages reach some kind of maximum within that particular condition. I know that
[78:33] English is, if you compare things written in different languages, sometimes you see on products like 20 languages with the same message, the English text will probably be one of the shortest ones. So I think this is maybe something you're referring to that, that it can actually convey the same message with fewer letters or fewer symbols.
[79:02] Reminds me of this joke someone was translating. I think it's I think this actually happened I think it was from Hideo Kojima who's a video game creator, and he was on stage She speaks Japanese, and he says he goes it goes on for like 20 seconds 30 seconds the translator comes He says thank you You're like that's not what he's like just
[79:23] If you're lazy or you've forgotten, that's fine, but there's no way that's all of what he said. Yeah. Well, I actually lived in Japan for a while. So this is actually the issue of pragmatics, right? Human language is not just exchanging messages. There's a lot, for example, related to politeness. And in Japanese, you spend a lot of time just being polite in addition to passing a message.
[79:53] Ah, like San. San at the end of a person's name. Is that to connote I am lower than you or respect? Yeah, there's a lot more tools for expressing this kind of relationship in Japanese.
[80:05] Do you know who Larry David is from Seinfeld?
[80:29] Larry David, the creator, he said that when Caesar was being assassinated by Brutus, that Brutus said something with the two and then Larry David said that was too informal for an assassination.
[80:47] To end this, you did your master's thesis on a theoretical evaluation on selected backtracking algorithms. So how has your perspective on that subject since the writing of that thesis changed? How has it developed?
[81:01] Yeah, so this is part of what's called artificial intelligence, but it's a very formal thing called constraint satisfaction. And what I liked about it is that you can actually prove something. Unlike in pure linguistics, you can never prove anything. You can just argue about it and then some people will disagree.
[81:26] But I didn't stay in that area because I wanted to work with language. I love language. And then I found that in language it's very hard to prove anything because there are always exceptions. But now after all those years I'm coming back to the point that I think that I can actually use the language of mathematics to describe human language. And I find this very exciting. So I hope to be able to prove things
[81:56] What's one of the more out there theories of the Voynich manuscript as to what it's about, what it contains, information on that you don't believe in but you find interesting, maybe even plausible?
[82:19] So there was this hilarious paper, somebody trying to show that the language of Varnish is actually Lojban. I don't know if you've heard about it. It's an invented language.
[82:34] And this paper showed to me that you can actually provide evidence for anything, for any language. If it's a lodge band that was invented in the 20th century, and somebody wrote Voynich manuscript in the 15th century in that language, then that means you can basically argue for anything.
[82:56] And that again shows the value of if you can actually prove something. And in the case of the Voynich manuscript, the proof would be actually in the pudding, which means deciphering it into some kind of text that makes sense. Do you think it will be deciphered in the next five years? I don't know. I hope it will be. I hope it will, but I wouldn't bet on it.
[83:25] You know, people said in history, people often said something will never be done and it was done. When I first heard about the zodiac cipher, I thought, no, that's never going to be the cipher because it's probably just random noise. And then it turns out that it was deciphered. So that's a lesson for us.
[83:51] Meaning, in the case of the zodiac, you thought that it was gibberish, that he didn't actually write anything. It's not something that was deciphered, it's just symbols. Yeah, I thought it was just the intentional gibberish to confuse people. This is similar to the people that say that Voynich is a joke, right? They make the same assumption that somebody just did it to confuse people.
[84:16] Well, thank you for spending about two hours with me or an hour and a half on what is potentially a joke, but we hopefully not. Take care, man. It's good to speak with you. Thank you. It was fun talking to you. Bye.
[84:33] The podcast is now concluded. Thank you for watching. If you haven't subscribed or clicked on that like button, now would be a great time to do so as each subscribe and like helps YouTube push this content to more people. Also, I recently found out that external links count plenty toward the algorithm, which means that when you share on Twitter, on Facebook, on Reddit, etc.
[84:54] It shows YouTube that people are talking about this outside of YouTube, which in turn greatly aids the distribution on YouTube as well. If you'd like to support more conversations like this, then do consider visiting theories of everything dot org. Again, it's support from the sponsors and you that allow me to work on toe full time. You get early access to ad free audio episodes there as well. Every dollar helps far more than you may think. Either way, your viewership is generosity enough. Thank you.
[85:47] Think Verizon, the best 5G network is expensive? Think again. Bring in your AT&T or T-Mobile bill to a Verizon store today and we'll give you a better deal. Now what to do with your unwanted bills? Ever seen an origami version of the Miami Bull?
[86:00] Jokes aside, Verizon has the most ways to save on phones and plans where you can get a single line with everything you need. So bring in your bill to your local Miami Verizon store today and we'll give you a better deal.
View Full JSON Data (Word-Level Timestamps)
{
  "source": "transcribe.metaboat.io",
  "workspace_id": "AXs1igz",
  "job_seq": 8884,
  "audio_duration_seconds": 5177.67,
  "completed_at": "2025-12-01T01:16:03Z",
  "segments": [
    {
      "end_time": 20.896,
      "index": 0,
      "start_time": 0.009,
      "text": " The Economist covers math, physics, philosophy, and AI in a manner that shows how different countries perceive developments and how they impact markets. They recently published a piece on China's new neutrino detector. They cover extending life via mitochondrial transplants, creating an entirely new field of medicine. But it's also not just science they analyze."
    },
    {
      "end_time": 36.067,
      "index": 1,
      "start_time": 20.896,
      "text": " Culture, they analyze finance, economics, business, international affairs across every region. I'm particularly liking their new insider feature. It was just launched this month. It gives you, it gives me, a front row access to The Economist's internal editorial debates."
    },
    {
      "end_time": 64.514,
      "index": 2,
      "start_time": 36.34,
      "text": " Where senior editors argue through the news with world leaders and policy makers in twice weekly long format shows. Basically an extremely high quality podcast. Whether it's scientific innovation or shifting global politics, The Economist provides comprehensive coverage beyond headlines. As a toe listener, you get a special discount. Head over to economist.com slash TOE to subscribe. That's economist.com slash TOE for your discount."
    },
    {
      "end_time": 83.319,
      "index": 3,
      "start_time": 66.22,
      "text": " You really want to be educated to be somebody who cannot be replaced by a computer. And I guarantee you that he will never be able to replace the most important part of us which is the creativity. Are you sure about that?"
    },
    {
      "end_time": 107.005,
      "index": 4,
      "start_time": 86.135,
      "text": " Today we talk about the Voynich manuscript, the zodiac cipher, specialized non-human languages, the Dorabella cipher, ciphers in general, and how to go about decrypting them. Greg Kondrak is a professor of artificial intelligence, investigating natural language processing in a way that it relates to language reconstruction. He's employed these machine learning techniques to attempt what can be considered the most"
    },
    {
      "end_time": 132.585,
      "index": 5,
      "start_time": 107.005,
      "text": " objective decipherment of the Voynich manuscript, which is a considerably rare illustrated codex handwritten in an otherwise unknown writing system. It's evaded any attempt to decode it since the Italian Renaissance. The Voynich manuscript is written on an expensive vellum, and it's just one of the several puzzles that Professor Kondrak has tackled, another example being the Dora Bella cipher. Greg Kondrak is also known for proving Chomsky's statement wrong, the statement that English orthography is close to optimal."
    },
    {
      "end_time": 162.039,
      "index": 6,
      "start_time": 132.585,
      "text": " My name is Kurt Jaimungal. I have a background in mathematical physics. This podcast is called Theories of Everything, is dedicated to the exploration of theories of everything from a theoretical physics perspective, but as well as exploring the role consciousness has to the fundamental laws of nature. Each sponsor, as well as the patrons, improves the quality of the videos drastically, it improves the depth, it improves the frequency, and it goes toward paying the staff, for instance, someone who's editing this full-time right now, and then we have an operations manager. In that vein, I want to thank today's sponsor, Brilliant."
    },
    {
      "end_time": 191.288,
      "index": 7,
      "start_time": 162.039,
      "text": " If you're familiar with Toe, you're familiar with Brilliant, but for those who don't know, Brilliant is a place where you go to learn math, science, and engineering through these bite-sized interactive learning experiences. For example, and I keep saying this, I would like to do a podcast on information theory, particularly Chiara Marletto, which is David Deutsch's student, has a theory of everything that she puts forward called constructor theory, which is heavily contingent on information theory. So I took their course on random variable distributions and knowledge and uncertainty,"
    },
    {
      "end_time": 219.667,
      "index": 8,
      "start_time": 191.596,
      "text": " in order to learn a bit more about entropy. Now there's this formula for entropy, essentially hammered into you as an undergraduate, which seems to have fallen from the sky. However, when you take Brilliant's course, it was the first time that I could see that it's an extremely clear and intuitive formula. That is to say that it would be unnatural to define it in any other manner. Visit brilliant.org slash TOE, that is T-O-E, to get 20% off the annual subscription. And I recommend that you don't stop before four lessons."
    },
    {
      "end_time": 242.278,
      "index": 9,
      "start_time": 219.667,
      "text": " I think you'll be greatly surprised at the ease at which you can now comprehend subjects you previously had a difficult time grokking. At some point, I'll also go through the courses and give a recommendation"
    },
    {
      "end_time": 256.22,
      "index": 10,
      "start_time": 242.278,
      "text": " Professor, what is the Voynich manuscript and why is it important?"
    },
    {
      "end_time": 287.534,
      "index": 11,
      "start_time": 257.773,
      "text": " The Voynich manuscript is a medieval manuscript written in some code that was actually confirmed to be actually a genuine manuscript from the 15th century. It has illustrations, it has text. The script or the alphabet is unique for the manuscript. It has not been deciphered yet."
    },
    {
      "end_time": 316.613,
      "index": 12,
      "start_time": 289.206,
      "text": " Why hasn't it been deciphered? We don't know why it hasn't been deciphered. Some people say it's because there's nothing to decipher. It's just some kind of a joke. Or other people say that the encoding system is very complicated. And other people have other theories about it. Why do people say it's a joke and what do you make of that?"
    },
    {
      "end_time": 333.797,
      "index": 13,
      "start_time": 318.797,
      "text": " Well, personally, I don't think, as I said, that this is a testable scientific hypothesis. You can guess that it's a joke, but it's very hard to prove something like that."
    },
    {
      "end_time": 360.742,
      "index": 14,
      "start_time": 333.797,
      "text": " Obviously, it cost a lot of money to produce this kind of manuscript in the Middle Ages, so that's one reason. I don't think it was a joke. But also, there's some work showing that there are some statistical properties that indicate there is actually a language that is being encoded."
    },
    {
      "end_time": 376.92,
      "index": 15,
      "start_time": 360.742,
      "text": " I've seen several documentaries on the Voynich manuscript, several, so not just one or two, there's a variety of them and then there's also a whole subreddit, so a whole reddit group dedicated to solving this. Why is this so difficult compared to other ciphers in the past? Like, what is it about this?"
    },
    {
      "end_time": 405.538,
      "index": 16,
      "start_time": 378.66,
      "text": " Well, the difficulty here is that the first difficulty is that we don't know what language it is. Very often we have ciphers, we have messages, but we know what the message is. For example, the German Enigma machine was a very hard cipher, but we knew that it was German being encoded, which made it quite easier."
    },
    {
      "end_time": 434.991,
      "index": 17,
      "start_time": 405.538,
      "text": " And the second thing is that we don't know the script or the alphabet. If that alphabet was used for something else, we would know how to speak it, how to pronounce these words. And third, the problem is that this is just a unique document. There's no other document that is written in this way. So it's all self-contained and we don't even know, we're not exactly sure where it was even produced."
    },
    {
      "end_time": 452.363,
      "index": 18,
      "start_time": 434.991,
      "text": " It's strange that there's no other document that's like that. Firstly, that the script is different, like you mentioned that we don't know how to pronounce the words. Though, even in the Enigma code, it's not as if the words were meant to be pronounced and then understood like that. You could have translated it to zeros and ones, and it still would have been difficult. It still would have been the same problem."
    },
    {
      "end_time": 473.575,
      "index": 19,
      "start_time": 452.363,
      "text": " So, why does the fact that we don't understand how to pronounce the alphabet make a difference? Why can't we just say, look, this letter appears, let's call that letter 28, let's call that other letter 50? Well, yeah, well, so we have letters, but we have words. Words are made of letters. That's the truth in every language, human language, that you have words that are made of"
    },
    {
      "end_time": 495.452,
      "index": 20,
      "start_time": 473.575,
      "text": " phonemes that usually every phoneme has its own letter and then there are certain utterance regularities for example there are certain words that can be pronounced and other words that cannot be pronounced and that varies from language to language"
    },
    {
      "end_time": 524.65,
      "index": 21,
      "start_time": 495.452,
      "text": " So the moment you know what language we're speaking, we usually know the lexicon of that language, typically a few thousand words, and they have different frequencies, so that makes it a lot easier to decipher anything. Even if you just replace every word with a number, that will still give you some information about the frequency of the words, as long as you do something to compare it to."
    },
    {
      "end_time": 536.476,
      "index": 22,
      "start_time": 524.65,
      "text": " I know there are different kinds of ciphers, and I believe the simplest is called the substitution cipher. That's like when you're a kid and you just replace the letter A with C and so on. What other kinds of ciphers are there?"
    },
    {
      "end_time": 555.93,
      "index": 23,
      "start_time": 537.773,
      "text": " So, principally there are substitution ciphers which is replacing symbols and transposition ciphers which is mixing them up, changing their sequence. And every cipher is a combination of those two methods."
    },
    {
      "end_time": 582.449,
      "index": 24,
      "start_time": 555.93,
      "text": " Now, we don't know what is Voynich. We don't know if it's just a simple substitution or is it a substitution combined with transposition. The paper that we wrote assumed that the transposition was involved and we tried to come up with a general method of breaking this kind of ciphers that combine substitutions and transpositions."
    },
    {
      "end_time": 611.34,
      "index": 25,
      "start_time": 582.449,
      "text": " I completely glossed over the history. Actually, not that I glossed over, I had forgotten to even ask you to give the audience an indication as to where was this document found? Why does it matter? Why is it that scholars even care about it? I'm sure there's plenty about the past that we don't know about, so why is it that many scholars, and not only scholars, groups of people, teams of people on Reddit are poring over trying to figure out what the heck is this saying? Where was it from? Can you give a bit of the history of it, please?"
    },
    {
      "end_time": 641.493,
      "index": 26,
      "start_time": 612.671,
      "text": " Yeah, so it was found basically by the person whose name was Voynich, that's why it's called Voynich. It was a kind of a collector in the beginning of the 20th century. And since then the manuscript has been tracked back to 17th century to the court of a Roman emperor. And that's where it kind of ends, that's where the trail ends."
    },
    {
      "end_time": 657.892,
      "index": 27,
      "start_time": 641.493,
      "text": " But not long ago there was a chemical analysis done on the manuscript, so we're certain that it was actually written in the 15th century. So there is no doubt about that anymore."
    },
    {
      "end_time": 686.186,
      "index": 28,
      "start_time": 658.473,
      "text": " Now, you also asked the other question, which is why are people so fascinated with it? I think the main reason is that we are fascinated by puzzles. If we see something that seems to be a message, we want to know what that message is. We want to decipher it. And Voynich is like Mount Everest of cryptographic puzzles."
    },
    {
      "end_time": 714.445,
      "index": 29,
      "start_time": 687.705,
      "text": " It was actually studied by many years by people that were working for the US government and that were professional breakers, code breakers that broke codes during the Second World War. In the case of the Enigma code, at least we could presume that it has something to do with government secrets and war. Whereas with this, do we have any indication as to what the subject matter is?"
    },
    {
      "end_time": 732.329,
      "index": 30,
      "start_time": 716.049,
      "text": " Well, we have the illustrations. That's what makes it really interesting is we have plants, people, medicines, all kinds of strange illustrations that seem to be related to the text."
    },
    {
      "end_time": 755.998,
      "index": 31,
      "start_time": 733.046,
      "text": " So this is not just a text like some kind of ancient inscription, but it is actually a codex, which is like a compendium of some kind of knowledge, which was quite common in Middle Ages. And is it quite common among a particular group that speaks a particular language? And thus we could figure out, okay, with this amount of probability, it's from this culture or this group of people."
    },
    {
      "end_time": 778.882,
      "index": 32,
      "start_time": 757.534,
      "text": " Yeah, so many of those codices are in Latin, which was the language of literature and science in Middle Ages. And you can actually find such books very similar looking from Middle Ages that are written in Latin. The best guesses about"
    },
    {
      "end_time": 794.548,
      "index": 33,
      "start_time": 779.155,
      "text": " Is it controversial that it's in the 15th century and in northern Italy?"
    },
    {
      "end_time": 812.773,
      "index": 34,
      "start_time": 794.667,
      "text": " Almost everything that you can say about Voynich manuscript is controversial. So this is what I consider a reasonable guess, but pretty much everybody has a different opinion where this manuscript comes from and what language it represents."
    },
    {
      "end_time": 836.305,
      "index": 35,
      "start_time": 812.773,
      "text": " Speaking of your research, now would be a great time to tell the audience what is it that you study and then how the heck did you become interested in the Voynich manuscript, other than a general curiosity for solving puzzles."
    },
    {
      "end_time": 865.572,
      "index": 36,
      "start_time": 837.637,
      "text": " Yeah, so I'm a computational linguist, I'd say. So I work at the computer science department at the University of Alberta in Canada. And I work on language in general, making computers understand human language, making writing programs that can process human language and do the work for us, because there's so much text that is available that nobody can actually read all of it."
    },
    {
      "end_time": 888.507,
      "index": 37,
      "start_time": 865.572,
      "text": " About the decipherment, the person that made this really interesting for me was Professor Kevin Knight from University of Southern California and he worked on various interesting projects and I saw his presentation of Voynich manuscript about 10 years ago, I'd say."
    },
    {
      "end_time": 912.807,
      "index": 38,
      "start_time": 888.507,
      "text": " And it was related to what I was doing. What Kevin, Dr. Knight said basically is that everything we do with language is a kind of decipherment. Because language is typically written, that's what we work with, a written language. Even if it's spoken, we work with some form of it that is a transcription."
    },
    {
      "end_time": 937.619,
      "index": 39,
      "start_time": 912.807,
      "text": " With respect to the images in the Voynich which will overlay on screen, is it strange to depict what they depict? The pictures? Yeah, what are the pictures of and is there something unique about them?"
    },
    {
      "end_time": 954.923,
      "index": 40,
      "start_time": 939.036,
      "text": " Yes, so generally if I tell you that it depicts plants, for example, that's not strange because that's what medieval codices do. But if you look closely at those plants, if you're an expert in plants, I'm not, but"
    },
    {
      "end_time": 974.889,
      "index": 41,
      "start_time": 955.674,
      "text": " experts on plants look at them and say well these don't really look like real plants. These look like made up plants. And then there are pictures of people that some kind of many naked bodies taking baths in some kind of green water"
    },
    {
      "end_time": 995.162,
      "index": 42,
      "start_time": 974.889,
      "text": " In general, pictures of people are not strange, but those particular pictures are really strange and they are unlike anything else that we know from middle ages. They're strange because they're naked or they're strange because they're depicted alongside plants? What is it specifically that's unique?"
    },
    {
      "end_time": 1018.37,
      "index": 43,
      "start_time": 996.305,
      "text": " There's change because it's not clear what they're depicting. Are they depicting people taking baths? That I don't think was very common in Middle Ages. Or why are those figures, for example, all women, and why they are naked, right? In the 15th century that was not a normal thing to put in a book."
    },
    {
      "end_time": 1043.063,
      "index": 44,
      "start_time": 1018.37,
      "text": " There are also other things like zodiac signs or pictures of planets that you would expect to be quite normal because we even know what those zodiacs are. But it's difficult to connect the words that describe those pictures to the actual pictures."
    },
    {
      "end_time": 1071.459,
      "index": 45,
      "start_time": 1044.889,
      "text": " In one of the documentaries that I was watching about this, they said that a remarkable element is that there are extremely few errors. In a writing of this size, they would expect that there are some errors and then maybe you smudge it out or however they correct errors, and there's some way of detecting the frequency of the errors in a document. And most documents have, let's say, error percentage 2, 2%. They make an error every 2 out of 100 words."
    },
    {
      "end_time": 1095.606,
      "index": 46,
      "start_time": 1071.459,
      "text": " Yeah, so first of all, I did not work with the actual text. I worked with the transcription that somebody made. But it is, from what I know, it is true that there is very few corrections."
    },
    {
      "end_time": 1119.531,
      "index": 47,
      "start_time": 1096.084,
      "text": " My personal opinion that this may indicate that whoever was writing this, copying it, did not understand what they were writing. You usually make corrections if you write something and you say, oh, that's not what it should be, right? But if you write something in a language that you're totally unfamiliar with, you won't be able to notice that. That's interesting. In that case, that would imply that there's another copy."
    },
    {
      "end_time": 1140.759,
      "index": 48,
      "start_time": 1119.77,
      "text": " No, that means that there were people that maybe were copying the text into the manuscript because you would expect there was some draft that they were copying from because the manuscript itself was very expensive to write on."
    },
    {
      "end_time": 1162.261,
      "index": 49,
      "start_time": 1140.759,
      "text": " That's interesting. Then that means that it's extra difficult to decipher it because there are errors. We don't know this, but that would mean that there's more than the average amount of errors. What's the difference between in-ciphering, so creating a cipher out of something, and then encryption?"
    },
    {
      "end_time": 1187.108,
      "index": 50,
      "start_time": 1164.172,
      "text": " Yeah, I'm not sure if there is a difference. You know, if you really try to find the difference, you could say that encryption is like you do encryption of a text, but you don't really know how it is done. You know, you apply some encryption program, whereas in ciphering implies that you go like letter by letter, look up the key and then cipher each letter separately."
    },
    {
      "end_time": 1200.333,
      "index": 51,
      "start_time": 1187.927,
      "text": " But in general I think it's pretty much the same thing. What were the different methods used to decipher the Voynich manuscript? Not just by you, but by others. What techniques do they employ?"
    },
    {
      "end_time": 1226.015,
      "index": 52,
      "start_time": 1202.312,
      "text": " Well, that's kind of difficult to say because none of those methods actually worked. The decipherment has not been achieved in spite of many claims to the contrary. So there isn't really any algorithm involved. It's mostly based on people's intuition and theories."
    },
    {
      "end_time": 1242.022,
      "index": 53,
      "start_time": 1226.015,
      "text": " What are some of your attempts to decipher? Can you go through the successes and failures?"
    },
    {
      "end_time": 1270.93,
      "index": 54,
      "start_time": 1244.036,
      "text": " Yes, so in our project, our assumption was that the first thing to basically to start with is to find out what language this is written in. If we don't know what the language it is written in, then there's no way to decipher it. So we devised some methods of detecting, identifying the language of the cipher even without deciphering it."
    },
    {
      "end_time": 1288.37,
      "index": 55,
      "start_time": 1270.93,
      "text": " and we used a large sample of about 400 languages and out of those 400 languages we assigned like a number, a score to each language in terms of the probability that this language is the language of the manuscript."
    },
    {
      "end_time": 1302.773,
      "index": 56,
      "start_time": 1288.37,
      "text": " How much higher was it than the second and third place?"
    },
    {
      "end_time": 1330.06,
      "index": 57,
      "start_time": 1304.445,
      "text": " It was a clear difference, I would say, a significant difference between the second run in the list. So that was quite striking. Were you able to get any other historical documents that are written in Hebrew that have a similar art style and are of similar length just to see, well, is this common? Is this a common practice to the people who write in Hebrew or is this aberrant? Is this extremely unique?"
    },
    {
      "end_time": 1353.353,
      "index": 58,
      "start_time": 1331.493,
      "text": " So the Hebrew manuscripts exist and they were written throughout Middle Ages in Hebrew by the Jewish scholars. And I'm not the only person that hypothesized that this was actually coming from the Jewish scholar community."
    },
    {
      "end_time": 1382.261,
      "index": 59,
      "start_time": 1353.353,
      "text": " Now, nobody used this kind of particular script, but this script does have some similarities to Hebrew script. For example, I actually don't speak Hebrew, or I don't know much about it, but I know that the Hebrew script does not include letters, sorry, include vowels, which makes the words shorter."
    },
    {
      "end_time": 1406.749,
      "index": 60,
      "start_time": 1382.261,
      "text": " And this is what we observe in Voynich that the words are quite short. And then the number of different symbols suggest that it is something like a substitution cipher because the number of symbols is similar to the number of phonemes in a typical language. So now that you have at least potentially identified the language, what's the next step?"
    },
    {
      "end_time": 1433.285,
      "index": 61,
      "start_time": 1408.558,
      "text": " Yeah, a very good point. So, the next step is obviously try to match every symbol to a different letter of the Hebrew alphabet. And that usually is easy. Breaking simple substitution ciphers is easy. But it doesn't work in this case. It does not produce any sensible decipherment."
    },
    {
      "end_time": 1461.203,
      "index": 62,
      "start_time": 1433.285,
      "text": " So we came up with this hypothesis that the letters within words are actually transposed to make it more difficult to decipher. And when you kind of move the letters around it becomes very difficult to decipher it. So we came up with a method that could handle this kind of transposition within words and we tested it on other languages and it worked very well."
    },
    {
      "end_time": 1483.439,
      "index": 63,
      "start_time": 1462.619,
      "text": " However, when we apply this to the Vonage manuscript, it still does not produce any kind of readable decipherment. When you're testing it with other languages, are you testing it with encipherments that you contrive or are you testing it with ciphers that already exist from those other languages?"
    },
    {
      "end_time": 1507.978,
      "index": 64,
      "start_time": 1484.906,
      "text": " No, we tested it on a mass scale with synthetic ciphers, so computer-generated ciphers, but these were generated from the actual text in those languages. You mentioned that there's two kinds of ciphers, at least so far, so there's substitution and then transposed, or transposition. What else is there? So Pig Latin, where you just add some words, is that considered transposition?"
    },
    {
      "end_time": 1524.241,
      "index": 65,
      "start_time": 1510.162,
      "text": " No, I think piglining is like a game. What I'm saying is that it seems like there are other methods that exist, even if they're silly. So there's transposition, there's substitution. Is that primarily it or is there a seldom third?"
    },
    {
      "end_time": 1553.302,
      "index": 66,
      "start_time": 1526.084,
      "text": " Well, I would say even Piglet and you can express it probably as some kind of substitution and transposition, right? So these serious ciphers like Enigma is basically, again, a combination of substitution and transposition. Actually, Enigma is just pure substitution, really. There's no transposition there, except that, of course, the spaces between words are removed."
    },
    {
      "end_time": 1574.514,
      "index": 67,
      "start_time": 1554.838,
      "text": " Well, that's disappointing. You're like, okay, great. I've made a headway. I found out that it's Hebrew. At least you're somewhat confident it's Hebrew. Then you say, well, okay, let me devise some way of deciphering any substitution plus transposition combination. It works on other ciphers. Great. Let me apply it here. Doesn't work. So now what are you thinking and what's next?"
    },
    {
      "end_time": 1600.725,
      "index": 68,
      "start_time": 1575.64,
      "text": " Well, first of all, I'm not confident that it is actually Hebrew. All I can say is that out of those 400 languages that we had samples of, this is the one that got the highest score. So if I had to pick one of those 400, then I would pick Hebrew. But the language may actually not be in that 400 sample. There's thousands of languages in the world."
    },
    {
      "end_time": 1610.469,
      "index": 69,
      "start_time": 1601.783,
      "text": " and in addition it may not actually be actually any human language. Some people hypothesize that it's a made-up language like Esperanto."
    },
    {
      "end_time": 1640.572,
      "index": 70,
      "start_time": 1611.834,
      "text": " So, of course, we were excited to see some kind of clear preference for one of the languages. And we applied a kind of a scientific methodology to it. So we reported those results and they are replicable. If somebody else applies this to that sample, they will find exactly the same thing. But that doesn't mean that this language is actually Hebrew."
    },
    {
      "end_time": 1670.708,
      "index": 71,
      "start_time": 1643.166,
      "text": " So, yeah, if I was really convinced that this was Hebrew, I think the next thing I would have to do is to actually learn Hebrew, because that would be the only way to decipher that complicated manuscript full of errors. But of course, I have a lot of other projects to do, so I'm not going to study Hebrew for that purpose."
    },
    {
      "end_time": 1690.811,
      "index": 72,
      "start_time": 1670.708,
      "text": " But there are many people that know Hebrew and I'm sure if this was really Hebrew they would be able to decipher it themselves. So if someone watching speaks Hebrew and is a computer scientist and they also want to help, what should they do? Contact you? Or is there some program that they run?"
    },
    {
      "end_time": 1718.968,
      "index": 73,
      "start_time": 1691.34,
      "text": " Some of those people were actually experts in Hebrew and in computers and in ciphers and even they could not make any progress. So then why do you think that if you were to learn Hebrew it would help?"
    },
    {
      "end_time": 1746.476,
      "index": 74,
      "start_time": 1720.503,
      "text": " No, I said that if I really was 100% sure that it's Hebrew, then that would definitely help to know Hebrew, right? My work is from a point of view of a computer scientist, not from a point of view of a linguist or a cryptographer. So it's not as simple as saying identify the language, then suggest the different rules. So that is the substitution slash"
    },
    {
      "end_time": 1763.302,
      "index": 75,
      "start_time": 1747.005,
      "text": " Transposition Combination."
    },
    {
      "end_time": 1787.227,
      "index": 76,
      "start_time": 1764.923,
      "text": " So, you know, as I said, the main value of Vonage Manosphere is that it forces you to come up with new methods that later may turn out to be useful for other things. What we came up with is a methodology for doing this and we proved it in our paper. It works."
    },
    {
      "end_time": 1816.357,
      "index": 77,
      "start_time": 1787.227,
      "text": " kind of 95% accuracy. If you take a language, whatever language, pick any language, provided it's in that 400 sample, substitute letters for other symbols, scramble them, give it to our program, it will decipher it with 95% accuracy. So that is proven and that is a replicable thing that was published in the paper."
    },
    {
      "end_time": 1844.121,
      "index": 78,
      "start_time": 1818.404,
      "text": " But that is made on the assumption that it is actually an actual human language included in that set of 400 languages that is being used for that purpose. The fact that it doesn't work with Voynich suggests that Voynich is not written in Hebrew or any of those 400 languages. So you ended up testing it on all 400 languages."
    },
    {
      "end_time": 1850.538,
      "index": 79,
      "start_time": 1845.572,
      "text": " No, we tested it on a smaller subset. I think it was six languages."
    },
    {
      "end_time": 1877.381,
      "index": 80,
      "start_time": 1852.005,
      "text": " Is it just computationally too difficult to do all of them? Like it takes up too much time? Can you not just tell the computer to run with it? The problem is that you need to build what's called a language model for each language. And for that you need a lot of text. And the European languages, usually all of them have a lot of text, like people write newspapers in them."
    },
    {
      "end_time": 1900.913,
      "index": 81,
      "start_time": 1877.381,
      "text": " But if you pick languages that are very small or very exotic, then it's very difficult to find any electronic text written in those languages. So then it's very difficult to derive a language model from those texts because they are too small. That was the reason. It's quite the conundrum."
    },
    {
      "end_time": 1924.377,
      "index": 82,
      "start_time": 1902.79,
      "text": " At least you were able to develop some new techniques that can be applied to other problems. Have you made any other progress other than what you've just indicated? So I mentioned that we worked on another undeciphered text. How about we just transition to that and I'll come back and forth to the Voynich at different points. Why don't you tell us about the Dorabella cipher?"
    },
    {
      "end_time": 1950.93,
      "index": 83,
      "start_time": 1925.247,
      "text": " You know, for me, like I know there are people that just spend all their lives on Voynich, right? They're like obsessed with Voynich. But for me, it was just one project of many. So after Voynich, after we decided that we've done everything we could with it, we left it to other people to puzzle over. And there was another cipher that caught my attention, which is called the Dorabella cipher."
    },
    {
      "end_time": 1959.991,
      "index": 84,
      "start_time": 1951.852,
      "text": " And this was written in 20th century. We know who wrote it. It was an English composer"
    },
    {
      "end_time": 1990.555,
      "index": 85,
      "start_time": 1962.005,
      "text": " who wrote a postcard to his friend and that postcard was deciphered. It included a decipher which is about 80 characters in a kind of a strange script. And that postcard survived and was published after his death. The composer's name is Elgar. And nobody has been able to decipher that short text."
    },
    {
      "end_time": 2010.52,
      "index": 86,
      "start_time": 1992.244,
      "text": " So that's the Drabula cipher, another undeciphered cipher. Now our approach was that maybe, you know, this is not a text, any language text, maybe this is just music because that guy was a composer."
    },
    {
      "end_time": 2031.732,
      "index": 87,
      "start_time": 2010.52,
      "text": " So what will happen if we try to decipher into music? So we came up with algorithms and implemented programs that can take a short piece of music that is encoded in some way and decipher it."
    },
    {
      "end_time": 2048.012,
      "index": 88,
      "start_time": 2034.48,
      "text": " And that's what happened. We published a paper that at the end produces a kind of reconstruction of a melody that is our best guess"
    },
    {
      "end_time": 2065.657,
      "index": 89,
      "start_time": 2049.036,
      "text": " You said a peculiar statement about the Voynich manuscript, that it may not be a human language. Now do you mean to say a language that large groups of people speak or that it's an alien language, like it's not a homo sapien?"
    },
    {
      "end_time": 2096.51,
      "index": 90,
      "start_time": 2067.807,
      "text": " It could be a made-up language, right? So you know that actually there exist languages that were invented, like Esperanto, and many languages, like hundreds of languages have been invented. This could be one of those, a language that was never spoken by any community, but somebody just kind of made up a language, and it's possible. Anybody can do that, invent their own language."
    },
    {
      "end_time": 2112.671,
      "index": 91,
      "start_time": 2096.51,
      "text": " I see. So still it's a human language in the sense that it's made by a human, but it's not a human language in the sense that it's not spoken by many people or even known about. So it's not as if an alligator made up this language or some other extradimensional entity made up the language or divinely inspired."
    },
    {
      "end_time": 2139.241,
      "index": 92,
      "start_time": 2114.428,
      "text": " That's right. A better word is probably natural language. We say natural language is the language that occur on the planet spoken by some community of people. I see. Stephen Bax is another professor who is no longer with us, but he studied this manuscript and I'm curious if you can go through what his theories on it are and then also your commentary on it."
    },
    {
      "end_time": 2165.026,
      "index": 93,
      "start_time": 2141.152,
      "text": " Well, actually, I'm not an expert in his theories or any other theories. You know, the ultimate test of a theory is that it produces a decipherment, right? So, as far as I know, no reasonable decipherment has been produced by Dr. Bax or anybody else."
    },
    {
      "end_time": 2178.353,
      "index": 94,
      "start_time": 2165.026,
      "text": " So it's not a huge motivation to study somebody's method if that method has not actually worked. How does one go about the process of learning a language from a computational perspective?"
    },
    {
      "end_time": 2199.855,
      "index": 95,
      "start_time": 2180.555,
      "text": " So, you know, everybody speaks a language. That's the universal thing. Every human being, they have their own native language. Plus, they may speak other languages. But the majority of people, I think they just learn very well their own native language and they learn it as children."
    },
    {
      "end_time": 2229.48,
      "index": 96,
      "start_time": 2199.855,
      "text": " If you try to learn a language after you're like somehow like 10 years old, then you'll find out that it actually becomes a different process. It becomes more difficult and you actually have to go to school or study books or go on the internet and somebody teaches you a language. This is not how children learn a language. So there's a big difference between the native language and the second language that we learn."
    },
    {
      "end_time": 2237.705,
      "index": 97,
      "start_time": 2229.48,
      "text": " For example, when I speak, you can probably tell that English is not my first language. My first language is Polish."
    },
    {
      "end_time": 2264.497,
      "index": 98,
      "start_time": 2238.848,
      "text": " So, because it's not my native language and because I learned it as a teenager, you can tell from my accent that I'm not a native speaker. So, this already tells you something about what people call the language instinct, the ability of people to acquire language. Now, linguistics is"
    },
    {
      "end_time": 2282.927,
      "index": 99,
      "start_time": 2266.357,
      "text": " science of the language which deals with various aspects of the language and those include things like phonetics and morphology, grammar, syntax, semantics, pragmatics, acquisition, many things."
    },
    {
      "end_time": 2305.964,
      "index": 100,
      "start_time": 2282.927,
      "text": " What are pragmatics, briefly, sorry? Yeah, pragmatics is probably what you're most interested in yourself is basically, for example, sentiment analysis is pragmatics, right? If you deal with sentiment analysis, you're not really interested in finding out what people say, but what they feel about what they say, right?"
    },
    {
      "end_time": 2318.422,
      "index": 101,
      "start_time": 2305.964,
      "text": " So that's what we call pragmatics. It's not just about the message. It's about all the other stuff upon it. How do we feel about the message? That sounds terribly complicated."
    },
    {
      "end_time": 2335.998,
      "index": 102,
      "start_time": 2319.531,
      "text": " Is it? No. Well, it is difficult, but it's doable and it's not the hardest part. It is one of the tasks that people do and we have programs now that are very good at it. Okay, so continue on where you were, please."
    },
    {
      "end_time": 2355.623,
      "index": 103,
      "start_time": 2337.295,
      "text": " When you say that it sounds very complicated, it's because it's hard to define exactly what we mean by things like sentiment. What do you mean by sentiment? And then people say, well, are you angry? Are you happy? Are you sad?"
    },
    {
      "end_time": 2379.138,
      "index": 104,
      "start_time": 2355.623,
      "text": " And then how many feelings do we have? Well, we have eight feelings. Really eight? No, maybe twelve, right? So these are things that are very difficult to define. It's much easier to deal with things like letters or phonemes, where we know exactly how many letters or phonemes we have in a language, and it's easier to write programs that deal with that."
    },
    {
      "end_time": 2408.49,
      "index": 105,
      "start_time": 2379.855,
      "text": " Yeah, so I'll give it a try. So you can imagine it's like a pipeline. So you start with, when you hear somebody speaking, you start with what are the sounds of the language, right? That's phonetics. And now once you've done that, then you try to figure out where one word starts, the other ends, right? You want to see, you want to identify the words because there's only a limited number of words."
    },
    {
      "end_time": 2412.773,
      "index": 106,
      "start_time": 2408.49,
      "text": " And that's what we call lexicon or lexicals."
    },
    {
      "end_time": 2442.585,
      "index": 107,
      "start_time": 2414.343,
      "text": " And then when you look at the words, you see that they are made up of sounds or letters, but they're also made of something bigger, which is called morphemes. And that's the stuff of morphology. That's the study of morphology. For example, if I say a word like ungrammaticality, then you can say, well, there are three parts of it, the un, the grammar, and the ality."
    },
    {
      "end_time": 2472.585,
      "index": 108,
      "start_time": 2442.585,
      "text": " That's the morphology. So these are considered the kind of low level, low levels of language and as you go up it becomes more interesting. So first of all, how are words put together into sentences? How is it that you can have sentences that you ask somebody, is that the proper English sentence and they say yes or no? They can tell even though they have no idea, they haven't studied linguistics."
    },
    {
      "end_time": 2501.903,
      "index": 109,
      "start_time": 2472.585,
      "text": " Every native speaker can tell you if a sentence is grammatical or not. That's the study that Noam Chomsky did in the 50s. Can we write a program that can tell a grammatical sentence from an ungrammatical sentence? And on top of that, on top of syntax is semantics, which is about the meaning of words. We can have perfectly grammatical sentences that are meaningless. And vice versa, we can have meaningless utterances that are not grammatical."
    },
    {
      "end_time": 2521.135,
      "index": 110,
      "start_time": 2503.592,
      "text": " This universal grammar of Chomsky's, it's true in the sense that you can create a program that can identify which sentences are grammatically correct and incorrect."
    },
    {
      "end_time": 2540.589,
      "index": 111,
      "start_time": 2523.78,
      "text": " Actually, I don't think so. I think that's what Chomsky tried to do all his life, but it has not been done as far as I know. But at least that was the state of the art about 10 years ago."
    },
    {
      "end_time": 2566.118,
      "index": 112,
      "start_time": 2540.589,
      "text": " Now, the last few years we have seen the neural language models appearing which are extremely effective and which as you know can produce a completely grammatical and text that also makes sense. Yeah, so by extension that means that these programs can tell the difference between a grammatical and ungrammatical sentence because they only produce grammatical sentences."
    },
    {
      "end_time": 2592.5,
      "index": 113,
      "start_time": 2567.449,
      "text": " Are there other universal concepts in language like universal grammar? So the universal concepts in language are the things that are in every language on earth, every natural language. If there is something that almost all languages possess but some languages don't, then it's not universal."
    },
    {
      "end_time": 2611.732,
      "index": 114,
      "start_time": 2592.5,
      "text": " There's a whole area of linguistics that is dealing with finding things that are universal in human languages. And as far as I know, there's a long list of those things. Have you used any machine learning or neural language processing in the decipherment of the Voynich?"
    },
    {
      "end_time": 2633.865,
      "index": 115,
      "start_time": 2614.002,
      "text": " We did use machine learning, but not neural methods. No. The reason we didn't use neural methods for decipherment, and I think you have some experience already with these neural bots, is that they can make sense of everything."
    },
    {
      "end_time": 2655.503,
      "index": 116,
      "start_time": 2633.865,
      "text": " So, for example, Google Translate, if you give it something that doesn't make sense, it will still translate it into something that does. Obviously, we don't want something like that to be applied to Voynich manuscript, because we want to really know what's really there, not how to make sense out of it in some way, right?"
    },
    {
      "end_time": 2685.401,
      "index": 117,
      "start_time": 2656.63,
      "text": " There isn't some way of identifying what makes sense and what doesn't in the same way that for some sentences you can identify if it's grammatically correct or incorrect, like that program has not been completely explicated like you mentioned with Chomsky, but maybe there's huge progress there. Is there not progress in saying this sentence makes sense or not? That is much harder to do. There is progress, yeah, every year there is progress, but we are still far from reaching that point."
    },
    {
      "end_time": 2696.834,
      "index": 118,
      "start_time": 2687.039,
      "text": " You've seen that there's ChatGPT and there's OpenAI's GPT-3. What's your opinion of them? Are you excited by them? Are you surprised by them?"
    },
    {
      "end_time": 2724.582,
      "index": 119,
      "start_time": 2699.019,
      "text": " I'm excited that those tools become available, but I'm also kind of worried that people are too enthusiastic about them. And for me the problem is that they are basically what somebody called parrots. They're parrots that have heard a lot of language being spoken, everything that was ever written."
    },
    {
      "end_time": 2747.346,
      "index": 120,
      "start_time": 2724.582,
      "text": " And they are very good at repeating, putting together those sentences and words together. But there is no real understanding underneath. Those systems cannot tell us why they think these things that they say are true. They're basically repeating the words that have been written somewhere and rearranging"
    },
    {
      "end_time": 2759.445,
      "index": 121,
      "start_time": 2749.155,
      "text": " To be fair, most people when they're putting out something that's creative, they're just repeating what they've seen and they're mixing it up and they believe it to be absolutely new. And also, just so you know,"
    },
    {
      "end_time": 2788.626,
      "index": 122,
      "start_time": 2759.957,
      "text": " There is something creative about mixing up and then presenting it. And furthermore, most people, maybe even all of us, we don't know the motivations, like we'll confabulate some reason for why we created so-and-so. Like that's why the whole field of psychoanalysis came about, because we don't know why we do what we do. We make up some reason. So why does it matter that the computer doesn't know why it does what it's doing and that it's, quote-unquote, repeating? Well, mixing, let's say mixing, what's old?"
    },
    {
      "end_time": 2817.978,
      "index": 123,
      "start_time": 2790.981,
      "text": " I don't think it matters if you're interested in a computer producing art, like writing a song or painting a picture. But it does matter if you rely on the computer to tell you what the truth is, right? Because if you don't, if somebody cannot explain to you why they believe something is true, then how can you trust them? These are deep questions."
    },
    {
      "end_time": 2834.309,
      "index": 124,
      "start_time": 2818.353,
      "text": " What I find remarkable is that you can just even a simple program asking it to code this in Python, code something that does this in Python, code something that does this in AutoHotKey or whatever it may be, and it does it or does it 90% the way there."
    },
    {
      "end_time": 2855.128,
      "index": 125,
      "start_time": 2834.667,
      "text": " So..."
    },
    {
      "end_time": 2870.896,
      "index": 126,
      "start_time": 2855.52,
      "text": " Well, programming is a bit different story, because you can actually test programs. So if you ask whether it's a human or it's a bot to write a program, you can"
    },
    {
      "end_time": 2896.869,
      "index": 127,
      "start_time": 2870.896,
      "text": " you provide a specification, then you can go through the testing, the test procedure and find out if that program really does what it does. So we don't actually have to trust anything, we can just test it. But if we don't have time to test it, then I would be wondering whether it's a good idea to depend on such a program."
    },
    {
      "end_time": 2906.681,
      "index": 128,
      "start_time": 2898.848,
      "text": " So going back to the Voynich, have you thought about if it's composed of at least one language, like maybe there are multiple?"
    },
    {
      "end_time": 2937.961,
      "index": 129,
      "start_time": 2909.684,
      "text": " You know, it could be a lot of things there. You can make these encryption systems as complicated as you wish. So it's all possible. There is no limit. There will be no limit where we can say, well, we tried everything and now we know it doesn't make sense. So it must be some kind of a joke or some kind of random generator."
    },
    {
      "end_time": 2968.387,
      "index": 130,
      "start_time": 2939.872,
      "text": " But what is fascinating about Voynich is that we can use it to actually create new things, right? So like with Dora Bella, we take the cipher and we create a melody, right? And many people take Voynich and they produce decipherments that are like their own pieces of art, like their own books."
    },
    {
      "end_time": 2982.09,
      "index": 131,
      "start_time": 2968.387,
      "text": " The only problem is that everybody produces a different one, so none of them can be actually correct. But it is still a creation, so I think that is very good about Varnish that it exists."
    },
    {
      "end_time": 3005.179,
      "index": 132,
      "start_time": 2983.507,
      "text": " Have you thought about Voynage from less of a computational perspective and more just from a human motivation one? What the heck is this about? Why would someone go through such lengths to decipher this? Or maybe it's not even lengths. Like you mentioned, it could be something trivial. We're just overlooking. Like what other theories come up in your mind? Just surmising, just conjecture."
    },
    {
      "end_time": 3017.398,
      "index": 133,
      "start_time": 3006.817,
      "text": " Yes, so one of the more interesting theories that I've encountered actually comes from this US expert on decipherment."
    },
    {
      "end_time": 3048.643,
      "index": 134,
      "start_time": 3018.643,
      "text": " In the end, he said that he thinks this is an artificial language. Somebody created an artificial language and wrote that Voynich manuscript in that language. Well, if that's the case, then it would be very difficult to decipher it because we don't know the principles of that language. It could be a language that is completely unpronounceable. It's just a sequence of symbols."
    },
    {
      "end_time": 3070.384,
      "index": 135,
      "start_time": 3048.643,
      "text": " What else have you heard that is at least somewhat convincing? Maybe this one's at the top, but is there a second?"
    },
    {
      "end_time": 3102.09,
      "index": 136,
      "start_time": 3072.517,
      "text": " You know, anybody can look at those illustrations, they're on the web, right? And if you look for them for a long time, sometimes I think this somebody was not quite... It wasn't a work of an expert, it was a work of somebody who actually didn't know what they were doing and just tried to create something like what they saw before in other codices, in other books."
    },
    {
      "end_time": 3116.101,
      "index": 137,
      "start_time": 3102.09,
      "text": " a little bit like a neural language model that just looks a lot of things, sees a lot, reads a lot and then produces something that looks like it shouldn't make sense but it doesn't. That's interesting."
    },
    {
      "end_time": 3133.763,
      "index": 138,
      "start_time": 3117.398,
      "text": " We know that those language mouths can be tricked to produce texts that just seem to make sense but are complete nonsense, right? For example, why it's good to eat crushed glass, right? We will give you all the reasons for that, why it is good to eat crushed glass."
    },
    {
      "end_time": 3153.012,
      "index": 139,
      "start_time": 3133.763,
      "text": " When it comes to the Dorabella cipher, there were some other people who came up with decipherments. I'm gonna read some right now. And then there's others, like, why am I very sad? And so on. I'm sure you've heard these."
    },
    {
      "end_time": 3176.578,
      "index": 140,
      "start_time": 3153.865,
      "text": " I've seen this before but it always makes me laugh when I hear it. For me it's complete nonsense. Why? Well, it is nonsense to imagine that a distinguished English composer would write something like that to his love interest."
    },
    {
      "end_time": 3205.828,
      "index": 141,
      "start_time": 3178.285,
      "text": " Imagine if you were to decipher Voynich. What would be next for you? No more ciphers or do you have your eye on another one? You know, I think this happened. I mean, Voynich has not been decrypted but there was a very interesting decipherment recently of actual cipher which was called Zodiac Cipher. I don't know if you've heard of it."
    },
    {
      "end_time": 3229.684,
      "index": 142,
      "start_time": 3207.073,
      "text": " And that is actually correct, right? That decipherment is not fake. It is actually a correct decipherment. So I would probably ask that person about their feelings, like how they feel about cracking that cipher. Is it like a complete bliss or is it like some kind of disappointment?"
    },
    {
      "end_time": 3252.79,
      "index": 143,
      "start_time": 3229.684,
      "text": " You know, I put so much work into it and then I find that this text is actually kind of, you know, not interesting at all. It's like some kind of deranged mind writing it. So, yes, you know, there are one kind of tragedy is you don't achieve your goal and the other tragedy is if you do achieve your goal."
    },
    {
      "end_time": 3270.572,
      "index": 144,
      "start_time": 3252.79,
      "text": " Yeah, that's interesting. Let's get philosophical here. To me that means that you have to enjoy the process more than the state. Even though there's some end state and that's supposedly driving the process, you have to fall in love with the process because you may, if you're lucky and maybe unlucky, reach that state."
    },
    {
      "end_time": 3289.855,
      "index": 145,
      "start_time": 3271.988,
      "text": " Yeah, absolutely. And this is something that I do feel about problems in computational linguistics that I love doing this stuff and I just would be able to do this, you know, for free."
    },
    {
      "end_time": 3317.125,
      "index": 146,
      "start_time": 3289.855,
      "text": " because it's such huge fun to do this. But Voynich was just one of the projects that I got interested in and I learned from the project, I got some experience from that project that I think made me a better scientist so that I can apply this experience to the projects that actually do have a solution."
    },
    {
      "end_time": 3347.09,
      "index": 147,
      "start_time": 3317.125,
      "text": " Well, right now we are very excited to be working on semantics, on lexical semantics, and we are proposing, you know, we are finding things that other scientists find may be controversial, right? But the huge advantage of the work that we do is that we can actually provide proofs, mathematical proofs of what we do."
    },
    {
      "end_time": 3376.323,
      "index": 148,
      "start_time": 3347.09,
      "text": " And this gives us the satisfaction of actually being certain that we are doing something right because we can prove it. Going back to loving the road more than where you're going, I feel the same with this podcast. It's about theories of everything in the physics sense. So my background is in mathematical physics. And a part of me, I feel like I'll be extremely disappointed if I encounter or if we discover as people, as scientists,"
    },
    {
      "end_time": 3405.367,
      "index": 149,
      "start_time": 3376.937,
      "text": " the theory of everything. There is something that's terribly fun about learning it and investigating. I don't want it to be over. I don't think you have to worry about that. Personally, I think, you know, looking at how the universe is constructed, I'm pretty sure it has some built-in mechanism so that we can never actually figure it out completely. What gives you that intuition?"
    },
    {
      "end_time": 3424.053,
      "index": 150,
      "start_time": 3407.346,
      "text": " Well, when you talk about theories of errors, you obviously talk to physicists and that deal with quantum mechanics and things like that."
    },
    {
      "end_time": 3440.794,
      "index": 151,
      "start_time": 3424.053,
      "text": " and how there are certain principles that we can prove that we'll never know the truth, right? Like we'll never know where the particular particle is, what is its exact location and speed and so on."
    },
    {
      "end_time": 3465.316,
      "index": 152,
      "start_time": 3440.794,
      "text": " And this is for me an indication that these things are constructed in such a way that we will never be able to crack them completely. Alright, well, that's hopeful but also dismaying. At least it's both and not one without the other. So, about the zodiac."
    },
    {
      "end_time": 3481.63,
      "index": 153,
      "start_time": 3466.288,
      "text": " If that's a substitution cipher, was it a substitution and transposition or just substitution? Oh yeah, it was substitution and transposition and it was a very tricky transposition too. Yeah, why is that? And would your method have worked on the zodiac one?"
    },
    {
      "end_time": 3508.2,
      "index": 154,
      "start_time": 3485.52,
      "text": " No, the method would not work on Zodiac because the assumption of our methods is that we know where the words are. So in Voynich there are spaces between words and we made this assumption that this is not just to confuse but they are actually words, right? Now in the Zodiac cipher there was no spaces between words."
    },
    {
      "end_time": 3527.79,
      "index": 155,
      "start_time": 3508.2,
      "text": " So, although it is possible to kind of hypothesis where the spaces are, that method, the particular method would not work on Zodiac. And the method used to crack the Zodiac cipher, can that method or methods be used to help with the Voynich?"
    },
    {
      "end_time": 3559.241,
      "index": 156,
      "start_time": 3531.817,
      "text": " Actually, you know, I don't think so. I think the key of the decipherment in that case was just finding the specific pattern of transposition. So it was not any kind of cool new theory that is general and can be applied to various things. It was just kind of a stroke of luck. Like trial and error?"
    },
    {
      "end_time": 3586.681,
      "index": 157,
      "start_time": 3559.241,
      "text": " It is always a challenge there when you do actual decipherment, but what I mean is that there is no method behind it that can be generalized and applied to other things. How has AI, and maybe this is a term that you don't want to use, but how has AI aided your field? So instead of saying AI, then reference a specific model like GANs have changed my field because of some sort of supervised learning in the form of A, B, and C changed my field."
    },
    {
      "end_time": 3609.394,
      "index": 158,
      "start_time": 3588.78,
      "text": " Yeah, so you cannot avoid the word neural nowadays when you talk about language understanding. It's a powerful new tool and everybody is very, very excited about it, including myself."
    },
    {
      "end_time": 3631.391,
      "index": 159,
      "start_time": 3609.394,
      "text": " So of course it changed everything because the story of language processing is that it started from a kind of a symbolic processing and then moved into the machine learning stage and then evolved into the neural methods which we use nowadays."
    },
    {
      "end_time": 3660.094,
      "index": 160,
      "start_time": 3631.578,
      "text": " So what is exciting about it is that every few years you have a new revolution and new methods and we make constant progress to the point that some people think that the problem of language has been solved. But it's not the case. Sorry, that the problem of language has been solved? The problem of language understanding has been solved. That we can basically now have programs that will do every language related task that we want."
    },
    {
      "end_time": 3665.179,
      "index": 161,
      "start_time": 3661.476,
      "text": " And it's not true. Who thinks that that problem has been solved?"
    },
    {
      "end_time": 3692.654,
      "index": 162,
      "start_time": 3667.466,
      "text": " Well, you know, when I read these articles about the neural bots that can, you know, write newspaper articles or compose symphonies or something that sometimes you get an impression that, well, we're done, right? We can just leave it all to the computers and they will do everything for us."
    },
    {
      "end_time": 3721.613,
      "index": 163,
      "start_time": 3692.654,
      "text": " But what I tell my students is that you really want to become, to be educated, to be somebody who cannot be replaced by a computer. And I guarantee you that they will never be able to replace the most important part of us, which is the creativity. Are you sure about that? What is it about human creativity that a machine can't replicate?"
    },
    {
      "end_time": 3730.657,
      "index": 164,
      "start_time": 3721.92,
      "text": " By the way, I'm not being skeptical, I just don't know. I'm curious what your thoughts are since you're in this field. Well, that's exactly what you said."
    },
    {
      "end_time": 3754.002,
      "index": 165,
      "start_time": 3731.135,
      "text": " Machine cannot replicate. Creativity and replication are opposite things. Creativity is doing something that has not been done before. Of course, you can say, well, it's just kind of building on what was before, but it's not replicating. It's not parroting. It's creating something new based on a deeper understanding of things."
    },
    {
      "end_time": 3772.807,
      "index": 166,
      "start_time": 3756.305,
      "text": " There's this old joke of if you want to create an apple pie from scratch, you have to first create the universe. It's like, well, did you get it from the farm? No, I bought it from this. Okay. But even if you had it from the farm, did you grow the dirt? Did you? Well, yes. Okay. But did you make the cow and so on and so on. In a sense,"
    },
    {
      "end_time": 3787.944,
      "index": 167,
      "start_time": 3773.456,
      "text": " Whatever we think of as new, it's so tricky, like it depends on what the heck are we defining as novel, as creative. And I'm sure if we could look into our brain with a certain amount of resolution and we had the correct model, if it even could be modeled computationally."
    },
    {
      "end_time": 3804.343,
      "index": 168,
      "start_time": 3788.234,
      "text": " But regardless, maybe there's some non-computational model, if it can even be modeled, quote-unquote, model. The point is that I imagine it's conceivable to me that what we think of as outputting something creative is something that is algorithmic. Like, I'm not set on this, but it's conceivable."
    },
    {
      "end_time": 3824.701,
      "index": 169,
      "start_time": 3804.667,
      "text": " And if that's the case, then I don't see why a computer can't do it. Now whether or not a computer can feel and understand what it's doing, like that's a separate problem. But the actual output, I don't see an in principle reason why it can't be done. And I'm telling you this as a romantic, like I don't want this to be done. But I see more and more, like aspects that we thought computers could not do."
    },
    {
      "end_time": 3852.739,
      "index": 170,
      "start_time": 3825.503,
      "text": " I would like that to be the case. I want to be convinced of that."
    },
    {
      "end_time": 3877.944,
      "index": 171,
      "start_time": 3854.838,
      "text": " Well, first of all, when you said that something cannot be done, you cannot demonstrate that something cannot be done. You cannot really demonstrate that something can be done by doing it, right? So I will not be able or I don't think anybody would be able to demonstrate that computers cannot do something."
    },
    {
      "end_time": 3899.377,
      "index": 172,
      "start_time": 3879.889,
      "text": " But I am a computer scientist. I've programmed a lot. I worked with computers a lot. And I know that the computers are good at doing repeatedly certain things and repeating patterns that already exist, right?"
    },
    {
      "end_time": 3928.746,
      "index": 173,
      "start_time": 3900.589,
      "text": " You cannot have an algorithm to create something that does not exist, right? That is novel, that is meaningful. Of course, you can create novel things. You can create chaos, right? You can create a random generator and this sequence of randomly generated numbers is unique. Is it novel? No, because it doesn't make sense. Are you afraid of where AI may be or are you more hopeful?"
    },
    {
      "end_time": 3956.169,
      "index": 174,
      "start_time": 3930.964,
      "text": " I think it's a serious issue and we have to think about it, you know, because the danger I see is that people will trust those programs too much. And we bail them and we are responsible for telling them what we want them to do. If we don't do this right, they may do surprising things that we never actually anticipated."
    },
    {
      "end_time": 3981.084,
      "index": 175,
      "start_time": 3956.169,
      "text": " I think the key thing is that we want these things to be transparent. We want to know if they tell us a statement, then we want to know why they think the statement is true. We want them to provide the proof of something that they state. Obviously, they are not at this level yet. For example,"
    },
    {
      "end_time": 3991.903,
      "index": 176,
      "start_time": 3981.681,
      "text": " They can write basically history books, right? But we don't know whether they are hallucinating or is it actual facts they are talking about."
    },
    {
      "end_time": 4020.794,
      "index": 177,
      "start_time": 3993.746,
      "text": " So there must be some way of them providing evidence of what they are saying is true. Like put references when you make a statement. Exactly. So I've been talking to students recently about what is true. How can we decide if a sentence is true or false?"
    },
    {
      "end_time": 4046.527,
      "index": 178,
      "start_time": 4022.312,
      "text": " And the fact is that, you know, some people say everything is relative. Some people think this is true and other people think this is true. What I want the students to do is to decide first what is the speaker, the author of the utterance, if they think it's true or not. And this is non-trivial."
    },
    {
      "end_time": 4069.753,
      "index": 179,
      "start_time": 4046.527,
      "text": " But if they can establish that the author of the utterance or sentence believes it's true, then it is true with respect to that person, right? So we can say this is a true statement according to this person. And it is then kind of clear that this is some kind of evidence based on somebody's belief."
    },
    {
      "end_time": 4078.916,
      "index": 180,
      "start_time": 4069.753,
      "text": " So I do believe we can tell whether a statement is true or false modulo the author of the statement."
    },
    {
      "end_time": 4106.63,
      "index": 181,
      "start_time": 4080.418,
      "text": " Except in the case of AI, like in the case of people we can because they have intentions. But AIs, no. Currently, no. Is there a subfield in computer science that's dedicated to this problem? How did the machine come upon this decision? Can it explain the reasons? Yeah, many people are working on that because many people have realized that this is what we need in order to be able to use those tools. And what's that field called or subfield?"
    },
    {
      "end_time": 4133.729,
      "index": 182,
      "start_time": 4106.63,
      "text": " Is there a name for when you're specifically trying to pry open that black box and then pull out something that is understandable to us? Like how did it make the decision?"
    },
    {
      "end_time": 4153.882,
      "index": 183,
      "start_time": 4134.514,
      "text": " The word I've heard uses interpretability. So you want to have a program that not just does the job, but is also interpretable. So we can interpret why it does the job as it does. So the current non-interpretability"
    },
    {
      "end_time": 4174.411,
      "index": 184,
      "start_time": 4153.882,
      "text": " of AI. Is that what you see as its greatest threat? Or do you see that like you've heard strong AI and you've heard of the singularity and that machines may turn on humans or that other people may use like if you invert certain parameters then a drug that was that a machine developed to produce a drug that was helpful can be"
    },
    {
      "end_time": 4184.718,
      "index": 185,
      "start_time": 4174.411,
      "text": " turn to produce a drug that's extremely potent and deleterious. Do you see the non-interpretability of machines as the greatest issue that we have right now or is somehow connected to all those other issues?"
    },
    {
      "end_time": 4206.988,
      "index": 186,
      "start_time": 4186.203,
      "text": " I don't know if it's the greatest issue, but it's an important issue. Another important issue is the so-called bias, right? These language models are trained on texts that have been written by people that are biased, and they become biased themselves. Obviously, we don't want that to be guided by such kind of texts."
    },
    {
      "end_time": 4218.814,
      "index": 187,
      "start_time": 4208.729,
      "text": " There's a phrase that you wrote down, English orthography is not close to optimal. Correct. Can you explain firstly what orthography is and then take us through that phrase?"
    },
    {
      "end_time": 4242.176,
      "index": 188,
      "start_time": 4220.606,
      "text": " Yes, so orthography is the way we write language. So English exists primarily as a spoken thing, but we also write it down, like as every language. And the orthography is the way we write down the sounds. And as you may know, English doesn't have a very good orthography."
    },
    {
      "end_time": 4269.735,
      "index": 189,
      "start_time": 4242.176,
      "text": " Well, it doesn't seem to be good because it's very hard to learn and people that learn English, they make a lot of spelling errors and even native speakers find it difficult to write down words that they speak. So Noam Chomsky had that kind of a statement that English orthography is near optimal, is close to optimal, even though it appears not to be."
    },
    {
      "end_time": 4288.831,
      "index": 190,
      "start_time": 4269.735,
      "text": " So we had the projects when he kind of showed that it actually is not optimal, it's not close to optimal, it could be much better. And so that's the essence of that paper. Why did Chomsky think that it was?"
    },
    {
      "end_time": 4318.933,
      "index": 191,
      "start_time": 4292.346,
      "text": " Chomsky had very good reasons for saying what he said, but, you know, in science our job is to question everything, right? And that's what we did in that project. We wanted to question that statement which seems to be nowadays accepted as truth by everybody."
    },
    {
      "end_time": 4340.998,
      "index": 192,
      "start_time": 4318.933,
      "text": " and to show that, to provide evidence for that, we wrote programs and we did simulations and we published this to show that it is not actually optimal, it is not close to optimal, could be much better. Yeah, so basically that's the point here."
    },
    {
      "end_time": 4361.271,
      "index": 193,
      "start_time": 4341.527,
      "text": " What was Chomsky's reasons for suggesting it was optimal? Because as you pointed out, it seems on the face that it's clear it's not. Like the word tough is with an F, but it ends with GH. It seems clear that it's not. So Chomsky must have had some reasons, and like you mentioned, he had good reasons. What were they? And then what was his response to, if any, to your results?"
    },
    {
      "end_time": 4391.886,
      "index": 194,
      "start_time": 4364.172,
      "text": " Yes, so Chomsky was, when he wrote this in the 60s, he was going against the consensus rate, which was that English orthography is bad. And he questioned that and he said, no, it's actually near optimal. It would take a lot of time to go into those arguments, which are reasonable. However,"
    },
    {
      "end_time": 4415.213,
      "index": 195,
      "start_time": 4393.097,
      "text": " There's more to it, right? Everything can be interpreted in different ways. The main assumption that is not spoken is that our writing system in English is based on the history of English and other languages."
    },
    {
      "end_time": 4438.063,
      "index": 196,
      "start_time": 4415.213,
      "text": " For example, a lot of English at some point was very influenced by French about a thousand years ago and that influenced the spelling of English. Now, even if we could change the orthography of English to something better, if there is something better,"
    },
    {
      "end_time": 4453.49,
      "index": 197,
      "start_time": 4438.063,
      "text": " then that wouldn't be practically possible because people are just used to the way as it is written right now. And besides, English is spoken in many different countries and those countries would never agree on a new system."
    },
    {
      "end_time": 4474.121,
      "index": 198,
      "start_time": 4455.503,
      "text": " So in a sense Chomsky was right about so-called morphological consistency, that words that have the same morphemes, which are pronounced differently, should have the same representation for the morpheme. That representation shouldn't change."
    },
    {
      "end_time": 4496.51,
      "index": 199,
      "start_time": 4474.121,
      "text": " But there's also something called phonetic consistency and you gave example of that and that is just not good, right? There are just too many arbitrary solutions that reflect the pronunciation as it was 500 years ago. For example, the word tough as you said was actually pronounced with a consonant at the end 500 years ago."
    },
    {
      "end_time": 4524.753,
      "index": 200,
      "start_time": 4498.114,
      "text": " There's morphological consistency, phonetic consistency, and then there's orthographical optimality. Can you place numbers on those? Like, you can say this language is 90% optimal orthographically and 50% morphologically consistent. Can you actually place numbers on them? Yes, so"
    },
    {
      "end_time": 4537.329,
      "index": 201,
      "start_time": 4525.077,
      "text": " Let me give it a try. So, for example, Finnish is considered an extremely good orthography. It's completely consistent in all kind of aspects."
    },
    {
      "end_time": 4565.828,
      "index": 202,
      "start_time": 4538.387,
      "text": " Some languages are, like Croatian for example, the orthography was created under the principle, write as you speak. So that has this consistency that you can just, you never make spelling mistakes, you just write as you speak. Sorry, which language was based like that? That sounds interesting."
    },
    {
      "end_time": 4586.596,
      "index": 203,
      "start_time": 4565.828,
      "text": " It used to be Serbo-Croatian, now these are separate languages, but it still applies to it. Now Spanish, which many people are familiar with, is another type of language where you always know how to read something."
    },
    {
      "end_time": 4610.674,
      "index": 204,
      "start_time": 4586.596,
      "text": " You may still make spelling mistakes, but you will never pronounce a written word in the wrong way. So that's another type of consistency. English doesn't have either of those. You as a native speaker will probably make mistakes unless you have a spell checker, even though you know perfectly well how to pronounce a word."
    },
    {
      "end_time": 4630.64,
      "index": 205,
      "start_time": 4612.705,
      "text": " and me as a second language learner of English, I will encounter words that I just don't know how to pronounce. So it is definitely a problem in English, but other languages are even more difficult, like the Japanese orthographic system is even more difficult than English."
    },
    {
      "end_time": 4651.578,
      "index": 206,
      "start_time": 4632.654,
      "text": " I'm curious if English stands out as best or worst in some metric, and if so, which? For instance, I heard that English can convey a complex sentence second best, something like that, and Mandarin is first. You can think of it as a simple language, as one that a child may just come up with on their own."
    },
    {
      "end_time": 4671.101,
      "index": 207,
      "start_time": 4651.578,
      "text": " I don't know the actual terminology, I just heard this. And I heard that English is actually pretty great, it's second in the world, and Mandarin is best at that. But anyway, the point is I just heard this, so what is English great at and not great at?"
    },
    {
      "end_time": 4686.63,
      "index": 208,
      "start_time": 4673.336,
      "text": " Yes, so English and Chinese have something in common which is that they are analytic languages. So morphology in English is very basic compared to languages like Spanish or Polish."
    },
    {
      "end_time": 4712.329,
      "index": 209,
      "start_time": 4687.705,
      "text": " In Chinese it is even more simple, there is no morphology at all. So in that sense these analytical languages reach some kind of maximum within that particular condition. I know that"
    },
    {
      "end_time": 4740.964,
      "index": 210,
      "start_time": 4713.848,
      "text": " English is, if you compare things written in different languages, sometimes you see on products like 20 languages with the same message, the English text will probably be one of the shortest ones. So I think this is maybe something you're referring to that, that it can actually convey the same message with fewer letters or fewer symbols."
    },
    {
      "end_time": 4763.319,
      "index": 211,
      "start_time": 4742.654,
      "text": " Reminds me of this joke someone was translating. I think it's I think this actually happened I think it was from Hideo Kojima who's a video game creator, and he was on stage She speaks Japanese, and he says he goes it goes on for like 20 seconds 30 seconds the translator comes He says thank you You're like that's not what he's like just"
    },
    {
      "end_time": 4793.592,
      "index": 212,
      "start_time": 4763.831,
      "text": " If you're lazy or you've forgotten, that's fine, but there's no way that's all of what he said. Yeah. Well, I actually lived in Japan for a while. So this is actually the issue of pragmatics, right? Human language is not just exchanging messages. There's a lot, for example, related to politeness. And in Japanese, you spend a lot of time just being polite in addition to passing a message."
    },
    {
      "end_time": 4804.838,
      "index": 213,
      "start_time": 4793.592,
      "text": " Ah, like San. San at the end of a person's name. Is that to connote I am lower than you or respect? Yeah, there's a lot more tools for expressing this kind of relationship in Japanese."
    },
    {
      "end_time": 4827.466,
      "index": 214,
      "start_time": 4805.776,
      "text": " Do you know who Larry David is from Seinfeld?"
    },
    {
      "end_time": 4843.387,
      "index": 215,
      "start_time": 4829.411,
      "text": " Larry David, the creator, he said that when Caesar was being assassinated by Brutus, that Brutus said something with the two and then Larry David said that was too informal for an assassination."
    },
    {
      "end_time": 4859.514,
      "index": 216,
      "start_time": 4847.602,
      "text": " To end this, you did your master's thesis on a theoretical evaluation on selected backtracking algorithms. So how has your perspective on that subject since the writing of that thesis changed? How has it developed?"
    },
    {
      "end_time": 4886.032,
      "index": 217,
      "start_time": 4861.084,
      "text": " Yeah, so this is part of what's called artificial intelligence, but it's a very formal thing called constraint satisfaction. And what I liked about it is that you can actually prove something. Unlike in pure linguistics, you can never prove anything. You can just argue about it and then some people will disagree."
    },
    {
      "end_time": 4916.032,
      "index": 218,
      "start_time": 4886.032,
      "text": " But I didn't stay in that area because I wanted to work with language. I love language. And then I found that in language it's very hard to prove anything because there are always exceptions. But now after all those years I'm coming back to the point that I think that I can actually use the language of mathematics to describe human language. And I find this very exciting. So I hope to be able to prove things"
    },
    {
      "end_time": 4937.961,
      "index": 219,
      "start_time": 4916.032,
      "text": " What's one of the more out there theories of the Voynich manuscript as to what it's about, what it contains, information on that you don't believe in but you find interesting, maybe even plausible?"
    },
    {
      "end_time": 4953.626,
      "index": 220,
      "start_time": 4939.565,
      "text": " So there was this hilarious paper, somebody trying to show that the language of Varnish is actually Lojban. I don't know if you've heard about it. It's an invented language."
    },
    {
      "end_time": 4976.988,
      "index": 221,
      "start_time": 4954.821,
      "text": " And this paper showed to me that you can actually provide evidence for anything, for any language. If it's a lodge band that was invented in the 20th century, and somebody wrote Voynich manuscript in the 15th century in that language, then that means you can basically argue for anything."
    },
    {
      "end_time": 5003.422,
      "index": 222,
      "start_time": 4976.988,
      "text": " And that again shows the value of if you can actually prove something. And in the case of the Voynich manuscript, the proof would be actually in the pudding, which means deciphering it into some kind of text that makes sense. Do you think it will be deciphered in the next five years? I don't know. I hope it will be. I hope it will, but I wouldn't bet on it."
    },
    {
      "end_time": 5031.254,
      "index": 223,
      "start_time": 5005.06,
      "text": " You know, people said in history, people often said something will never be done and it was done. When I first heard about the zodiac cipher, I thought, no, that's never going to be the cipher because it's probably just random noise. And then it turns out that it was deciphered. So that's a lesson for us."
    },
    {
      "end_time": 5056.357,
      "index": 224,
      "start_time": 5031.254,
      "text": " Meaning, in the case of the zodiac, you thought that it was gibberish, that he didn't actually write anything. It's not something that was deciphered, it's just symbols. Yeah, I thought it was just the intentional gibberish to confuse people. This is similar to the people that say that Voynich is a joke, right? They make the same assumption that somebody just did it to confuse people."
    },
    {
      "end_time": 5072.602,
      "index": 225,
      "start_time": 5056.357,
      "text": " Well, thank you for spending about two hours with me or an hour and a half on what is potentially a joke, but we hopefully not. Take care, man. It's good to speak with you. Thank you. It was fun talking to you. Bye."
    },
    {
      "end_time": 5094.582,
      "index": 226,
      "start_time": 5073.746,
      "text": " The podcast is now concluded. Thank you for watching. If you haven't subscribed or clicked on that like button, now would be a great time to do so as each subscribe and like helps YouTube push this content to more people. Also, I recently found out that external links count plenty toward the algorithm, which means that when you share on Twitter, on Facebook, on Reddit, etc."
    },
    {
      "end_time": 5121.527,
      "index": 227,
      "start_time": 5094.582,
      "text": " It shows YouTube that people are talking about this outside of YouTube, which in turn greatly aids the distribution on YouTube as well. If you'd like to support more conversations like this, then do consider visiting theories of everything dot org. Again, it's support from the sponsors and you that allow me to work on toe full time. You get early access to ad free audio episodes there as well. Every dollar helps far more than you may think. Either way, your viewership is generosity enough. Thank you."
    },
    {
      "end_time": 5159.565,
      "index": 228,
      "start_time": 5147.637,
      "text": " Think Verizon, the best 5G network is expensive? Think again. Bring in your AT&T or T-Mobile bill to a Verizon store today and we'll give you a better deal. Now what to do with your unwanted bills? Ever seen an origami version of the Miami Bull?"
    },
    {
      "end_time": 5177.671,
      "index": 229,
      "start_time": 5160.009,
      "text": " Jokes aside, Verizon has the most ways to save on phones and plans where you can get a single line with everything you need. So bring in your bill to your local Miami Verizon store today and we'll give you a better deal."
    }
  ]
}

No transcript available.