Audio Player

Starting at:

Elan Barenholtz: The Theory That Shatters Language Itself

June 13, 2025 • 2:30:19 • undefined

Channel: Theories of Everything with Curt Jaimungal

⚠️ Timestamps are hidden: Some podcast MP3s have dynamically injected ads which can shift timestamps. Show timestamps for troubleshooting.

Transcript

Enhanced with Timestamps

347 sentences 22,078 words

Method: api-polled Transcription time: 149m 27s

[0:00] The Economist covers math, physics, philosophy, and AI in a manner that shows how different countries perceive developments and how they impact markets. They recently published a piece on China's new neutrino detector. They cover extending life via mitochondrial transplants, creating an entirely new field of medicine. But it's also not just science, they analyze culture, they analyze finance, economics, business, international affairs across every region.

[0:26] I'm particularly liking their new insider feature was just launched this month it gives you gives me a front row access to the economist internal editorial debates where senior editors argue through the news with world leaders and policy makers and twice weekly long format shows basically an extremely high quality podcast whether it's scientific innovation or shifting global politics the economist provides comprehensive coverage beyond headlines.

[0:53] Ford Blue Cruise hands-free highway driving takes the work out of being behind the wheel, allowing you to relax and reconnect while also staying in control.

[1:20] Enjoy the drive in Blue Cruise-enabled vehicles like the F-150, Explorer, and Mustang Mach-E. Available feature on equipped vehicles. Terms apply. Does not replace safe driving. See Ford.com slash Blue Cruise for more details. I'm going to get attacked by physicists. This thing is just ridiculously good. And so that just blows my mind.

[1:43] Professor Barenholtz completely inverts how we understand mind, meaning, and our place in the universe. The standard model of language assumes words point to meanings in the world. However, Professor Barenholtz of Florida Atlantic University has discovered what's unconscionably unsettling. They don't. Language is actually deconstructing itself. Most startlingly, he argues that our rational linguistic minds

[2:09] have severed us from the unified cosmic experience that animals may still inhabit. Most current LLMs operate with purely autoregressive next token prediction operating on ungrounded symbols.

[2:25] All of this terminology is explained so don't worry, this podcast can be watched without a formal background in psychology or computer science. In this conversation, we journey through rigorous explorations of how LLMs work, what they imply about how we view the world and the relationship between our consciousness and the cosmos. Professor, you have two theses.

[2:48] One is a speculative one and the other is more grounded. You even have another more hypothetical one atop that, which we may get into. Why don't you tell us about the more corroborated one and then we can move to the contestable parts later? Okay, sure. So yeah, I would call them sort of the grounded thesis and then sort of the extended version of that, if we can call it that. The grounded thesis is primarily about language.

[3:16] Uh, and the thesis is that human language is captured by what's going on in the large language models. And I mean, not in terms of the specific, uh, exact algorithm as to how the, uh, large language models like ChatGPT are doing the, are actually generating language, but the core sort of mathematical principle that large language models like ChatGPT run on are what's happening in the brain.

[3:43] uh... and that's what's happening in human language and really the reason i say corroborated is because ultimately this isn't even about the brain it's about language itself and i think what we have learned in the course of being able to replicate language in a completely different substrate namely in you know computers uh... is that we've learned the properties of language itself we've discovered it's not through clever human engineering that we've been able to kind of

[4:13] Barrel our way towards language competency. It's that with actually fairly straightforward mathematical principles Done at scale. We've actually discovered that language has certain properties that we didn't know it had before and so the incontrovertible fact in my opinion is that language itself has certain properties now that we know it has those properties and

[4:38] My claim is, the sort of corroborated claim is that those properties force us to conclude that the mechanism by which humans generate language is the same as what's going on in these large language models. Because now that we know that language is capable of doing the stuff that it does, it not only has the properties to, and I'm sort of giving away the punchline, to self-generate based on its internal structure,

[5:05] It's unavoidable to think that we are using the same basic mechanism and principles because it would be extremely odd to think that we have a completely different orthogonal method for generating language. Put differently, if we are using completely different mechanisms than the language models, then it's extremely unlikely that the language models would work as well as they do.

[5:30] The obvious question that's occurring to the audience as they listen right now is how do we know that whatever mechanism is being used by LLMs isn't just mimicry?

[5:56] Right, and so that's sort of the critical question. Is this mimicry, right? Is what the models are doing, in a sense, learning a kind of roundabout technique that captures some of the superficial components of language in humans, but ultimately it's a completely different approach.

[6:15] And so, you know, my argument is really from the fundamental simplicity of these models. So let's just talk really quickly about how large language models work, things like Chachi BT. What they're doing is learning, given a sequence, you know, let's say the sequence is, I pledge allegiance to the, and then the model is being asked,

[6:39] to do this thing called Next Token Generation. What's the probable next word? We'll say word for the purpose of this conversation. We're going to call tokens a word. Token is a more technical term about how you chop up and encode the information in a sequence of language, but we're just going to say word. So guess the next word based on that sequence. And then what you do in these models is you

[7:07] you train them

[7:37] Then take that word, tag it onto the sequence, and feed it back in. This is sufficient to generate human-level language. Now, the reason I believe that this demonstrates something not about our engineering or even about the models themselves, because there's different ways you might build a model that can do this, is because this very simple trick, this simple recipe of simply guessing the next word turns out to be sufficient

[8:05] To the point where there really are no benchmarks, no standard benchmarks that these models aren't able to do. And so what that suggests to me is just by learning the predictive structure of language, you're able to completely solve language. That means that that is likely to be the actual fundamental principle that's built into language in order to generate it.

[8:29] If we had to come up with a very complex scheme, for example, syntax trees, complex grammar, long range dependencies that we had to take into account, and through enough compute, we were able to kind of master that, then I might argue, well, what we're doing is possibly figuring out a roundabout way to capture all this complexity.

[8:53] But it's the simplicity itself that simply being able to predict the next token, the next word, is sufficient to do all of this long-range thinking to be able to take an extremely long sequence and then produce an extremely long sequence on the basis of that. That suggests to me that we discovered a principle that's actually already latent in language, that we just had to throw enough firepower at it but with an extremely simple algorithmic trick

[9:18] And then language revealed its secrets. So to me, this really suggests that there is, of course, you know, there's still a lot of science that needs to be done and this kind of thing, kind of work that I'm doing in my lab in terms of really being able to hammer down how the brain is instantiating this exact same algorithm. It's not going to look exactly like chat GPT. It's not necessarily going to be based on

[9:42] what are called transformer models, which is something we can get into a little bit. But as far as the core principle of prediction of the next token, the fact that that solved language so handily to me really argues that that is the fundamental algorithm. That is the fundamental algorithm that when you apply it, boom, language emerges. If you just have the corpus, you have the statistics, and then you do next token prediction,

[10:06] Okay, so Elon, you and I have spent several days together

[10:32] In fact, you're in the video with Jacob Barndes and the Manolis Keleswan will place that on screen and I'll put a pointer to you. And you were in the background of the interview with William Hahn on Williams. Always in the background, never in the foreground. Here we are. OK, well, yes, great. You have a large epiphany that occurred to you at one point. You spoke about the software and this precipitated this entire point of view of language as a generative slash autoregressive

[11:01] Model or what have you tell me about it? What the heck was that big idea? So the it wasn't so much an idea as as as an epiphany a realization and it really it hit me in a single moment And it wasn't necessarily about autoregression. It wasn't about this finer detail of how Ultimately language models and and I believe the brain solved this problem it was the realization that

[11:29] All of the, any model that has been trained, any model that anybody has built that accomplishes human-level language. So it might be based on autoregression. It might be based even on diffusion, which is kind of the arch nemesis of my autoregressive theory. But regardless, the fact is that these models are being trained exclusively on text data.

[11:57] And so all they are learning is the relations between words. To the model, as far as the model is concerned, the words are turned into numbers. They're tokenized. We think of them as numerical representations. But those numbers, and for our purpose, we could think of them as words, don't represent anything. There is nothing in the model besides the relations. Relations just between the words themselves. There isn't, for example, any relation between any of the tokens and something external to it.

[12:27] What we tend to think of as people is what words are doing when we're discussing topics, thinking about words in our head, is that they symbolize something, that they refer to something. A lot of the philosophy of language, a lot of the scientific study of linguistics has been concerned with semantics. How do words get grounded? How do they mean something outside of themselves? And what large language models show us is that words

[12:57] don't mean anything outside of themselves. As far as generation goes, as far as the ability for us to have this conversation, and as far as the model's ability to produce meaningful responses to just about any question you can throw at them, including writing a long essay on any topic, including a novel topic that it's never encountered, is by stringing together sequences based on

[13:25] simply the learned relations between words. And so this really hit me very, very hard. I've long been puzzled by, as many are, by the mind-body problem, the phenomena of consciousness, the problem of how do we know your red is my red? And actually the moment that I had this realization was related to this very question. I realized that the word red doesn't mean what we mean by qualitative red. The qualitative red is taking place in our sensory perceptual system.

[13:54] The word red, to a large language model, can't mean that. It can't mean any color. It has no color phenomena. It has no concept of what sensory red would mean. Yet it is able to use the word red with equal ability, with equal competency, just as well as I can, if we're just having a conversation about it. And so what this means is that within the corpus of language, the word red doesn't mean something external to itself. Instead,

[14:23] The word red simply means where does it fall in the space of language itself? Where does red fall in relation to other colors, in relation to the word color, in relation to other concepts, other, well, frankly, just words, tokens that are related to what we call concepts that have to do with color and have to do with the word red. So yeah, so this epiphany was about this extraordinary dichotomy, this divide between language

[14:52] and that which we think language refers to. The question is, how does language refer? And the answer is it doesn't. Language doesn't refer in and of itself. Language is an autonomous system. It's a self-contained system. It has the rules contained within it to generate itself, to carry on a conversation. Large language models don't know what they're talking about in any real sense. They can talk about a sunset. They can talk about a taste.

[15:21] They can talk about all of space and time and all of those things and yet we would say they have no idea what they're talking about and we'd be right in the sense that they don't have a notion of red beyond the token and its relation to other tokens.

[15:36] Now this then raises the obvious question, well, what do I mean what red is about? Don't I think red refers to a quality of perception? And the answer is I do have a quality of perception. There is something called red that my sensory system is aware of. And then there's a token called red that is used in conjunction with there's a sort of coherent mapping between my sensory perception of red

[16:05] and the linguistic red. But that doesn't mean that you need to understand what that word refers to. You don't need to have the sensory qualitative concept of red in order to completely successfully use the word red. And so these are compatible but dichotomous systems. The sensory perceptual system and the linguistic system are ultimately, we can think of them as essentially

[16:34] Distinct and autonomous, but compatible, integrated. They're running alongside each other, they're exchanging messages.

[16:49] so that we can have a single organism that is successfully navigating the world and enable, for example, to communicate. So I see something red that's registered in my brain. I have a qualitative experience of red. It's remembered in having a certain quality. And then later on I said, Oh, you know, could you go pick up that red object for me? And so we are, there's a handoff between the perceptual system and the linguistic system. Just that the linguistic system can now successfully

[17:18] Send a message to you. Now you've got the linguistic system. You can talk about that. Oh, okay. You told me there's a red object. Are there multiple objects? Yes, there's multiple objects. They have different colors. You're looking for the red one. Maybe it's a dark red. I'm doing this all linguistically. Now you're able to go into the room and successfully get the right object. So again, the handoff happens the other direction. Language is able to hand off to the perceptual system. The perceptual system is able to then detect that there's something with the right quality. But that's not the same thing as having saying,

[17:47] that the language contains the reference inherently within it. It simply means that these are communicative systems that they can exchange information, that they integrate with one another in terms of forming coherent behavior. But language is its own beast. It's its own autonomous system. It can run on its own. That was the big realization. Large language models prove it, that language is able to produce the next token and by virtue of the next token, the next sequence. And that means all of language

[18:17] Without having any concept of reference, the reference has no place there. There's no way to kind of squeeze it in. If your computational account is the one that I'm proposing, if the computational account is essentially prediction based on a next token based purely on the topology, the structure, the statistical structure of language, then there's no way to cram any other kind of grounding or any sort of computational feature in there at all. It has to be something closer to

[18:47] What's happening is much closer to generating a prompt, basically saying, here's what's in the room.

[19:11] And now based on these features, these scripts now run the same exact language exclusive model. And so language takes care of itself. It doesn't need grounding in order to be able to do everything it does. It doesn't have to have concepts outside of itself. I think that's basically been proven by these text only large language models. So that was the big epiphany. The big epiphany was that language is autonomous.

[19:37] Okay, so you're not denying consciousness and you're not denying qualia.

[20:05] No, and I want to make this very clear that my personal opinion on this is besides the point to some extent. You can be an eliminativeist if you want, although I think everything I'm saying has a lot of bearing on this. But I believe my account is strictly an account of language.

[20:28] I think that perceptual mechanisms that give rise to qualia, things like redness and heat and taste and all of these, are basically processes that take place long before the handoff. And so what happens is, you know, think about the camera, the camera is transducing light, it's measuring certain wavelengths,

[20:51] It's then there's a lot of visual processing that has to happen before you get to the point where it's turned into a linguistic friendly embedding, right? The stuff that an LLM can see, a multimodal LLM can see. And so all of that processing that happens is what I think gives rise to the qualitative experience.

[21:10] We experience redness because of all of this very sort of analog, probably non-symbolic kind of representation. And then at the end of that process, there is a conversion. By the way, by the end of the process, a lot of things happen. We also respond to colors and to light and all of that non-linguistically. But we could think of different endpoints. One of those endpoints is here's a handoff to language.

[21:40] And by the time language gets it, it's long past that kind of sensory and perceptual processing that gives rise to qualitative phenomena. So I strongly believe that there is, in a certain sense, the word hard problem is a little loaded. I believe there's undeniable qualia. But what I also think is that language is poorly equipped. It's simply

[22:09] Just a moment. Don't go anywhere. Hey, I see you inching away.

[22:27] Don't be like the economy, instead read the economist. I thought all the economist was was something that CEOs read to stay up to date on world trends. And that's true, but that's not only true. What I found more than useful for myself personally is their coverage of math, physics, philosophy, and AI, especially how something is perceived by other countries and how it may impact markets.

[22:50] For instance the economist had an interview with some of the people behind deep seek the week deep seek was launched no one else had that another example is the economist has this fantastic article on the recent dark energy data which surpasses even scientific americans coverage in my opinion they also have the charts of everything like the chart version of this channel it's something which is a pleasure to scroll through and learn from.

[23:14] Links to all of these will be in the description, of course. Additionally, just this week there were two articles published. One about the Dead Sea Scrolls and how AI models can help analyze the dates that they were published by looking at their transcription qualities. And another article that I loved is the 40 best books published this year so far. Sign up at Economist.com slash TOE for the yearly subscription. I do so and you won't regret it. Remember to use that TOE code as it counts to helping this channel and gets you a discount.

[23:41] Now, the economist's commitment to rigorous journalism means that you get a clear picture of the world's most significant developments. I am personally interested in the more scientific ones, like this one on extending life via mitochondrial transplants, which creates actually a new field of medicine, something that would make Michael Levin proud. The economist also covers culture, finance and economics, business, international affairs, Britain, Europe, the Middle East, Africa, China, Asia, the Americas, and of course, the USA.

[24:11] Whether it's the latest in scientific innovation or the shifting landscape of global politics, The Economist provides comprehensive coverage and it goes far beyond just headlines. Look, if you're passionate about expanding your knowledge and gaining a new understanding, a deeper one of the forces that shape our world, then I highly recommend subscribing to The Economist.

[24:32] I subscribe to them and it's an investment into my into your intellectual growth one that you won't regret as a listener of this podcast you'll get a special twenty percent off discount now you can enjoy the economist and all it has to offer.

[24:46] Your planet is now marked for death. Model Studios The Fantastic Four First Steps is now streaming on Disney+. We will protect you. As a family. Light them up, Johnny!

[25:15] Marvel's First Family is certified fresh on Rotten Tomatoes. That is fantastic. And critics say it's one of the best superhero movies of all time. Marvel Studios' The Fantastic Four First Steps, now streaming on Disney+. Rated PG-13. What time is it, Ben? It's slobbering time!

[25:34] Close your eyes, exhale, feel your body relax, and let go of whatever you're carrying today. Well, I'm letting go of the worry that I wouldn't get my new contacts in time for this class. I got them delivered free from 1-800-CONTACTS. Oh my gosh, they're so fast. And breathe. Oh, sorry. I almost couldn't breathe when I saw the discount they gave me on my first order. Oh, sorry. Namaste. Visit 1-800-CONTACTS.COM today to save on your first order. 1-800-CONTACTS.

[26:05] Okay, let me see if I get this you have some redness so you do you're not denying redness you grant redness. I do. Okay, there's redness and then somehow this needs to be referred to with some spoken words with some language. Okay.

[26:28] So what's happening? You're saying that it's an independent system, yet it's integrated. So what is that relationship? And does it become so diluted that by the time you refer to it, you're no longer referring to that qualia? I don't understand. Yeah, that is essentially the idea. So this is this is the exact problem I am working on right now. There was a fantastic paper that I just came across about a week ago.

[26:54] There was a paper that was published in an archive recently. It's called Harnessing the Universal Geometry of Embeddings. And what this paper showed is that you could have completely different models solving different linguistic tasks. For example, you could have GPT, then you could have BERT, which solves a somewhat different task. So there's masked tokens as opposed to autoregressive next token generation. And what they found was that you could learn what is latent space.

[27:23] What you could do is hand off, take the embedding. The embedding is basically, you can think of that as numerical representation. It's a high dimensional numerical representation of your tokens. So here's a token. This token is going to represent the word dog. And then we're going to take that token and embed it in a much higher dimensional space. And what they found is that if you take the embedding, the high dimensional representation from one model, so you chat GPT and then

[27:53] take representation from a different model, that you could actually get the, you could take the embedding, send it to this latent space. If you cycle it through, get the, you have to, it's starting to get in the weeds a little bit, but you send it to this latent space and then recover it in its original form. What you can do is, once you've got that latent space, you can then translate

[28:21] From one embedding to a completely different embedding. This is a new paper. This is a new paper, yes. This rocked my world because what they're arguing is that there in some ways is this underlying universal structure of language that's captured in this latent space. And so even though if you have a radically different embedding in one line, you know, they didn't do it across different languages. That's one of the projects I'm

[28:50] doing right now is to see if you can do this across say English and Spanish, even for a language that's trained exclusively on English and then another models trained exclusively on Spanish. Can you guess the Spanish just from finding this kind of universal structure across these two different models?

[29:10] Sorry, what do you mean? Can you guess the Spanish? If a model was trained only in English and then it was receiving some Spanish text, a couple of Spanish sentences. So the way to think about it is that what you're doing is creating another embedding, another, another, this, this latent space where you're going to be able to send in a message in English and then based on

[29:33] the station and then again, do the same thing for Spanish and then what you're not, you're never going to show any model. No model is going to ever see a pair of English and Spanish. Instead, what you're going to learn is that there is some way to get from a link. You're going to end up being able to get from English to Spanish without ever seeing the actual translation because what the model is going to learn is what's common across these two representations. What's true for both the Spanish embedding,

[30:02] and the English embedding, that there's some sort of underlying latent structure.

[30:07] That's true of both and that that captures something more universal about language. And again, they didn't do it for different languages. They just did it for different embeddings of English, but very different embeddings because they were trained on completely different models. If you looked at them, if you just looked at this sort of vector representation, took a vector representation of the word dog in one and a vectorization the word dog in the other, they're completely numerically not. There's no similarity. You can never spot the similarities if you just looked even them then pairwise.

[30:34] But if they do this kind of reconstruction and then ask the model to be able to reconstruct, not in the original embedding space, but go and reconstruct in the other embedding space, it's able to actually do this. And so by doing that, by training it to do that, without ever seeing any pairs, it's able to sort of learn this translation between one representation and another representation. What this opened up to me is the possibility that we could think about the exact same

[31:04] kind of latent space in the brain and possibly in artificial intelligence models between the perceptual world and the linguistic world, that there is some embedding of how the physical world is structured. We understand, like think about an animal, a non-linguistic animal, certainly has idea of objects and objects in relation to other objects, objects in proximity to other objects, moving around those objects. My dog was just barking in the background.

[31:35] knows what doors are and she can go scratch it and she knows it opens up. She certainly isn't able to express that linguistically, but she has this concept and she's able to think about, she's able in some ways to reason about that. My suspicion is that that probably is done maybe even autoregressively, but we'll leave that aside for now. The main point is that there is some representation of the facts about the world, the sensory facts of the world, or the sensory, I would say the sensory construction, the facts that have been constructed

[32:04] based on sensory information. So that's some sort of embedding of the world. The linguistic embedding is a radically different embedding. It carries information about the world as well, but not in the way, not in the direct way that we think, not that the word, you know, my headphones are sitting on this desk has direct referent back to sensation and perception. No, it lives on its own. It's its own embedding and it does its own and it can do its own thing. However,

[32:34] Based on this paper, this really gave me sort of a key insight that there might be this latent space where you can actually do this kind of mapping, where there's translation between linguistic and perceptual embeddings. They're as distinct as they are, fundamentally very, very distinct, very different. They're there to solve different problems, but they're able to talk to each other. How? Perhaps through this kind of latent space where some universal structure

[33:01] Like, okay, in language, there's certain facts about language. There's a fact about the word dog or the word microphone that its relation to other words like desk in some ways captures the fact that microphones sit on top of desks. That fact is somehow actually contained within this embedding structure. In what sense? Well, if you ask me, would a desk sit on a microphone or a microphone sit on a desk?

[33:29] I can answer that question. So can chat JBT, right? And without any notion of what microphones really are sort of from a perceptual standpoint, they're having these kinds of properties, we can talk about them and the embedding space, the linguistic embedding space contains this information. What does it mean contains information? By the way, just to say, what does that mean? It means given a certain input, like do microphones sit on desks?

[33:51] Where should I put my microphone? I can answer linguistically in a reasonable way. And that's what I mean by the knowledge. It's purely linguistic knowledge. It only can generate linguistic responses. But the point is that that knowledge lives in this kind of linguistic embedding. And then there's the other kind of embeddings. There's a visual embedding. There might be an auditory embedding, which is distinct. And then the idea that I'm very inspired by is that there can be this latent space that captures certain universals.

[34:21] Are there any

[34:41] of this century perceptual kind of phenomena. And this is important because forever philosophers, philosophers, philosophers in general, linguists have been trying to understand how do words get their meaning? How do they, what they, something I referred to earlier, you know, what's the definition of a microphone? What's the definition of a dog? And the answer is there isn't a single one.

[35:03] There isn't a single definition that's ever going to capture. Instead, what you've got is this latent sort of bridge where there's some sort of representation of this fact that given whatever your particular prompt is, your linguistic prompt is going to lead to certain kind of meaningful linguistic behavior. If you ask me a question about this microphone,

[35:23] I

[35:45] about the world that's embedded in language. I don't think there would be a static set of facts embedded in our visual embedding of the world. Instead, what we've got is what I call potentialities. We now have the ability to engage that latent space linguistically where the perceptual information lives, this universal embedding of it, and then do whatever we need to do with it.

[36:11] If I need to answer this question about it, I can answer that question. If you ask me a different question, I can answer that. But there isn't a singular meaning.

[36:17] of microphone that captures sort of the entire set of facts. Here it is. Here's the embedded set of facts. The set of facts is actually infinite. I could tell you infinite things about this microphone. For starters, to use a silly philosophical example, it doesn't have this shape and it doesn't have that shape. I could tell you there's an infinite number of questions you could ask me about it that I could answer meaningfully about it. So all those potentialities are kind of what happens when the linguistic system

[36:47] Interacts with this is kind of a shared embedding space that's sort of the half-baked version of how I think language ultimately does have to enter of course language language only is meaningful insofar as it can live within the larger ecosystem of perception and sensation and perception we have to be able to take in information through our senses and then

[37:12] communicate, although I use that word kind of carefully, I don't communicate the entire representation because as I said, I don't think that's even a meaningful idea. Instead, what I can do is use language in a way that helps us coordinate our behavior. There's no way to sort of download the entire perceptual state. That's locked up in some ways in the perceptual embedding. No, what I can do is

[37:39] pull some information such that I can meaningfully communicate with you in a way that then is going to have the intended consequences. I'm not downloading perceptual information into your brain. I'm telling you what you need to know in order to be able to perform some action, to perform some behavior, or maybe even to think about it so that you could later perform some action. I know that was a lot, and feel free to back me up and challenge me on any of these things.

[38:09] So I want to see if I understand this and I want to explore what is the definition of language, even though we just talked about there isn't the definition of a microphone say, but I do want to talk about the definition of language and what is autoregression.

[38:21] And while presumably you're telling me what you believe with language, you're telling me this model because you believe it's true. I don't know what truth you're conveying if you believe this is not grounded. So what are you referring to when you even say that language is autoregressive without symbol grounding? I don't have an ideas to that. I want to explore that.

[38:41] But first, I want to see if I understand you. OK, so a latent space. So let's think of a word. A word gets a vector like an arrow. And I'm just going to be 2D for this example, because that's just what the camera picks up. So let's say the word dog looks like. So the word cat looks like. So whatever. OK, the space that it's embedded in is called the latent space. Is that correct? Well, the initial embedding is is just the embedding.

[39:08] So the latent space is a compressed version of that?

[39:34] Well, in some ways, it's actually not compressed. It's actually it's actually it. What's the opposite of compressed? It's uncompressed expanded.

[39:41] It's an expanded version. So you have the original tokenization, which just says here in a fairly small vector, but then you expand it into a much higher dimensional embedding space so that each token actually ends up getting much richer, many more numbers that are used in order to represent each token.

[40:05] I mean, that's a very key fundamental thing that these models do. And by expanding in these different dimensions, that's what allows you to sort of massage the space so that you can get all these cool properties like cat and dog being sort of in the appropriate relation to one another. So that later on, when you're trying to figure out what the next token is, you're able to actually leverage the inherent structure in this high dimensional space. Okay.

[40:34] So then you have the language model for English and then you have a language model for Spanish. Yes. And let's imagine that it was trained only with the corpus of English in the former case and only with the corpus of Spanish in the second. And then we can even have a third of Mandarin. Sure. OK. Yeah. In fact, in the paper, they didn't do different languages. They said they did different embeddings of English language models. But yes, they use multiple. They actually did this across several different embeddings, not just two.

[40:58] Okay, so then the claim or finding is that if we look at cat and dog inside of here in English, it gets mapped to some fourth space here, which is like a Rosetta stone space or platonic space. And that's exactly what they call platonic. They use the word. Okay, great. Well, great. And then it looks like this there. Okay. And then if you were to say, okay, well, let me just forget about English and this platonic space. Let me look at cat and dog in Spanish.

[41:25] Okay, and it looks like this here. Let me map it from here to my platonic space. Oh, wow, it gets mapped to a similar place. Oh, and does the Mandarin let's find that out a cat dog. It does. Okay, let's test out more words. So the claim is that this space here is this meaning like space.

[41:45] Okay, great. And then what you're saying is that microphone, we think of microphone as living in here as a single vector, that would be like an essence of the microphone that we're referring to. But actually, microphone, our concept of microphone depends on the prompt. So explain that that sounds interesting.

[42:04] Yeah, I think, and you're making me think about this in a way that I hadn't quite before. So the level of which I've thought about it is that you've got these different embeddings. When I see a microphone visually, there's a certain vector representation of what that sensory perceptual experience, and I don't mean the qualitative sense, I'm not getting into phenomenology, but

[42:32] There's something happening

[43:02] Each individual token is simply a vector in that space. So it really picks out a specific point. And we can say microphone lives right here in this linguistic space. And then my perceptual experience, I don't want to use that word, but my perceptual kind of grasping of this microphone being here is this point in a completely different space, this perceptual space.

[43:30] which has, you know, it captures other kinds of information. In language, so let's actually talk about this for a second. In language, the space, if you want it to be a useful, meaningful space, you're going to want things that have similar meaning. They're likely to actually have proximity to each other. And this is, to some extent, what the large language models learn. They learn in embedding. In order to do next token, they learn embedding that gives this, where the space, you know, and we could think of almost like two-dimensional, three-dimensional space.

[44:00] It's very high dimensional, but you know, for our purpose, we think about that, that where cat and dog live, you want those things to live closer together than cat and desk. And of course, it's much richer than that, right? It's not just semantic, like this very kind of superficial level of semantic similarity. In fact, what it is, is capture this somehow the semantics, so to speak, are captured by the space, like the space, the shape of the space itself,

[44:29] is what allows the model

[44:44] in terms of next token generation so that it's useful for that purpose. What does the perceptual space look like? Well, this perceptual space is going to have a very different... The axes there almost certainly aren't going to have the same kind of meaning as in the linguistic space. There'll be something closer, maybe color features, shape features, something like that. And where this microphone lives is within that space is going to have radically different meaning than saying...

[45:11] It's not apples and oranges, right? Those aren't different enough, right? It's apples and math or something. It's really, really radically different kinds of spaces. But what I'm proposing, what I think the insight here is that ultimately there is the possibility of having a shared space that you can send, you can project both of these things to where microphone, the word, is going to somehow make contact

[45:42] With this perceptual experience right now, this perceptual fact, but it's not, and here's the key point that you're getting at. It's not that this word microphone picks out the exact same embedding in this latent space. It's not that it's going to make that thing light up. Oh, it's the same thing. No, it's that when you ask a certain question about a microphone, is there a microphone on your desk, my perceptual system is

[46:11] Generating some some well first of all just generating the perceptual phenomena, but then it's also sharing information in this latent space which my linguistic system can then go draw from and then Given this particular prompt was there a microphone on my desk. I'm able to then successfully answer the question So it's not it's not quite the same thing as saying that they are they're picking out the same information in latent space because

[46:38] My argument is that that's not really a meaningful concept. There isn't the same. Microphone in linguistic terms doesn't pick out a perceptual kind of fact. That's not possible. These are radically different kinds of facts. But what the latent space might allow us to do is not just to translate, which is what they did in this paper, but perhaps to pass information along in a meaningful way so that you're able to access it and do something successful like answer the question.

[47:08] Is there a microphone on this desk? I think that might be what's happening to some extent even in the multimodal models. So it's a longer conversation. That's not really how they work. They don't actually operate based on a shared latent space or anything like that. Really what they do is the models learn to take a perceptual input and turn it into something like language. So it's more similar to like prompting almost. It's not exactly that. But it's injecting something within

[47:38] Linguistic space that is equivalent to actual language. It's not the same thing as the shared latent space, but my hypothesis is that there may be something very similar happening. So you don't think that multimodal models will solve the symbol grounding problem? You don't even think there is a symbol grounding problem? That is a fair question. And here's actually a prediction or a falsifiable.

[48:07] in some sense.

[48:37] organism that's able to use language and also use perception in, you know, bridge these different maps in a meaningful way so that we can get, you know, full coherence. I guess, you know, let's just call it human level perceptual linguistic coherence so that I can say to you, hey, can you go grab that or say to a machine, can you go grab that object

[49:06] described what I want and then the machine is able to go and do exactly what I described, then my argument is that I don't think, and again this is speculative, I could be proven wrong certainly on this, my suspicion is that we're not going to be able to do it using the kind of approach that multimodal models currently use, that you're not going to get there. It's kind of a dumb trick the way that we're currently solving the problem because we're not really allowing these two different

[49:32] This podcast one day is kind of an early sort of canary in the coal mine for this idea is that it's something closer to this kind of shared latent space. What you have is these completely distinct kind of mappings

[50:02] We call them embeddings. They can kind of grow up on their own, learn the information that they need to independently of one another. But at the same time, they have this sort of shared sandbox where they're able to communicate with one another and do things. So I think it might take a very different approach to get full perceptual linguistic competency.

[50:23] This episode is brought to you by State Farm. Listening to this podcast? Smart move. Being financially savvy? Smart move. Another smart move? Having State Farm help you create a competitive price when you choose to bundle home and auto. Bundling. Just another way to save with a personal price plan. Like a good neighbor, State Farm is there. Prices are based on rating plans that vary by state. Coverage options are selected by the customer. Availability, amount of discounts and savings, and eligibility vary by state.

[50:53] Okay.

[51:23] Have you heard of Wilfred Sellers? I believe it's Sellers. Oh gosh, I read Wilfred Sellers in early, nearly one of my first philosophy classes I ever took. I'm trying to remember the name of the book, but I'm sorry. So what, which, which, which work by? I believe it's empiricism and the philosophy of mine. I'll put a link on screen if I'm correct. Sounds familiar, but, but catch me up. So he's criticizing the idea that our perception gives foundational non-conceptual empirical knowledge.

[51:50] So these experiential givens that we think of as primitive, like redness, he would say that they involve

[51:57] Heavy interrelations of concepts. So for instance, the way that I think about it is if you're to say to someone redness, they'll be like, well, what kind of redness exactly are you talking about? Then they'll think, okay, the redness of an apple, but then an apple is not always red. Okay. Redness of an apple in a certain season with a certain type of sunlight. Okay. Now I've gotten it. So by the time you go in to pull out this primitive, you've then soaked it with so many other concepts. You can't actually come in with language and pull out a primitive.

[52:28] Yeah, that sounds extremely similar to sort of to the initial insight. And it's related to the inverted qualia problem. I don't know why your red is not my green and vice versa. And it's because the linguistic representation doesn't capture, you know, we can think again, it lives in a completely different embedding space. And when we think about the redness of red,

[52:54] Well, it's qualitatively similar to orange in, you know, there's sort of a continuum between those. Those qualitative similarities are really only contained and only understandable by the sensory perceptual system. And we can talk about them. We can sort of say, yeah, red is a little more similar to orange.

[53:13] That's because we have a very coarse maybe via this latent space where we're able to refer to certain kinds of properties in a way that is useful for communication. But as far as that raw qualitative property, it's primitive in the sense that we can't unpackage it linguistically.

[53:42] But it's not primitive in the sense that there's extraordinary cognitive machinery that is responsible for that qualitative. Think about the world of animals and what they do with color and how well they understand shape and how they understand space. All of that is unavailable to our linguistic system. It's available by the way to us.

[54:04] Our sensory perceptual system, but it's unavailable to linguistic system because it doesn't live in the same space at all. And so I think what you're describing actually sounds extremely similar. The idea that we can't really dip in. It's simply the wrong map. We can't map this map onto that map at all. We can go to this and just maybe the potentially shared latent space or maybe again, maybe my accounts wrong and there's some more direct

[54:25] Kind of handshake that happens between the systems but ultimately they're they're they're taking place in radically different spaces and you're losing an enormous amount of information it's literally you know quantifiably a loss of information the word red does not convey redness because redness is not just

[54:47] a word. It's not just a simple concept that you can say in using an individual token. By the way, the word red is not so simple either. Red in language space is also complex. It has all kinds of relations to other words, but the concept of red has all of this complexity to it because it's

[55:10] Yeah. And just so you know, the way that I relayed what Sellers myth of the given is, it isn't precisely what he was saying because he was more about knowledge and I'm speaking more about the percepts, the raw sense data than being taken to language, like being dredged from the

[55:38] from your sensory data to language.

[55:54] Thank you. Thank you. So now you're an LLM speaking to some other LLM trying to convince it of some truth that we mentioned before, like you have this model, whatever you want to call this model, autoregressive language toe model. What are you even referring to? You're using language to convince myself to convince yourself to

[56:24] Explain what are you even explaining? What are you referring to? Yes, you've asked a very hard question and and there's there's a certain I think of it as a bit of a paradox that's sort of inherent in sort of what I'm trying to do Because language is trying to describe itself and in the process of doing so it's actually deconstructing itself It's saying I am just this And I'm not what I think I am but who's I and what do you mean think right? How is language have wrong concepts about itself?

[56:53] that are actually manifestations of its own structure. The good news is that I have a sort of escape hatch here, which is this is really in some ways a very, very simple account. And it's just that there's prediction from sequence to token. How that does stuff in the world is a harder problem. How that does stuff, as we were discussing,

[57:23] How does it allow me to say something to you that then can have perceptual consequences, behavioral consequences? This is certainly a difficult problem. But we can ignore that problem for a moment and say, we are going to take language on its own terms. And what language is, is simply a map amongst meaningless squiggles, is simply a map amongst various

[57:47] What we can think of is largely arbitrary symbols, and those symbols can get grounded in writing. They could get grounded in the activation of circuits. They can get grounded in the dendritic or neural responses. But the core hypothesis here is that language is simply a topology amongst symbols.

[58:17] And by topology mean connectivity?

[58:31] to try to capture this structure. In the case of large language models, it comes down to these embeddings, which you can do from a graph theoretical standpoint, but you don't have to. You could just think about it as a space, and then you're just simply saying where each token lives within that space. And that's really the representation of language. But what it is is relational. It's that these symbols have relations to one another within this space.

[59:00] The relations are then used in order to generate. And that's it. Now, how does meaning emerge out of that is a separate question. But my argument is that language doesn't have to worry about meaning. Language just has to worry about language. So when I say I'm talking to you and I'm having a conversation with you and trying to explain something to you, this is an LLM actually producing a sequence

[59:28] And what that sequence is going to do, it might do certain perceptual things, by the way, in your mind, it might produce certain kinds of images. Those are kind of auxiliary to language. Those happen as well. I'm not denying they happen. But as far as this conversation goes, I am producing a sequence that's going to serve as a prompt and you're going to predict the next token. Yeah, without my consent, by the way. And that's that is that's in some ways, you know, that not to you.

[59:54] Take that too seriously, but yes, one way to think about it is that language is actually forcing your mind to do something else, whether it's produce images but also to produce sequences. So my choice of a prompt is actually going to deterministically

[60:15] There is, within large language models, there's some probabilistic kind of behavior in the sense that they generate a distribution of the next token and then you add a little bit of chanciness. You say, maybe I'm going to pick the most likely versus this is the temperature, but it really is deterministic and yes, the prompt I'm going to put into your head is going to basically determine how you're going to respond.

[60:42] Now, mind you, again, there's a larger ecosystem where you're going to think about things visually and that's going to go feedback into the linguistic system. So it's not quite as simple as prompt in and sequence out.

[60:55] But at the linguistic level, that's basically what I'm arguing is now the fancy stuff, which is basically meaning and the ability to coordinate all that falls out of how our minds ultimately form this space. Now we could, you know, we could, we could, you can take an untrained model, an untrained large language model, you give it a sequence in, it's going to give you a sequence out, right? It will do that. And we say, Hey, look, it's, it's doing next token generation.

[61:23] I

[61:44] This harder problem of, well, I can tell you something and then that's going to determine not just your language, but your behavior later on. And so there really is something more. The map matters, right? The space, the shape of the space is really, really critical. It's not like autoregressive Next Token solves the problem. It's that autoregressive Next Token generation, when optimized in the larger ecosystem of behavior and coordination and communication,

[62:13] Does this thing but still I don't want to back away from this when you get down to it in the end What you've got is just next token. What you've got is just language generating language That's really what language is. That's what's what we're doing when we're doing thinking linguistically and The fact that it happens to have this meaning is not actually Driving the computation right you shape the space the space gets shaped by other factors things like

[62:43] the learning. Well, you learn about the different tokens and how they relate to one another. You learn about, perhaps, the utility of certain tokens to refer and to map to these perceptual phenomena. But by the time you're doing language generation, the space has been shaped. And so all you're doing is next token generation. All you're doing is predicting tokens.

[63:11] And so I don't want to back away from that. The strong claim is that language simply is that. And it's autonomous. It has these properties. Through all this optimization over the course of development, maybe evolution, it's not part of my theory at this point. Chomsky's poverty stimulus, all of these problems of how do we get to such a magnificent space? How do we get to such a magnificent shape of this space?

[63:39] such that it is able to map to, you know, or at least serve this utility of being a coordinative kind of a tool. All of that has to happen. But the bottom line of what language is, is unchanged in this account. Okay, so I want to explore more about language and then it

[64:03] Relating or giving rise to action and other systems visual systems, etc. Like what is there? So I was worried about that but good Okay, so if look there are meaningless squiggles, how is it that some meaningless squiggles your brain squiggle generator? Makes your physical body get up and close the door because your dog was barking Yes, where does action connect to abstraction? Yeah, so and that so that is that is the key to

[64:28] question that I believe that's what we need to solve. That's sort of what the field of linguistics or whatever we want to call it, maybe even cognition needs to solve because these mappings are happening. What we know from the large language models is you don't need that in order to be proficient in language. So this is where we have to start from. That's the starting point that the language can live on its own and you can learn language. In theory, you can learn language independent of any of that stuff.

[64:56] the ability to make somebody get up and move, the ability for me to reason about perceptual phenomena, language is able to be mastered entirely based on its own structure, the meaningless squiggles. Now, the question you're raising is what I think is, that's what we need to do as a species. If we want to understand scientifically how language really works is to understand how you go from

[65:25] An autonomous self-generating system that has its own self-generating rules that are determined simply by relations between these meaningless squiggles and how does that then get mapped to the ability for me to use some of those tokens and then get you to do stuff.

[65:47] Right. And so that is what's that's what language learning is. So there's going to be, I guess we can think of almost two, maybe even independent processes. One is learn how words play with one another. OK, learn that this kind of this word tends to be in relation to that word. OK, that one solved. Yes. As far as you're concerned. Got it. OK.

[66:09] Then we also learn about perceptual phenomena, right? We learn that there's things on top of other things and there are actions we want to take, the things that my dog understands. Now, the question is, how do these things bridge? How do you get from tokens that have their own life of their own, sort of relational properties amongst one another to that other kind of, I guess,

[66:36] Facts about the world is just another brain state. All we've got is brain states.

[67:00] This is fact number one about what we've learned about ourselves as a species. We have perceptual brain states. We also have maybe linguistic brain states. Those perceptual brain states are in some ways related to what's going on in the world. Potentially, we can think about them as being related to what you can do in the world as well. Maybe actions

[67:28] Well, we have brain states that correspond to our proprioception, our muscles, where things having to do with our own body. And so there's these various brain states that carry, we could think of them as carrying information. The reason I'm worried about using that phrase is because again, I don't believe in sort of a one-to-one simple correspondence where we say this particular brain state corresponds to

[67:52] You know this perceptual Kind of phenomenon in the world or some state of the world because it's probably not that simple It's probably closer to this potentialities, right? There's some sort of activity that's due to my perceptual system that my brain can do things with and engage with in some way and so but what we do have is these brain states that are derived from from distinct sources of information

[68:23] Sensory perceptual and then linguistic. Linguistic gets there, by the way, through sensory perceptual. We're not going to get into that, right? We're thinking of symbols as being kind of arbitrary. Yes, you have to hear the word cat and you have to hear the word dog. But I think we have a good reason to say now it's just like large language models that these are kind of arbitrary symbols with the relations between them. That's what matters. OK, so you've got these distinct brain states, which in some ways, again, this is

[68:51] philosophically fraught, but in some ways represent facts about the world, perhaps, but I don't really want to go that far. But you've got these brain states that need to talk to each other so that they can coordinate. And that is sort of the key fundamental problem that our organism has to solve. And it's not just like, of course, you're not born having the linguistic, you're not born having the linguistic mapping all solved. You have to learn that.

[69:21] But you are born into a world where it's already been solved. Meaning, we've got these corpus of language, the thing that the large language models were trained on. That pre-existed the models, just as when a baby is born, the English language pre-exists the baby. You can learn the mappings, and I believe you do, you can learn the embedding space of language without the other stuff, right? That's again, that's sort of the key insight for the large language models. So that already contained within

[69:51] the linguistic system that we've honed over however many years it took for humans to develop language. We've honed a system that has this utility built in such that it's a good thing to dump into that latent space so that when a baby hears the word ball, sees this object that's a ball, gets that mapping. But again, the word ball

[70:16] is really meaningful. It really has its own role in relation to other words. But over the course of development, you also learn this kind of what I think is maybe a latent space bridge or some other bridge between these. And so in the end, you end up being able to tell somebody go pick up that ball. And of course, they're able to go and do it. But you're really engaging very distinct mechanisms that have some way of bridging, which is it's a non-answer. I'm not going to pretend

[70:46] that that is even halfway to a solution. But I do think it's a sketch of how the cognitive architecture ultimately really is built. I think that we've now nailed down one piece, the linguistic piece, and we're able to say this is how it lives, and this is how it would operate, and it is autonomous. And then we don't have something similar for perceptual space and for motor space. We don't have something comparable. We haven't been able to capture it.

[71:16] So maybe this is a solved problem, but as, as it stands with chat, GPT and Claude and so on, they're fixed models and they're producing some output, but it's not as if when they're speaking to one another, they then retrain their model in real time. And it would seem like that's more like what's occurring with us. So maybe that's just a technology. Are you referring to like different, different language models, just chatting with one another?

[71:47] No, I mean, even us right now, we're learning concepts from exchanging it with one another and we're producing new ones and we're deleting old ones, potentially modifying old ones, recontextualizing. It doesn't seem like that's occurring with Gemini 06-05. Great question. Great question. And people

[72:08] This is one of the key challenges of the identity hypothesis that we're doing the same thing, which is continuous learning. There are two things that happen in large language models that we can call learning. One is the actual shaping of the space, which is really just determining

[72:34] You know, the connectivity between neurons, again, you could think of it as a graph, you know, or you could think of it as sort of just an embeddings of determining the embedding. But whatever it is, that happens during the course of training. And that's kind of done offline. And yes, it's so that's training the model. There's also fine tuning, which is just more of the same. You have some new data you want to incorporate into the weights of the model. That's actually going to, again, change the shape of the space if you want to think about it that way.

[73:04] And then there's something called in-context learning. And in-context learning is where you're in the middle of a chat and you say, Hey, chat GPT, let me teach you a new word. It's a global global and global global is that feeling you get when, uh, you know, you really, you're tired, but you know, you have to keep working or whatever. Chat GPT can, can use that word very successfully. I got global gobble up the wazoo. Sure you do.

[73:31] You suffer from extreme global.

[73:46] I don't remember who put out the paper, but it was about the shocking generalizability. The in-context learning seems to be too good to be true. But lo and behold, that's what happens. And that is happening in the autoregressive. It's happening even though this model has never seen Google global, even though it's never encountered that word before. But here it shows up in the sequence and now

[74:14] Through the auto aggressive process as it's churning through the longer sequence with this word in it is now able to predict sort of the next token in the appropriate way. So that is using that term correctly. So we do actually see this kind of continuous learning in the case of these models. However, it's happening in context. It's and what that means is

[74:37] you know, from a practical standpoint is if you start a new chat window, yes, it doesn't know that word anymore. So what would be the analogy here is context, window length, our working memory? Like what's the actual? Great question. Yes, that is what I truly believe. And this is a different line of research. But with some caveats. So yes, in my conception, what we call long term memory is just fine tuning of the of the weights.

[75:06] It's information that gets embedded in the actual weights of the model. So the static model we can think of when it's not actually in the process of autoregressively generating. Working memory is literally autoregression. So what would the analogy for rag be then? What would the role for rag be? Okay, so this is where I'm at right now. Does the brain actually do anything like retrieval?

[75:35] I've decided to stake out the extreme view that our brain doesn't do retrieval at all. That all we do is fine-tuning and then next token generation autoregressively. And we don't actually ever retrieve per se. That we don't ever actually have to do anything like RAG. RAG is a transitional technology. I don't believe long-term that we're going to have to do something like that. We're going to have to have something like a stored database and then a search.

[76:05] One of the reasons I believe this is because that's not how our brains work. We don't do that. Cognition doesn't work that way. We may sometimes sit there pondering and trying to recall a fact. But when we're doing that, we're not actually searching a space. It's either we're running some sort of chain of thought where we're like, okay, I remember I was doing this and I'm trying to actually produce the appropriate sequence in working memory such that

[76:32] It'll pop out. The right fact will pop out from the autoregressive process. Sometimes we just find ourselves trying to remember something, trying to remember something. There's a tip of the tongue phenomenon. The reason why tip of the tongue phenomenon, I believe, is so frustrating is not because we're searching, searching. We're actually running some sort of search retrieval process. It's because

[76:56] Part of our brain actually is running the autogenerative process and we kind of can feel like the word, we can almost generate, we can produce it, but it's short-circuited somehow and we can't do the full generation. So my hypothesis is that we don't have anything like RAD. All we've got is this, and it's in some ways a very simple and I think elegant model. All we've got is fine tuning and that's what we can call memory consolidation. That happens after the fact over the course of minutes and weeks and months and years.

[77:25] It's not working memory. I'll tell you what I don't I don't working memory in the way it's it cognitive psychology has thought about it for many years. I think I frankly think is erroneous. It's not this super time duration limited, you know, seven seconds or 15 seconds. And after that, it's a cliff.

[77:53] and you don't remember anything. That's what happens when you have to directly explicitly retrieve, like what was the last word I said? Tell me the exact sequence of letters or numbers. That's not something our brain actually has to do regularly. Instead, what we're seeing in working memory, we can do that, we can do retrieval of the last seven seconds, but that's because we have continuous context. And there is a decay function, unlike the large language models, which represent everything

[78:22] It's not retrieval.

[78:42] It's guiding. It's the past is guiding the generation. And so what you and I talked about an hour ago, I don't know how long been going here, probably a while. I don't know how much global you've got going on, right? It's been a while. So those those tokens that we were that we were expressed, you know, an hour ago are still guiding the generation now. Now they're doing so less than the than the last 10 seconds. This is, you know, we could think about it as kind of a decay function of some sort, where they're having less impact.

[79:11] We see that in the models too, by the way. If you look at the attention weeds, words that are farther apart have less impact on one another. That is a direct reflection of the fact that language is

[79:25] You've been generated and humans do this, right? We the words that we spoke about a few seconds ago are more impactful on the words that we're going to say than we spoke about an hour ago. But the idea is, yes, that what we've got is this not I don't I don't use the term working memory because I think that's very fraught with with like the the the modal model that's been in vogue for a long time. We're working memory model badly. And all these folks, they were really thinking of this very short duration time limited boom. No, this is

[79:54] Continuous Activation, namely context. I don't know how far back it goes.

[80:16] I don't know how far back it goes, right? This is an empirical question. Does it operate over hours? Does it operate over days? Is there a continuous activation, a more dynamical form of memory that's happening? That's not the same thing as long-term memory because long-term memory memory is not a database in your model. My memory is not a database. Correct. What memory in my model is, is the there's two things. Memory is the fixed weights of the neural network.

[80:43] which can represent, they don't represent facts, they represent potentialities. Those fixed weights are, what does that mean? It means if you give it a certain input, it's going to produce a certain output, right? Just like a large language model. If I say to it, tell me the, you know, recite the Pledge of Allegiance, it will say, here is the Pledge of Allegiance, right? The next token is going to be out is here, whatever. But then it'll actually say the Pledge of Allegiance. And all of that is a potentiality

[81:08] that's embedded, that's encoded in the weights. But the weights, you're not going to find that fact in the weights. It's the weights are there as potentialities ready for whatever input comes their way. They're going to produce this input. Okay. Okay. Okay. So that's the weight. And then you've got the running sequence and the running sequence. And we, we see this from, from in context learning, but it's the, it's the core autoregressive process. The sequence itself,

[81:36] Is there some

[81:58] computation going on there's some black box occurring but let me make it simple for linear algebra you have a matrix a matrix operates on a vector to produce another vector okay so you may look at the whole thing right all right exactly so you may look at this where my arm is pointed up and to the right at least on my screen right now

[82:19] And you may say, where is this in the matrix? And the answer is this isn't in the matrix. But if you take this guy, my arm is now pointed to the left, maybe parallel to the horizon and have the matrix operate on this, it moves it here. So the mistake is for us to look at the output and say, where's that output inside the box? It's not that it's the input with the black box. So the input with the matrix that produces the output.

[82:47] that is perfectly set exactly with and then there's but one additional piece which is after you've produced that you're also again taking taking that that output and then using that as the input as part as a neck as as part of the sequence of input and that's the order aggressive piece and that's what's so gorgeous about it is that the potential realities aren't just to produce a single output but it's to produce the sequence but to do so one piece at a time

[83:15] So that's what the matrix is. The matrix doesn't really even have the sequence in it. It doesn't have a sequence in, sequence out. That's not even correct. It's sequence in, one token out, add it to the sequence, do it again, do it again. So the sequence is in there, but only in this potential form. It has to do it autoregressively. It can only produce the sequence by feeding it back into itself recursively.

[83:43] And that's a radical way of thinking about what the brain is doing, right? That what it's really doing is it's generating the next input for itself, not just generating an output, but the next input for itself. Super interesting. Yeah. It's a, it's a, it's a, it's recursive. It's, it's fundamentally recursive. Um, and, and, and when we think about what the system is built to do this recursion, right? It's not just like, this is one way to get to it. The language contains within it, the ingredients for producing

[84:13] This kind of recursion, the language it contains with it, this sequence of language that they learn, it's built to have this recursive capability within it, that this word is going to produce the next word, which is going to produce the next word, premised on the entire sequence before it. And that's the crazy thing. There was also this interesting result Anthropic put out a paper a little while ago, I think it's called The Biology of Large Language Models.

[84:42] Next token, even though you're only producing the very next token from the sequence, but the language models have learned that because they learn sequences to next token, they've learned that within any point along the sequence, that point in the sequence is pregnant with the potentiality for not just the next token, but many other tokens moving forward. It's the whole trajectory that is sort of encapsulated

[85:11] In that matrix that you're talking about earlier, the matrix is just a matrix for taking a sequence, produce the next token. But no, no, the matrix is customized so that it's going to run recursively. And so it's tuned in such a way that it's going to produce the next word, the. Well, that's not useful. No, the is the next piece of the autoregressive chain that's going to produce the man when to the store.

[85:41] It's not just any old matrix, it's an indescribably rich kind of information that's contained within that matrix. And I like to think about if aliens landed and found the brains,

[85:55] You know, because we've been wiped out by AI. I'm kidding. I'm kidding. Right. But, you know, there's no humans left, but we find the brain sort of crossified and we were able to do this and we start feeding it stuff and we could see that there's this input output. If you didn't do the auto aggressive piece, you would never understand what the hell this thing is doing.

[86:12] Note, Elon's been talking plenty about autoregression and the technically minded among you may be wondering about the success of diffusion models. While we don't get to it here, he does admit that his thesis would be undermined if diffusion models were accurate enough for natural language. But so far they seem to be only good for coding. This is something I love about Professor Elon Barinholtz. He's extremely humble and open to how his model can be falsified. If you didn't do the autoregressive piece, you would never understand what the hell this thing is doing.

[86:41] It's you would get it all wrong because you think its purpose is to produce some sort of label or some sort. No, its purpose is to produce these sequences, but you have to run it. You have to run it on or aggressively and get the output and then feed it back in as a sequence. So memory, this kind of short term memory, working memory is fundamental. It's super, super fundamental. The brain is I don't want to.

[87:06] I don't want to anger people, but it's non-Markovian. It's fundamentally non-Markovian. It's not state in and then the current state and then produce the output. It's previous states. There's a sequence of states that led to the current state and it's the particular sequence that leads to the next token and the next token is going to be the next element

[87:33] This puts you in good company with Jacob Barnes.

[87:54] Ultimately, it has this sort of normal coven property that the universe sort of has a memory has to in order to produce, you know, consistent coherence of space, you know, space in the space time has to have a sort of memory. If it's just instantaneous, this current state, well, then it wouldn't really know what to do. It has to sort of know what happened recently.

[88:18] Just a moment in your model, because our minds work autoregressively and must be non Markovian in your model. And this is how our cognition works, which we didn't exactly get to. We got to the language is an autoregressive model. Your next thesis was that cognition itself is autoregressive in a similar manner. Later, maybe we can explore it here today. Maybe we'll save it for the next part. It's that physics itself is autoregressive. However,

[88:45] Physics is a model, and many people will conflate physics with reality, where physics is our models of reality. So are you making the claim that reality is non-Markovian, or are you saying that necessarily as we model reality, it will be non-Markovian? No, I'm making the former claim that reality itself is non-Markovian, that we observe in physics certain kinds of phenomena, that we end up having to use tools like, refer to things as forces,

[89:12] that ultimately are really kind of sneaking in a past. And the idea is that the deterministic nature of the fact that there's coherence, you know, the spatial-temporal coherence, the fact that, you know, that things move the way they do through space, there's a contingency on the past in a way that you can't really capture by saying you could fully... The past is actually present. The past is in the present.

[89:39] In a deep way that the universe really has to have a memory in order to produce the sort of the next frame, so to speak. That's sort of the shallow version of the claim. It's not about our particular characterization of physics. Our characterization of physics

[89:59] Observes certain kinds of spatiotemporal continuity certain kinds of contingencies that really depend on What's happening in not just it's not about this instantaneous moment, right? Like in some ways It's like Zeno's paradox. It's it's you know, we can use calculus and say no no and in fact, there's an instantaneous rate of change, but That's that's a mathematical trick

[90:24] That's really getting away from the fact that no, there isn't an instantaneous anything. There's simply a continuity that depends on what's happened in the past. But I know I'm going to get attacked by physicists and I'm not really well equipped to fend them off. So I don't want to be too bold in this piece because it's not in my wheelhouse. But I do want to take that question and this conversation. Do I think the brain

[90:52] Is just leveraging sort of the memory of the universe. No, I think the brain I think and that this is an empirical claim That we see interesting features of the brain like feedback loops. There's all these all these Backwards kinds of connectivity there's recurrent loops and things like that and they're not well understood and the predictive coding has some things to say about that I have some things to say about predictive coding and I think that

[91:21] What we may find is that this kind of memory, this kind of continuous, we can call it a context, a continuous activation, but this ability to use the past to guide the next generation is going to end up being physiologically built into the brain. It's not that the brain is just leveraging memory of the universe. No, the brain has to do memory. It has to actually retain

[91:47] The words that I said a couple of seconds ago to be able to generate the next word appropriately. And in fact, that's what we see and what we see from, you know, so-called working memory experiments. You can really go back in and say what happened before. My claim is that it's not because it's there to retrieve, but rather it's just guiding my current generation. But still, it's represented. It's there. What happened in the past,

[92:12] You know, it's not like Vegas, right? What happened in the past days doesn't stay in the past. It actually guides the current generation and it's guiding what I'm saying right now. And it's doing so, you know, smoothly, meaning it's happening from a second ago. It's happening from a few seconds ago. But all of this is beautifully modelable using large language models. We can just look at tension weights. We could say what is the impact of information from this far back?

[92:42] I don't think the brain is doing probably not doing what

[92:50] These large language models do and that's one of the reasons I say I'm not claiming that we are a transformer model. I'm not claiming we are GPT in this current incarnation, right? What I'm claiming is that the fundamental math what you just said before is matrix multiply is vector times matrix multiply to not next vector autoregress do it again. That's sort of the level of abstraction at which I think it's accurate. I don't think we're we don't have the whole context. We don't have the entire conversation. We've just had

[93:17] GPT does, and it's probably a deep inefficiency in the way these models run right now. They're very computationally expensive. Too computationally expensive to run in a brain, most likely. We don't store all that information. We forget stuff.

[93:31] Right. She BT doesn't in context. It doesn't forget. Although if you go far back in context enough, it kind of does, which is interesting, probably is similar to what we're talking about because the you're you're waiting things that are further back less. But in humans, we're not doing the whole context. We're not even like 30 seconds back in perfectly. But some representation and what the nature of that representation is, that's what I want to do with the rest of my life. I want to understand

[93:59] What does the context look like in people? What is that activation? How is it physiologically instantiated? And what are its mathematical properties? How much does – how is what I said 10 seconds ago influencing what I'm saying now? How about 50 seconds ago? How about 10 minutes ago? How about a year ago? Does this thing continue? Is there dynamics that are continuing over months and years? Possibly. It doesn't all have to be fine-tuned weights. It could be that there's

[94:27] Decaying activation that spreads over much longer periods. Once you allow that it's not explicit retrieval in the working memory form, then all bets are off as to how the dynamics of this thing actually works. I see this as a possible new frontier for thinking about

[94:47] You know what, what, what, what memory really means in humans. But I think physiologically, you know, coming back to that question and there I was just trying to do it. I was like, okay, let me rerun. What was the original question, right? So, so, uh, in the brain, what's the happening in the brain? I think we, you know, my hypothesis actually leads to some concrete predictions that we're actually going to be able to find some correspondence between, you know, unlike the working memory model, I think we're going to be able to find 10 minutes back.

[95:15] We're going to find some, some activations that are interpretable. We'll be able to decode them as guiding my current, my current expression, my current speech. It's very different by the way than saying, you know, the classic decoding model in these things is here's some neural activity. Is it this picture or that picture? Is it this word or that word? It's not going to look like that. It's not going to look like that. We're not going to be able to code it in the sense of like a concrete specific static thing.

[95:42] We have to decode it in terms of whether it's guiding my next word because that's what it's doing. It's not there to be retrieved. It doesn't have a concrete specific meaning. It has meaning insofar as it's guiding my next generation. And so we have to think about this entire project differently. If we want to think about longer term working memory, so to speak, we have to think it in terms of how is my speech, how is my behavior now influenced by what happened a while ago, not

[96:09] So one of the reasons I was excited was and am excited to speak with you is that I see this as a new frontier as well. But for me, I have a side project, which I'll tell you about maybe off air because I'm not ready to announce it.

[96:30] But there are philosophical questions that we can look at with the new lens that's gifted to us by these statistical linguistic models, the ones we call LMS, LMS, sorry, physical philosophy. I don't know if you've heard of that. Have you heard of this term physical philosophy? So you can use philosophy to philosophize about physics, but you can also use physics to inform your philosophy. So there are some established concepts and theories and empirical findings from physics, like special relativity or quantum mechanics.

[96:58] that inform and constrain or even reframe traditional philosophical questions such as the nature of time that wouldn't be there had we not invented special relativity or found special relativity. Okay, so I think there's something about these new models that can be used to then inform philosophical questions. Like you mentioned, there is no symbol grounding problem.

[97:22] If physics has a memory, does that mean that energy isn't conserved? So is a particle carrying with it its memory, then why isn't it heating up or getting more massive with time?

[97:49] Why isn't it going to form a black hole? This is why I venture very carefully into these waters because I would need some time to go and read and think about questions like that and you're in a much better position to ask and reason about those questions.

[98:11] Yes, I and then you'd also have to talk about why is the present plus velocity model like way of viewing the world so successful like to predict an eclipse you don't require knowledge about 100 and 200 and 300 years ago all at once right you just know the present pretty much right but even velocity again if you if you sort of take me at it if you consider the sort of the instantaneous you know the idea that velocity velocity

[98:40] Well, it isn't really in the present, right? You can only get velocity over stretched over time. It only has meaning. But you could say this particle has this velocity at this time. But that's a cheat, right? In some ways that I see that that really maybe it's just a rearticulation. So the physics that we've got, we've been we've been able to do this sort of symbolic representation of things like velocity that are sneaking in this kind of temporal extension.

[99:08] in a way that I think may not end up in an erratically different place, thinking about this as the universe having memory, as long as you just accept that velocity is a convenience, that it's a kind of way of communicating some property, such that you can say that this is happening instantaneously, but that's not real.

[99:34] So again, you're in good company with Jacob Arndez. I'm not saying that these questions are in principle unanswerable, but something else is that, look, if the universe has a memory, let's say a particle has a memory, how much of a memory doesn't know about more than it's given space, like more than its neighbor, because then do you violate locality? Right. Like these are different questions that will have to be answered. Yeah. And I wish I could tell you that maybe this is a solution.

[99:59] to

[100:13] that there is some memory of their shared origin that somehow, I still don't know how that gives you a spooky action at a distance. It's not a good account, but it might have some relevance. If you think about things very differently, if you think about the universe has memory, well, what does that change? If you just speculate on that and try to reframe things that way, could it potentially help solve some of these issues? I don't know.

[100:42] So let's go back to language. A child is babbling. Yeah. OK, so let's call it vocal motor babbling. It doesn't actually know what it's doing. When does it decouple and become a token, like a bona fide token with meaning? That's a great question. I would say that it becomes a token when the infant learns that a specific, more, you know, phonological unit

[101:11] has relations to some other phonological unit. Language ultimately is completely determined by relations. It might be a very limited initial map of the token relations, but as soon as it's relational,

[101:33] Then we would say that that becomes discretized such that it's that it's meaningful to say that these symbols have the relation to one another. If it's just sounds, ba ba ba ba ba ba ba, right, ba ba ba ba ba ba can't has no specific relation to any to ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba

[102:02] So help me phrase this question properly, because

[102:32] I haven't formulated it before, so it's going to come out ill-formed. Earlier you talked about analog, and I believe you were referring to it as like the animal brain is analog, but then the language is digital, if that's the correct analogy? Yeah. Symbolic maybe is, I don't know, digital is how we, you know, actually in computers sort of instantiate, you know, with ones and zeros or whatever, a sort of symbolic representation. But yes, symbolic.

[103:00] Think Verizon, the best 5G network is expensive? Think again. Bring in your AT&T or T-Mobile bill to a Verizon store today and we'll give you a better deal. Now what to do with your unwanted bills? Ever seen an origami version of the Miami Bull?

[103:19] The written word is something like

[103:41] Is there anything then about language that changes because it wasn't written down? Sorry, is there anything about your model that changes because it wasn't there to be tokenized

[104:11] It's such a great question. I've been thinking about exactly that. I don't think anything changes. What's crazy about it is that until the written word, people might not have even thought about the concept of words at all. And so we were even more oblivious as a species to the idea that there were these individual discretized symbols that have relations amongst each other. Because until you see them outside of yourself,

[104:39] They just run. Yes. They're just running in the machinery of the language, how it's meant to run. It was just an auditory kind of medium. You don't really necessarily even think about them as being distinct from one another. You just have a flow, right? You just make these sounds and stuff happens. Once we started writing things down and especially phonetically,

[105:07] You know, because you think about like hieroglyphic and pictorial kinds of representations really don't actually capture words, right? They're very often, they're not distinct. They can actually be a little more rich than a single word. And so it was only with writing that maybe people really started to become aware that we have these things called words. And now it's only with language models that we really understand what words are.

[105:37] which has these relational abstractions, I don't know, symbols is just another word. I don't know if that even captures it fully. But what's wild about it is that the brain was doing exactly this and the brain was tokenized these sounds and was using the mapping between them in order to produce language. Probably long, long, long before anybody ever sort of self-consciously had a conception that there's such a thing as a word.

[106:07] And so that just blows my mind. It speaks to what I think is a very deep mystery, a very deep mystery. Where the hell did language come from? Here's what didn't happen. There was not a symposium of, quote unquote, cavemen or let's use the more modern term, hunter-gatherers. Okay.

[106:35] They had to figure out how do we make an auto generative auto regressive sequential system that is able to carry meaning about the world. This thing is just ridiculously good and it's operating over these arbitrary symbols and again when I say arbitrary symbols.

[106:55] Just to recap, it's not that it's arbitrary like the word for snow is this weird sound snow and it's kind of like what? No, arbitrary in the sense that these the map is the territory, right? It's like it's the relations that matter between these symbols. Is it completely arbitrary, though? So, for instance, there's the kiki and boo boo. You've heard of those. I think those are cute. I mean, that's that's the exception that that proves the rule to a large extent. I don't think

[107:23] i think it is largely arbitrary there is also just

[107:28] words themselves have an action component so when you scream a word you can physically shake the world around you and it shakes your lungs and if you speak for too long you can die let's say if you just exhale and you don't inhale it is a physical activity and it's hard to wrap your mind around like that's not symbols right that's not exactly captured by the symbols or by the just the sequence of words and again I'm only I'm really I'm just following where the data leads me because in

[107:57] The large language models, they no longer have any of those properties, right? It's just an arbitrary vector. The tokenization in the end, ultimately, yes, there's proximity, but it's just strings of ones and zeros. Well, it's not ones and zeros, but whatever. Your vector is just a string of numbers that end up having certain mathematical relations to one another, but completely and totally lost, as far as I can tell, is the physical characteristics.

[108:26] of these words. By the way, I should mention, there's a former student and I are actually working on this idea, this crazy idea of using that latent mapping that I mentioned in that earlier paper to see if maybe that's not true. I wonder if you could guess what English sounds like just from the text-based representation, or if you've never seen, you don't know what sound D makes or D makes or what sound a T makes.

[108:56] but you've got the map, you've got the embedding in text space, and then you've got some other phonological embedding, could you possibly guess? That's a long shot. So maybe it's not totally arbitrary, and maybe it's going to be, maybe the radical thesis here is it's not arbitrary at all, that the words have to sound the way they do, that the mechanics actually happen, like something happens mechanically based on the sounds themselves. But my bet is that it's going to be closer to arbitrary.

[109:26] Uh, it's going to be close to arbitrary, but I could be wrong. Uh, but you were going to say, why not? Why wouldn't the platonic space prove that it's arbitrary? Well, if in fact you can't do the mapping at all, if you can't guess it, if the platonic space says, you know, there's no way to get from text representation to phonology, phonology is doing its own thing and it's, and it's just like the word mouse is just for no good reason.

[109:51] Um, then then it's hopeless. Okay. Um, but if you can get anywhere and you can actually guess at all, then that would suggest that there really is a kind of autoregressive inherent, uh, there's an inherent autoregressive capability just in phonology. Um, and so what that would mean is it's not at the symbol level there. It's, uh, well, yes, it's no, it's at the phonological symbol level.

[110:19] But maybe that's happening even in a mechanical level, like there's certain sounds that are easier to say together or something like that, which could guide it. I don't know. It's convoluted in my head right now exactly how this might map out. But I think it's reasonable now to assume that unless proven otherwise, it's probably arbitrary and it's probably arbitrary symbols and what matters is the relation between them. There is no sense in which mouse means mouse, except that mouse ends up showing up

[110:48] After trap or before trap and after the, you know, the cat was chasing the and all of that. And there's nothing else. Let me see if I got your belt on showing down, but in terms of the syllogism. So premise one would be that LLMs master language using only ungrounded autoregressive next token prediction. Then you have another premise that says, well, LLMs have this superhuman language performance just by doing this.

[111:17] And then you'd say that, well, computational efficiency suggests that this reflects language's inherent structure. And then the deduction is that therefore human language uses autoregressive next token prediction. Is that correct? You got it. You got it. I mean, and it's not only computational efficiency per se. It's that if that there's two ways to put it. One is if that structure is there, it would be very odd if we weren't using it.

[111:47] Very odd indeed. If that structure is there such that it's capable of full competency, you'd have to suggest that it's there just by the way, but humans are doing something completely different. Okay. You then go and say that language generation feels real time to us. So it's sequential in real time and autoregressiveness or autoregression explains the pregnant... Very good. ...present. You've gotten very good at this, I see.

[112:16] Now, although we didn't get to this or explore it in detail, my understanding from our previous conversations is that you would say that brains

[112:44] have pre-existing autoregressive machinery for motor and perceptual sequences. And by the way, I don't know if it's brains or cognition has it, by the way. Well, remember, so the speculation is that the brain is going to have to have the machinery, the physiologic machinery to support autoregressive. So things like, you know, like the continuous activation, backward projections, ways of representing the past is sort of maybe built into the brain. So it's not, those aren't very that distinct.

[113:14] The main reason I think that is because if you believe as I do that language is autoregressive in humans, you can either propose

[113:44] that spontaneously, however language got here, in order for us to create language, we had to invent a different kind of cognitive machinery that's able to do this autoregressive, hold the past, let it guide the future, do this trajectory mapping between the past and the future. All of that kind of machine, that computational machinery,

[114:13] would have to have been built special purpose for language. Yes. To me, that seems extremely, uh, extremely unlikely. Costly. Yeah. Yes. So there's a term in evolutionary biology called exaptation. I'm not familiar with that. So exaptation means you have previous machinery used for purpose a exactly that something else comes about and uses that machinery and perhaps does so even better. So for instance, our tongues evolved for eating.

[114:44] But then language came about and started to use that machinery and now we use it primarily. Well, I don't know about primarily how to quantify that, but we use it more adeptly for language. I think most of our time, more time is spent talking and eating at this point. Yes, I know. But the reason why I said I don't know because we're constantly swallowing saliva at the same time. So I don't know how much.

[115:03] Predictive coding in a nutshell postulates that what the brain is doing, that what neurons are doing is actually anticipating the future state, the next state

[115:32] that the environment is going to generate. And so they're basically predicting something about the external world that's going to end up getting represented in the brain. And then there's this constant process of prediction and then measuring the prediction versus the actual, what ends up being the observation. My beef with predictive coding is

[116:01] that you might very well be able to explain the phenomena that it's meant to describe in a more efficient way. So predictive coding to me means that you actually have to have sort of a model of the external, that what you're doing is sort of simulating. And you're doing it in such a way that you actually are producing neural responses that don't really need to get produced very often because the environment is likely to produce them.

[116:28] To me, this seems like an inefficiency and a complexity. And I think there's a much simpler account in some ways, a more elegant account, namely that what our brain is constantly doing is generating, not predicting, but generating, but that the generation has latent within it a strong predictive element. Because of this smooth trajectory, this sort of this idea of the path, the pregnant present, that there is a continuous path from the past to the future.

[116:58] You are, in essence, predicting, to some extent, the same way that a large language model is kind of predicting the next token, but it's not really predicting. Here's where I strongly disagree, or I'm proposing a different model, is you're not predicting in such a way that you're supposed to map to something external to the system. It's simply generation internally defined that's supposed to have this kind of continuity to it.

[117:26] The external world certainly impinges on our system, and we are of course inherently anticipating that we're not going to have a brick wall in front of us as we're running down the street. When that brick wall shows up, you've got to do something about it. That wasn't implicit in your next token generation, so you're going to have to radically reorient and do something about that. I think that can account

[117:55] for some of the phenomena that are supposed to support predictive coding. But the big difference here is that it's all about internal consistency with the anticipation that that internal consistency is going to also map very well to what's happening in the world. But it's built in. There isn't any explicit modeling of the external world. It's that the internal generative process is so good

[118:24] So I'm confused then. If the symbols are truly ungrounded, then what's preventing it from becoming coherent but fictional? So that is to say, what tethers our language to the world? Yeah, and the answer would have to reach back again to that latent space. So let's say my language system, you know, wants to go off a deep end and says, actually, I'm sitting here underwater talking

[118:54] The words we've said up till now are pretty consistent with that. I'm expecting some fish to float by in the next second. My perceptual system is going to have something to say about that. There has to be this tethering that you're calling it. Of course, there is grounding in the sense that there has to be some sort of shared agreement within what I think is maybe this latent space or something like that.

[119:20] There is communication between these distinct systems, but the language system can unplug from all that and it could talk about what would it mean to be sitting and talking to a robot underwater and it will have a meaningful coherent conversation about that.

[119:36] All internally consistent, and you can give the prompt. What if instead of Kurt, it was actually a robot Kurt? How would that change things? And I could go in and get philosophical about that. And the point is that the linguistic system has all of its own internal rules in any trajectory. Many different trajectories are possible, although strongly guided by the past. But there is also impinging information from our perceptual system

[120:05] that also continues to guide it.

[120:27] are what happens when you get no longer as closely tethered by the recent past.

[120:36] So this kind of tethering, it happens in language, namely I have to be consistent with my more recent linguistic past, but we also do some tethering to the non-linguistic embedding. There is this crosstalk that happens.

[121:06] Our language system doesn't just go off the deep end. It retains some grounding, not the philosophical kind of grounding, not the symbol equals this percept, but the kind of grounding where this storyline in a certain sense, if you want to think about it that way, more semantically vague, this storyline linguistically is going to have to match my perceptual storyline.

[121:31] OK, so in the same way that one with these video generation models, you see Will Smith eating spaghetti like a three year old joke. Yes. And every three frames, if you just look at it sequentially, exactly every three frames makes sense. But then he's just morphing into something else and he ballooned now and it looks dreamlike. Exactly. And that's that's what's happening in video generation. And that's what everybody knows the trajectory now. How is it going to get better? Longer context. And that just means the autoregressive generation

[122:01] Is more and more angered in the past and that past becomes a more meaningful smooth curve But it seems like there must be something more tethering us to reality than just long context Says you Know if there is and and what I would say is it's certainly in the case of language It looks like I said we inherit when we step into this world. We inherit This this the corpus of language is a certain kind of tethering

[122:30] Words have the relations they do to each other and that carries meaning. The words don't just line up with each other in any old way. You can't just use language however you want. You end up having to adapt and adopt the language that you're given. I would say in the case of language, even more so than perceptual, what we do is we learn that tethering. It is a certain kind of reality. It's a linguistic reality, but it's not

[122:59] arbitrary. It's been honed over God knows how many years for that mapping to be useful. And in order to be useful, it actually has to map somehow to perceptual reality too. That is definitely there. And so, no, it's very strongly tethered. It's not just poetic. We're not just doing a poetry slam when we're talking. We're not just spitting out words.

[123:27] that are loosely related to one another. No, the sequence matters. It's extremely granular. What's the word? It's funny that I can't come up with the word right now. Beautiful is not the right word, but it's precise. There's such incredible detail in how each word relates to one another. This is something we didn't create. You and I, Kurt, didn't create this.

[123:57] This is something that humanity created it has all of this rich, you know relational properties that that are this tethering that that carry somehow meaning about the universe Only as expressed as a communicative coordinative tool embedded within a larger Perception action system, but we should respect it

[124:21] language is an extraordinary invention. I think we should have a completely new respect for just how rich and powerful it is. It's not some symbol, this symbol equals this mental representation or this object. No, it's this construct that contains within the relations the capacity to express anything in such a way that my mind can make your mind do stuff. How the heck does that work? Who knows? But it's

[124:51] So is there something about your model that commits you to idealism or realism or structural realism or anti-realism or foundationalism or what have you? Like what is the philosophy that underpins your model and also what philosophy is entailed by your model if any? Yeah that is a great question and I would say it's

[125:21] I've come to actually sometimes use the term linguistic anti realism. And it's the idea that language is not what it thinks it is. Uh, we, we, we engage in philosophical, our philosophical thoughts and even our, you know, sort of general, uh, thinking about, um, who we are, uh, what, what is our place in the universe? Much of that takes place in the realm of language.

[125:51] And the conclusion I've come to is that language as a sort of semi-autonomous, autogenerative computational system, modular computational system, doesn't really know what it's talking about in a deep way. And there is really a fundamentally different way of knowing the sensory perceptual system, the thing that gives rise to qualia, the thing that gives rise to consciousness.

[126:19] Here's a big one. The thing that gives rise to mattering, to meaning. What do we care about? We care about our feelings. We care about feeling good or not so good, pleasure, pain, love, all the things that actually matter. These are actually, these live in what I call the sort of the animal embedding. It's something that other species, non-linguistic species, they can feel, they can sense, they can perceive.

[126:50] They don't have language. We think, oh gosh, they don't understand anything. Well, what if it's the opposite? What if it's our linguistic system that doesn't understand anything? What if it's our linguistic system that's actually a construct, a societal construct, a coordinated construct? But as a system, it's a construct that doesn't actually have a clue about what

[127:19] pain and pleasure are. It has tokens for them and the tokens run in within the system to say things like, I don't like pain, I like pleasure. Those are valid constructs and they kind of do the thing they're supposed to do in language. But a purely linguistic system, and I think language is purely linguistic, I guess is one way to think about it, doesn't really have contained within it these other kinds of meanings.

[127:48] Now, first of all, this has implications for artificial intelligence, thinking about whether AI can have sentience, should we care about if your LLM starts saying, this is terrible, don't shut me off, I'm having an existential crisis. Perhaps, I would argue that we shouldn't worry about it. So my LLM says all the time. I don't know which LLM you're hanging about. My current LLM. The current LLM.

[128:17] Yes, the Kurt LLM. But the Kurt LLM, as an LLM, perhaps doesn't really have that meaning contained within it in a deep sense. It's, again, because of the mapping, it is communicating something probably about the non-LLM Kurt. When you say ouch, there is pain there. I'm not denying that.

[128:43] What I'm saying is that as a sort of thinking rational system that does the things that language does, that system itself may not have within it the true meaning of the words that it's using in a deep sense. I don't want to take you off course and hopefully this will help you stay on course and hopefully it aids the course. An LLM can process the word torment, say.

[129:06] But what's the difference between our human brain's autoregressive process that creates the feeling of torment itself and the word torment? So my speculation here, and it is purely speculative, is that it's non-symbolic. There's something happening when the universe gets represented in our brain. It's still in our brain. It's still a certain mapping. But when it gets represented, so that physical characteristics

[129:37] of the world are actually represented in a more direct mapping. So think about color. We talked earlier about the sort of color space. There's a real true sense in which red and orange are more similar in physical color space, like there's actually some physical fact about it, and also in brain space. That's my guess.

[130:02] Is non-symbolic a synonym for ineffable?

[130:33] I wouldn't have thought of it that way, but that may be a very good way to say it or to not say it. Yes, ineffable. Well, by virtue of being symbolic, by virtue of being a purely relational kind of representation, which is what the language, maybe even more than saying it's symbolic, it's that it's represent, it's relational. The language is a relational kind of, the location in the space matters only because it's a relation to other tokens in the space.

[131:03] That's not true in color perception. In color perception, where you are in the sort of the probably in the embedding space is going to have physical meaning. It's going to be related to the physical world in a much more direct way. And so the space, even though it's an internal space, right, the perception of color is still just comes down to neurons firing. We're not actually getting the light. The light's not getting into our brains, but the mapping is

[131:32] such that it

[132:03] I don't think language has that. I think because it's purely relational, it's not a rippling of anything. It's its own system of relational embeddings that aren't continuous in any way with the physical universe. Do you think that has something to do with God?

[132:33] That if we think of the grand unity of creation, there's some sense in which language breaks that unity. And I think that we can lie in language in a way that we can't in any other substrate. And so I think by becoming purely linguistic beings, as the vast majority of our time as humans is spent in the linguistic

[133:03] space. That's where we're hanging out there. Our minds are hanging out there. I think we have perhaps forgotten something that animals know about the universe. And it's this kind of unity because the animal processing is an extension. It's a continuation of the world. And since the world, the universe is one thing in some sense. It's

[133:31] It is a key, everything we don't even have to get into non-locality, right? The origins of the, you know, let's just talk about like the big bang or something like that. What's happening here now in some ways is connected quite literally to what happened elsewhere for the way back in time. So I think this sort of unity that, you know, that mystics talk about is much closer to sort of the animal brains.

[134:00] then the linguistic brain, because the linguistic brain actually creates its dichotomy. It breaks the continuity. Symbols, I sometimes use them, it's like a new physics. The relations are what matters and it's no longer continuous, it's no longer an extension of the physical universe. It interacts with the physical universe in a way that we, as we see, we can sort of do this mapping so that when I talk,

[134:30] It can have influence on the physical universe. It can have influence on my perception. It can have influence on my behavior. I think that sort of the rationalist movement, the positivist movement, sort of modernity itself is a complete hijacking of our brain by the linguistic system. And I do think that has something to do with the denouement, the kind of the God is dead kind of modernity equals somehow

[135:01] The decline and so, you know a rationalist would say well that's that's appropriate because we've figured out how the universe works and we don't need any of this hocus pocus But what about the feeling of unity and that what about what about the sense of sort of a cosmic hole? Are we so sure that We're right and those ancients were wrong and yes, I think I do think that that this has

[135:31] As very significant consequences for thinking about some of these intangibles, these ineffables. So a snake that mimics the poison of another snake in terms of its color, that's a form of a lie. Now, would you say that that is somehow symbolic as well, though? No. And yes, there is a mimicry and there is

[135:59] You know certain certain sense of which animals can can engage they're not they don't even know they're engaging in subterfuge But that's much more continuous with okay You've just pushed the the cognitive agent into a slightly different space which is consistent with some other physical reality That's very very different than saying We are made of atoms

[136:21] and particles and everything that happens is determined by the forces amongst these atoms, none of which is something that we have any material animal grasp of, any true physical grasp of. These are words. These models are really words and they run in words and they run very well to make predictions and to manipulate the physical universe. But they're stories and they're linguistic stories.

[136:51] Those kinds of stories can be, according to my own theory, language doesn't really have physical, doesn't point to physical meaning. And so even saying that it's a lie or untrue isn't quite right. But within its own space, you can go off in many different directions. And maybe the danger is not in thinking of things as true,

[137:20] Thinking thoughts that aren't really true, it's falling too deeply in love with the idea that idea space and language space is the real space. Yes, interesting. So see in our circles, so when we're hanging out off air, when we're hanging out with other professors and on the university grounds and so on, we praise this

[137:46] exchange of words and making models precise and doing calculations and so on and i've always intimated that this is entirely incorrect and i haven't heard an anti-philosopher like a philosopher that was an anti-philosopher except one who was an ancient indian philosopher i think his name is Jayarasi Bhatta i'm likely butchering that pronunciation but i'll place it on screen anyhow who was arguing against the the buddhists and the other

[138:15] Contemporary philosophers by saying look you think know thyself is what you should be doing or what you didn't say like like this But you think of it as the the highest goal However, who is living more truly than a rooster? Like none of you are living more truly than just something that's just being yes Exactly. That is that that is the exact same intuition And and yes, it's this idea

[138:40] I articulated to myself a long time ago that the fly knows something that our linguistics system can never know. That it knows something. It really does. That simply existing and being is a form of knowledge and it's a deeper one. It's a deeper one than whatever it is that our fancy rationalist kind of perspective has given us. Our rationalist perspective is very, very powerful in coordinating

[139:09] and predicting. But in terms of like true ontology, I suspect it's actually the wrong direction. It's created a false god of linguistic knowledge, of shared objective knowledge, when the subjective is the one that we really have. It's the Cartesian

[139:36] So I was watching everything everywhere all at once. I never saw it. Because I also had another intimation. I'll spoil some of it and if you are listening and you don't want to spoil it then just skip ahead.

[140:05] I was telling someone that I think if there's a point of life, it's one of two. And so this is just me speaking poetically and not rigorously. One is to find a love that is so powerful, it outlasts death. Okay, so that's number one. And then number two is to get to the point in your life where you realize that all your inadequacies and all your insecurities and all your your missteps and your the your jealousies and your

[140:36] and your malice and so on that it, but rather than it being a weakness, it's what led you to this place here. And here is the optimal moment. It's to get that insight. So I don't know how to rationally justify any of that or explain it. But anyhow, when I said this one time on a podcast, someone else said, Hey, that latter one that you expressed was covered in everything everywhere all at once. So I watched it.

[141:03] What was great about that movie, and here's where I spoil it, is that it makes me want to tear. The movie is silly and comedic in a way that didn't resonate with me, but there's this one lesson that did.

[141:16] The the woman she's a fighter the main protagonist She's a fighter and she's strong-headed and she has this husband who is weak and she's always able to put down and so then you think okay Well, this is a modern trope where there's always the stronger woman and every guy is like just just a fool And the woman is always more intelligent and so on. Okay, so you just think of it as as okay Well, it's just it's just a modern trope Toward the end and the guys the guy is kind and loving to people toward the end there was something with

[141:47] She was getting audited by the IRS and she was supposed to something was supposed to happen that night where she had to bring receipts and she couldn't. Now, the husband was talking with the IRS lady and our protagonist, the woman was saying in Vietnamese like or in Mandarin, whichever language it was, was saying, oh, he's an idiot. I hope he doesn't make it worse.

[142:11] The husband, the IRS lady then comes to the woman and says, you have another week, you have an extension. She's like, how did this happen? She talks to the husband. And remember, this is a movie almost about a multiverse. So you're getting different versions of this. And there's this one version where the husband's speaking to her and telling her, you know, Evelyn, the main character, you know, Evelyn, you see what you do as fighting. You see yourself as strong and you see me as weak and you see the world as a cruel place. But I've lived

[142:41] On this earth just as long as you and I know it's cruel. My method of being kind and loving and turning the cheek. That's my way of fighting. I fight just like you. And then you see that what he did in another universe was he just spoke kindly to the IRS agent and talked about something personal and that softened her. And then you see all the other universes where she was

[143:10] She was trying to go on this grand adventure and do some fighting. And the husband then says, Evelyn, even though you've broken my heart once again in another universe, I would have really loved to just do the laundry and taxes with you. And it makes you realize you're, you're aiming for something grand and you're aiming to go out on,

[143:39] And conquer demons and so on, but there's, there's something that's so much more intimate about these everyday scenarios. There's something so rich. And the journey, there's, there's also a quote about this, that the journey I think is TS Eliot's is to find, sorry, the, at the end of all our exploring will be to arrive where we started and know the place for the first time.

[144:04] Anyhow, I know all of this abstract talk. No, no, no, no, this is this. It's this it is. It's exactly what we're talking about, because if you see yourself as a ripple in the universe, right? Then you are part of something cosmic and grand. And it's it's sort of that extensiveness, it's that extensiveness that's it's that's being it's being here now, it's it's it's that.

[144:31] We aren't just atoms. We're part of a larger thing. You can call it God. You can call it the universe or whatever. But it's there. It's actually something I think we, I don't think we really, I think animals don't think of themselves as as discrete. I don't think they do. I think that they don't think of an outside and inside. They don't think of them objective and subjective. It's just

[145:00] this unfolding. They have theory of mind or that, but these are linguistic concepts. And I think I do. And I sound like an anti-linguist and I recognize the power of it. I said before, you know, how extraordinary it is, how rich it is and I have tremendous respect for it. But at the same time, I do think

[145:23] that all this talk about objective things, particles, and we are physical bodies and we are just this and we are just that, that is bullshit. Like, no, we are the universe resonating. We are part of the whole in a way that I think thinking objectively as language requires you to do, actually it breaks it. So I think there's such a beauty in the silence.

[145:51] It's something everybody knows, the ineffable. Why is it called the ineffable? The ineffable isn't just that you can't say it. It's magnificent. The ineffable is extraordinary. Why? Because it's this true extension. Something like that. Again, I'm trying to put it into words. Right therein lies the trap. But we're both feeling it.

[146:23] Well, I'm feeling extremely grateful to have met you, to have spent so long with you. And there are many conversations you've you and I have had that we need to finish that are off air as well. So hopefully we can do that. And thank you for spending so long with me here. This was wonderful, Kurt. Thank you so much. I just want to hang out and talk about this stuff. So really appreciate it.

[146:50] I've received several messages, emails and comments from professors saying that they recommend theories of everything to their students and that's fantastic. If you're a professor or lecturer and there's a particular standout episode that your students can benefit from, please do share and as always feel free to contact me.

[147:06] New update! Started a sub stack. Writings on there are currently about language and ill-defined concepts as well as some other mathematical details. Much more being written there. This is content that isn't anywhere else. It's not on theories of everything. It's not on Patreon. Also, full transcripts will be placed there at some point in the future. Several people ask me, hey Kurt, you've spoken to so many people in the fields of theoretical physics, philosophy and consciousness. What are your thoughts?

[147:34] While I remain impartial in interviews, this substack is a way to peer into my present deliberations on these topics. Also, thank you to our partner, The Economist.

[147:48] Firstly, thank you for watching, thank you for listening. If you haven't subscribed or clicked that like button, now is the time to do so. Why? Because each subscribe, each like helps YouTube push this content to more people like yourself, plus it helps out Kurt directly, aka me. I also found out last year that external links count plenty toward the algorithm,

[148:10] Which means that whenever you share on Twitter, say on Facebook or even on Reddit, et cetera, it shows YouTube. Hey, people are talking about this content outside of YouTube, which in turn greatly aids the distribution on YouTube. Thirdly, you should know this podcast is on iTunes. It's on Spotify. It's on all of the audio platforms. All you have to do is type in theories of everything and you'll find it. Personally, I gained from rewatching lectures and podcasts.

[148:36] I also read in the comments that hey, toll listeners also gain from replaying. So how about instead you re-listen on those platforms like iTunes?

[148:46] As a

[149:09] You also get early access to ad free episodes, whether it's audio or video. It's audio in the case of Patreon video in the case of YouTube. For instance, this episode that you're listening to right now was released a few days earlier. Every dollar helps far more than you think. Either way, your viewership is generosity enough. Thank you so much.

▶ View Full JSON Data (Word-Level Timestamps)

{
  "source": "transcribe.metaboat.io",
  "workspace_id": "AXs1igz",
  "job_seq": 2261,
  "audio_duration_seconds": 8966.61,
  "completed_at": "2025-11-30T21:26:02Z",
  "segments": [
    {
      "end_time": 26.203,
      "index": 0,
      "start_time": 0.009,
      "text": " The Economist covers math, physics, philosophy, and AI in a manner that shows how different countries perceive developments and how they impact markets. They recently published a piece on China's new neutrino detector. They cover extending life via mitochondrial transplants, creating an entirely new field of medicine. But it's also not just science, they analyze culture, they analyze finance, economics, business, international affairs across every region."
    },
    {
      "end_time": 53.234,
      "index": 1,
      "start_time": 26.203,
      "text": " I'm particularly liking their new insider feature was just launched this month it gives you gives me a front row access to the economist internal editorial debates where senior editors argue through the news with world leaders and policy makers and twice weekly long format shows basically an extremely high quality podcast whether it's scientific innovation or shifting global politics the economist provides comprehensive coverage beyond headlines."
    },
    {
      "end_time": 78.951,
      "index": 2,
      "start_time": 53.558,
      "text": " Ford Blue Cruise hands-free highway driving takes the work out of being behind the wheel, allowing you to relax and reconnect while also staying in control."
    },
    {
      "end_time": 101.92,
      "index": 3,
      "start_time": 80.077,
      "text": " Enjoy the drive in Blue Cruise-enabled vehicles like the F-150, Explorer, and Mustang Mach-E. Available feature on equipped vehicles. Terms apply. Does not replace safe driving. See Ford.com slash Blue Cruise for more details. I'm going to get attacked by physicists. This thing is just ridiculously good. And so that just blows my mind."
    },
    {
      "end_time": 129.104,
      "index": 4,
      "start_time": 103.814,
      "text": " Professor Barenholtz completely inverts how we understand mind, meaning, and our place in the universe. The standard model of language assumes words point to meanings in the world. However, Professor Barenholtz of Florida Atlantic University has discovered what's unconscionably unsettling. They don't. Language is actually deconstructing itself. Most startlingly, he argues that our rational linguistic minds"
    },
    {
      "end_time": 144.991,
      "index": 5,
      "start_time": 129.275,
      "text": " have severed us from the unified cosmic experience that animals may still inhabit. Most current LLMs operate with purely autoregressive next token prediction operating on ungrounded symbols."
    },
    {
      "end_time": 168.234,
      "index": 6,
      "start_time": 145.265,
      "text": " All of this terminology is explained so don't worry, this podcast can be watched without a formal background in psychology or computer science. In this conversation, we journey through rigorous explorations of how LLMs work, what they imply about how we view the world and the relationship between our consciousness and the cosmos. Professor, you have two theses."
    },
    {
      "end_time": 195.64,
      "index": 7,
      "start_time": 168.746,
      "text": " One is a speculative one and the other is more grounded. You even have another more hypothetical one atop that, which we may get into. Why don't you tell us about the more corroborated one and then we can move to the contestable parts later? Okay, sure. So yeah, I would call them sort of the grounded thesis and then sort of the extended version of that, if we can call it that. The grounded thesis is primarily about language."
    },
    {
      "end_time": 222.756,
      "index": 8,
      "start_time": 196.408,
      "text": " Uh, and the thesis is that human language is captured by what's going on in the large language models. And I mean, not in terms of the specific, uh, exact algorithm as to how the, uh, large language models like ChatGPT are doing the, are actually generating language, but the core sort of mathematical principle that large language models like ChatGPT run on are what's happening in the brain."
    },
    {
      "end_time": 253.302,
      "index": 9,
      "start_time": 223.575,
      "text": " uh... and that's what's happening in human language and really the reason i say corroborated is because ultimately this isn't even about the brain it's about language itself and i think what we have learned in the course of being able to replicate language in a completely different substrate namely in you know computers uh... is that we've learned the properties of language itself we've discovered it's not through clever human engineering that we've been able to kind of"
    },
    {
      "end_time": 277.824,
      "index": 10,
      "start_time": 253.951,
      "text": " Barrel our way towards language competency. It's that with actually fairly straightforward mathematical principles Done at scale. We've actually discovered that language has certain properties that we didn't know it had before and so the incontrovertible fact in my opinion is that language itself has certain properties now that we know it has those properties and"
    },
    {
      "end_time": 305.145,
      "index": 11,
      "start_time": 278.712,
      "text": " My claim is, the sort of corroborated claim is that those properties force us to conclude that the mechanism by which humans generate language is the same as what's going on in these large language models. Because now that we know that language is capable of doing the stuff that it does, it not only has the properties to, and I'm sort of giving away the punchline, to self-generate based on its internal structure,"
    },
    {
      "end_time": 330.23,
      "index": 12,
      "start_time": 305.811,
      "text": " It's unavoidable to think that we are using the same basic mechanism and principles because it would be extremely odd to think that we have a completely different orthogonal method for generating language. Put differently, if we are using completely different mechanisms than the language models, then it's extremely unlikely that the language models would work as well as they do."
    },
    {
      "end_time": 354.718,
      "index": 13,
      "start_time": 330.538,
      "text": " The obvious question that's occurring to the audience as they listen right now is how do we know that whatever mechanism is being used by LLMs isn't just mimicry?"
    },
    {
      "end_time": 374.701,
      "index": 14,
      "start_time": 356.203,
      "text": " Right, and so that's sort of the critical question. Is this mimicry, right? Is what the models are doing, in a sense, learning a kind of roundabout technique that captures some of the superficial components of language in humans, but ultimately it's a completely different approach."
    },
    {
      "end_time": 399.326,
      "index": 15,
      "start_time": 375.06,
      "text": " And so, you know, my argument is really from the fundamental simplicity of these models. So let's just talk really quickly about how large language models work, things like Chachi BT. What they're doing is learning, given a sequence, you know, let's say the sequence is, I pledge allegiance to the, and then the model is being asked,"
    },
    {
      "end_time": 426.8,
      "index": 16,
      "start_time": 399.684,
      "text": " to do this thing called Next Token Generation. What's the probable next word? We'll say word for the purpose of this conversation. We're going to call tokens a word. Token is a more technical term about how you chop up and encode the information in a sequence of language, but we're just going to say word. So guess the next word based on that sequence. And then what you do in these models is you"
    },
    {
      "end_time": 456.869,
      "index": 17,
      "start_time": 427.261,
      "text": " you train them"
    },
    {
      "end_time": 485.333,
      "index": 18,
      "start_time": 457.312,
      "text": " Then take that word, tag it onto the sequence, and feed it back in. This is sufficient to generate human-level language. Now, the reason I believe that this demonstrates something not about our engineering or even about the models themselves, because there's different ways you might build a model that can do this, is because this very simple trick, this simple recipe of simply guessing the next word turns out to be sufficient"
    },
    {
      "end_time": 508.712,
      "index": 19,
      "start_time": 485.606,
      "text": " To the point where there really are no benchmarks, no standard benchmarks that these models aren't able to do. And so what that suggests to me is just by learning the predictive structure of language, you're able to completely solve language. That means that that is likely to be the actual fundamental principle that's built into language in order to generate it."
    },
    {
      "end_time": 532.602,
      "index": 20,
      "start_time": 509.104,
      "text": " If we had to come up with a very complex scheme, for example, syntax trees, complex grammar, long range dependencies that we had to take into account, and through enough compute, we were able to kind of master that, then I might argue, well, what we're doing is possibly figuring out a roundabout way to capture all this complexity."
    },
    {
      "end_time": 557.688,
      "index": 21,
      "start_time": 533.08,
      "text": " But it's the simplicity itself that simply being able to predict the next token, the next word, is sufficient to do all of this long-range thinking to be able to take an extremely long sequence and then produce an extremely long sequence on the basis of that. That suggests to me that we discovered a principle that's actually already latent in language, that we just had to throw enough firepower at it but with an extremely simple algorithmic trick"
    },
    {
      "end_time": 582.346,
      "index": 22,
      "start_time": 558.183,
      "text": " And then language revealed its secrets. So to me, this really suggests that there is, of course, you know, there's still a lot of science that needs to be done and this kind of thing, kind of work that I'm doing in my lab in terms of really being able to hammer down how the brain is instantiating this exact same algorithm. It's not going to look exactly like chat GPT. It's not necessarily going to be based on"
    },
    {
      "end_time": 606.288,
      "index": 23,
      "start_time": 582.688,
      "text": " what are called transformer models, which is something we can get into a little bit. But as far as the core principle of prediction of the next token, the fact that that solved language so handily to me really argues that that is the fundamental algorithm. That is the fundamental algorithm that when you apply it, boom, language emerges. If you just have the corpus, you have the statistics, and then you do next token prediction,"
    },
    {
      "end_time": 632.159,
      "index": 24,
      "start_time": 606.766,
      "text": " Okay, so Elon, you and I have spent several days together"
    },
    {
      "end_time": 660.52,
      "index": 25,
      "start_time": 632.398,
      "text": " In fact, you're in the video with Jacob Barndes and the Manolis Keleswan will place that on screen and I'll put a pointer to you. And you were in the background of the interview with William Hahn on Williams. Always in the background, never in the foreground. Here we are. OK, well, yes, great. You have a large epiphany that occurred to you at one point. You spoke about the software and this precipitated this entire point of view of language as a generative slash autoregressive"
    },
    {
      "end_time": 687.585,
      "index": 26,
      "start_time": 661.169,
      "text": " Model or what have you tell me about it? What the heck was that big idea? So the it wasn't so much an idea as as as an epiphany a realization and it really it hit me in a single moment And it wasn't necessarily about autoregression. It wasn't about this finer detail of how Ultimately language models and and I believe the brain solved this problem it was the realization that"
    },
    {
      "end_time": 715.657,
      "index": 27,
      "start_time": 689.189,
      "text": " All of the, any model that has been trained, any model that anybody has built that accomplishes human-level language. So it might be based on autoregression. It might be based even on diffusion, which is kind of the arch nemesis of my autoregressive theory. But regardless, the fact is that these models are being trained exclusively on text data."
    },
    {
      "end_time": 746.561,
      "index": 28,
      "start_time": 717.125,
      "text": " And so all they are learning is the relations between words. To the model, as far as the model is concerned, the words are turned into numbers. They're tokenized. We think of them as numerical representations. But those numbers, and for our purpose, we could think of them as words, don't represent anything. There is nothing in the model besides the relations. Relations just between the words themselves. There isn't, for example, any relation between any of the tokens and something external to it."
    },
    {
      "end_time": 776.732,
      "index": 29,
      "start_time": 747.056,
      "text": " What we tend to think of as people is what words are doing when we're discussing topics, thinking about words in our head, is that they symbolize something, that they refer to something. A lot of the philosophy of language, a lot of the scientific study of linguistics has been concerned with semantics. How do words get grounded? How do they mean something outside of themselves? And what large language models show us is that words"
    },
    {
      "end_time": 804.411,
      "index": 30,
      "start_time": 777.142,
      "text": " don't mean anything outside of themselves. As far as generation goes, as far as the ability for us to have this conversation, and as far as the model's ability to produce meaningful responses to just about any question you can throw at them, including writing a long essay on any topic, including a novel topic that it's never encountered, is by stringing together sequences based on"
    },
    {
      "end_time": 834.514,
      "index": 31,
      "start_time": 805.026,
      "text": " simply the learned relations between words. And so this really hit me very, very hard. I've long been puzzled by, as many are, by the mind-body problem, the phenomena of consciousness, the problem of how do we know your red is my red? And actually the moment that I had this realization was related to this very question. I realized that the word red doesn't mean what we mean by qualitative red. The qualitative red is taking place in our sensory perceptual system."
    },
    {
      "end_time": 863.49,
      "index": 32,
      "start_time": 834.855,
      "text": " The word red, to a large language model, can't mean that. It can't mean any color. It has no color phenomena. It has no concept of what sensory red would mean. Yet it is able to use the word red with equal ability, with equal competency, just as well as I can, if we're just having a conversation about it. And so what this means is that within the corpus of language, the word red doesn't mean something external to itself. Instead,"
    },
    {
      "end_time": 892.005,
      "index": 33,
      "start_time": 863.677,
      "text": " The word red simply means where does it fall in the space of language itself? Where does red fall in relation to other colors, in relation to the word color, in relation to other concepts, other, well, frankly, just words, tokens that are related to what we call concepts that have to do with color and have to do with the word red. So yeah, so this epiphany was about this extraordinary dichotomy, this divide between language"
    },
    {
      "end_time": 920.794,
      "index": 34,
      "start_time": 892.363,
      "text": " and that which we think language refers to. The question is, how does language refer? And the answer is it doesn't. Language doesn't refer in and of itself. Language is an autonomous system. It's a self-contained system. It has the rules contained within it to generate itself, to carry on a conversation. Large language models don't know what they're talking about in any real sense. They can talk about a sunset. They can talk about a taste."
    },
    {
      "end_time": 936.049,
      "index": 35,
      "start_time": 921.135,
      "text": " They can talk about all of space and time and all of those things and yet we would say they have no idea what they're talking about and we'd be right in the sense that they don't have a notion of red beyond the token and its relation to other tokens."
    },
    {
      "end_time": 965.555,
      "index": 36,
      "start_time": 936.664,
      "text": " Now this then raises the obvious question, well, what do I mean what red is about? Don't I think red refers to a quality of perception? And the answer is I do have a quality of perception. There is something called red that my sensory system is aware of. And then there's a token called red that is used in conjunction with there's a sort of coherent mapping between my sensory perception of red"
    },
    {
      "end_time": 994.514,
      "index": 37,
      "start_time": 965.981,
      "text": " and the linguistic red. But that doesn't mean that you need to understand what that word refers to. You don't need to have the sensory qualitative concept of red in order to completely successfully use the word red. And so these are compatible but dichotomous systems. The sensory perceptual system and the linguistic system are ultimately, we can think of them as essentially"
    },
    {
      "end_time": 1009.445,
      "index": 38,
      "start_time": 994.804,
      "text": " Distinct and autonomous, but compatible, integrated. They're running alongside each other, they're exchanging messages."
    },
    {
      "end_time": 1037.773,
      "index": 39,
      "start_time": 1009.804,
      "text": " so that we can have a single organism that is successfully navigating the world and enable, for example, to communicate. So I see something red that's registered in my brain. I have a qualitative experience of red. It's remembered in having a certain quality. And then later on I said, Oh, you know, could you go pick up that red object for me? And so we are, there's a handoff between the perceptual system and the linguistic system. Just that the linguistic system can now successfully"
    },
    {
      "end_time": 1067.688,
      "index": 40,
      "start_time": 1038.319,
      "text": " Send a message to you. Now you've got the linguistic system. You can talk about that. Oh, okay. You told me there's a red object. Are there multiple objects? Yes, there's multiple objects. They have different colors. You're looking for the red one. Maybe it's a dark red. I'm doing this all linguistically. Now you're able to go into the room and successfully get the right object. So again, the handoff happens the other direction. Language is able to hand off to the perceptual system. The perceptual system is able to then detect that there's something with the right quality. But that's not the same thing as having saying,"
    },
    {
      "end_time": 1096.852,
      "index": 41,
      "start_time": 1067.824,
      "text": " that the language contains the reference inherently within it. It simply means that these are communicative systems that they can exchange information, that they integrate with one another in terms of forming coherent behavior. But language is its own beast. It's its own autonomous system. It can run on its own. That was the big realization. Large language models prove it, that language is able to produce the next token and by virtue of the next token, the next sequence. And that means all of language"
    },
    {
      "end_time": 1126.596,
      "index": 42,
      "start_time": 1097.483,
      "text": " Without having any concept of reference, the reference has no place there. There's no way to kind of squeeze it in. If your computational account is the one that I'm proposing, if the computational account is essentially prediction based on a next token based purely on the topology, the structure, the statistical structure of language, then there's no way to cram any other kind of grounding or any sort of computational feature in there at all. It has to be something closer to"
    },
    {
      "end_time": 1151.664,
      "index": 43,
      "start_time": 1127.381,
      "text": " What's happening is much closer to generating a prompt, basically saying, here's what's in the room."
    },
    {
      "end_time": 1177.466,
      "index": 44,
      "start_time": 1151.988,
      "text": " And now based on these features, these scripts now run the same exact language exclusive model. And so language takes care of itself. It doesn't need grounding in order to be able to do everything it does. It doesn't have to have concepts outside of itself. I think that's basically been proven by these text only large language models. So that was the big epiphany. The big epiphany was that language is autonomous."
    },
    {
      "end_time": 1204.753,
      "index": 45,
      "start_time": 1177.637,
      "text": " Okay, so you're not denying consciousness and you're not denying qualia."
    },
    {
      "end_time": 1228.37,
      "index": 46,
      "start_time": 1205.094,
      "text": " No, and I want to make this very clear that my personal opinion on this is besides the point to some extent. You can be an eliminativeist if you want, although I think everything I'm saying has a lot of bearing on this. But I believe my account is strictly an account of language."
    },
    {
      "end_time": 1250.776,
      "index": 47,
      "start_time": 1228.592,
      "text": " I think that perceptual mechanisms that give rise to qualia, things like redness and heat and taste and all of these, are basically processes that take place long before the handoff. And so what happens is, you know, think about the camera, the camera is transducing light, it's measuring certain wavelengths,"
    },
    {
      "end_time": 1270.469,
      "index": 48,
      "start_time": 1251.049,
      "text": " It's then there's a lot of visual processing that has to happen before you get to the point where it's turned into a linguistic friendly embedding, right? The stuff that an LLM can see, a multimodal LLM can see. And so all of that processing that happens is what I think gives rise to the qualitative experience."
    },
    {
      "end_time": 1300.23,
      "index": 49,
      "start_time": 1270.896,
      "text": " We experience redness because of all of this very sort of analog, probably non-symbolic kind of representation. And then at the end of that process, there is a conversion. By the way, by the end of the process, a lot of things happen. We also respond to colors and to light and all of that non-linguistically. But we could think of different endpoints. One of those endpoints is here's a handoff to language."
    },
    {
      "end_time": 1329.275,
      "index": 50,
      "start_time": 1300.555,
      "text": " And by the time language gets it, it's long past that kind of sensory and perceptual processing that gives rise to qualitative phenomena. So I strongly believe that there is, in a certain sense, the word hard problem is a little loaded. I believe there's undeniable qualia. But what I also think is that language is poorly equipped. It's simply"
    },
    {
      "end_time": 1347.227,
      "index": 51,
      "start_time": 1329.889,
      "text": " Just a moment. Don't go anywhere. Hey, I see you inching away."
    },
    {
      "end_time": 1370.964,
      "index": 52,
      "start_time": 1347.705,
      "text": " Don't be like the economy, instead read the economist. I thought all the economist was was something that CEOs read to stay up to date on world trends. And that's true, but that's not only true. What I found more than useful for myself personally is their coverage of math, physics, philosophy, and AI, especially how something is perceived by other countries and how it may impact markets."
    },
    {
      "end_time": 1394.974,
      "index": 53,
      "start_time": 1370.964,
      "text": " For instance the economist had an interview with some of the people behind deep seek the week deep seek was launched no one else had that another example is the economist has this fantastic article on the recent dark energy data which surpasses even scientific americans coverage in my opinion they also have the charts of everything like the chart version of this channel it's something which is a pleasure to scroll through and learn from."
    },
    {
      "end_time": 1421.613,
      "index": 54,
      "start_time": 1394.974,
      "text": " Links to all of these will be in the description, of course. Additionally, just this week there were two articles published. One about the Dead Sea Scrolls and how AI models can help analyze the dates that they were published by looking at their transcription qualities. And another article that I loved is the 40 best books published this year so far. Sign up at Economist.com slash TOE for the yearly subscription. I do so and you won't regret it. Remember to use that TOE code as it counts to helping this channel and gets you a discount."
    },
    {
      "end_time": 1451.971,
      "index": 55,
      "start_time": 1421.971,
      "text": " Now, the economist's commitment to rigorous journalism means that you get a clear picture of the world's most significant developments. I am personally interested in the more scientific ones, like this one on extending life via mitochondrial transplants, which creates actually a new field of medicine, something that would make Michael Levin proud. The economist also covers culture, finance and economics, business, international affairs, Britain, Europe, the Middle East, Africa, China, Asia, the Americas, and of course, the USA."
    },
    {
      "end_time": 1472.039,
      "index": 56,
      "start_time": 1451.971,
      "text": " Whether it's the latest in scientific innovation or the shifting landscape of global politics, The Economist provides comprehensive coverage and it goes far beyond just headlines. Look, if you're passionate about expanding your knowledge and gaining a new understanding, a deeper one of the forces that shape our world, then I highly recommend subscribing to The Economist."
    },
    {
      "end_time": 1486.647,
      "index": 57,
      "start_time": 1472.039,
      "text": " I subscribe to them and it's an investment into my into your intellectual growth one that you won't regret as a listener of this podcast you'll get a special twenty percent off discount now you can enjoy the economist and all it has to offer."
    },
    {
      "end_time": 1515.555,
      "index": 58,
      "start_time": 1486.954,
      "text": " Your planet is now marked for death. Model Studios The Fantastic Four First Steps is now streaming on Disney+. We will protect you. As a family. Light them up, Johnny!"
    },
    {
      "end_time": 1534.224,
      "index": 59,
      "start_time": 1515.879,
      "text": " Marvel's First Family is certified fresh on Rotten Tomatoes. That is fantastic. And critics say it's one of the best superhero movies of all time. Marvel Studios' The Fantastic Four First Steps, now streaming on Disney+. Rated PG-13. What time is it, Ben? It's slobbering time!"
    },
    {
      "end_time": 1564.377,
      "index": 60,
      "start_time": 1534.889,
      "text": " Close your eyes, exhale, feel your body relax, and let go of whatever you're carrying today. Well, I'm letting go of the worry that I wouldn't get my new contacts in time for this class. I got them delivered free from 1-800-CONTACTS. Oh my gosh, they're so fast. And breathe. Oh, sorry. I almost couldn't breathe when I saw the discount they gave me on my first order. Oh, sorry. Namaste. Visit 1-800-CONTACTS.COM today to save on your first order. 1-800-CONTACTS."
    },
    {
      "end_time": 1587.892,
      "index": 61,
      "start_time": 1565.094,
      "text": " Okay, let me see if I get this you have some redness so you do you're not denying redness you grant redness. I do. Okay, there's redness and then somehow this needs to be referred to with some spoken words with some language. Okay."
    },
    {
      "end_time": 1613.677,
      "index": 62,
      "start_time": 1588.78,
      "text": " So what's happening? You're saying that it's an independent system, yet it's integrated. So what is that relationship? And does it become so diluted that by the time you refer to it, you're no longer referring to that qualia? I don't understand. Yeah, that is essentially the idea. So this is this is the exact problem I am working on right now. There was a fantastic paper that I just came across about a week ago."
    },
    {
      "end_time": 1643.558,
      "index": 63,
      "start_time": 1614.565,
      "text": " There was a paper that was published in an archive recently. It's called Harnessing the Universal Geometry of Embeddings. And what this paper showed is that you could have completely different models solving different linguistic tasks. For example, you could have GPT, then you could have BERT, which solves a somewhat different task. So there's masked tokens as opposed to autoregressive next token generation. And what they found was that you could learn what is latent space."
    },
    {
      "end_time": 1673.148,
      "index": 64,
      "start_time": 1643.882,
      "text": " What you could do is hand off, take the embedding. The embedding is basically, you can think of that as numerical representation. It's a high dimensional numerical representation of your tokens. So here's a token. This token is going to represent the word dog. And then we're going to take that token and embed it in a much higher dimensional space. And what they found is that if you take the embedding, the high dimensional representation from one model, so you chat GPT and then"
    },
    {
      "end_time": 1700.759,
      "index": 65,
      "start_time": 1673.456,
      "text": " take representation from a different model, that you could actually get the, you could take the embedding, send it to this latent space. If you cycle it through, get the, you have to, it's starting to get in the weeds a little bit, but you send it to this latent space and then recover it in its original form. What you can do is, once you've got that latent space, you can then translate"
    },
    {
      "end_time": 1730.333,
      "index": 66,
      "start_time": 1701.408,
      "text": " From one embedding to a completely different embedding. This is a new paper. This is a new paper, yes. This rocked my world because what they're arguing is that there in some ways is this underlying universal structure of language that's captured in this latent space. And so even though if you have a radically different embedding in one line, you know, they didn't do it across different languages. That's one of the projects I'm"
    },
    {
      "end_time": 1750.486,
      "index": 67,
      "start_time": 1730.555,
      "text": " doing right now is to see if you can do this across say English and Spanish, even for a language that's trained exclusively on English and then another models trained exclusively on Spanish. Can you guess the Spanish just from finding this kind of universal structure across these two different models?"
    },
    {
      "end_time": 1773.183,
      "index": 68,
      "start_time": 1750.794,
      "text": " Sorry, what do you mean? Can you guess the Spanish? If a model was trained only in English and then it was receiving some Spanish text, a couple of Spanish sentences. So the way to think about it is that what you're doing is creating another embedding, another, another, this, this latent space where you're going to be able to send in a message in English and then based on"
    },
    {
      "end_time": 1802.363,
      "index": 69,
      "start_time": 1773.677,
      "text": " the station and then again, do the same thing for Spanish and then what you're not, you're never going to show any model. No model is going to ever see a pair of English and Spanish. Instead, what you're going to learn is that there is some way to get from a link. You're going to end up being able to get from English to Spanish without ever seeing the actual translation because what the model is going to learn is what's common across these two representations. What's true for both the Spanish embedding,"
    },
    {
      "end_time": 1806.783,
      "index": 70,
      "start_time": 1802.671,
      "text": " and the English embedding, that there's some sort of underlying latent structure."
    },
    {
      "end_time": 1834.889,
      "index": 71,
      "start_time": 1807.176,
      "text": " That's true of both and that that captures something more universal about language. And again, they didn't do it for different languages. They just did it for different embeddings of English, but very different embeddings because they were trained on completely different models. If you looked at them, if you just looked at this sort of vector representation, took a vector representation of the word dog in one and a vectorization the word dog in the other, they're completely numerically not. There's no similarity. You can never spot the similarities if you just looked even them then pairwise."
    },
    {
      "end_time": 1864.684,
      "index": 72,
      "start_time": 1834.889,
      "text": " But if they do this kind of reconstruction and then ask the model to be able to reconstruct, not in the original embedding space, but go and reconstruct in the other embedding space, it's able to actually do this. And so by doing that, by training it to do that, without ever seeing any pairs, it's able to sort of learn this translation between one representation and another representation. What this opened up to me is the possibility that we could think about the exact same"
    },
    {
      "end_time": 1894.872,
      "index": 73,
      "start_time": 1864.889,
      "text": " kind of latent space in the brain and possibly in artificial intelligence models between the perceptual world and the linguistic world, that there is some embedding of how the physical world is structured. We understand, like think about an animal, a non-linguistic animal, certainly has idea of objects and objects in relation to other objects, objects in proximity to other objects, moving around those objects. My dog was just barking in the background."
    },
    {
      "end_time": 1924.07,
      "index": 74,
      "start_time": 1895.299,
      "text": " knows what doors are and she can go scratch it and she knows it opens up. She certainly isn't able to express that linguistically, but she has this concept and she's able to think about, she's able in some ways to reason about that. My suspicion is that that probably is done maybe even autoregressively, but we'll leave that aside for now. The main point is that there is some representation of the facts about the world, the sensory facts of the world, or the sensory, I would say the sensory construction, the facts that have been constructed"
    },
    {
      "end_time": 1954.275,
      "index": 75,
      "start_time": 1924.343,
      "text": " based on sensory information. So that's some sort of embedding of the world. The linguistic embedding is a radically different embedding. It carries information about the world as well, but not in the way, not in the direct way that we think, not that the word, you know, my headphones are sitting on this desk has direct referent back to sensation and perception. No, it lives on its own. It's its own embedding and it does its own and it can do its own thing. However,"
    },
    {
      "end_time": 1980.862,
      "index": 76,
      "start_time": 1954.735,
      "text": " Based on this paper, this really gave me sort of a key insight that there might be this latent space where you can actually do this kind of mapping, where there's translation between linguistic and perceptual embeddings. They're as distinct as they are, fundamentally very, very distinct, very different. They're there to solve different problems, but they're able to talk to each other. How? Perhaps through this kind of latent space where some universal structure"
    },
    {
      "end_time": 2009.138,
      "index": 77,
      "start_time": 1981.135,
      "text": " Like, okay, in language, there's certain facts about language. There's a fact about the word dog or the word microphone that its relation to other words like desk in some ways captures the fact that microphones sit on top of desks. That fact is somehow actually contained within this embedding structure. In what sense? Well, if you ask me, would a desk sit on a microphone or a microphone sit on a desk?"
    },
    {
      "end_time": 2031.015,
      "index": 78,
      "start_time": 2009.343,
      "text": " I can answer that question. So can chat JBT, right? And without any notion of what microphones really are sort of from a perceptual standpoint, they're having these kinds of properties, we can talk about them and the embedding space, the linguistic embedding space contains this information. What does it mean contains information? By the way, just to say, what does that mean? It means given a certain input, like do microphones sit on desks?"
    },
    {
      "end_time": 2061.186,
      "index": 79,
      "start_time": 2031.271,
      "text": " Where should I put my microphone? I can answer linguistically in a reasonable way. And that's what I mean by the knowledge. It's purely linguistic knowledge. It only can generate linguistic responses. But the point is that that knowledge lives in this kind of linguistic embedding. And then there's the other kind of embeddings. There's a visual embedding. There might be an auditory embedding, which is distinct. And then the idea that I'm very inspired by is that there can be this latent space that captures certain universals."
    },
    {
      "end_time": 2081.254,
      "index": 80,
      "start_time": 2061.664,
      "text": " Are there any"
    },
    {
      "end_time": 2102.654,
      "index": 81,
      "start_time": 2081.527,
      "text": " of this century perceptual kind of phenomena. And this is important because forever philosophers, philosophers, philosophers in general, linguists have been trying to understand how do words get their meaning? How do they, what they, something I referred to earlier, you know, what's the definition of a microphone? What's the definition of a dog? And the answer is there isn't a single one."
    },
    {
      "end_time": 2123.183,
      "index": 82,
      "start_time": 2103.148,
      "text": " There isn't a single definition that's ever going to capture. Instead, what you've got is this latent sort of bridge where there's some sort of representation of this fact that given whatever your particular prompt is, your linguistic prompt is going to lead to certain kind of meaningful linguistic behavior. If you ask me a question about this microphone,"
    },
    {
      "end_time": 2145.896,
      "index": 83,
      "start_time": 2123.422,
      "text": " I"
    },
    {
      "end_time": 2171.135,
      "index": 84,
      "start_time": 2145.896,
      "text": " about the world that's embedded in language. I don't think there would be a static set of facts embedded in our visual embedding of the world. Instead, what we've got is what I call potentialities. We now have the ability to engage that latent space linguistically where the perceptual information lives, this universal embedding of it, and then do whatever we need to do with it."
    },
    {
      "end_time": 2177.568,
      "index": 85,
      "start_time": 2171.374,
      "text": " If I need to answer this question about it, I can answer that question. If you ask me a different question, I can answer that. But there isn't a singular meaning."
    },
    {
      "end_time": 2207.21,
      "index": 86,
      "start_time": 2177.961,
      "text": " of microphone that captures sort of the entire set of facts. Here it is. Here's the embedded set of facts. The set of facts is actually infinite. I could tell you infinite things about this microphone. For starters, to use a silly philosophical example, it doesn't have this shape and it doesn't have that shape. I could tell you there's an infinite number of questions you could ask me about it that I could answer meaningfully about it. So all those potentialities are kind of what happens when the linguistic system"
    },
    {
      "end_time": 2231.425,
      "index": 87,
      "start_time": 2207.602,
      "text": " Interacts with this is kind of a shared embedding space that's sort of the half-baked version of how I think language ultimately does have to enter of course language language only is meaningful insofar as it can live within the larger ecosystem of perception and sensation and perception we have to be able to take in information through our senses and then"
    },
    {
      "end_time": 2259.326,
      "index": 88,
      "start_time": 2232.244,
      "text": " communicate, although I use that word kind of carefully, I don't communicate the entire representation because as I said, I don't think that's even a meaningful idea. Instead, what I can do is use language in a way that helps us coordinate our behavior. There's no way to sort of download the entire perceptual state. That's locked up in some ways in the perceptual embedding. No, what I can do is"
    },
    {
      "end_time": 2288.626,
      "index": 89,
      "start_time": 2259.855,
      "text": " pull some information such that I can meaningfully communicate with you in a way that then is going to have the intended consequences. I'm not downloading perceptual information into your brain. I'm telling you what you need to know in order to be able to perform some action, to perform some behavior, or maybe even to think about it so that you could later perform some action. I know that was a lot, and feel free to back me up and challenge me on any of these things."
    },
    {
      "end_time": 2301.647,
      "index": 90,
      "start_time": 2289.138,
      "text": " So I want to see if I understand this and I want to explore what is the definition of language, even though we just talked about there isn't the definition of a microphone say, but I do want to talk about the definition of language and what is autoregression."
    },
    {
      "end_time": 2320.811,
      "index": 91,
      "start_time": 2301.954,
      "text": " And while presumably you're telling me what you believe with language, you're telling me this model because you believe it's true. I don't know what truth you're conveying if you believe this is not grounded. So what are you referring to when you even say that language is autoregressive without symbol grounding? I don't have an ideas to that. I want to explore that."
    },
    {
      "end_time": 2347.807,
      "index": 92,
      "start_time": 2321.22,
      "text": " But first, I want to see if I understand you. OK, so a latent space. So let's think of a word. A word gets a vector like an arrow. And I'm just going to be 2D for this example, because that's just what the camera picks up. So let's say the word dog looks like. So the word cat looks like. So whatever. OK, the space that it's embedded in is called the latent space. Is that correct? Well, the initial embedding is is just the embedding."
    },
    {
      "end_time": 2373.507,
      "index": 93,
      "start_time": 2348.439,
      "text": " So the latent space is a compressed version of that?"
    },
    {
      "end_time": 2381.596,
      "index": 94,
      "start_time": 2374.189,
      "text": " Well, in some ways, it's actually not compressed. It's actually it's actually it. What's the opposite of compressed? It's uncompressed expanded."
    },
    {
      "end_time": 2405.401,
      "index": 95,
      "start_time": 2381.869,
      "text": " It's an expanded version. So you have the original tokenization, which just says here in a fairly small vector, but then you expand it into a much higher dimensional embedding space so that each token actually ends up getting much richer, many more numbers that are used in order to represent each token."
    },
    {
      "end_time": 2433.746,
      "index": 96,
      "start_time": 2405.657,
      "text": " I mean, that's a very key fundamental thing that these models do. And by expanding in these different dimensions, that's what allows you to sort of massage the space so that you can get all these cool properties like cat and dog being sort of in the appropriate relation to one another. So that later on, when you're trying to figure out what the next token is, you're able to actually leverage the inherent structure in this high dimensional space. Okay."
    },
    {
      "end_time": 2457.79,
      "index": 97,
      "start_time": 2434.138,
      "text": " So then you have the language model for English and then you have a language model for Spanish. Yes. And let's imagine that it was trained only with the corpus of English in the former case and only with the corpus of Spanish in the second. And then we can even have a third of Mandarin. Sure. OK. Yeah. In fact, in the paper, they didn't do different languages. They said they did different embeddings of English language models. But yes, they use multiple. They actually did this across several different embeddings, not just two."
    },
    {
      "end_time": 2485.282,
      "index": 98,
      "start_time": 2458.541,
      "text": " Okay, so then the claim or finding is that if we look at cat and dog inside of here in English, it gets mapped to some fourth space here, which is like a Rosetta stone space or platonic space. And that's exactly what they call platonic. They use the word. Okay, great. Well, great. And then it looks like this there. Okay. And then if you were to say, okay, well, let me just forget about English and this platonic space. Let me look at cat and dog in Spanish."
    },
    {
      "end_time": 2505.009,
      "index": 99,
      "start_time": 2485.657,
      "text": " Okay, and it looks like this here. Let me map it from here to my platonic space. Oh, wow, it gets mapped to a similar place. Oh, and does the Mandarin let's find that out a cat dog. It does. Okay, let's test out more words. So the claim is that this space here is this meaning like space."
    },
    {
      "end_time": 2523.217,
      "index": 100,
      "start_time": 2505.555,
      "text": " Okay, great. And then what you're saying is that microphone, we think of microphone as living in here as a single vector, that would be like an essence of the microphone that we're referring to. But actually, microphone, our concept of microphone depends on the prompt. So explain that that sounds interesting."
    },
    {
      "end_time": 2552.21,
      "index": 101,
      "start_time": 2524.445,
      "text": " Yeah, I think, and you're making me think about this in a way that I hadn't quite before. So the level of which I've thought about it is that you've got these different embeddings. When I see a microphone visually, there's a certain vector representation of what that sensory perceptual experience, and I don't mean the qualitative sense, I'm not getting into phenomenology, but"
    },
    {
      "end_time": 2582.483,
      "index": 102,
      "start_time": 2552.483,
      "text": " There's something happening"
    },
    {
      "end_time": 2609.94,
      "index": 103,
      "start_time": 2582.79,
      "text": " Each individual token is simply a vector in that space. So it really picks out a specific point. And we can say microphone lives right here in this linguistic space. And then my perceptual experience, I don't want to use that word, but my perceptual kind of grasping of this microphone being here is this point in a completely different space, this perceptual space."
    },
    {
      "end_time": 2640.367,
      "index": 104,
      "start_time": 2610.367,
      "text": " which has, you know, it captures other kinds of information. In language, so let's actually talk about this for a second. In language, the space, if you want it to be a useful, meaningful space, you're going to want things that have similar meaning. They're likely to actually have proximity to each other. And this is, to some extent, what the large language models learn. They learn in embedding. In order to do next token, they learn embedding that gives this, where the space, you know, and we could think of almost like two-dimensional, three-dimensional space."
    },
    {
      "end_time": 2668.985,
      "index": 105,
      "start_time": 2640.708,
      "text": " It's very high dimensional, but you know, for our purpose, we think about that, that where cat and dog live, you want those things to live closer together than cat and desk. And of course, it's much richer than that, right? It's not just semantic, like this very kind of superficial level of semantic similarity. In fact, what it is, is capture this somehow the semantics, so to speak, are captured by the space, like the space, the shape of the space itself,"
    },
    {
      "end_time": 2683.899,
      "index": 106,
      "start_time": 2669.36,
      "text": " is what allows the model"
    },
    {
      "end_time": 2711.578,
      "index": 107,
      "start_time": 2684.445,
      "text": " in terms of next token generation so that it's useful for that purpose. What does the perceptual space look like? Well, this perceptual space is going to have a very different... The axes there almost certainly aren't going to have the same kind of meaning as in the linguistic space. There'll be something closer, maybe color features, shape features, something like that. And where this microphone lives is within that space is going to have radically different meaning than saying..."
    },
    {
      "end_time": 2741.783,
      "index": 108,
      "start_time": 2711.937,
      "text": " It's not apples and oranges, right? Those aren't different enough, right? It's apples and math or something. It's really, really radically different kinds of spaces. But what I'm proposing, what I think the insight here is that ultimately there is the possibility of having a shared space that you can send, you can project both of these things to where microphone, the word, is going to somehow make contact"
    },
    {
      "end_time": 2770.384,
      "index": 109,
      "start_time": 2742.005,
      "text": " With this perceptual experience right now, this perceptual fact, but it's not, and here's the key point that you're getting at. It's not that this word microphone picks out the exact same embedding in this latent space. It's not that it's going to make that thing light up. Oh, it's the same thing. No, it's that when you ask a certain question about a microphone, is there a microphone on your desk, my perceptual system is"
    },
    {
      "end_time": 2798.387,
      "index": 110,
      "start_time": 2771.391,
      "text": " Generating some some well first of all just generating the perceptual phenomena, but then it's also sharing information in this latent space which my linguistic system can then go draw from and then Given this particular prompt was there a microphone on my desk. I'm able to then successfully answer the question So it's not it's not quite the same thing as saying that they are they're picking out the same information in latent space because"
    },
    {
      "end_time": 2828.797,
      "index": 111,
      "start_time": 2798.899,
      "text": " My argument is that that's not really a meaningful concept. There isn't the same. Microphone in linguistic terms doesn't pick out a perceptual kind of fact. That's not possible. These are radically different kinds of facts. But what the latent space might allow us to do is not just to translate, which is what they did in this paper, but perhaps to pass information along in a meaningful way so that you're able to access it and do something successful like answer the question."
    },
    {
      "end_time": 2858.268,
      "index": 112,
      "start_time": 2828.899,
      "text": " Is there a microphone on this desk? I think that might be what's happening to some extent even in the multimodal models. So it's a longer conversation. That's not really how they work. They don't actually operate based on a shared latent space or anything like that. Really what they do is the models learn to take a perceptual input and turn it into something like language. So it's more similar to like prompting almost. It's not exactly that. But it's injecting something within"
    },
    {
      "end_time": 2887.022,
      "index": 113,
      "start_time": 2858.575,
      "text": " Linguistic space that is equivalent to actual language. It's not the same thing as the shared latent space, but my hypothesis is that there may be something very similar happening. So you don't think that multimodal models will solve the symbol grounding problem? You don't even think there is a symbol grounding problem? That is a fair question. And here's actually a prediction or a falsifiable."
    },
    {
      "end_time": 2917.227,
      "index": 114,
      "start_time": 2887.278,
      "text": " in some sense."
    },
    {
      "end_time": 2946.613,
      "index": 115,
      "start_time": 2917.619,
      "text": " organism that's able to use language and also use perception in, you know, bridge these different maps in a meaningful way so that we can get, you know, full coherence. I guess, you know, let's just call it human level perceptual linguistic coherence so that I can say to you, hey, can you go grab that or say to a machine, can you go grab that object"
    },
    {
      "end_time": 2972.398,
      "index": 116,
      "start_time": 2946.817,
      "text": " described what I want and then the machine is able to go and do exactly what I described, then my argument is that I don't think, and again this is speculative, I could be proven wrong certainly on this, my suspicion is that we're not going to be able to do it using the kind of approach that multimodal models currently use, that you're not going to get there. It's kind of a dumb trick the way that we're currently solving the problem because we're not really allowing these two different"
    },
    {
      "end_time": 3001.886,
      "index": 117,
      "start_time": 2972.739,
      "text": " This podcast one day is kind of an early sort of canary in the coal mine for this idea is that it's something closer to this kind of shared latent space. What you have is these completely distinct kind of mappings"
    },
    {
      "end_time": 3021.783,
      "index": 118,
      "start_time": 3002.619,
      "text": " We call them embeddings. They can kind of grow up on their own, learn the information that they need to independently of one another. But at the same time, they have this sort of shared sandbox where they're able to communicate with one another and do things. So I think it might take a very different approach to get full perceptual linguistic competency."
    },
    {
      "end_time": 3052.193,
      "index": 119,
      "start_time": 3023.439,
      "text": " This episode is brought to you by State Farm. Listening to this podcast? Smart move. Being financially savvy? Smart move. Another smart move? Having State Farm help you create a competitive price when you choose to bundle home and auto. Bundling. Just another way to save with a personal price plan. Like a good neighbor, State Farm is there. Prices are based on rating plans that vary by state. Coverage options are selected by the customer. Availability, amount of discounts and savings, and eligibility vary by state."
    },
    {
      "end_time": 3082.381,
      "index": 120,
      "start_time": 3053.541,
      "text": " Okay."
    },
    {
      "end_time": 3110.64,
      "index": 121,
      "start_time": 3083.114,
      "text": " Have you heard of Wilfred Sellers? I believe it's Sellers. Oh gosh, I read Wilfred Sellers in early, nearly one of my first philosophy classes I ever took. I'm trying to remember the name of the book, but I'm sorry. So what, which, which, which work by? I believe it's empiricism and the philosophy of mine. I'll put a link on screen if I'm correct. Sounds familiar, but, but catch me up. So he's criticizing the idea that our perception gives foundational non-conceptual empirical knowledge."
    },
    {
      "end_time": 3117.261,
      "index": 122,
      "start_time": 3110.862,
      "text": " So these experiential givens that we think of as primitive, like redness, he would say that they involve"
    },
    {
      "end_time": 3147.09,
      "index": 123,
      "start_time": 3117.568,
      "text": " Heavy interrelations of concepts. So for instance, the way that I think about it is if you're to say to someone redness, they'll be like, well, what kind of redness exactly are you talking about? Then they'll think, okay, the redness of an apple, but then an apple is not always red. Okay. Redness of an apple in a certain season with a certain type of sunlight. Okay. Now I've gotten it. So by the time you go in to pull out this primitive, you've then soaked it with so many other concepts. You can't actually come in with language and pull out a primitive."
    },
    {
      "end_time": 3173.729,
      "index": 124,
      "start_time": 3148.029,
      "text": " Yeah, that sounds extremely similar to sort of to the initial insight. And it's related to the inverted qualia problem. I don't know why your red is not my green and vice versa. And it's because the linguistic representation doesn't capture, you know, we can think again, it lives in a completely different embedding space. And when we think about the redness of red,"
    },
    {
      "end_time": 3193.012,
      "index": 125,
      "start_time": 3174.121,
      "text": " Well, it's qualitatively similar to orange in, you know, there's sort of a continuum between those. Those qualitative similarities are really only contained and only understandable by the sensory perceptual system. And we can talk about them. We can sort of say, yeah, red is a little more similar to orange."
    },
    {
      "end_time": 3222.5,
      "index": 126,
      "start_time": 3193.456,
      "text": " That's because we have a very coarse maybe via this latent space where we're able to refer to certain kinds of properties in a way that is useful for communication. But as far as that raw qualitative property, it's primitive in the sense that we can't unpackage it linguistically."
    },
    {
      "end_time": 3244.565,
      "index": 127,
      "start_time": 3222.756,
      "text": " But it's not primitive in the sense that there's extraordinary cognitive machinery that is responsible for that qualitative. Think about the world of animals and what they do with color and how well they understand shape and how they understand space. All of that is unavailable to our linguistic system. It's available by the way to us."
    },
    {
      "end_time": 3265.845,
      "index": 128,
      "start_time": 3244.906,
      "text": " Our sensory perceptual system, but it's unavailable to linguistic system because it doesn't live in the same space at all. And so I think what you're describing actually sounds extremely similar. The idea that we can't really dip in. It's simply the wrong map. We can't map this map onto that map at all. We can go to this and just maybe the potentially shared latent space or maybe again, maybe my accounts wrong and there's some more direct"
    },
    {
      "end_time": 3286.732,
      "index": 129,
      "start_time": 3265.845,
      "text": " Kind of handshake that happens between the systems but ultimately they're they're they're taking place in radically different spaces and you're losing an enormous amount of information it's literally you know quantifiably a loss of information the word red does not convey redness because redness is not just"
    },
    {
      "end_time": 3310.145,
      "index": 130,
      "start_time": 3287.637,
      "text": " a word. It's not just a simple concept that you can say in using an individual token. By the way, the word red is not so simple either. Red in language space is also complex. It has all kinds of relations to other words, but the concept of red has all of this complexity to it because it's"
    },
    {
      "end_time": 3337.824,
      "index": 131,
      "start_time": 3310.538,
      "text": " Yeah. And just so you know, the way that I relayed what Sellers myth of the given is, it isn't precisely what he was saying because he was more about knowledge and I'm speaking more about the percepts, the raw sense data than being taken to language, like being dredged from the"
    },
    {
      "end_time": 3354.309,
      "index": 132,
      "start_time": 3338.49,
      "text": " from your sensory data to language."
    },
    {
      "end_time": 3383.78,
      "index": 133,
      "start_time": 3354.616,
      "text": " Thank you. Thank you. So now you're an LLM speaking to some other LLM trying to convince it of some truth that we mentioned before, like you have this model, whatever you want to call this model, autoregressive language toe model. What are you even referring to? You're using language to convince myself to convince yourself to"
    },
    {
      "end_time": 3413.439,
      "index": 134,
      "start_time": 3384.394,
      "text": " Explain what are you even explaining? What are you referring to? Yes, you've asked a very hard question and and there's there's a certain I think of it as a bit of a paradox that's sort of inherent in sort of what I'm trying to do Because language is trying to describe itself and in the process of doing so it's actually deconstructing itself It's saying I am just this And I'm not what I think I am but who's I and what do you mean think right? How is language have wrong concepts about itself?"
    },
    {
      "end_time": 3442.415,
      "index": 135,
      "start_time": 3413.729,
      "text": " that are actually manifestations of its own structure. The good news is that I have a sort of escape hatch here, which is this is really in some ways a very, very simple account. And it's just that there's prediction from sequence to token. How that does stuff in the world is a harder problem. How that does stuff, as we were discussing,"
    },
    {
      "end_time": 3467.295,
      "index": 136,
      "start_time": 3443.114,
      "text": " How does it allow me to say something to you that then can have perceptual consequences, behavioral consequences? This is certainly a difficult problem. But we can ignore that problem for a moment and say, we are going to take language on its own terms. And what language is, is simply a map amongst meaningless squiggles, is simply a map amongst various"
    },
    {
      "end_time": 3497.039,
      "index": 137,
      "start_time": 3467.568,
      "text": " What we can think of is largely arbitrary symbols, and those symbols can get grounded in writing. They could get grounded in the activation of circuits. They can get grounded in the dendritic or neural responses. But the core hypothesis here is that language is simply a topology amongst symbols."
    },
    {
      "end_time": 3511.63,
      "index": 138,
      "start_time": 3497.585,
      "text": " And by topology mean connectivity?"
    },
    {
      "end_time": 3540.486,
      "index": 139,
      "start_time": 3511.886,
      "text": " to try to capture this structure. In the case of large language models, it comes down to these embeddings, which you can do from a graph theoretical standpoint, but you don't have to. You could just think about it as a space, and then you're just simply saying where each token lives within that space. And that's really the representation of language. But what it is is relational. It's that these symbols have relations to one another within this space."
    },
    {
      "end_time": 3567.995,
      "index": 140,
      "start_time": 3540.845,
      "text": " The relations are then used in order to generate. And that's it. Now, how does meaning emerge out of that is a separate question. But my argument is that language doesn't have to worry about meaning. Language just has to worry about language. So when I say I'm talking to you and I'm having a conversation with you and trying to explain something to you, this is an LLM actually producing a sequence"
    },
    {
      "end_time": 3594.343,
      "index": 141,
      "start_time": 3568.268,
      "text": " And what that sequence is going to do, it might do certain perceptual things, by the way, in your mind, it might produce certain kinds of images. Those are kind of auxiliary to language. Those happen as well. I'm not denying they happen. But as far as this conversation goes, I am producing a sequence that's going to serve as a prompt and you're going to predict the next token. Yeah, without my consent, by the way. And that's that is that's in some ways, you know, that not to you."
    },
    {
      "end_time": 3614.718,
      "index": 142,
      "start_time": 3594.855,
      "text": " Take that too seriously, but yes, one way to think about it is that language is actually forcing your mind to do something else, whether it's produce images but also to produce sequences. So my choice of a prompt is actually going to deterministically"
    },
    {
      "end_time": 3642.534,
      "index": 143,
      "start_time": 3615.247,
      "text": " There is, within large language models, there's some probabilistic kind of behavior in the sense that they generate a distribution of the next token and then you add a little bit of chanciness. You say, maybe I'm going to pick the most likely versus this is the temperature, but it really is deterministic and yes, the prompt I'm going to put into your head is going to basically determine how you're going to respond."
    },
    {
      "end_time": 3654.684,
      "index": 144,
      "start_time": 3642.858,
      "text": " Now, mind you, again, there's a larger ecosystem where you're going to think about things visually and that's going to go feedback into the linguistic system. So it's not quite as simple as prompt in and sequence out."
    },
    {
      "end_time": 3682.824,
      "index": 145,
      "start_time": 3655.06,
      "text": " But at the linguistic level, that's basically what I'm arguing is now the fancy stuff, which is basically meaning and the ability to coordinate all that falls out of how our minds ultimately form this space. Now we could, you know, we could, we could, you can take an untrained model, an untrained large language model, you give it a sequence in, it's going to give you a sequence out, right? It will do that. And we say, Hey, look, it's, it's doing next token generation."
    },
    {
      "end_time": 3703.865,
      "index": 146,
      "start_time": 3683.029,
      "text": " I"
    },
    {
      "end_time": 3732.858,
      "index": 147,
      "start_time": 3704.582,
      "text": " This harder problem of, well, I can tell you something and then that's going to determine not just your language, but your behavior later on. And so there really is something more. The map matters, right? The space, the shape of the space is really, really critical. It's not like autoregressive Next Token solves the problem. It's that autoregressive Next Token generation, when optimized in the larger ecosystem of behavior and coordination and communication,"
    },
    {
      "end_time": 3763.131,
      "index": 148,
      "start_time": 3733.251,
      "text": " Does this thing but still I don't want to back away from this when you get down to it in the end What you've got is just next token. What you've got is just language generating language That's really what language is. That's what's what we're doing when we're doing thinking linguistically and The fact that it happens to have this meaning is not actually Driving the computation right you shape the space the space gets shaped by other factors things like"
    },
    {
      "end_time": 3791.032,
      "index": 149,
      "start_time": 3763.439,
      "text": " the learning. Well, you learn about the different tokens and how they relate to one another. You learn about, perhaps, the utility of certain tokens to refer and to map to these perceptual phenomena. But by the time you're doing language generation, the space has been shaped. And so all you're doing is next token generation. All you're doing is predicting tokens."
    },
    {
      "end_time": 3819.019,
      "index": 150,
      "start_time": 3791.391,
      "text": " And so I don't want to back away from that. The strong claim is that language simply is that. And it's autonomous. It has these properties. Through all this optimization over the course of development, maybe evolution, it's not part of my theory at this point. Chomsky's poverty stimulus, all of these problems of how do we get to such a magnificent space? How do we get to such a magnificent shape of this space?"
    },
    {
      "end_time": 3843.268,
      "index": 151,
      "start_time": 3819.292,
      "text": " such that it is able to map to, you know, or at least serve this utility of being a coordinative kind of a tool. All of that has to happen. But the bottom line of what language is, is unchanged in this account. Okay, so I want to explore more about language and then it"
    },
    {
      "end_time": 3868.063,
      "index": 152,
      "start_time": 3843.473,
      "text": " Relating or giving rise to action and other systems visual systems, etc. Like what is there? So I was worried about that but good Okay, so if look there are meaningless squiggles, how is it that some meaningless squiggles your brain squiggle generator? Makes your physical body get up and close the door because your dog was barking Yes, where does action connect to abstraction? Yeah, so and that so that is that is the key to"
    },
    {
      "end_time": 3896.254,
      "index": 153,
      "start_time": 3868.541,
      "text": " question that I believe that's what we need to solve. That's sort of what the field of linguistics or whatever we want to call it, maybe even cognition needs to solve because these mappings are happening. What we know from the large language models is you don't need that in order to be proficient in language. So this is where we have to start from. That's the starting point that the language can live on its own and you can learn language. In theory, you can learn language independent of any of that stuff."
    },
    {
      "end_time": 3925.043,
      "index": 154,
      "start_time": 3896.732,
      "text": " the ability to make somebody get up and move, the ability for me to reason about perceptual phenomena, language is able to be mastered entirely based on its own structure, the meaningless squiggles. Now, the question you're raising is what I think is, that's what we need to do as a species. If we want to understand scientifically how language really works is to understand how you go from"
    },
    {
      "end_time": 3946.783,
      "index": 155,
      "start_time": 3925.418,
      "text": " An autonomous self-generating system that has its own self-generating rules that are determined simply by relations between these meaningless squiggles and how does that then get mapped to the ability for me to use some of those tokens and then get you to do stuff."
    },
    {
      "end_time": 3969.48,
      "index": 156,
      "start_time": 3947.227,
      "text": " Right. And so that is what's that's what language learning is. So there's going to be, I guess we can think of almost two, maybe even independent processes. One is learn how words play with one another. OK, learn that this kind of this word tends to be in relation to that word. OK, that one solved. Yes. As far as you're concerned. Got it. OK."
    },
    {
      "end_time": 3995.862,
      "index": 157,
      "start_time": 3969.906,
      "text": " Then we also learn about perceptual phenomena, right? We learn that there's things on top of other things and there are actions we want to take, the things that my dog understands. Now, the question is, how do these things bridge? How do you get from tokens that have their own life of their own, sort of relational properties amongst one another to that other kind of, I guess,"
    },
    {
      "end_time": 4020.128,
      "index": 158,
      "start_time": 3996.084,
      "text": " Facts about the world is just another brain state. All we've got is brain states."
    },
    {
      "end_time": 4048.37,
      "index": 159,
      "start_time": 4020.35,
      "text": " This is fact number one about what we've learned about ourselves as a species. We have perceptual brain states. We also have maybe linguistic brain states. Those perceptual brain states are in some ways related to what's going on in the world. Potentially, we can think about them as being related to what you can do in the world as well. Maybe actions"
    },
    {
      "end_time": 4072.466,
      "index": 160,
      "start_time": 4048.763,
      "text": " Well, we have brain states that correspond to our proprioception, our muscles, where things having to do with our own body. And so there's these various brain states that carry, we could think of them as carrying information. The reason I'm worried about using that phrase is because again, I don't believe in sort of a one-to-one simple correspondence where we say this particular brain state corresponds to"
    },
    {
      "end_time": 4102.773,
      "index": 161,
      "start_time": 4072.79,
      "text": " You know this perceptual Kind of phenomenon in the world or some state of the world because it's probably not that simple It's probably closer to this potentialities, right? There's some sort of activity that's due to my perceptual system that my brain can do things with and engage with in some way and so but what we do have is these brain states that are derived from from distinct sources of information"
    },
    {
      "end_time": 4131.305,
      "index": 162,
      "start_time": 4103.097,
      "text": " Sensory perceptual and then linguistic. Linguistic gets there, by the way, through sensory perceptual. We're not going to get into that, right? We're thinking of symbols as being kind of arbitrary. Yes, you have to hear the word cat and you have to hear the word dog. But I think we have a good reason to say now it's just like large language models that these are kind of arbitrary symbols with the relations between them. That's what matters. OK, so you've got these distinct brain states, which in some ways, again, this is"
    },
    {
      "end_time": 4160.913,
      "index": 163,
      "start_time": 4131.647,
      "text": " philosophically fraught, but in some ways represent facts about the world, perhaps, but I don't really want to go that far. But you've got these brain states that need to talk to each other so that they can coordinate. And that is sort of the key fundamental problem that our organism has to solve. And it's not just like, of course, you're not born having the linguistic, you're not born having the linguistic mapping all solved. You have to learn that."
    },
    {
      "end_time": 4190.759,
      "index": 164,
      "start_time": 4161.527,
      "text": " But you are born into a world where it's already been solved. Meaning, we've got these corpus of language, the thing that the large language models were trained on. That pre-existed the models, just as when a baby is born, the English language pre-exists the baby. You can learn the mappings, and I believe you do, you can learn the embedding space of language without the other stuff, right? That's again, that's sort of the key insight for the large language models. So that already contained within"
    },
    {
      "end_time": 4216.032,
      "index": 165,
      "start_time": 4191.203,
      "text": " the linguistic system that we've honed over however many years it took for humans to develop language. We've honed a system that has this utility built in such that it's a good thing to dump into that latent space so that when a baby hears the word ball, sees this object that's a ball, gets that mapping. But again, the word ball"
    },
    {
      "end_time": 4246.271,
      "index": 166,
      "start_time": 4216.647,
      "text": " is really meaningful. It really has its own role in relation to other words. But over the course of development, you also learn this kind of what I think is maybe a latent space bridge or some other bridge between these. And so in the end, you end up being able to tell somebody go pick up that ball. And of course, they're able to go and do it. But you're really engaging very distinct mechanisms that have some way of bridging, which is it's a non-answer. I'm not going to pretend"
    },
    {
      "end_time": 4276.374,
      "index": 167,
      "start_time": 4246.493,
      "text": " that that is even halfway to a solution. But I do think it's a sketch of how the cognitive architecture ultimately really is built. I think that we've now nailed down one piece, the linguistic piece, and we're able to say this is how it lives, and this is how it would operate, and it is autonomous. And then we don't have something similar for perceptual space and for motor space. We don't have something comparable. We haven't been able to capture it."
    },
    {
      "end_time": 4306.374,
      "index": 168,
      "start_time": 4276.51,
      "text": " So maybe this is a solved problem, but as, as it stands with chat, GPT and Claude and so on, they're fixed models and they're producing some output, but it's not as if when they're speaking to one another, they then retrain their model in real time. And it would seem like that's more like what's occurring with us. So maybe that's just a technology. Are you referring to like different, different language models, just chatting with one another?"
    },
    {
      "end_time": 4327.654,
      "index": 169,
      "start_time": 4307.329,
      "text": " No, I mean, even us right now, we're learning concepts from exchanging it with one another and we're producing new ones and we're deleting old ones, potentially modifying old ones, recontextualizing. It doesn't seem like that's occurring with Gemini 06-05. Great question. Great question. And people"
    },
    {
      "end_time": 4353.78,
      "index": 170,
      "start_time": 4328.456,
      "text": " This is one of the key challenges of the identity hypothesis that we're doing the same thing, which is continuous learning. There are two things that happen in large language models that we can call learning. One is the actual shaping of the space, which is really just determining"
    },
    {
      "end_time": 4383.643,
      "index": 171,
      "start_time": 4354.172,
      "text": " You know, the connectivity between neurons, again, you could think of it as a graph, you know, or you could think of it as sort of just an embeddings of determining the embedding. But whatever it is, that happens during the course of training. And that's kind of done offline. And yes, it's so that's training the model. There's also fine tuning, which is just more of the same. You have some new data you want to incorporate into the weights of the model. That's actually going to, again, change the shape of the space if you want to think about it that way."
    },
    {
      "end_time": 4410.657,
      "index": 172,
      "start_time": 4384.292,
      "text": " And then there's something called in-context learning. And in-context learning is where you're in the middle of a chat and you say, Hey, chat GPT, let me teach you a new word. It's a global global and global global is that feeling you get when, uh, you know, you really, you're tired, but you know, you have to keep working or whatever. Chat GPT can, can use that word very successfully. I got global gobble up the wazoo. Sure you do."
    },
    {
      "end_time": 4425.708,
      "index": 173,
      "start_time": 4411.084,
      "text": " You suffer from extreme global."
    },
    {
      "end_time": 4453.729,
      "index": 174,
      "start_time": 4426.049,
      "text": " I don't remember who put out the paper, but it was about the shocking generalizability. The in-context learning seems to be too good to be true. But lo and behold, that's what happens. And that is happening in the autoregressive. It's happening even though this model has never seen Google global, even though it's never encountered that word before. But here it shows up in the sequence and now"
    },
    {
      "end_time": 4477.005,
      "index": 175,
      "start_time": 4454.275,
      "text": " Through the auto aggressive process as it's churning through the longer sequence with this word in it is now able to predict sort of the next token in the appropriate way. So that is using that term correctly. So we do actually see this kind of continuous learning in the case of these models. However, it's happening in context. It's and what that means is"
    },
    {
      "end_time": 4505.947,
      "index": 176,
      "start_time": 4477.773,
      "text": " you know, from a practical standpoint is if you start a new chat window, yes, it doesn't know that word anymore. So what would be the analogy here is context, window length, our working memory? Like what's the actual? Great question. Yes, that is what I truly believe. And this is a different line of research. But with some caveats. So yes, in my conception, what we call long term memory is just fine tuning of the of the weights."
    },
    {
      "end_time": 4533.899,
      "index": 177,
      "start_time": 4506.271,
      "text": " It's information that gets embedded in the actual weights of the model. So the static model we can think of when it's not actually in the process of autoregressively generating. Working memory is literally autoregression. So what would the analogy for rag be then? What would the role for rag be? Okay, so this is where I'm at right now. Does the brain actually do anything like retrieval?"
    },
    {
      "end_time": 4565.469,
      "index": 178,
      "start_time": 4535.469,
      "text": " I've decided to stake out the extreme view that our brain doesn't do retrieval at all. That all we do is fine-tuning and then next token generation autoregressively. And we don't actually ever retrieve per se. That we don't ever actually have to do anything like RAG. RAG is a transitional technology. I don't believe long-term that we're going to have to do something like that. We're going to have to have something like a stored database and then a search."
    },
    {
      "end_time": 4591.937,
      "index": 179,
      "start_time": 4565.845,
      "text": " One of the reasons I believe this is because that's not how our brains work. We don't do that. Cognition doesn't work that way. We may sometimes sit there pondering and trying to recall a fact. But when we're doing that, we're not actually searching a space. It's either we're running some sort of chain of thought where we're like, okay, I remember I was doing this and I'm trying to actually produce the appropriate sequence in working memory such that"
    },
    {
      "end_time": 4615.828,
      "index": 180,
      "start_time": 4592.278,
      "text": " It'll pop out. The right fact will pop out from the autoregressive process. Sometimes we just find ourselves trying to remember something, trying to remember something. There's a tip of the tongue phenomenon. The reason why tip of the tongue phenomenon, I believe, is so frustrating is not because we're searching, searching. We're actually running some sort of search retrieval process. It's because"
    },
    {
      "end_time": 4644.753,
      "index": 181,
      "start_time": 4616.032,
      "text": " Part of our brain actually is running the autogenerative process and we kind of can feel like the word, we can almost generate, we can produce it, but it's short-circuited somehow and we can't do the full generation. So my hypothesis is that we don't have anything like RAD. All we've got is this, and it's in some ways a very simple and I think elegant model. All we've got is fine tuning and that's what we can call memory consolidation. That happens after the fact over the course of minutes and weeks and months and years."
    },
    {
      "end_time": 4672.756,
      "index": 182,
      "start_time": 4645.179,
      "text": " It's not working memory. I'll tell you what I don't I don't working memory in the way it's it cognitive psychology has thought about it for many years. I think I frankly think is erroneous. It's not this super time duration limited, you know, seven seconds or 15 seconds. And after that, it's a cliff."
    },
    {
      "end_time": 4701.544,
      "index": 183,
      "start_time": 4673.097,
      "text": " and you don't remember anything. That's what happens when you have to directly explicitly retrieve, like what was the last word I said? Tell me the exact sequence of letters or numbers. That's not something our brain actually has to do regularly. Instead, what we're seeing in working memory, we can do that, we can do retrieval of the last seven seconds, but that's because we have continuous context. And there is a decay function, unlike the large language models, which represent everything"
    },
    {
      "end_time": 4722.295,
      "index": 184,
      "start_time": 4702.056,
      "text": " It's not retrieval."
    },
    {
      "end_time": 4751.169,
      "index": 185,
      "start_time": 4722.807,
      "text": " It's guiding. It's the past is guiding the generation. And so what you and I talked about an hour ago, I don't know how long been going here, probably a while. I don't know how much global you've got going on, right? It's been a while. So those those tokens that we were that we were expressed, you know, an hour ago are still guiding the generation now. Now they're doing so less than the than the last 10 seconds. This is, you know, we could think about it as kind of a decay function of some sort, where they're having less impact."
    },
    {
      "end_time": 4765.213,
      "index": 186,
      "start_time": 4751.647,
      "text": " We see that in the models too, by the way. If you look at the attention weeds, words that are farther apart have less impact on one another. That is a direct reflection of the fact that language is"
    },
    {
      "end_time": 4794.36,
      "index": 187,
      "start_time": 4765.623,
      "text": " You've been generated and humans do this, right? We the words that we spoke about a few seconds ago are more impactful on the words that we're going to say than we spoke about an hour ago. But the idea is, yes, that what we've got is this not I don't I don't use the term working memory because I think that's very fraught with with like the the the modal model that's been in vogue for a long time. We're working memory model badly. And all these folks, they were really thinking of this very short duration time limited boom. No, this is"
    },
    {
      "end_time": 4816.084,
      "index": 188,
      "start_time": 4794.991,
      "text": " Continuous Activation, namely context. I don't know how far back it goes."
    },
    {
      "end_time": 4843.097,
      "index": 189,
      "start_time": 4816.544,
      "text": " I don't know how far back it goes, right? This is an empirical question. Does it operate over hours? Does it operate over days? Is there a continuous activation, a more dynamical form of memory that's happening? That's not the same thing as long-term memory because long-term memory memory is not a database in your model. My memory is not a database. Correct. What memory in my model is, is the there's two things. Memory is the fixed weights of the neural network."
    },
    {
      "end_time": 4867.79,
      "index": 190,
      "start_time": 4843.695,
      "text": " which can represent, they don't represent facts, they represent potentialities. Those fixed weights are, what does that mean? It means if you give it a certain input, it's going to produce a certain output, right? Just like a large language model. If I say to it, tell me the, you know, recite the Pledge of Allegiance, it will say, here is the Pledge of Allegiance, right? The next token is going to be out is here, whatever. But then it'll actually say the Pledge of Allegiance. And all of that is a potentiality"
    },
    {
      "end_time": 4896.288,
      "index": 191,
      "start_time": 4868.166,
      "text": " that's embedded, that's encoded in the weights. But the weights, you're not going to find that fact in the weights. It's the weights are there as potentialities ready for whatever input comes their way. They're going to produce this input. Okay. Okay. Okay. So that's the weight. And then you've got the running sequence and the running sequence. And we, we see this from, from in context learning, but it's the, it's the core autoregressive process. The sequence itself,"
    },
    {
      "end_time": 4918.183,
      "index": 192,
      "start_time": 4896.664,
      "text": " Is there some"
    },
    {
      "end_time": 4938.404,
      "index": 193,
      "start_time": 4918.797,
      "text": " computation going on there's some black box occurring but let me make it simple for linear algebra you have a matrix a matrix operates on a vector to produce another vector okay so you may look at the whole thing right all right exactly so you may look at this where my arm is pointed up and to the right at least on my screen right now"
    },
    {
      "end_time": 4966.596,
      "index": 194,
      "start_time": 4939.36,
      "text": " And you may say, where is this in the matrix? And the answer is this isn't in the matrix. But if you take this guy, my arm is now pointed to the left, maybe parallel to the horizon and have the matrix operate on this, it moves it here. So the mistake is for us to look at the output and say, where's that output inside the box? It's not that it's the input with the black box. So the input with the matrix that produces the output."
    },
    {
      "end_time": 4995.35,
      "index": 195,
      "start_time": 4967.005,
      "text": " that is perfectly set exactly with and then there's but one additional piece which is after you've produced that you're also again taking taking that that output and then using that as the input as part as a neck as as part of the sequence of input and that's the order aggressive piece and that's what's so gorgeous about it is that the potential realities aren't just to produce a single output but it's to produce the sequence but to do so one piece at a time"
    },
    {
      "end_time": 5022.807,
      "index": 196,
      "start_time": 4995.947,
      "text": " So that's what the matrix is. The matrix doesn't really even have the sequence in it. It doesn't have a sequence in, sequence out. That's not even correct. It's sequence in, one token out, add it to the sequence, do it again, do it again. So the sequence is in there, but only in this potential form. It has to do it autoregressively. It can only produce the sequence by feeding it back into itself recursively."
    },
    {
      "end_time": 5052.927,
      "index": 197,
      "start_time": 5023.695,
      "text": " And that's a radical way of thinking about what the brain is doing, right? That what it's really doing is it's generating the next input for itself, not just generating an output, but the next input for itself. Super interesting. Yeah. It's a, it's a, it's a, it's recursive. It's, it's fundamentally recursive. Um, and, and, and when we think about what the system is built to do this recursion, right? It's not just like, this is one way to get to it. The language contains within it, the ingredients for producing"
    },
    {
      "end_time": 5081.971,
      "index": 198,
      "start_time": 5053.268,
      "text": " This kind of recursion, the language it contains with it, this sequence of language that they learn, it's built to have this recursive capability within it, that this word is going to produce the next word, which is going to produce the next word, premised on the entire sequence before it. And that's the crazy thing. There was also this interesting result Anthropic put out a paper a little while ago, I think it's called The Biology of Large Language Models."
    },
    {
      "end_time": 5110.708,
      "index": 199,
      "start_time": 5082.449,
      "text": " Next token, even though you're only producing the very next token from the sequence, but the language models have learned that because they learn sequences to next token, they've learned that within any point along the sequence, that point in the sequence is pregnant with the potentiality for not just the next token, but many other tokens moving forward. It's the whole trajectory that is sort of encapsulated"
    },
    {
      "end_time": 5140.538,
      "index": 200,
      "start_time": 5111.493,
      "text": " In that matrix that you're talking about earlier, the matrix is just a matrix for taking a sequence, produce the next token. But no, no, the matrix is customized so that it's going to run recursively. And so it's tuned in such a way that it's going to produce the next word, the. Well, that's not useful. No, the is the next piece of the autoregressive chain that's going to produce the man when to the store."
    },
    {
      "end_time": 5155.538,
      "index": 201,
      "start_time": 5141.254,
      "text": " It's not just any old matrix, it's an indescribably rich kind of information that's contained within that matrix. And I like to think about if aliens landed and found the brains,"
    },
    {
      "end_time": 5172.927,
      "index": 202,
      "start_time": 5155.674,
      "text": " You know, because we've been wiped out by AI. I'm kidding. I'm kidding. Right. But, you know, there's no humans left, but we find the brain sort of crossified and we were able to do this and we start feeding it stuff and we could see that there's this input output. If you didn't do the auto aggressive piece, you would never understand what the hell this thing is doing."
    },
    {
      "end_time": 5201.305,
      "index": 203,
      "start_time": 5172.927,
      "text": " Note, Elon's been talking plenty about autoregression and the technically minded among you may be wondering about the success of diffusion models. While we don't get to it here, he does admit that his thesis would be undermined if diffusion models were accurate enough for natural language. But so far they seem to be only good for coding. This is something I love about Professor Elon Barinholtz. He's extremely humble and open to how his model can be falsified. If you didn't do the autoregressive piece, you would never understand what the hell this thing is doing."
    },
    {
      "end_time": 5225.538,
      "index": 204,
      "start_time": 5201.817,
      "text": " It's you would get it all wrong because you think its purpose is to produce some sort of label or some sort. No, its purpose is to produce these sequences, but you have to run it. You have to run it on or aggressively and get the output and then feed it back in as a sequence. So memory, this kind of short term memory, working memory is fundamental. It's super, super fundamental. The brain is I don't want to."
    },
    {
      "end_time": 5253.319,
      "index": 205,
      "start_time": 5226.288,
      "text": " I don't want to anger people, but it's non-Markovian. It's fundamentally non-Markovian. It's not state in and then the current state and then produce the output. It's previous states. There's a sequence of states that led to the current state and it's the particular sequence that leads to the next token and the next token is going to be the next element"
    },
    {
      "end_time": 5274.394,
      "index": 206,
      "start_time": 5253.541,
      "text": " This puts you in good company with Jacob Barnes."
    },
    {
      "end_time": 5297.449,
      "index": 207,
      "start_time": 5274.787,
      "text": " Ultimately, it has this sort of normal coven property that the universe sort of has a memory has to in order to produce, you know, consistent coherence of space, you know, space in the space time has to have a sort of memory. If it's just instantaneous, this current state, well, then it wouldn't really know what to do. It has to sort of know what happened recently."
    },
    {
      "end_time": 5325.247,
      "index": 208,
      "start_time": 5298.012,
      "text": " Just a moment in your model, because our minds work autoregressively and must be non Markovian in your model. And this is how our cognition works, which we didn't exactly get to. We got to the language is an autoregressive model. Your next thesis was that cognition itself is autoregressive in a similar manner. Later, maybe we can explore it here today. Maybe we'll save it for the next part. It's that physics itself is autoregressive. However,"
    },
    {
      "end_time": 5352.5,
      "index": 209,
      "start_time": 5325.555,
      "text": " Physics is a model, and many people will conflate physics with reality, where physics is our models of reality. So are you making the claim that reality is non-Markovian, or are you saying that necessarily as we model reality, it will be non-Markovian? No, I'm making the former claim that reality itself is non-Markovian, that we observe in physics certain kinds of phenomena, that we end up having to use tools like, refer to things as forces,"
    },
    {
      "end_time": 5379.599,
      "index": 210,
      "start_time": 5352.79,
      "text": " that ultimately are really kind of sneaking in a past. And the idea is that the deterministic nature of the fact that there's coherence, you know, the spatial-temporal coherence, the fact that, you know, that things move the way they do through space, there's a contingency on the past in a way that you can't really capture by saying you could fully... The past is actually present. The past is in the present."
    },
    {
      "end_time": 5399.241,
      "index": 211,
      "start_time": 5379.889,
      "text": " In a deep way that the universe really has to have a memory in order to produce the sort of the next frame, so to speak. That's sort of the shallow version of the claim. It's not about our particular characterization of physics. Our characterization of physics"
    },
    {
      "end_time": 5423.865,
      "index": 212,
      "start_time": 5399.701,
      "text": " Observes certain kinds of spatiotemporal continuity certain kinds of contingencies that really depend on What's happening in not just it's not about this instantaneous moment, right? Like in some ways It's like Zeno's paradox. It's it's you know, we can use calculus and say no no and in fact, there's an instantaneous rate of change, but That's that's a mathematical trick"
    },
    {
      "end_time": 5452.278,
      "index": 213,
      "start_time": 5424.155,
      "text": " That's really getting away from the fact that no, there isn't an instantaneous anything. There's simply a continuity that depends on what's happened in the past. But I know I'm going to get attacked by physicists and I'm not really well equipped to fend them off. So I don't want to be too bold in this piece because it's not in my wheelhouse. But I do want to take that question and this conversation. Do I think the brain"
    },
    {
      "end_time": 5481.169,
      "index": 214,
      "start_time": 5452.705,
      "text": " Is just leveraging sort of the memory of the universe. No, I think the brain I think and that this is an empirical claim That we see interesting features of the brain like feedback loops. There's all these all these Backwards kinds of connectivity there's recurrent loops and things like that and they're not well understood and the predictive coding has some things to say about that I have some things to say about predictive coding and I think that"
    },
    {
      "end_time": 5507.363,
      "index": 215,
      "start_time": 5481.391,
      "text": " What we may find is that this kind of memory, this kind of continuous, we can call it a context, a continuous activation, but this ability to use the past to guide the next generation is going to end up being physiologically built into the brain. It's not that the brain is just leveraging memory of the universe. No, the brain has to do memory. It has to actually retain"
    },
    {
      "end_time": 5532.363,
      "index": 216,
      "start_time": 5507.841,
      "text": " The words that I said a couple of seconds ago to be able to generate the next word appropriately. And in fact, that's what we see and what we see from, you know, so-called working memory experiments. You can really go back in and say what happened before. My claim is that it's not because it's there to retrieve, but rather it's just guiding my current generation. But still, it's represented. It's there. What happened in the past,"
    },
    {
      "end_time": 5562.176,
      "index": 217,
      "start_time": 5532.858,
      "text": " You know, it's not like Vegas, right? What happened in the past days doesn't stay in the past. It actually guides the current generation and it's guiding what I'm saying right now. And it's doing so, you know, smoothly, meaning it's happening from a second ago. It's happening from a few seconds ago. But all of this is beautifully modelable using large language models. We can just look at tension weights. We could say what is the impact of information from this far back?"
    },
    {
      "end_time": 5569.718,
      "index": 218,
      "start_time": 5562.585,
      "text": " I don't think the brain is doing probably not doing what"
    },
    {
      "end_time": 5596.647,
      "index": 219,
      "start_time": 5570.026,
      "text": " These large language models do and that's one of the reasons I say I'm not claiming that we are a transformer model. I'm not claiming we are GPT in this current incarnation, right? What I'm claiming is that the fundamental math what you just said before is matrix multiply is vector times matrix multiply to not next vector autoregress do it again. That's sort of the level of abstraction at which I think it's accurate. I don't think we're we don't have the whole context. We don't have the entire conversation. We've just had"
    },
    {
      "end_time": 5610.981,
      "index": 220,
      "start_time": 5597.056,
      "text": " GPT does, and it's probably a deep inefficiency in the way these models run right now. They're very computationally expensive. Too computationally expensive to run in a brain, most likely. We don't store all that information. We forget stuff."
    },
    {
      "end_time": 5638.831,
      "index": 221,
      "start_time": 5611.169,
      "text": " Right. She BT doesn't in context. It doesn't forget. Although if you go far back in context enough, it kind of does, which is interesting, probably is similar to what we're talking about because the you're you're waiting things that are further back less. But in humans, we're not doing the whole context. We're not even like 30 seconds back in perfectly. But some representation and what the nature of that representation is, that's what I want to do with the rest of my life. I want to understand"
    },
    {
      "end_time": 5667.261,
      "index": 222,
      "start_time": 5639.309,
      "text": " What does the context look like in people? What is that activation? How is it physiologically instantiated? And what are its mathematical properties? How much does – how is what I said 10 seconds ago influencing what I'm saying now? How about 50 seconds ago? How about 10 minutes ago? How about a year ago? Does this thing continue? Is there dynamics that are continuing over months and years? Possibly. It doesn't all have to be fine-tuned weights. It could be that there's"
    },
    {
      "end_time": 5686.476,
      "index": 223,
      "start_time": 5667.688,
      "text": " Decaying activation that spreads over much longer periods. Once you allow that it's not explicit retrieval in the working memory form, then all bets are off as to how the dynamics of this thing actually works. I see this as a possible new frontier for thinking about"
    },
    {
      "end_time": 5715.452,
      "index": 224,
      "start_time": 5687.21,
      "text": " You know what, what, what, what memory really means in humans. But I think physiologically, you know, coming back to that question and there I was just trying to do it. I was like, okay, let me rerun. What was the original question, right? So, so, uh, in the brain, what's the happening in the brain? I think we, you know, my hypothesis actually leads to some concrete predictions that we're actually going to be able to find some correspondence between, you know, unlike the working memory model, I think we're going to be able to find 10 minutes back."
    },
    {
      "end_time": 5741.937,
      "index": 225,
      "start_time": 5715.828,
      "text": " We're going to find some, some activations that are interpretable. We'll be able to decode them as guiding my current, my current expression, my current speech. It's very different by the way than saying, you know, the classic decoding model in these things is here's some neural activity. Is it this picture or that picture? Is it this word or that word? It's not going to look like that. It's not going to look like that. We're not going to be able to code it in the sense of like a concrete specific static thing."
    },
    {
      "end_time": 5769.189,
      "index": 226,
      "start_time": 5742.295,
      "text": " We have to decode it in terms of whether it's guiding my next word because that's what it's doing. It's not there to be retrieved. It doesn't have a concrete specific meaning. It has meaning insofar as it's guiding my next generation. And so we have to think about this entire project differently. If we want to think about longer term working memory, so to speak, we have to think it in terms of how is my speech, how is my behavior now influenced by what happened a while ago, not"
    },
    {
      "end_time": 5790.401,
      "index": 227,
      "start_time": 5769.667,
      "text": " So one of the reasons I was excited was and am excited to speak with you is that I see this as a new frontier as well. But for me, I have a side project, which I'll tell you about maybe off air because I'm not ready to announce it."
    },
    {
      "end_time": 5818.439,
      "index": 228,
      "start_time": 5790.879,
      "text": " But there are philosophical questions that we can look at with the new lens that's gifted to us by these statistical linguistic models, the ones we call LMS, LMS, sorry, physical philosophy. I don't know if you've heard of that. Have you heard of this term physical philosophy? So you can use philosophy to philosophize about physics, but you can also use physics to inform your philosophy. So there are some established concepts and theories and empirical findings from physics, like special relativity or quantum mechanics."
    },
    {
      "end_time": 5842.602,
      "index": 229,
      "start_time": 5818.933,
      "text": " that inform and constrain or even reframe traditional philosophical questions such as the nature of time that wouldn't be there had we not invented special relativity or found special relativity. Okay, so I think there's something about these new models that can be used to then inform philosophical questions. Like you mentioned, there is no symbol grounding problem."
    },
    {
      "end_time": 5869.224,
      "index": 230,
      "start_time": 5842.927,
      "text": " If physics has a memory, does that mean that energy isn't conserved? So is a particle carrying with it its memory, then why isn't it heating up or getting more massive with time?"
    },
    {
      "end_time": 5891.323,
      "index": 231,
      "start_time": 5869.48,
      "text": " Why isn't it going to form a black hole? This is why I venture very carefully into these waters because I would need some time to go and read and think about questions like that and you're in a much better position to ask and reason about those questions."
    },
    {
      "end_time": 5920.35,
      "index": 232,
      "start_time": 5891.323,
      "text": " Yes, I and then you'd also have to talk about why is the present plus velocity model like way of viewing the world so successful like to predict an eclipse you don't require knowledge about 100 and 200 and 300 years ago all at once right you just know the present pretty much right but even velocity again if you if you sort of take me at it if you consider the sort of the instantaneous you know the idea that velocity velocity"
    },
    {
      "end_time": 5948.302,
      "index": 233,
      "start_time": 5920.674,
      "text": " Well, it isn't really in the present, right? You can only get velocity over stretched over time. It only has meaning. But you could say this particle has this velocity at this time. But that's a cheat, right? In some ways that I see that that really maybe it's just a rearticulation. So the physics that we've got, we've been we've been able to do this sort of symbolic representation of things like velocity that are sneaking in this kind of temporal extension."
    },
    {
      "end_time": 5973.643,
      "index": 234,
      "start_time": 5948.882,
      "text": " in a way that I think may not end up in an erratically different place, thinking about this as the universe having memory, as long as you just accept that velocity is a convenience, that it's a kind of way of communicating some property, such that you can say that this is happening instantaneously, but that's not real."
    },
    {
      "end_time": 5999.087,
      "index": 235,
      "start_time": 5974.923,
      "text": " So again, you're in good company with Jacob Arndez. I'm not saying that these questions are in principle unanswerable, but something else is that, look, if the universe has a memory, let's say a particle has a memory, how much of a memory doesn't know about more than it's given space, like more than its neighbor, because then do you violate locality? Right. Like these are different questions that will have to be answered. Yeah. And I wish I could tell you that maybe this is a solution."
    },
    {
      "end_time": 6013.336,
      "index": 236,
      "start_time": 5999.701,
      "text": " to"
    },
    {
      "end_time": 6041.22,
      "index": 237,
      "start_time": 6013.677,
      "text": " that there is some memory of their shared origin that somehow, I still don't know how that gives you a spooky action at a distance. It's not a good account, but it might have some relevance. If you think about things very differently, if you think about the universe has memory, well, what does that change? If you just speculate on that and try to reframe things that way, could it potentially help solve some of these issues? I don't know."
    },
    {
      "end_time": 6070.913,
      "index": 238,
      "start_time": 6042.381,
      "text": " So let's go back to language. A child is babbling. Yeah. OK, so let's call it vocal motor babbling. It doesn't actually know what it's doing. When does it decouple and become a token, like a bona fide token with meaning? That's a great question. I would say that it becomes a token when the infant learns that a specific, more, you know, phonological unit"
    },
    {
      "end_time": 6092.568,
      "index": 239,
      "start_time": 6071.596,
      "text": " has relations to some other phonological unit. Language ultimately is completely determined by relations. It might be a very limited initial map of the token relations, but as soon as it's relational,"
    },
    {
      "end_time": 6121.886,
      "index": 240,
      "start_time": 6093.558,
      "text": " Then we would say that that becomes discretized such that it's that it's meaningful to say that these symbols have the relation to one another. If it's just sounds, ba ba ba ba ba ba ba, right, ba ba ba ba ba ba can't has no specific relation to any to ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba"
    },
    {
      "end_time": 6152.637,
      "index": 241,
      "start_time": 6122.927,
      "text": " So help me phrase this question properly, because"
    },
    {
      "end_time": 6180.282,
      "index": 242,
      "start_time": 6152.858,
      "text": " I haven't formulated it before, so it's going to come out ill-formed. Earlier you talked about analog, and I believe you were referring to it as like the animal brain is analog, but then the language is digital, if that's the correct analogy? Yeah. Symbolic maybe is, I don't know, digital is how we, you know, actually in computers sort of instantiate, you know, with ones and zeros or whatever, a sort of symbolic representation. But yes, symbolic."
    },
    {
      "end_time": 6198.746,
      "index": 243,
      "start_time": 6180.981,
      "text": " Think Verizon, the best 5G network is expensive? Think again. Bring in your AT&T or T-Mobile bill to a Verizon store today and we'll give you a better deal. Now what to do with your unwanted bills? Ever seen an origami version of the Miami Bull?"
    },
    {
      "end_time": 6220.128,
      "index": 244,
      "start_time": 6199.189,
      "text": " The written word is something like"
    },
    {
      "end_time": 6250.862,
      "index": 245,
      "start_time": 6221.834,
      "text": " Is there anything then about language that changes because it wasn't written down? Sorry, is there anything about your model that changes because it wasn't there to be tokenized"
    },
    {
      "end_time": 6279.394,
      "index": 246,
      "start_time": 6251.288,
      "text": " It's such a great question. I've been thinking about exactly that. I don't think anything changes. What's crazy about it is that until the written word, people might not have even thought about the concept of words at all. And so we were even more oblivious as a species to the idea that there were these individual discretized symbols that have relations amongst each other. Because until you see them outside of yourself,"
    },
    {
      "end_time": 6306.357,
      "index": 247,
      "start_time": 6279.838,
      "text": " They just run. Yes. They're just running in the machinery of the language, how it's meant to run. It was just an auditory kind of medium. You don't really necessarily even think about them as being distinct from one another. You just have a flow, right? You just make these sounds and stuff happens. Once we started writing things down and especially phonetically,"
    },
    {
      "end_time": 6336.357,
      "index": 248,
      "start_time": 6307.79,
      "text": " You know, because you think about like hieroglyphic and pictorial kinds of representations really don't actually capture words, right? They're very often, they're not distinct. They can actually be a little more rich than a single word. And so it was only with writing that maybe people really started to become aware that we have these things called words. And now it's only with language models that we really understand what words are."
    },
    {
      "end_time": 6366.305,
      "index": 249,
      "start_time": 6337.022,
      "text": " which has these relational abstractions, I don't know, symbols is just another word. I don't know if that even captures it fully. But what's wild about it is that the brain was doing exactly this and the brain was tokenized these sounds and was using the mapping between them in order to produce language. Probably long, long, long before anybody ever sort of self-consciously had a conception that there's such a thing as a word."
    },
    {
      "end_time": 6395.009,
      "index": 250,
      "start_time": 6367.637,
      "text": " And so that just blows my mind. It speaks to what I think is a very deep mystery, a very deep mystery. Where the hell did language come from? Here's what didn't happen. There was not a symposium of, quote unquote, cavemen or let's use the more modern term, hunter-gatherers. Okay."
    },
    {
      "end_time": 6415.009,
      "index": 251,
      "start_time": 6395.503,
      "text": " They had to figure out how do we make an auto generative auto regressive sequential system that is able to carry meaning about the world. This thing is just ridiculously good and it's operating over these arbitrary symbols and again when I say arbitrary symbols."
    },
    {
      "end_time": 6443.507,
      "index": 252,
      "start_time": 6415.555,
      "text": " Just to recap, it's not that it's arbitrary like the word for snow is this weird sound snow and it's kind of like what? No, arbitrary in the sense that these the map is the territory, right? It's like it's the relations that matter between these symbols. Is it completely arbitrary, though? So, for instance, there's the kiki and boo boo. You've heard of those. I think those are cute. I mean, that's that's the exception that that proves the rule to a large extent. I don't think"
    },
    {
      "end_time": 6447.602,
      "index": 253,
      "start_time": 6443.951,
      "text": " i think it is largely arbitrary there is also just"
    },
    {
      "end_time": 6476.954,
      "index": 254,
      "start_time": 6448.353,
      "text": " words themselves have an action component so when you scream a word you can physically shake the world around you and it shakes your lungs and if you speak for too long you can die let's say if you just exhale and you don't inhale it is a physical activity and it's hard to wrap your mind around like that's not symbols right that's not exactly captured by the symbols or by the just the sequence of words and again I'm only I'm really I'm just following where the data leads me because in"
    },
    {
      "end_time": 6506.527,
      "index": 255,
      "start_time": 6477.142,
      "text": " The large language models, they no longer have any of those properties, right? It's just an arbitrary vector. The tokenization in the end, ultimately, yes, there's proximity, but it's just strings of ones and zeros. Well, it's not ones and zeros, but whatever. Your vector is just a string of numbers that end up having certain mathematical relations to one another, but completely and totally lost, as far as I can tell, is the physical characteristics."
    },
    {
      "end_time": 6535.93,
      "index": 256,
      "start_time": 6506.903,
      "text": " of these words. By the way, I should mention, there's a former student and I are actually working on this idea, this crazy idea of using that latent mapping that I mentioned in that earlier paper to see if maybe that's not true. I wonder if you could guess what English sounds like just from the text-based representation, or if you've never seen, you don't know what sound D makes or D makes or what sound a T makes."
    },
    {
      "end_time": 6566.169,
      "index": 257,
      "start_time": 6536.305,
      "text": " but you've got the map, you've got the embedding in text space, and then you've got some other phonological embedding, could you possibly guess? That's a long shot. So maybe it's not totally arbitrary, and maybe it's going to be, maybe the radical thesis here is it's not arbitrary at all, that the words have to sound the way they do, that the mechanics actually happen, like something happens mechanically based on the sounds themselves. But my bet is that it's going to be closer to arbitrary."
    },
    {
      "end_time": 6591.305,
      "index": 258,
      "start_time": 6566.664,
      "text": " Uh, it's going to be close to arbitrary, but I could be wrong. Uh, but you were going to say, why not? Why wouldn't the platonic space prove that it's arbitrary? Well, if in fact you can't do the mapping at all, if you can't guess it, if the platonic space says, you know, there's no way to get from text representation to phonology, phonology is doing its own thing and it's, and it's just like the word mouse is just for no good reason."
    },
    {
      "end_time": 6619.309,
      "index": 259,
      "start_time": 6591.578,
      "text": " Um, then then it's hopeless. Okay. Um, but if you can get anywhere and you can actually guess at all, then that would suggest that there really is a kind of autoregressive inherent, uh, there's an inherent autoregressive capability just in phonology. Um, and so what that would mean is it's not at the symbol level there. It's, uh, well, yes, it's no, it's at the phonological symbol level."
    },
    {
      "end_time": 6648.046,
      "index": 260,
      "start_time": 6619.787,
      "text": " But maybe that's happening even in a mechanical level, like there's certain sounds that are easier to say together or something like that, which could guide it. I don't know. It's convoluted in my head right now exactly how this might map out. But I think it's reasonable now to assume that unless proven otherwise, it's probably arbitrary and it's probably arbitrary symbols and what matters is the relation between them. There is no sense in which mouse means mouse, except that mouse ends up showing up"
    },
    {
      "end_time": 6676.613,
      "index": 261,
      "start_time": 6648.37,
      "text": " After trap or before trap and after the, you know, the cat was chasing the and all of that. And there's nothing else. Let me see if I got your belt on showing down, but in terms of the syllogism. So premise one would be that LLMs master language using only ungrounded autoregressive next token prediction. Then you have another premise that says, well, LLMs have this superhuman language performance just by doing this."
    },
    {
      "end_time": 6706.613,
      "index": 262,
      "start_time": 6677.398,
      "text": " And then you'd say that, well, computational efficiency suggests that this reflects language's inherent structure. And then the deduction is that therefore human language uses autoregressive next token prediction. Is that correct? You got it. You got it. I mean, and it's not only computational efficiency per se. It's that if that there's two ways to put it. One is if that structure is there, it would be very odd if we weren't using it."
    },
    {
      "end_time": 6735.538,
      "index": 263,
      "start_time": 6707.142,
      "text": " Very odd indeed. If that structure is there such that it's capable of full competency, you'd have to suggest that it's there just by the way, but humans are doing something completely different. Okay. You then go and say that language generation feels real time to us. So it's sequential in real time and autoregressiveness or autoregression explains the pregnant... Very good. ...present. You've gotten very good at this, I see."
    },
    {
      "end_time": 6763.899,
      "index": 264,
      "start_time": 6736.084,
      "text": " Now, although we didn't get to this or explore it in detail, my understanding from our previous conversations is that you would say that brains"
    },
    {
      "end_time": 6794.138,
      "index": 265,
      "start_time": 6764.292,
      "text": " have pre-existing autoregressive machinery for motor and perceptual sequences. And by the way, I don't know if it's brains or cognition has it, by the way. Well, remember, so the speculation is that the brain is going to have to have the machinery, the physiologic machinery to support autoregressive. So things like, you know, like the continuous activation, backward projections, ways of representing the past is sort of maybe built into the brain. So it's not, those aren't very that distinct."
    },
    {
      "end_time": 6824.206,
      "index": 266,
      "start_time": 6794.667,
      "text": " The main reason I think that is because if you believe as I do that language is autoregressive in humans, you can either propose"
    },
    {
      "end_time": 6853.643,
      "index": 267,
      "start_time": 6824.735,
      "text": " that spontaneously, however language got here, in order for us to create language, we had to invent a different kind of cognitive machinery that's able to do this autoregressive, hold the past, let it guide the future, do this trajectory mapping between the past and the future. All of that kind of machine, that computational machinery,"
    },
    {
      "end_time": 6883.251,
      "index": 268,
      "start_time": 6853.831,
      "text": " would have to have been built special purpose for language. Yes. To me, that seems extremely, uh, extremely unlikely. Costly. Yeah. Yes. So there's a term in evolutionary biology called exaptation. I'm not familiar with that. So exaptation means you have previous machinery used for purpose a exactly that something else comes about and uses that machinery and perhaps does so even better. So for instance, our tongues evolved for eating."
    },
    {
      "end_time": 6903.404,
      "index": 269,
      "start_time": 6884.189,
      "text": " But then language came about and started to use that machinery and now we use it primarily. Well, I don't know about primarily how to quantify that, but we use it more adeptly for language. I think most of our time, more time is spent talking and eating at this point. Yes, I know. But the reason why I said I don't know because we're constantly swallowing saliva at the same time. So I don't know how much."
    },
    {
      "end_time": 6931.698,
      "index": 270,
      "start_time": 6903.882,
      "text": " Predictive coding in a nutshell postulates that what the brain is doing, that what neurons are doing is actually anticipating the future state, the next state"
    },
    {
      "end_time": 6960.862,
      "index": 271,
      "start_time": 6932.176,
      "text": " that the environment is going to generate. And so they're basically predicting something about the external world that's going to end up getting represented in the brain. And then there's this constant process of prediction and then measuring the prediction versus the actual, what ends up being the observation. My beef with predictive coding is"
    },
    {
      "end_time": 6987.5,
      "index": 272,
      "start_time": 6961.186,
      "text": " that you might very well be able to explain the phenomena that it's meant to describe in a more efficient way. So predictive coding to me means that you actually have to have sort of a model of the external, that what you're doing is sort of simulating. And you're doing it in such a way that you actually are producing neural responses that don't really need to get produced very often because the environment is likely to produce them."
    },
    {
      "end_time": 7017.654,
      "index": 273,
      "start_time": 6988.626,
      "text": " To me, this seems like an inefficiency and a complexity. And I think there's a much simpler account in some ways, a more elegant account, namely that what our brain is constantly doing is generating, not predicting, but generating, but that the generation has latent within it a strong predictive element. Because of this smooth trajectory, this sort of this idea of the path, the pregnant present, that there is a continuous path from the past to the future."
    },
    {
      "end_time": 7046.305,
      "index": 274,
      "start_time": 7018.643,
      "text": " You are, in essence, predicting, to some extent, the same way that a large language model is kind of predicting the next token, but it's not really predicting. Here's where I strongly disagree, or I'm proposing a different model, is you're not predicting in such a way that you're supposed to map to something external to the system. It's simply generation internally defined that's supposed to have this kind of continuity to it."
    },
    {
      "end_time": 7075.247,
      "index": 275,
      "start_time": 7046.596,
      "text": " The external world certainly impinges on our system, and we are of course inherently anticipating that we're not going to have a brick wall in front of us as we're running down the street. When that brick wall shows up, you've got to do something about it. That wasn't implicit in your next token generation, so you're going to have to radically reorient and do something about that. I think that can account"
    },
    {
      "end_time": 7103.865,
      "index": 276,
      "start_time": 7075.657,
      "text": " for some of the phenomena that are supposed to support predictive coding. But the big difference here is that it's all about internal consistency with the anticipation that that internal consistency is going to also map very well to what's happening in the world. But it's built in. There isn't any explicit modeling of the external world. It's that the internal generative process is so good"
    },
    {
      "end_time": 7134.104,
      "index": 277,
      "start_time": 7104.445,
      "text": " So I'm confused then. If the symbols are truly ungrounded, then what's preventing it from becoming coherent but fictional? So that is to say, what tethers our language to the world? Yeah, and the answer would have to reach back again to that latent space. So let's say my language system, you know, wants to go off a deep end and says, actually, I'm sitting here underwater talking"
    },
    {
      "end_time": 7160.538,
      "index": 278,
      "start_time": 7134.377,
      "text": " The words we've said up till now are pretty consistent with that. I'm expecting some fish to float by in the next second. My perceptual system is going to have something to say about that. There has to be this tethering that you're calling it. Of course, there is grounding in the sense that there has to be some sort of shared agreement within what I think is maybe this latent space or something like that."
    },
    {
      "end_time": 7176.34,
      "index": 279,
      "start_time": 7160.845,
      "text": " There is communication between these distinct systems, but the language system can unplug from all that and it could talk about what would it mean to be sitting and talking to a robot underwater and it will have a meaningful coherent conversation about that."
    },
    {
      "end_time": 7205.538,
      "index": 280,
      "start_time": 7176.766,
      "text": " All internally consistent, and you can give the prompt. What if instead of Kurt, it was actually a robot Kurt? How would that change things? And I could go in and get philosophical about that. And the point is that the linguistic system has all of its own internal rules in any trajectory. Many different trajectories are possible, although strongly guided by the past. But there is also impinging information from our perceptual system"
    },
    {
      "end_time": 7226.8,
      "index": 281,
      "start_time": 7205.742,
      "text": " that also continues to guide it."
    },
    {
      "end_time": 7236.032,
      "index": 282,
      "start_time": 7227.193,
      "text": " are what happens when you get no longer as closely tethered by the recent past."
    },
    {
      "end_time": 7265.776,
      "index": 283,
      "start_time": 7236.408,
      "text": " So this kind of tethering, it happens in language, namely I have to be consistent with my more recent linguistic past, but we also do some tethering to the non-linguistic embedding. There is this crosstalk that happens."
    },
    {
      "end_time": 7291.647,
      "index": 284,
      "start_time": 7266.391,
      "text": " Our language system doesn't just go off the deep end. It retains some grounding, not the philosophical kind of grounding, not the symbol equals this percept, but the kind of grounding where this storyline in a certain sense, if you want to think about it that way, more semantically vague, this storyline linguistically is going to have to match my perceptual storyline."
    },
    {
      "end_time": 7320.52,
      "index": 285,
      "start_time": 7291.954,
      "text": " OK, so in the same way that one with these video generation models, you see Will Smith eating spaghetti like a three year old joke. Yes. And every three frames, if you just look at it sequentially, exactly every three frames makes sense. But then he's just morphing into something else and he ballooned now and it looks dreamlike. Exactly. And that's that's what's happening in video generation. And that's what everybody knows the trajectory now. How is it going to get better? Longer context. And that just means the autoregressive generation"
    },
    {
      "end_time": 7349.787,
      "index": 286,
      "start_time": 7321.049,
      "text": " Is more and more angered in the past and that past becomes a more meaningful smooth curve But it seems like there must be something more tethering us to reality than just long context Says you Know if there is and and what I would say is it's certainly in the case of language It looks like I said we inherit when we step into this world. We inherit This this the corpus of language is a certain kind of tethering"
    },
    {
      "end_time": 7378.848,
      "index": 287,
      "start_time": 7350.384,
      "text": " Words have the relations they do to each other and that carries meaning. The words don't just line up with each other in any old way. You can't just use language however you want. You end up having to adapt and adopt the language that you're given. I would say in the case of language, even more so than perceptual, what we do is we learn that tethering. It is a certain kind of reality. It's a linguistic reality, but it's not"
    },
    {
      "end_time": 7407.415,
      "index": 288,
      "start_time": 7379.206,
      "text": " arbitrary. It's been honed over God knows how many years for that mapping to be useful. And in order to be useful, it actually has to map somehow to perceptual reality too. That is definitely there. And so, no, it's very strongly tethered. It's not just poetic. We're not just doing a poetry slam when we're talking. We're not just spitting out words."
    },
    {
      "end_time": 7437.21,
      "index": 289,
      "start_time": 7407.722,
      "text": " that are loosely related to one another. No, the sequence matters. It's extremely granular. What's the word? It's funny that I can't come up with the word right now. Beautiful is not the right word, but it's precise. There's such incredible detail in how each word relates to one another. This is something we didn't create. You and I, Kurt, didn't create this."
    },
    {
      "end_time": 7461.135,
      "index": 290,
      "start_time": 7437.568,
      "text": " This is something that humanity created it has all of this rich, you know relational properties that that are this tethering that that carry somehow meaning about the universe Only as expressed as a communicative coordinative tool embedded within a larger Perception action system, but we should respect it"
    },
    {
      "end_time": 7490.913,
      "index": 291,
      "start_time": 7461.561,
      "text": " language is an extraordinary invention. I think we should have a completely new respect for just how rich and powerful it is. It's not some symbol, this symbol equals this mental representation or this object. No, it's this construct that contains within the relations the capacity to express anything in such a way that my mind can make your mind do stuff. How the heck does that work? Who knows? But it's"
    },
    {
      "end_time": 7520.503,
      "index": 292,
      "start_time": 7491.305,
      "text": " So is there something about your model that commits you to idealism or realism or structural realism or anti-realism or foundationalism or what have you? Like what is the philosophy that underpins your model and also what philosophy is entailed by your model if any? Yeah that is a great question and I would say it's"
    },
    {
      "end_time": 7550.35,
      "index": 293,
      "start_time": 7521.032,
      "text": " I've come to actually sometimes use the term linguistic anti realism. And it's the idea that language is not what it thinks it is. Uh, we, we, we engage in philosophical, our philosophical thoughts and even our, you know, sort of general, uh, thinking about, um, who we are, uh, what, what is our place in the universe? Much of that takes place in the realm of language."
    },
    {
      "end_time": 7579.275,
      "index": 294,
      "start_time": 7551.647,
      "text": " And the conclusion I've come to is that language as a sort of semi-autonomous, autogenerative computational system, modular computational system, doesn't really know what it's talking about in a deep way. And there is really a fundamentally different way of knowing the sensory perceptual system, the thing that gives rise to qualia, the thing that gives rise to consciousness."
    },
    {
      "end_time": 7609.599,
      "index": 295,
      "start_time": 7579.838,
      "text": " Here's a big one. The thing that gives rise to mattering, to meaning. What do we care about? We care about our feelings. We care about feeling good or not so good, pleasure, pain, love, all the things that actually matter. These are actually, these live in what I call the sort of the animal embedding. It's something that other species, non-linguistic species, they can feel, they can sense, they can perceive."
    },
    {
      "end_time": 7639.019,
      "index": 296,
      "start_time": 7610.162,
      "text": " They don't have language. We think, oh gosh, they don't understand anything. Well, what if it's the opposite? What if it's our linguistic system that doesn't understand anything? What if it's our linguistic system that's actually a construct, a societal construct, a coordinated construct? But as a system, it's a construct that doesn't actually have a clue about what"
    },
    {
      "end_time": 7666.698,
      "index": 297,
      "start_time": 7639.531,
      "text": " pain and pleasure are. It has tokens for them and the tokens run in within the system to say things like, I don't like pain, I like pleasure. Those are valid constructs and they kind of do the thing they're supposed to do in language. But a purely linguistic system, and I think language is purely linguistic, I guess is one way to think about it, doesn't really have contained within it these other kinds of meanings."
    },
    {
      "end_time": 7696.971,
      "index": 298,
      "start_time": 7668.507,
      "text": " Now, first of all, this has implications for artificial intelligence, thinking about whether AI can have sentience, should we care about if your LLM starts saying, this is terrible, don't shut me off, I'm having an existential crisis. Perhaps, I would argue that we shouldn't worry about it. So my LLM says all the time. I don't know which LLM you're hanging about. My current LLM. The current LLM."
    },
    {
      "end_time": 7722.927,
      "index": 299,
      "start_time": 7697.432,
      "text": " Yes, the Kurt LLM. But the Kurt LLM, as an LLM, perhaps doesn't really have that meaning contained within it in a deep sense. It's, again, because of the mapping, it is communicating something probably about the non-LLM Kurt. When you say ouch, there is pain there. I'm not denying that."
    },
    {
      "end_time": 7746.459,
      "index": 300,
      "start_time": 7723.541,
      "text": " What I'm saying is that as a sort of thinking rational system that does the things that language does, that system itself may not have within it the true meaning of the words that it's using in a deep sense. I don't want to take you off course and hopefully this will help you stay on course and hopefully it aids the course. An LLM can process the word torment, say."
    },
    {
      "end_time": 7776.681,
      "index": 301,
      "start_time": 7746.869,
      "text": " But what's the difference between our human brain's autoregressive process that creates the feeling of torment itself and the word torment? So my speculation here, and it is purely speculative, is that it's non-symbolic. There's something happening when the universe gets represented in our brain. It's still in our brain. It's still a certain mapping. But when it gets represented, so that physical characteristics"
    },
    {
      "end_time": 7802.278,
      "index": 302,
      "start_time": 7777.227,
      "text": " of the world are actually represented in a more direct mapping. So think about color. We talked earlier about the sort of color space. There's a real true sense in which red and orange are more similar in physical color space, like there's actually some physical fact about it, and also in brain space. That's my guess."
    },
    {
      "end_time": 7830.418,
      "index": 303,
      "start_time": 7802.551,
      "text": " Is non-symbolic a synonym for ineffable?"
    },
    {
      "end_time": 7862.568,
      "index": 304,
      "start_time": 7833.012,
      "text": " I wouldn't have thought of it that way, but that may be a very good way to say it or to not say it. Yes, ineffable. Well, by virtue of being symbolic, by virtue of being a purely relational kind of representation, which is what the language, maybe even more than saying it's symbolic, it's that it's represent, it's relational. The language is a relational kind of, the location in the space matters only because it's a relation to other tokens in the space."
    },
    {
      "end_time": 7892.261,
      "index": 305,
      "start_time": 7863.012,
      "text": " That's not true in color perception. In color perception, where you are in the sort of the probably in the embedding space is going to have physical meaning. It's going to be related to the physical world in a much more direct way. And so the space, even though it's an internal space, right, the perception of color is still just comes down to neurons firing. We're not actually getting the light. The light's not getting into our brains, but the mapping is"
    },
    {
      "end_time": 7922.602,
      "index": 306,
      "start_time": 7892.602,
      "text": " such that it"
    },
    {
      "end_time": 7951.852,
      "index": 307,
      "start_time": 7923.268,
      "text": " I don't think language has that. I think because it's purely relational, it's not a rippling of anything. It's its own system of relational embeddings that aren't continuous in any way with the physical universe. Do you think that has something to do with God?"
    },
    {
      "end_time": 7983.251,
      "index": 308,
      "start_time": 7953.507,
      "text": " That if we think of the grand unity of creation, there's some sense in which language breaks that unity. And I think that we can lie in language in a way that we can't in any other substrate. And so I think by becoming purely linguistic beings, as the vast majority of our time as humans is spent in the linguistic"
    },
    {
      "end_time": 8011.271,
      "index": 309,
      "start_time": 7983.524,
      "text": " space. That's where we're hanging out there. Our minds are hanging out there. I think we have perhaps forgotten something that animals know about the universe. And it's this kind of unity because the animal processing is an extension. It's a continuation of the world. And since the world, the universe is one thing in some sense. It's"
    },
    {
      "end_time": 8039.753,
      "index": 310,
      "start_time": 8011.783,
      "text": " It is a key, everything we don't even have to get into non-locality, right? The origins of the, you know, let's just talk about like the big bang or something like that. What's happening here now in some ways is connected quite literally to what happened elsewhere for the way back in time. So I think this sort of unity that, you know, that mystics talk about is much closer to sort of the animal brains."
    },
    {
      "end_time": 8070.299,
      "index": 311,
      "start_time": 8040.418,
      "text": " then the linguistic brain, because the linguistic brain actually creates its dichotomy. It breaks the continuity. Symbols, I sometimes use them, it's like a new physics. The relations are what matters and it's no longer continuous, it's no longer an extension of the physical universe. It interacts with the physical universe in a way that we, as we see, we can sort of do this mapping so that when I talk,"
    },
    {
      "end_time": 8100.794,
      "index": 312,
      "start_time": 8070.828,
      "text": " It can have influence on the physical universe. It can have influence on my perception. It can have influence on my behavior. I think that sort of the rationalist movement, the positivist movement, sort of modernity itself is a complete hijacking of our brain by the linguistic system. And I do think that has something to do with the denouement, the kind of the God is dead kind of modernity equals somehow"
    },
    {
      "end_time": 8131.391,
      "index": 313,
      "start_time": 8101.459,
      "text": " The decline and so, you know a rationalist would say well that's that's appropriate because we've figured out how the universe works and we don't need any of this hocus pocus But what about the feeling of unity and that what about what about the sense of sort of a cosmic hole? Are we so sure that We're right and those ancients were wrong and yes, I think I do think that that this has"
    },
    {
      "end_time": 8159.377,
      "index": 314,
      "start_time": 8131.749,
      "text": " As very significant consequences for thinking about some of these intangibles, these ineffables. So a snake that mimics the poison of another snake in terms of its color, that's a form of a lie. Now, would you say that that is somehow symbolic as well, though? No. And yes, there is a mimicry and there is"
    },
    {
      "end_time": 8181.493,
      "index": 315,
      "start_time": 8159.838,
      "text": " You know certain certain sense of which animals can can engage they're not they don't even know they're engaging in subterfuge But that's much more continuous with okay You've just pushed the the cognitive agent into a slightly different space which is consistent with some other physical reality That's very very different than saying We are made of atoms"
    },
    {
      "end_time": 8211.391,
      "index": 316,
      "start_time": 8181.732,
      "text": " and particles and everything that happens is determined by the forces amongst these atoms, none of which is something that we have any material animal grasp of, any true physical grasp of. These are words. These models are really words and they run in words and they run very well to make predictions and to manipulate the physical universe. But they're stories and they're linguistic stories."
    },
    {
      "end_time": 8240.316,
      "index": 317,
      "start_time": 8211.578,
      "text": " Those kinds of stories can be, according to my own theory, language doesn't really have physical, doesn't point to physical meaning. And so even saying that it's a lie or untrue isn't quite right. But within its own space, you can go off in many different directions. And maybe the danger is not in thinking of things as true,"
    },
    {
      "end_time": 8265.913,
      "index": 318,
      "start_time": 8240.657,
      "text": " Thinking thoughts that aren't really true, it's falling too deeply in love with the idea that idea space and language space is the real space. Yes, interesting. So see in our circles, so when we're hanging out off air, when we're hanging out with other professors and on the university grounds and so on, we praise this"
    },
    {
      "end_time": 8295.06,
      "index": 319,
      "start_time": 8266.271,
      "text": " exchange of words and making models precise and doing calculations and so on and i've always intimated that this is entirely incorrect and i haven't heard an anti-philosopher like a philosopher that was an anti-philosopher except one who was an ancient indian philosopher i think his name is Jayarasi Bhatta i'm likely butchering that pronunciation but i'll place it on screen anyhow who was arguing against the the buddhists and the other"
    },
    {
      "end_time": 8319.855,
      "index": 320,
      "start_time": 8295.469,
      "text": " Contemporary philosophers by saying look you think know thyself is what you should be doing or what you didn't say like like this But you think of it as the the highest goal However, who is living more truly than a rooster? Like none of you are living more truly than just something that's just being yes Exactly. That is that that is the exact same intuition And and yes, it's this idea"
    },
    {
      "end_time": 8348.831,
      "index": 321,
      "start_time": 8320.128,
      "text": " I articulated to myself a long time ago that the fly knows something that our linguistics system can never know. That it knows something. It really does. That simply existing and being is a form of knowledge and it's a deeper one. It's a deeper one than whatever it is that our fancy rationalist kind of perspective has given us. Our rationalist perspective is very, very powerful in coordinating"
    },
    {
      "end_time": 8376.067,
      "index": 322,
      "start_time": 8349.326,
      "text": " and predicting. But in terms of like true ontology, I suspect it's actually the wrong direction. It's created a false god of linguistic knowledge, of shared objective knowledge, when the subjective is the one that we really have. It's the Cartesian"
    },
    {
      "end_time": 8404.445,
      "index": 323,
      "start_time": 8376.937,
      "text": " So I was watching everything everywhere all at once. I never saw it. Because I also had another intimation. I'll spoil some of it and if you are listening and you don't want to spoil it then just skip ahead."
    },
    {
      "end_time": 8435.418,
      "index": 324,
      "start_time": 8405.606,
      "text": " I was telling someone that I think if there's a point of life, it's one of two. And so this is just me speaking poetically and not rigorously. One is to find a love that is so powerful, it outlasts death. Okay, so that's number one. And then number two is to get to the point in your life where you realize that all your inadequacies and all your insecurities and all your your missteps and your the your jealousies and your"
    },
    {
      "end_time": 8462.807,
      "index": 325,
      "start_time": 8436.374,
      "text": " and your malice and so on that it, but rather than it being a weakness, it's what led you to this place here. And here is the optimal moment. It's to get that insight. So I don't know how to rationally justify any of that or explain it. But anyhow, when I said this one time on a podcast, someone else said, Hey, that latter one that you expressed was covered in everything everywhere all at once. So I watched it."
    },
    {
      "end_time": 8474.957,
      "index": 326,
      "start_time": 8463.217,
      "text": " What was great about that movie, and here's where I spoil it, is that it makes me want to tear. The movie is silly and comedic in a way that didn't resonate with me, but there's this one lesson that did."
    },
    {
      "end_time": 8506.852,
      "index": 327,
      "start_time": 8476.852,
      "text": " The the woman she's a fighter the main protagonist She's a fighter and she's strong-headed and she has this husband who is weak and she's always able to put down and so then you think okay Well, this is a modern trope where there's always the stronger woman and every guy is like just just a fool And the woman is always more intelligent and so on. Okay, so you just think of it as as okay Well, it's just it's just a modern trope Toward the end and the guys the guy is kind and loving to people toward the end there was something with"
    },
    {
      "end_time": 8530.998,
      "index": 328,
      "start_time": 8507.551,
      "text": " She was getting audited by the IRS and she was supposed to something was supposed to happen that night where she had to bring receipts and she couldn't. Now, the husband was talking with the IRS lady and our protagonist, the woman was saying in Vietnamese like or in Mandarin, whichever language it was, was saying, oh, he's an idiot. I hope he doesn't make it worse."
    },
    {
      "end_time": 8561.288,
      "index": 329,
      "start_time": 8531.647,
      "text": " The husband, the IRS lady then comes to the woman and says, you have another week, you have an extension. She's like, how did this happen? She talks to the husband. And remember, this is a movie almost about a multiverse. So you're getting different versions of this. And there's this one version where the husband's speaking to her and telling her, you know, Evelyn, the main character, you know, Evelyn, you see what you do as fighting. You see yourself as strong and you see me as weak and you see the world as a cruel place. But I've lived"
    },
    {
      "end_time": 8589.872,
      "index": 330,
      "start_time": 8561.561,
      "text": " On this earth just as long as you and I know it's cruel. My method of being kind and loving and turning the cheek. That's my way of fighting. I fight just like you. And then you see that what he did in another universe was he just spoke kindly to the IRS agent and talked about something personal and that softened her. And then you see all the other universes where she was"
    },
    {
      "end_time": 8617.824,
      "index": 331,
      "start_time": 8590.555,
      "text": " She was trying to go on this grand adventure and do some fighting. And the husband then says, Evelyn, even though you've broken my heart once again in another universe, I would have really loved to just do the laundry and taxes with you. And it makes you realize you're, you're aiming for something grand and you're aiming to go out on,"
    },
    {
      "end_time": 8641.34,
      "index": 332,
      "start_time": 8619.292,
      "text": " And conquer demons and so on, but there's, there's something that's so much more intimate about these everyday scenarios. There's something so rich. And the journey, there's, there's also a quote about this, that the journey I think is TS Eliot's is to find, sorry, the, at the end of all our exploring will be to arrive where we started and know the place for the first time."
    },
    {
      "end_time": 8670.759,
      "index": 333,
      "start_time": 8644.002,
      "text": " Anyhow, I know all of this abstract talk. No, no, no, no, this is this. It's this it is. It's exactly what we're talking about, because if you see yourself as a ripple in the universe, right? Then you are part of something cosmic and grand. And it's it's sort of that extensiveness, it's that extensiveness that's it's that's being it's being here now, it's it's it's that."
    },
    {
      "end_time": 8699.94,
      "index": 334,
      "start_time": 8671.032,
      "text": " We aren't just atoms. We're part of a larger thing. You can call it God. You can call it the universe or whatever. But it's there. It's actually something I think we, I don't think we really, I think animals don't think of themselves as as discrete. I don't think they do. I think that they don't think of an outside and inside. They don't think of them objective and subjective. It's just"
    },
    {
      "end_time": 8723.37,
      "index": 335,
      "start_time": 8700.418,
      "text": " this unfolding. They have theory of mind or that, but these are linguistic concepts. And I think I do. And I sound like an anti-linguist and I recognize the power of it. I said before, you know, how extraordinary it is, how rich it is and I have tremendous respect for it. But at the same time, I do think"
    },
    {
      "end_time": 8751.015,
      "index": 336,
      "start_time": 8723.985,
      "text": " that all this talk about objective things, particles, and we are physical bodies and we are just this and we are just that, that is bullshit. Like, no, we are the universe resonating. We are part of the whole in a way that I think thinking objectively as language requires you to do, actually it breaks it. So I think there's such a beauty in the silence."
    },
    {
      "end_time": 8779.377,
      "index": 337,
      "start_time": 8751.476,
      "text": " It's something everybody knows, the ineffable. Why is it called the ineffable? The ineffable isn't just that you can't say it. It's magnificent. The ineffable is extraordinary. Why? Because it's this true extension. Something like that. Again, I'm trying to put it into words. Right therein lies the trap. But we're both feeling it."
    },
    {
      "end_time": 8808.029,
      "index": 338,
      "start_time": 8783.558,
      "text": " Well, I'm feeling extremely grateful to have met you, to have spent so long with you. And there are many conversations you've you and I have had that we need to finish that are off air as well. So hopefully we can do that. And thank you for spending so long with me here. This was wonderful, Kurt. Thank you so much. I just want to hang out and talk about this stuff. So really appreciate it."
    },
    {
      "end_time": 8826.237,
      "index": 339,
      "start_time": 8810.282,
      "text": " I've received several messages, emails and comments from professors saying that they recommend theories of everything to their students and that's fantastic. If you're a professor or lecturer and there's a particular standout episode that your students can benefit from, please do share and as always feel free to contact me."
    },
    {
      "end_time": 8853.797,
      "index": 340,
      "start_time": 8826.664,
      "text": " New update! Started a sub stack. Writings on there are currently about language and ill-defined concepts as well as some other mathematical details. Much more being written there. This is content that isn't anywhere else. It's not on theories of everything. It's not on Patreon. Also, full transcripts will be placed there at some point in the future. Several people ask me, hey Kurt, you've spoken to so many people in the fields of theoretical physics, philosophy and consciousness. What are your thoughts?"
    },
    {
      "end_time": 8865.947,
      "index": 341,
      "start_time": 8854.104,
      "text": " While I remain impartial in interviews, this substack is a way to peer into my present deliberations on these topics. Also, thank you to our partner, The Economist."
    },
    {
      "end_time": 8890.589,
      "index": 342,
      "start_time": 8868.2,
      "text": " Firstly, thank you for watching, thank you for listening. If you haven't subscribed or clicked that like button, now is the time to do so. Why? Because each subscribe, each like helps YouTube push this content to more people like yourself, plus it helps out Kurt directly, aka me. I also found out last year that external links count plenty toward the algorithm,"
    },
    {
      "end_time": 8916.613,
      "index": 343,
      "start_time": 8890.589,
      "text": " Which means that whenever you share on Twitter, say on Facebook or even on Reddit, et cetera, it shows YouTube. Hey, people are talking about this content outside of YouTube, which in turn greatly aids the distribution on YouTube. Thirdly, you should know this podcast is on iTunes. It's on Spotify. It's on all of the audio platforms. All you have to do is type in theories of everything and you'll find it. Personally, I gained from rewatching lectures and podcasts."
    },
    {
      "end_time": 8924.497,
      "index": 344,
      "start_time": 8916.613,
      "text": " I also read in the comments that hey, toll listeners also gain from replaying. So how about instead you re-listen on those platforms like iTunes?"
    },
    {
      "end_time": 8949.036,
      "index": 345,
      "start_time": 8926.732,
      "text": " As a"
    },
    {
      "end_time": 8966.613,
      "index": 346,
      "start_time": 8949.036,
      "text": " You also get early access to ad free episodes, whether it's audio or video. It's audio in the case of Patreon video in the case of YouTube. For instance, this episode that you're listening to right now was released a few days earlier. Every dollar helps far more than you think. Either way, your viewership is generosity enough. Thank you so much."
    }
  ]
}

No transcript available.