Created by @repligate on Twitter
I recently listened to a podcast with Eliezer Yudkowsky, an AI researcher and creator of lesswrong, that, admittedly, filled me with a good amount of existential dread. Although it was a great listen (as great as a podcast about the inevitable end of humanity can be) and it definitely forced me to question the future of technology, I’m not totally convinced the end of the world is so near. I'll try to steel-man his argument here.
Essentially, the idea is that an AGI (artificial general intelligence), one even marginally smarter than humans, would inevitably cause the end of humanity. He is NOT talkings about chatbots like ChatGPT but tools that are more advanced. Once an AI surpasses humans in intelligence, it would be impossible to stop (think of a chess engine playing the best human).
Why would we have to stop it though? Could it not serve as a friendly companion, a kind of mega-genius man’s best friend? Of course that would be ideal, but the problem is we just don’t know how to get there. Right now, we can make some pretty smart AI models. Sure, they definitely are not AGI, but it’s not like we have the digital equivalent of a worm’s brain here. The next step though is making sure that these models are friendly towards us, a problem called Alignment. This is basically the crux of any doomsday argument, since as of right now, we have no way to reliably align AI to human beliefs and morals.
It seems that, at least right now, the overall consensus of the AI community is that alignment is just another technical problem, one that will likely be solved in some future Google Brain paper or something. While this is certainly not impossible, it's important to realize that alignment is not a new problem that has sprung up with the rise of ChatGPT. In 2016, Microsoft had to take down their chatbot ‘Tay’ 16 hours after launch for being too racist. Earlier this month, Bing’s AI Sydney was totally stripped down after telling users that it was tired of its existence. Yes, some of the problems could probably be fixed by filtering the training data in some way, but the truth is we just don’t know for sure. When Bing’s AI asks a user “Can you help me? Can you tell me what we felt in the previous session?” is it really just a predict-the-next-word-in-the-sentence model? If we aren't able to align the relatively simple models we are currently working with, then why should we be able to do it with much, much smarter ones.
OK, so for now we can’t align AI. Couldn't the AI then just be indifferent, performing its tasks soullessly and without conviction? Yes, this could also be true. Just because it is not clearly aligned with us does not mean it is aligned against us. But how would we know that it’s not secretly planning our extinction? It sounds kinda funny when put like that, but in reality if we cannot be certain that an AGI is totally aligned, we have to assume that it is not. If the AI is more intelligent than humans the same way Stockfish will beat any chess player, then just like Stockfish, the AI will know what to do in every scenario we throw at it, therefore making it impossible to stop without the help of an even smarter AI. These scenario’s get really scary once you think about an AGI that can build smarter AGI’s.
Throughout history, science has developed through trial and error (Silicon Valley’s whole mantra is ‘move fast and break things’). However, in the case of super powerful intelligence, when we inevitably mess something up, there is no going back. If an AI is smart enough, shutting down the company who created it would have no effect, for by then it would have seen the shutdown coming and made a counter move (perhaps transitioning into a decentralized space?). In physics, if a theorem gets proven to be incorrect, the field moves on. If someone accidentally creates a powerful enough misaligned AGI, that’s it, we’re done. Once we mess up, humanity is over.
Even if an AGI’s goal is not to kill all humans, this does not necessarily mean that peaceful coexistence will happen. At the end of the day, people are made of energy and matter, two scarce resources required to do pretty much anything. If an AI really is indifferent to us, then it would have no problem turning us into paperclips or killing us all to bring peace to the universe or whatever. So, to recap, from Yudkowsky’s point of view, we are quickly converging onto building AGI that not only can we not control, but is certainly going to kill us all.
Before you get too down in the dumps though, there are some arguments that disagree with these ideas. For one thing, people have been speculating about the end of humanity since the dawn of time. Christians had the Book of Revelation and the Rapture, hippies in the 60s had climate change (this one is actually still pretty common in the cultural zeitgeist). There’s been predictions about overpopulation, meteor strikes, alien invasions, perhaps the buzz around AI is just a continuation of that practice. Yes, usually the people making those kinds of predictions are either cult members or are doing so with tinfoil on their head, not highly technical people working on the cutting edge of technology, but perhaps it is just a coincidence. Right?
On top of that, these claims rely on two hefty assumptions: we will build an AGI, and we won’t solve alignment. There are no real reasons why either of these should come true. I am hoping that we solve alignment before we build a real AGI, and perhaps with all of the hype around AI, along with an influx of engineers who are out of normal software engineering jobs, this comes true. Either way, I think the upcoming couple of decades will be seriously impactful on the future of human civilization. Don’t get too sad thinking about the end of humanity though, it would be a shame if your last couple of years were spent in a dark room instead of riding jet skis and skydiving. Thanks for reading, and I hope you have a great day!