McCarthy and his Prediction Regarding ‘Scruffy’ AI
John McCarthy, one of the founders of (and the one who supposedly coined the term) artificial intelligence (AI), stated on several occasions that if we insist on building AI systems based on empirical methods (e.g., neural networks or evolutionary models), we might be successful in building “some kind of an AI,” but even the designers of such systems will not understand how such systems work. In hindsight, this was an amazing prediction, since the deep neural networks (DNNs) that currently dominate AI are utterly unexplainable, and their unexplainability is paradigmatic: there are no concepts and human-understandable features in distributed connectionist architectures but microfeatures that are conceptually and cognitively hollow, not to mention that these microfeatures are semantically meaningless.
Generalization, Explanation, and Understanding
To consider the relationship between generalization, explanation, and understanding, let us consider an example. Suppose we have two intelligent agents, AG1 and AG2, and suppose we ask both to evaluate the expression “3 * (5 + 2).” Let us also suppose that AG1 and AG2 are two different “kinds” of robots. AG1 was designed by McCarthy and his colleagues, and thus it follows the rationalist approach in that its intelligence was built in a top-down, model- (or theory-) driven manner, while AG2’s intelligence was arrived at in a bottom-up data-driven (i.e., machine learning) approach.
No Reasoning if there is no Understanding
The intricate relationship between generalization, explanation, and understanding, all of which are beyond the subsymbolic architecture of DNNs, show up in problems that require reasoning with quantified symbolic variables such as deep understanding of natural language, as well as simple problem solving such as that involved in planning.
As impressive as LLMs are, at least in their generative capabilities, these systems will never be able to perform high-level reasoning, the kind that is needed in deep language understanding, problem solving, and planning. The reason for the qualitative “never” in the preceding sentence is that the (theoretical and technical) limitations we alluded to are not a function of scale or the specifics of the model or the training corpus.
Concluding remark
The above criticism of DNNs and LLMs is not intended to imply that empirical methods that perform data-driven learning do not have their place, because they do. To insist on reducing the mind and human cognition to stochastic gradient descent and backpropagation is not only unscientific, but is also folly and potentially harmless.
References
- John McCarthy, (2007), From here to human-level AI, Artificial Intelligence, 171, pp. 1174–1182
- Jesse Lopes, (2023), Can Deep CNNs Avoid Infinite Regress/Circularity in Content Constitution? Minds and Machines 33, pp. 507–524
- Walid Saba, (2024), Stochastic Text Generators do not Understand Language, but they can Help us Get There (in preparation, available here)
- Michael Strevens, (2013), No Understanding without Explanation, Studies in History and Philosophy of Science, pp. 510-515
- Joseph J. Williams and Tania Lombrozo, (2009), The Role of Explanation in Discovery and Generalization: Evidence from Category Learning, Cognitive Science 34, pp. 776–806
- Frank C. Keil, (2006), Explanation and Understanding, Annual Review of Psychology, Vol. 57:227-254