Gary N. Smith

Senior Fellow, Walter Bradley Center for Natural and Artificial Intelligence

Gary N. Smith is the Fletcher Jones Professor of Economics at Pomona College. His research on financial markets statistical reasoning, and artificial intelligence, often involves stock market anomalies, statistical fallacies, and the misuse of data have been widely cited. He is the author of dozens of research articles and 16 books, most recently, Distrust: Big Data, Data-Torturing, and the Assault on Science, Oxford University Press, 2023.

Archives

Computers May Know “How” but They Still Don’t Know “Why”

Computers will not equal, let alone surpass, human intelligence.
Several years ago, a visiting professor taught an introductory statistics class at Pomona College, where I have taught for more than 40 years. When I asked how the class was going, he said, “At my university students ask how; here they ask why.” Knowing how to do a statistical test is important but more important is knowing why a specific test might be appropriate. During a recent Saturday morning bike ride, I got to thinking about the way that this distinction relates to the limitations of current AI systems. Take something as simple as peddling a bicycle. A robotic bike rider equipped with a powerful AI system might be programmed to peddle or it might be trained to peddle by randomly fiddling with different parts of the bike but it would have no idea why peddling moves the bike

LLMs Are Still Faux Intelligence

Large language models are remarkable but it's a huge mistake to think they're "intelligence" in any meaningful sense of the word.
In the popular television game show Jeopardy!, three contestants are given general-knowledge clues in the form of answers and respond with questions that fit the answers. For example, the clue, “Fifth President of the United States,” would be answered correctly with “Who is James Monroe?” In 2005 a team of IBM engineers was tasked with designing a computer system they named Watson that could defeat the best human Jeopardy players. Watson used hundreds of algorithms to identify keywords or phrases in a question, matched these keywords to people, places, and things in its massive database, and then formulated possible answers. The more the algorithms agreed on an answer, the more certain Watson was that it was the correct answer. In addition to its huge database (including all

A Modest Proposal for the MLB

Major League Baseball got greedy and needs to reform.
The Major League Baseball (MLB) season is finally over. Whew! In the World Series, the Texas Rangers (which tied with Houston and Philadelphia for the 6th best regular-season record) defeated the Arizona Diamondbacks (which had the 13th best regular-season record). What exactly was the point? Many of the games were entertaining but no one would seriously argue that these were the two best MLB teams. There was a time when the World Series pitted the American League team with the best regular-season record against the National League team with the best regular-season record. Then, the MLB got greedy and crafted an elaborate playoff system designed to increase revenue. First came two rounds, then three rounds, now four rounds. Currently, 12 out of 30 teams qualify for the postseason

The MLB Coin-Flipping Contest

What are the chances that wild-card teams will make it to the World Series and win?
“Let me tell you sonny, how it used to be in the good old days.” Is there anything worse than what Bruce Springsteen called “boring stories of glory days”? He goes on, “I hope when I get old I don’t sit around thinking about it but I probably will.” Well, this year’s World Series brought back memories of the glory days when there were two Major League Baseball leagues — the American League and the National League — and the regular season winners of each league met in the World Series. Now, each league has three divisions (East, Central, and West) and a complicated sequence of playoff games that involves the three division winners and three wild card teams in each league and pushes the World Series into November, so that it is possible that the Boys of Summer

Blue Zone BS: The Longevity Cluster Myth

We need to be reminded how much real science has done for us and how real science is done.
I recently tried to watch the television series, “Live to 100: Secrets of the Blue Zones,” which Netflix promotes as “an insightful adventure through longevity hotspots around the world.” The enthusiastic host, Dan Buettner, seems to be genuinely driven to help us all live to 100. I alternated between laughing and groaning, and then gave up. Real science is currently under siege, pummeled by conspiracy nuts and undermined internally by a replication crisis created by sloppy science. We need to be reminded how much real science has done for us and how real science is done. The Blue Zone BS is not helpful. To the contrary, it promotes sloppy science. In 2004 a team led by Michel Poulain and Gianni Pes identified a mountainous area in central Sardinia, an Italian island in

Confusing Correlation with Causation

Computers are amazing. But they can't distinguish between correlation and causation.
Artificial intelligence (AI) algorithms are terrific at discovering statistical correlations but terrible at distinguishing between correlation and causation. A computer algorithm might find a correlation between how often a person has been in an automobile accident and the words they post on Facebook, being a good software engineer and visiting certain websites, and making loan payments on time and keeping one’s phone fully charged. However, computer algorithms do not know what any of these things are and consequently have no way of determining whether these are causal relationships (and therefore useful predictors) or fleeting coincidences (that are useless predictors). If the program is black box, then humans cannot intervene and declare that these are almost certainly irrelevant

The LK-99 BS Further Undermines the Credibility of Science

The rejection or distortion of genuine science can have tragic consequences
Social media is afire with reports that South Korean researchers have synthesized a room-temperature and room-pressure superconductor they call K-99. This is the biggest scientific news this year — yes, ChatGPT is now so last year. A representative Wow! from experts has been: “If LK-99 is the real deal, it could be a game-changer for everything from quantum computing and medical imaging to energy and transportation.”  Long pursued by physicists and engineers, room-temperature, room-pressure superconductivity would revolutionize electronics and engineering by allowing current to move through wires without any energy loss. Everything will be cheaper and more efficient. Trains will levitate!  Alas, the likelihood that this is BS research is very close to 100 percent.

Sabrina Ionescu’s Hot Hand

When basketball players hit a "streak," does that elevate the probability of success?
Most people believe that athletes sometimes get “hot” or “cold” with their performance elevated or depressed temporarily. For example, Purvis Short, who scored 59 points in an NBA game, said, “You’re in a world all your own. It’s hard to describe. But the basket seems to be so wide. No matter what you do, you know the ball is going to go in.” Similarly, during a timeout in a 2015 game, LeBron James told his teammates to pass the ball to Kevin Love, explaining after the game that, “He had the hot hand, I wanted to keep going to him.” On the other hand, statisticians tell us that streaks are likely even in random coin flips that have a rock-steady 50-50 chance of heads or tails. For example, in 10 coin flips there is a 46% chance of a streak of either four or more

Using Data Like a Drunk Uses a Lamppost

Startup companies can be tempted to use statistics for support instead of real illumination
Karl Pearson, the great English mathematician and statistician, wrote, “Statistics is the grammar of science.” At its finest, it is. Statistical data and methods are the backbone of the scientific method that underlies the astonishing scientific advances that humans have made. Unfortunately, statistics can also be used to provide misleading support for false claims. The examples are numerous and so are the laments: There are three kinds of lies: lies, damned lies, and statistics.—Benjamin Disraeli There are two kinds of statistics, the kind you look up and the kind you make up.—Rex Stout Definition of Statistics: The science of producing unreliable facts from reliable figures. —Evan Esar If you torture the data long enough, they will confess.—Ronald Coase We

The LLM Deep Fake—Follow the Money

Hype sells. Just look at what's happening with AI
Large language models (LLMs) are far more impressive now than they were just a few months or a year ago. Compare these exchanges I had with GPT-3 in June 2022 and with ChatGPT two days ago: Gary:          Can I improve my test scores if I study after taking the test? GPT-3:        That’s a good question. I’m not sure if you can improve your test scores by studying after taking the test, but it couldn’t hurt to try! ChatGPT:   Studying after taking a test is unlikely to directly improve your scores on that particular test since the test has already been completed. Once a test is finished, the opportunity to answer questions and provide responses has passed. However, studying after taking a test can still have benefits. It allows you to review

The Death of Peer Review?

Science is built on useful research and thoroughly vetted peer review
Two years ago, I wrote about how peer review has become an example of Goodhart’s law: “When a measure becomes a target, it ceases to be a good measure.” Once scientific accomplishments came to be gauged by the publication of peer-reviewed research papers, peer review ceased to be a good measure of scientific accomplishments. The situation has not improved. One consequence of the pressure to publish is the temptation researchers have to p-hack or HARK. P-hacking occurs when a researcher tortures the data in order to support a desired conclusion. For example, a researcher might look at subsets of the data, discard inconvenient data, or try different model specifications until the desired results are obtained and deemed statistically significant—and therefore publishable. HARKing

A World Without Work? Here We Go Again

Large language models still can't replace critical thinking
On March 22, nearly 2,000 people signed an open letter drafted by the Future of Life Institute (FLI) calling for a pause of at least 6 months in the development of large language models (LLMs): Contemporary AI systems are now becoming human-competitive at general tasks, and we must ask ourselves: Should we let machines flood our information channels with propaganda and untruth? Should we automate away all the jobs, including the fulfilling ones? Should we develop nonhuman minds that might eventually outnumber, outsmart, obsolete and replace us? Should we risk loss of control of our civilization? FLI is a nonprofit organization concerned with the existential risks posed by artificial intelligence. Its president is Max Tegmark, an MIT professor who is no

An Illusion of Emergence, Part 2

A figure can tell a story but, intentionally or unintentionally, the story that is told may be fiction
I recently wrote about how graphs that use logarithms on the horizontal axis can create a misleading impression of the relationship between two variables. The specific example I used was the claim made in a recent paper (with 16 coauthors from Google, Stanford, UNC Chapel Hill, and DeepMind) that scaling up the number of parameters in large language models (LLMs) like ChatGPT can cause “emergence,” which they define as qualitative changes in abilities that are not present in smaller-scale models but are present in large-scale models; thus they cannot be predicted by simply extrapolating the performance improvements on smaller-scale models. They present several graphs similar to this one that seem to show emergence: However, their graphs have the logarithms of the number

A Graph Can Tell a Story—Sometimes It’s an Illusion

Mistakes, chicanery, and "chartjunk" can undermine the usefulness of graphs
A picture is said to be worth a thousand words. A graph can be worth a thousand numbers. Graphs are, as Edward Tufte titled his wonderful book, the “visual display of quantitative information.” Graphs should assist our understanding of the data we are using. Graphs can help us identify tendencies, patterns, trends, and relationships. They should display data accurately and encourage viewers to think about the data rather than admire the artwork. Unfortunately, graphs are sometimes marred (intentionally or unintentionally) by a variety of misleading techniques or by what Tufte calls “chartjunk” that obscures rather than illuminates. I have described elsewhere many ways in which mistakes, chicanery, and chartjunk can undermine the usefulness of graphs. I recently saw a novel

Learning to Communicate

Why writing skills are so important, especially in today's artificial world
Educators have been shaken by fears that students will use ChatGTP and other large language models (LLMs) to answer questions and write essays. LLMs are indeed astonishing good at finding facts and generating coherent essays — although the alleged facts are sometimes false and the essays are sometimes tedious BS supported by fake references. I am more optimistic than most. I am hopeful that LLMs will be a catalyst for a widespread discussion of our educational goals. What might students learn in schools that will be useful long after they graduate? There are many worthy goals, but critical thinking and communication skills should be high on any list. I’ve written elsewhere about how critical thinking abilities are important for students and cannot be reliably faked by

Text Generators, Education, and Critical Thinking: an Update

The fundamental problem remains that, not knowing what words mean, AI has no critical thinking abilities
This past October, I wrote that educational testing was being shaken by the astonishing ability of GPT-3 and other large language models (LLMs) to answer test questions and write articulate essays. I argued that, while LLMs might mimic human conversation, they do not know what words mean. They consequently excel at rote memorization and BS conversation but struggle mightily with assignments that are intended to help students develop their critical thinking abilities, such as Develop and defend a reasonable position Judge well the quality of an argument Identify conclusions, reasons, and assumptions Judge well the credibility of sources Ask appropriate clarifying questions Lacking any understanding of semantics, LLMs can do none of this. To illustrate, I asked

Let’s Take the “I” Out of AI

Large language models, though impressive, are not the solution. They may well be the catalyst for calamity.
When OpenAI’s text generator, ChatGPT, was released to the public this past November, the initial reaction was widespread astonishment. Marc Andreessen described it as, “Pure, absolute, indescribable magic.” Bill Gates said that the creation of ChatGPT was as important as the creation of the internet. Jensen Huang, Nvidia’s CEO, Jensen Huang, said that, “ChatGPT is one of the greatest things ever created in the computing industry.” Conversations with ChatGPT are, indeed, very much like conversations with a super-intelligent human. For many, it seems that the 70-year search for a computer program that could rival or surpass human intelligence has finally paid off. Perhaps we are close to the long-anticipated singularity where computers improve rapidly and autonomously,

Does New A.I. Live Up to the Hype?

Experts are finding ChatGPT and other LLMs unimpressive, but investors aren't getting the memo
Original article was featured at Salon on February 21st, 2023. On November 30, 2022, OpenAI announced the public release of ChatGPT-3, a large language model (LLM) that can engage in astonishingly human-like conversations and answer an incredible variety of questions. Three weeks later, Google’s management — wary that they had been publicly eclipsed by a competitor in the artificial intelligence technology space — issued a “Code Red” to staff. Google’s core business is its search engine, which currently accounts for 84% of the global search market. Their search engine is so dominant that searching the internet is generically called “googling.” When a user poses a search request, Google’s search engine returns dozens of helpful

Goodhart’s Law and Scientific Innovation in Academia

Many university researchers are leaving academia so they can actually get things done
British economist Charles Goodhart was a financial advisor to the Bank of England from 1968 to 1985, a period during which many economists (“monetarists”) believed that central banks should ignore unemployment and interest rates. Instead, they believed that central banks should focus on maintaining a steady rate of growth of the money supply. The core idea was that central banks could ignore economic booms and busts because they are short-lived and self-correcting (Ha! Ha!) and should, instead, keep some measure of the money supply growing at a constant rate in order to keep the rate of inflation low and constant. The choice of which money supply to target was based on how closely it was statistically correlated with GDP. The British monetary authorities adopted this policy in

Large Language Models Can Entertain but Are They Useful?

Humans who value correct responses will need to fact-check everything LLMs generate
In 1987 economics Nobel Laureate Robert Solow said that the computer age was everywhere—except in productivity data. A similar thing could be said about AI today: It dominates tech news but does not seem to have boosted productivity a whit. In fact, productivity growth has been declining since Solow’s observation. Productivity increased by an average of 2.7% a year from 1948 to 1986, by less than 2% a year from 1987 to 2022. Labor productivity is the amount of goods and services we produce in a given amount of time—output per hour. More productive workers can build more cars, construct more houses, and educate more children. More productive workers can also enjoy more free time. If workers can do in four days what use to take five days, they can produce 25 percent more—or