OpenAl and Bucerius Center Collaborate on GPT-4 Evaluation Paper

In a recent paper Daniel M. Katz and his team demonstrated that OpenAI’s latest deep learning model excels in complex legal reasoning.

Research & Faculty |

GPT-4, the new multimodal deep learning model from OpenAI, has passed the Uniform Bar Exam, demonstrating an enormous leap for machine learning and proving that an artificial intelligence program can perform complex legal tasks on par with or better than humans, according to a new paper co-authored by members of the Center for Legal Technology and Data Science (CLTDS) at Bucerius Law School. The paper was part of and quoted in the technical report for the release of GPT-4 by openAI.


AI technology

“GPT-4 represents a new frontier in AI’s role in the legal profession and society at large,” says first author Professor Daniel M. Katz, who collaborated with the legal AI company Casetext and fellow researcher Michael Bommarito, also a member of the legal data science research group at CLTDS. “The bar exam is an especially important test for AI to pass because it demonstrates the kind of skills that lawyers need in the real world, not just a classroom setting.

Indeed, AI technology has the potential to become a ‘force multiplier’ that expands access to legal services to all members of society, including those who couldn't previously afford to hire a lawyer.”

GPT-4 scored a 75 percent on the bar exam, higher than the 68 percent average and good enough to place in the top 15% of human test takers. In a previous paper that Katz co-wrote, GPT-3.5 scored a 50 percent and passed only two multiple choice portions of the bar exam, placing it in the 10th percentile.

Wide implications for the legal profession

In this test, GPT-4 not only took the multiple choice sections, but also the essays (worth 30 percent) and performance test (worth 20 percent). Although many have been skeptical about AI’s ability to pass sections that require generating language, GPT-4 did so by a significant margin, giving responses that were generally on par with the “representative good answers” provided by many state bars.

The latest GPT model also shows fewer “hallucinations,” in which an AI language model confidently asserts wrong answers that have no basis in reality. Passing the bar exam requires the command of not just ordinary English, but of complex “legalese,” which is difficult even for humans. GPT’s rapid advancement in this field is sure to have wide implications for the legal profession.

Research on scalability and legal domain specificity of large language models

Professor Katz and fellow researcher Dirk Hartung, Executive Director of the Center for Legal Technology and Data Science, recently revealed the findings in an exclusive talk for students and faculty on campus on March 15 and a Special Edition of Bucerius Legal Tech Essentials attended by over 600 participants on March 28.

“Lawyers need to figure out how to really use these tools. And those that do, it’ll be a very positive thing for them. We’re sitting on the dawn of a major increase in potential capacity,” says Katz. “These are tools that allow you to more effectively do your work, so you need to learn how to use them to maximum efficacy.”

One opportunity to do so will be provided this summer in the Bucerius Summer Program for Legal Technology and Operations whose Academic Directors are Katz and Hartung. In the meantime, one path forward will be a similar study for German legal exams and the bar exams and more research on scalability and legal domain specificity of large language models.



Dirk Hartung