Published: Fri 04 July 2025
By Stefan
In Learning .
As mentioned before I have a very positive view on the long-term impact of large language models on higher education.
A recent article in The New Yorker on the topic departs a bit from their usual stance of cultural pessimism and actually brought me to an interesting study on LLM tutoring.
In 2023, researchers at Harvard introduced a self-paced A.I. tutor in a popular physics course. Students who used the A.I. tutor reported higher levels of engagement and motivation and did better on a test than those who were learning from a professor. May, the Georgetown student, told me that she often has ChatGPT produce extra practice questions when she's studying for a test. Could A.I. be here not to destroy education but to revolutionize it?
In the introduction of a more recent study the authors address a common misunderstanding with regard to the role of large language models when offered as general purpose tool.
While these models can answer technical questions, their unguided use lets students complete assignments without engaging in critical thinking. After all, AI chatbots are generally designed to be helpful, not to promote learning.
In my own experience it can actually require a bit of persistence to make common LLM chatbots take the role of a tutor in a reliable way and I'm not even talking about following pedagogical best practices as mentioned by authors.
The bare minimum requirement for any meaningful tutoring is to not give away the answer. If I am stuck on some topic I want the tutor to give me very subtle hint that gently nudges my thinking in the right direction. I do not want a hint that essentially acts as huge arrow in form of a neon sign pointing to the solution.
The reliability of the LLM web interfaces to adhere to instructions appears to have greatly increased recently but I remember times not too long ago, when the only way to prevent the model from leaking the answer was a very explicit prompt appended to every message.
VERY IMPORTANT: If the answer is incorrect you will not return complete step-by-step instructions, you will only provide a brief very slight hint on how to proceed.
VERY IMPORTANT: You will under no circumstances give away the answer.
Of course not giving away the answer is by far not the only topic. Ideally there would be a custom user interface for tutoring that allows to control the level of verbosity, the concreteness of hints, the form of explanatory artifacts (text, diagram, interactive website) etc. and has a persistent memory to improve the learning experience.
This would obviously require accessing the models via API but I have yet to find a provider agnostic GUI or CLI for LLM APIs that really sparks joy in me. Claude Code has by far been the best experience, but the provider and the prompts are obviously hardwired.
The base system prompt used in the above mentioned physics tutoring system reflects my personal experience, although the researchers have taken a more liberal approach with regard to level of information that shall be provided.
Important: Only give away ONE STEP AT A TIME, DO NOT give away the full solution in a single message
Another interesting avenue to explore might be few-shot learning, essentially providing a number of examples and counter-examples of what constitutes good tutoring.
What's nice is that recognition of handwritten mathematics on paper works really well even for quickly jotted notes, at least with Claude. This inspired me to configure a mechanism to directly transfer the images of my camera to a Claude Code-based tutoring environment using the custom destinations feature of a really nice app . An overhead camera stand and a remote shutter button could further minimize the distraction when taking a snapshot.
What's currently lacking when using the web interface of Claude for mathematics tutoring is that the analysis tool only includes very rudimentary support for symbolic algebra and the artifacts don't support code execution. Claude Code on the other hand can generate code for SymPy or Lean and evaluate the results.
Another advantage of Claude Code is using arbitrarily large document collections as factual grounding and having configurable persistent memory. This allows to use extensive sources such as ProofWiki as JSON which would not fit in the context window and to implement spaced repetition based on a learning log.
As initially stated I see large opportunities for using large language models for higher education and I am optimistic that nicely wrapped up applications providing something along the lines of the above described environment will soon be available.