OpenAI has made headlines again with their latest breakthrough in deep learning – GPT-4. This new language model, built on the foundations of GPT-3.5, is a multimodal model that accepts both image and text inputs, and generates text outputs. In this article, we will explore GPT-4’s capabilities and limitations and compare them to GPT-3.5.
Takeaway
- Multimodal capabilities: GPT-4 can accept both image and text inputs for text output, with impressive image recognition and understanding capabilities.
- Creative writing assistance: GPT-4 can help with creative writing tasks such as composing songs or writing screenplays, and can even learn a user’s writing style, with an increased capacity to handle up to 25,000 words.
- Academic benchmark performance: GPT-4 has achieved human-level performance on academic benchmarks, outscoring even ChatGPT by a large margin, scoring in the 90th percentile for the uniform bar exam and 99th percentile for the biology olympiad.
- Safety and alignment: OpenAI spent six months making GPT-4 safer and more aligned, reducing the likelihood of producing disallowed content by 82%, and improving factual accuracy by 40% compared to GPT-3.5.
- Human feedback: GPT-4 incorporates more human feedback, including feedback from ChatGPT users, to further improve its performance and behavior.
GPT-4: A Multimodal Language Model
OpenAI has been working tirelessly on GPT-4 for the past six months, using lessons from their adversarial testing program and ChatGPT to constantly iterate and align it. The result is a model that exhibits human-level performance in various professional and academic benchmarks. For example, GPT-4 can pass a simulated bar exam with a score around the top 10% of test-takers, which is a significant improvement from GPT-3.5’s score of around the bottom 10%.
One of the most significant advancements of GPT-4 is its ability to accept both text and image inputs. This feature enables users to specify tasks that require visual as well as language comprehension, opening up a whole new world of possibilities. When presented with inputs consisting of a combination of text and images, GPT-4 performs just as well as it does on text-only inputs across a range of domains. It can process a variety of inputs, including documents with text and photographs, diagrams, or screenshots.
GPT-4 vs GPT-3.5: Which One Should You Use for Complex Tasks?
When it comes to natural language processing models like GPT-3.5 and GPT-4, the difference may not be noticeable in everyday conversations. However, the gap becomes apparent when it comes to complex tasks. To compare the two models, OpenAI ran a series of tests, including simulating exams that were initially designed for humans. The results of their tests indicate that GPT-4 is more dependable and creative than its predecessor, GPT-3.5. It can handle much more nuanced instructions, making it the better option for tasks that require a high degree of complexity.
The Power of Visual Inputs in GPT-4
The ability to process visual inputs is a game-changer for GPT-4. This feature allows GPT-4 to perform more complex tasks with minimal training. GPT-4 can be enhanced with techniques that were initially developed for text-only language models, such as few-shot and chain-of-thought prompting. These approaches enable the model to leverage its vast knowledge base and perform more complex tasks with minimal training.
Steer Your AI’s Behavior with ChatGPT’s Steerability Feature
The ChatGPT team has recently introduced a new feature called steerability that enables developers and users to prescribe their AI’s style and task by describing the directions in the “system” message. This allows for a significantly customized user experience within bounds. While the adherence to the bounds is not perfect, the ChatGPT team is continuously working to improve the feature.
Limitations of GPT-4: Reducing Hallucinations but Still a Work in Progress
Despite the impressive capabilities of GPT-4, it is not without limitations. Like its predecessors, it still struggles with reliability, often generating errors in reasoning and “hallucinating” facts. OpenAI alerts users to be cautious when using language model outputs, particularly in high-stakes contexts. It is crucial to follow an appropriate protocol, such as human review, grounding with additional context, or avoiding high-stakes uses altogether, to suit the specific use-case.
The model can sometimes generate biased outputs, although researchers are working to address this issue. OpenAI, in their blog post on how should AI systems behave , have emphasized their aim to build AI systems with reasonable default behaviors that reflect a wide range of users’ values, allowing customization within broad bounds and seeking public input on these bounds.
Risks & mitigations
GPT-4, as a powerful AI language model, poses potential risks of generating harmful content. These risks include inaccurate information, buggy code, and dangerous advice. To address these risks, the GPT-4 model incorporates several safety measures to prevent harmful outputs. OpenAI has engaged over 50 experts from various domains, who have adversarially tested the model and provided feedback and data, which has been used to improve the model’s safety properties.
One of the key features of GPT-4’s safety measures is the incorporation of an additional safety reward signal during Reinforcement Learning from Human Feedback (RLHF) training. This signal reduces harmful outputs by training the model to refuse requests for such content. OpenAI has also collected a diverse dataset from various sources and applies the safety reward signal to both allowed and disallowed categories to prevent the model from refusing valid requests.
Thanks to these mitigations, GPT-4’s safety properties have significantly improved compared to GPT-3.5. The model’s tendency to respond to requests for disallowed content has decreased by 82%, and it responds to sensitive requests, such as medical advice and self-harm, in accordance with the team’s policies 29% more often than GPT-3.5. However, it’s still possible to generate content that violates usage guidelines.
OpenAI recognizes the potential social and economic impacts of GPT-4 and other AI systems and is collaborating with external researchers to better understand and assess these impacts. For now, it’s important to complement these measures with deployment-time safety techniques like monitoring for abuse. Overall, GPT-4’s capabilities are significant, but it’s important to recognize and address the potential risks it poses.
Training process
The GPT-4 base model underwent training using a vast web-scale corpus of data, which included a diverse range of ideologies and ideas. To refine the model’s responses and ensure they align with user intent, OpenAI implemented a reinforcement learning with human feedback (RLHF) process.
It’s worth noting that RLHF doesn’t directly enhance exam performance. Instead, the prompt engineering process is necessary post-training to guide the model in answering questions in a manner that aligns with users’ intent.
What is OpenAI Evals?
- OpenAI has introduced OpenAI Evals, a framework for evaluating models like GPT-4. The software allows users to run benchmarks and evaluate the models’ performance sample by sample.
- OpenAI Evals played a crucial role in the development of GPT-4 by identifying shortcomings and preventing regressions. It also enables users to track the performance of various model versions.
- The software is open-source, and users can create new classes to implement custom evaluation logic. The software includes templates, such as the “model-graded evals” template, which shows how GPT-4 can check its own work.
How Can I use GPT-4?
If you’re interested in using ChatGPT-4, there are a couple of options available to you. First, you can sign up with OpenAI to try out the basic version of ChatGPT, although there may be restrictions depending on your location.
However, if you want to access the latest and greatest version, you’ll need to become a ChatGPT Plus subscriber, which costs $20 per month. With this subscription, you’ll be able to use ChatGPT-4 to generate even more advanced responses and carry out a wider range of tasks.
In the future, it’s also possible that you may be able to access ChatGPT-4 through Microsoft’s search engine, Bing. Currently, you can click on the “chat” button on the Bing webpage, but you’ll likely be redirected to a sign-up page for a waitlist. Access is expected to be rolled out gradually to users in the coming months, so keep an eye out for updates.
Bottom Line
GPT-4 represents a significant milestone in the field of deep learning and natural language processing. Its multimodal capabilities and improved performance in complex tasks make it a promising tool for a wide range of applications, from creative writing to legal research.
The development of GPT-4 is a testament to OpenAI’s commitment to advancing the field of AI while ensuring its safe and ethical use. With the open-sourcing of their evaluation framework and ongoing efforts to scale their methodology, OpenAI continues to lead the way in AI research and development. As GPT-4 becomes more widely available, we can expect to see even more exciting developments in the field of natural language processing and AI more broadly.