GPT-4 and the evolution of large language models

GPT-4 and the evolution of large language models
Photo Credit: Pixabay

Microsoft-backed research organisation OpenAI has released its latest large language model, GPT-4, on Wednesday, which powers Microsoft’s new Bing and several third-party apps. The new model can respond to images, say, offering recipe suggestions from photos of ingredients, as well as writing captions and descriptions. It can also process up to 25,000 words, about eight times as many as its predecessors including ChatGPT that released in November last year and has since taken the internet by storm. That said, with the new version, OpenAI claims that GPT-4 is smarter and more sensible than its previous iterations as well as other large language models. 

Large language models — even though an internet buzzword today — however are not new in the field of artificial intelligence (AI). These are a type of AI technology that use advanced machine learning (ML) algorithms to process and generate natural language. These models are trained on massive amounts of text data, allowing them to generate human-like text that can be used for a variety of areas such as search engines, natural language processing, healthcare, robotics and code generation.  

The history of large language models can be traced back to the 1950s, when AI was a nascent field. One of the earliest examples of a language model was ELIZA, a program developed by Joseph Weizenbaum at MIT in 1966. ELIZA used a simple set of rules to mimic human conversation, allowing it to respond to user input in a way that seemed natural and conversational.  


Over the years, as AI advanced, many large language models evolved and found widespread use in a variety of business applications. Let’s take a look at some of the key large language models released in recent years, including GPT-3, BERT, XLNet, and more.   

BERT (Bidirectional Encoder Representations from Transformers)  

Bidirectional Encoder Representations from Transformers, popularly called BERT is a large language model developed by Google in November 2018. It is trained on a massive amount of text data and uses unsupervised learning to generate human-like text. BERT is a transformer-based model. That said, it uses self-awareness to understand speech sequences and process text data. Hence, it can better handle long-range dependencies than some other models. It can perform well on a wide range of natural language processing tasks, including sentiment analysis and named entity recognition. However, experts believe, it requires a large number of computational resources to train and run.  


In July 2019, Google researchers in collaboration with Carnegie Mellon University, USA developed XLNet, or Extreme Language Net, another large language model. It is trained on a massive amount of text data and uses unsupervised learning to generate human-like text.   

In contrast to BERT, which predicts only the masked 15% tokens (Masked tokens), XLNET could predict all tokens but in random order. XLNet was trained with over 130 GB of textual data. In addition to BERT's two datasets, they included three more. XLNet is also transformer-based model that uses a training technique called permutation language modelling. This allows it to better capture the relationships between words in a sentence, improving its performance on a variety of natural language processing tasks.  

GPT-3 and GPT series  


GPT-3, or Generative Pretrained Transformer 3, launched by OpenAI in June 2020, is a large language model developed by Microsoft-backed research lab OpenAI. It is trained on a massive amount of text data and uses unsupervised learning to generate human-like text.   

One of the largest and most powerful language models currently available, with 175 billion parameters, GPT-3 can is bigger, smarter, and more interactive than its predecessors and can perform a wide range of natural language processing tasks, such as language translation, summarisation, and sentiment analysis.   

Earlier, the release of Generative Pre-Training (GPT) language model by OpenAI research lab in 2018 (117 million parameters) and GPT-2 in February 2019 (with 1.5 billion parameters), sparked a lot of excitement within the AI community. At the time, GPT was superior to other existing language models for tasks such as common-sense, reasoning and reading comprehension.   


ChatGPT, released in November last year, in fact is a variant of the GPT-3 model specifically designed for chatbot applications and been trained on a large dataset of conversational text, so it is able to generate responses that are more appropriate for use in a chatbot context. ChatGPT became popular and was used to generate content for blogs, social media posts, and even entire websites. Its ability to perform tasks like generating high-quality text that is difficult to distinguish from human-written text and in performing translation and summarisation made it a valuable tool for businesses and individuals alike.  

Meanwhile, Microsoft reportedly invested a $10 billion more in OpenAI in January 2023 to advance to “accelerate breakthroughs in AI and other advanced technologies”.  

OpenAI was however cautious about the potential for wrongful use of GPT-3 and initially kept its access private. Eventually, the company made it available through the API interface, allowing developers to access the model and build their AI applications.   


Nonetheless, GPT-3 has its limitations as it struggles with common sense reasoning and could sometimes generate biased or offensive content. These issues were a reminder of the importance of responsible AI development and the need for ongoing research and development, experts noted.  

MUM (T5)  

MUM or Multitask Unified Model was first introduced by Google in May 2021. Google claimed MUM as a 1,000 times more powerful evolution of BERT. MUM combines several technologies to make Google searches even more semantic and context-based to improve the user experience.  


With MUM, Google wants to answer complex search queries for which a normal SERP (search engine results pages) snippet is not sufficient. MUM is multimodal, so it understands information across text and images. It has 11 billion parameters and is also trained across 75 different languages.  

Other language models  

There were other language models that were developed in the last 2-3 years. In August 2019, Microsoft Research announced its new language model, LXMERT, which used pre-training on large scale tasks and fine-tuning on target tasks. The very same year, Microsoft invested $1 billion into Open AI. Meanwhile in August 2019, Facebook's AI division introduced RoBERT, which was more like an optimised BERT model.  

In March 2019, Chinese internet major Baidu also developed enhanced representation through kNowledge IntEgration or ERNIE as a large-scale pre-trained language model, with a capacity of 4.5 billion parameters. In December that year, Amazon Web Services (AWS) introduced DeepComposer, a language model that generates music based on text input.  

In January 2021, OpenAI releases DALL-E, a 12 billion parameter version of GPT-3 trained to generate images from text description.  

In May that year, Google introduced Language Model for Dialogue Applications (LaMDA) neural language models. Weeks after, GitHub announced Copilot, an AI pair-programmer for coding.  

On April 4, 2022, Google AI announced the Pathways Language Model (PaLM) in their quest to create, “a single model that could generalise across domains and tasks while being highly efficient.” PaML has scaled to 540 billion parameters but is efficiently trained with the Pathways system, a new ML system which enables highly efficient training of very large neural networks across thousands of accelerator chips Google said.  

Sign up for Newsletter

Select your Newsletter frequency