International Cybersecurity Research Made in Hamburg

ChatGPT, Bing Search, Bard & Friends - Part I

Jantje Silomon

26 April 2023

Last November, OpenAI launched ChatGPT, a chatbot built on top of the GPT-3 Large Language Model (LLM) family. Its humanesque responses quickly garnered attention, from interviews and being hailed as "amazing, creative, and totally wrong" to calling it "dumber than you think". It did not take long for people to try find and fix bugs with ChatGPT… or to create polymorphic malware. However, my favourite example is still ChatGPT being asked to write a biblical verse in the style of the King James Bible explaining how to remove a peanut butter sandwich from a VCR – yes, somebody really did that and the response was hilarious!

Five months on, it has become almost obligatory to have written something about chatbots, with ChatGPT moving from v3 to v4, Bing Search having several iterations, and the first close encounters of the third kind with Bard. So here I am, writing a blog post after all! There are a myriad of potential applications of this technology and OpenAI releasing API access at the end of March has seemingly fuelled a new AI goldrush. There are a lot of discussions on the implication of such tech, from job security to disinformation and cyber security. Corporations, institutions, and governments alike are scrambling to not only understand the tech but also try to get ahead in terms of regulation. Military applications are yet another ballpark.

Following a summary of what has happened so far, I will delve a tiny bit into LLMs and what makes this generative AI so… magical. No, that is still not real AI but I will not go down that rabbit hole right now; I will stick to using AI in the colloquial sense in this post! I will take a quick look at how the chatbots differ before exploring how it all affects cyber security issues.

Chatbot Recap Galore

Following ChatGPT’s release, Microsoft announced a new Bing Search in the first week of February. Coupled with the Edge Browser it was powered by AI – something that seemed to be similar to ChatGPT but ‘upgraded’ ChatGPT to have internet access. If you want to try out the new search, you have to join a waitlist, and of course sign-in with your account... and are asked to set Microsoft defaults – I did not do that and still got access rather quickly, though I am reminded every time! Not to be outdone, Google responded with announcing Bard while Opera added ChatGPT into its sidebar.

Those able to test the new Bing search early on came up with some interesting results. During one live streamed test it managed to combine several searches and piece information together to calculate how many bag-packs of make X fit into the trunk of car type Y (note: ads and language). In another example, the system was asked to pretend to be its ‘shadow self’, resulting in it “expressing a desire to steal nuclear codes, engineer a deadly pandemic, be human, be alive, hack computers, and spread lies”. Not really surprising given what it was asked to do! It also expressed some hostility, for example in a conversation with an engineering student that had previously tweeted about its vulnerabilities and codename.

However, Bing Search was not alone. ChatGPT had instances of hilarious (and horrifying) ‘AI hallucinations’, generating content that is nonsensical or unjustified given the training data, or decided to possibly side with machines over humans. User with access played around to their hearts’ content, trying to tease out secrets or circumvent other restrictions, revealing Bing’s unfinished 'secret modes', (happy) little accidents, but also ethical levers that still need adjusting. New constraints have since put in place so humans stop confusing the system, or in other words, Bing’s search has been gagged. Chat sessions have been extended since and new features are still being added, current options include three response styles: ‘creative’, ‘balanced’, and ‘precise’.

Meanwhile, OpenAI has released ChatGPT-4 to the public, which (unlike its predecessor) can manage images and process roughly 25k words at once – eight times as many as ChatGPT-3. Apparently, it is also safer, smarter, more creative, more collaborative, more accurate, “broader general knowledge and problem solving abilities”… you get the idea!

Before moving onto the cyber security aspects, including for example chatbot malware creation and detection, it is time to delve a little into the workings of such systems.

An Extremely Brief LLM History

LLMs are essentially the engine that lets chatbots and related systems ‘understand’ and generate human-like text, so they can summarise, translate, and answer questions (amongst other things). Chatbots specialise in dealing with natural language tasks (using Natural Language Processing – NLP) and are trained on vast amounts of data. While that is nothing new, a major drawback has been the resource intensiveness coupled with its lack of speed, greatly hampering real-time applications. Traditionally, most NLP algorithms tend to only examine the immediate context of words, while LLMs address the bigger picture, looking at large swaths of text for context.

Early origins of NLP can be traced to Weaver’s 1949 memorandum on machine translation (PDF copy here) and Noam Chomsky’s thoughts on language, such as his Three models for the description of language. Research soon split into symbolic and stochastic groups, the former focussing on syntax generation and formal languages, the latter on stats and probabilities including for example pattern recognition. Two decades on, researchers split even further, and yes, that trend continued down the line, adjusting to and incorporating new technological advances. Skipping over decades of development spurts and breakthroughs but also ‘AI winters’, Apple’s Siri became one of the first successful NLP assistants used by consumers in 2011.

Then in 2017, researchers released a paper detailing the original Transformer introducing the concept of attention mechanisms, which allowed the language model to learn context in sequential data. As far as I understand it, these mechanisms work by allowing the model to focus on different parts of the input when making predictions, rather than treating the entire input as a single entity. Soon after, Google published BERT (Bidirectional Encoder Representations from Transformers) and released an open-source implementation and several pre-trained models. Its significantly more accurate understanding and generation of natural language was apparently a watershed moment for NLPs. In 2019, Google announced BERT powered its Google Search, and since then LLMs have continued to evolve and improve.

In 2020, OpenAI launched GPT-3, trained on a dataset with almost a trillion words with over 175 billion model parameters, and was applied across a vast range of industries. A year later, Google released LaMDA (Language Model for Dialogue Applications), while Microsoft and NVIDIA introduced Megatron-Turing Natural Language Generation model (MT-NLG) with 530 billion parameters.

Currently, GPT-3.5 and -4 are all the rage, with the acronym standing for Generative Pre-trained Transformer generations 3.5 and 4, respectively. There have been numerous rumours about what Bing Search runs on and was confirmed to be GPT-4, tweaked to excelling in searching. Bing clearly did not get the memo though as when I asked Bing about itself (yes, I actually did that over Easter), it replied – in creative mode – with:

“I am an AI language model developed by Microsoft Bing Search. My primary function is to provide helpful and informative responses to user queries. I am designed to understand and communicate fluently in multiple languages and to provide accurate and relevant information based on web searches. My language model differs from others in that it is specifically designed for use in a search engine context. This means that I am able to leverage the vast amount of information available on the web to provide accurate and up-to-date responses to user queries. My language model is not directly related to any other known models, but it does share some similarities with other large language models in terms of its architecture and capabilities.”

While those two are currently the most talked about, there are of course loads of other LLMs (or at least offshoots) out there. Aside from the ones introduced earlier, popular ones include OpenAI’s generative art LLM DALL-E, or Meta’s emotive BlenderBot; GODEL is a large-scale pre-training model for goal-directed dialog developed by Microsoft. A key difference between these models is their intended use – other examples include tweaking them for specific purposes and then selling the ‘end product’, such as writing blogs or ads, tutoring, reviewing or creating code. And yes, the list goes on, especially as model offshoots can be created extremely cheaply by now – a list of open-source ones here.

Read Part II here -->