Jakub Tutka | Jakub Tutka Portfolio

Where did artificial intelligence come from?

Artificial intelligence is today one of the fastest-growing fields of technology. Just a few years ago, few people knew exactly what AI was, and even fewer were interested in how it works. Meanwhile, this topic has already penetrated a significant portion of other industries, and in the future, likely also into our daily lives.

Although the concept of "artificial intelligence" itself appeared in the 1950s, the real breakthrough happened only recently – with the emergence of the so-called Large Language Models (LLM). The year 2018 brought two key events: Google presented the BERT model, which perfectly understood natural language, and OpenAI began working on its flagship series of GPT models, specializing in generating human-like text.

However, it was GPT-2 from 2019 that showed that larger models, powered by vast datasets, can achieve increasingly better results. The real revolution, however, proved to be GPT-3 – a model that not only impressed with its scale but also enabled learning new tasks based on user commands.

In 2022, ChatGPT was released – an interface based on GPT-3 and GPT-3.5, which enabled conversation with artificial intelligence in a free, easy-to-use chat interface. It was then that artificial intelligence reached a mass audience for the first time. Millions of people started using ChatGPT for writing texts, coding, language learning, or problem-solving.

Currently, hundreds of language models are available on the market, developed by companies such as Anthropic, Meta, Mistral, DeepSeek, Alibaba, and Google. Despite this, OpenAI and the GPT series still set the direction for the entire industry's development.

Given the current number of available models, choosing the right tool for our task is becoming increasingly important. In this article, I will focus on LLM models, chatbots, and text analysis.

Most important models

OpenAI GPT and "o" Models

GPT models are the most recognizable and influential artificial intelligence tools that have largely defined the current landscape of generative AI. The latest ones are omnimodal, meaning they can process text, images, sound, and even video – all within a single system.

Currently, models are divided into 2 main series

GPT series (currently GPT-4.1) are versatile general-purpose models. They are regularly improved and trained on vast datasets, and are excellent for conversations, translations, text summaries, and content creation.
The "o" series (e.g., o4-mini, o3) is a new generation of models designed for deep reasoning. They can analyze complex problems, break them down into parts, and independently assess the correctness of their own answers. They can be described as "thinking" models – they work slower but more precisely.

Key differences

GPT 1.1. Purpose – a universal and "conversational" model, designed to generate fluent, creative, and natural responses. It focuses on intuitive interaction with humans and understanding user intent. 1.2. Thinking process – it relies on patterns learned during training. It generates responses quickly and coherently but can sometimes "hallucinate", i.e., create information that sounds true but is incorrect. 1.3. Availability – available in both the free ChatGPT plan and via API.
"o" 2.1. Purpose – a model for tasks requiring precision, logic, and deeper thought. Excellent for programming, mathematics, learning, and analysis, where consistency and correctness of reasoning are key. 2.2. Thinking process – it uses the "chain-of-thought" reasoning technique – it analyzes the problem step by step, generates different variants, verifies them internally, and only then responds. This makes it significantly more accurate. 2.3. Availability – currently available only in paid plans (from $20 per month).

What distinguishes these models?

Naturalness – GPT models perfectly understand natural language and user intent.
Creativity – they are excellent for content creation: stories, scripts, emails, and even poetry.
Huge context window – GPT-4.1 supports over 1 million tokens, and "o" models up to 200 thousand, which allows for the analysis of very extensive documents, books, and other input data.

Limitations

GPT – great for conversations and writing, but less precise in analysis. They can invent data or misunderstand questions.
"o" models – significantly more accurate, but not available in the free plan.
Knowledge from 2024 – current models have a "knowledge cutoff" in January 2024, meaning they lack information about the latest events. However, some of them can use the internet, which can be helpful but is not always reliable

Summary

The GPT series initiated the era of generative artificial intelligence and remains its symbol.
Models are trained on gigantic datasets, which requires enormous computational resources and energy.
GPT models are "conversational", intuitive, and creative tools – ideal for the everyday user.
The "o" series initiated a new wave of AI – models that understand, analyze, and improve themselves.
The choice depends on needs: for casual conversations and writing – GPT; for complex analyses – "o" models.

Google Gemini

Gemini is Google's answer to OpenAI (GPT) and Anthropic (Claude) models – created with native multimodality and deep integration with Google services such as YouTube, Search, and Workspace in mind.

Although the Gemini model series debuted only in December 2023 (version 1.0), the revolution itself, on which today's LLMs are based, began much earlier – at Google. In 2017, a team of researchers from the company published the groundbreaking paper "Attention is All You Need", in which the Transformer architecture was presented for the first time. It was this architecture that became the foundation for models like BERT, GPT, Claude, and – indeed – Gemini.

Unlike GPT, which started as a text model, Gemini was designed from the outset as a multimodal model. This means it not only understands but also combines different data types – text, image, sound, and video – into one cohesive analysis.

<img src="https://firebasestorage.googleapis.com/v0/b/flagship-8de31.firebasestorage.app/o/blogs%2FQytUFpb68boQeP4LyNkIOoO7b4nqyj?alt=media&token=3b0fa48f-ff0f-4723-9b49-1a8a8745c37a"> Comparison of the latest Gemini 2.5 Pro model with other flagship models on the LMArena website

The latest Gemini 2.5 model was created in 2 versions

Pro - the most powerful version of the model, focused on response accuracy and state-of-the-art performance
Flash - incredibly fast, offering the most balanced capabilities

What distinguishes these models?

Huge context window – already from version 1.0, Gemini models offer an impressive context window – up to 1,048,576 tokens, which allows for analyzing tens of thousands of lines of code, entire books, and even hours of video recordings. This is an ideal solution for complex analyses and summaries of long content
Mixture-of-Experts (MoE) architecture – instead of running the entire model every time, MoE selects only a few specialized "experts" for a given query. The result is greater efficiency and faster responses with the same (or greater) precision
Response speed – thanks to the new architecture, Gemini models significantly outperform other flagship models
Access to the latest data – Gemini 2.5 has a knowledge cutoff in January 2025 and additionally integrates with Google Search, giving it access to up-to-date information – which distinguishes it from many competing models
Multimodality from the ground up - Gemini understands text, images, sound, and video, making it one of the most versatile models publicly available
Integrated YouTube – it is the only model that allows live analysis of YouTube videos from a pasted link
Deep reasoning – New versions (especially Gemini 2.0 Flash Thinking and 2.5 Pro) offer significantly better reasoning for complex instructions and improved logical and mathematical inference

Limitations

Hallucinations – like any LLM, Gemini can sometimes generate false information, although Google is constantly working to reduce this phenomenon in the latest versions.
Lack of specialization – the model is very versatile, but as a result, it is rarely the best in one specific field (e.g., coding or science).
Excessive caution – for safety reasons, Gemini can be too restrictive and may refuse to answer even harmless questions if it deems them potentially risky.

Anthropic Claude

Claude Sonnet is one of the most important new generation language models, developed by Anthropic – a startup founded in 2021 by former leading OpenAI researchers who decided to focus on safety, ethics, and transparency in creating artificial intelligence.

The first Claude models appeared in 2023, and they were distinguished by an approach called "Constitutional AI". Instead of relying solely on human evaluations (like RLHF), Claude operates based on a set of built-in principles – a "constitution" that guides it when generating responses. This approach was intended to minimize the risk of unethical or harmful answers without sacrificing quality.

Claude models are divided into 3 variants

Haiku – the fastest and lightest, ideal for customer service or chatbots.
Sonnet – a golden mean between power and speed, available for free on Claude.ai.
Opus – the top model for the most demanding tasks (available in a paid version).

What distinguishes these models?

Safety – Claude was designed with alignment with human values in mind. Its built-in "constitution" means the model is less likely to generate controversial or false content.
Advanced reasoning and analysis – Claude models are excellent at handling logical tasks, data analysis, document summarization, and understanding charts. Claude 4 Sonnet and Opus, in particular, are considered leaders in this field.
Proficiency in programming – Claude Sonnet 4 is currently considered one of the best models for coding – it outperforms competitors in tests such as HumanEval and SWE-Bench.
"Artifacts" feature – in the Claude.ai application, the model can generate and edit code or documents in a special panel next to the chat – a great tool for creative and iterative work.

Limitations

Smaller context window – only 200,000 tokens, which may be insufficient for analyzing large video files, books, or code repositories (for comparison: Gemini and GPT-4.1 have over 1 million tokens).
Limited multimodality – currently, Claude only supports text and images – without audio and video, which are already appearing in competing models
No search engine – Claude has no internet integration – after the data cutoff date, it does not know about the latest events.
Less creativity in some tasks – users indicate that in tasks requiring significant creativity, GPT models can sometimes offer more diverse and original responses.
Caution vs. creativity – thanks to ethical safeguards, Claude rarely generates risky content, but it can be too restrictive in tasks requiring role-playing or creating controversial characters.
"Dry" responses – compared to more creative models like GPT, responses can be more formal, factual, and less original.

Other important models

It's also worth mentioning other models, besides the "Big Three"

Meta Llama – very popular open-source models that compete with commercial solutions.
Mistral AI – the largest European competitor to the main players.
DeepSeek – a popular Chinese open-source model that took a very innovative approach to training and can compete with the largest companies in the world.

Use cases

In the world of artificial intelligence, there is no single "best" model – everything depends on what you need. In 2025, we are dealing with specialized systems, each designed with specific applications in mind.

Some models are excellent for creative tasks, while others are indispensable for data analysis or precise coding. Therefore, choosing the right tool is a crucial decision that can significantly increase the effectiveness of work, learning, or creativity.

To help you make an informed choice, I have prepared a brief comparison of the most important models:

GPT 1.1. Strengths – creativity, text writing, language learning, dialogues, idea generation 1.2. Ideal for – copywriters, teachers, students, people looking for inspiration 1.3. Differentiator – excellent at creating unique content and conducting natural conversations
Gemini 2.1. Strengths – huge context window (over 1M tokens), Google integration, YouTube video analysis, access to the latest information 2.2. Ideal for – researchers, analysts, students, translators, people working with large documents 2.3. Differentiator – the most versatile model – excels in both text and multimedia analysis
Claude 3.1. Strengths – data analysis, programming, mathematics, content safety 3.2. Ideal for – programmers, scientists, researchers, companies valuing ethical AI 3.3. Differentiator – precise reasoning, fewer errors, an approach aligned with human values
DeepSeek 4.1. Strengths – openness, low cost, good support for coding 4.2. Ideal for – developers, startups, small companies, open source projects 4.3. Differentiator – one of the best open-source models with access to a powerful API at a significantly lower cost than commercial counterparts

Summary

Genesis of AI – The article presents the evolution of artificial intelligence from its beginnings, through the groundbreaking Transformer architecture (2017), up to the explosion in popularity thanks to ChatGPT (2022).
Main Players – Three leading model families were analyzed: GPT from OpenAI (known for creativity and multimodality), Gemini from Google (distinguished by a huge context window and integration with Google services), and Claude from Anthropic (focused on safety, ethics, and precise reasoning).
Specialized Models – It was pointed out that besides the "big three", there are important open-source models such as Llama, Mistral, and DeepSeek, which offer competitive capabilities at lower costs.
There is no single "best" model – The main conclusion is that choosing the right AI tool depends on the specific task. GPT is ideal for content creation, Gemini for analyzing large datasets and multimedia, and Claude for precise coding and tasks requiring logic.

All articles Discuss this post

Overview of artificial intelligence models

Where did artificial intelligence come from?

Most important models

OpenAI GPT and "o" Models

Google Gemini

Anthropic Claude

Other important models

Use cases

Summary

Related links