Posted on

AI, LLMs, RAG, Neural Network in a Nutshell

Why and What


You are sipping a warm cup of coffee on a Wednesday morning and pondering how your work will be impacted by AI. You have been reading too many tech journals and media coverage of cutting-edge devices that seem like a sci-fi movie. Let me tell you, the real, implementable solutions for actual productive work have not yet been realized.

All you are acquainted with right now is interacting with LLM chatbots that have a massive volume of understanding. It may look like a “Search engine on Steroids”— but it is actually more like “Matrix Multiplication on Steroids”.


Parameters


When it comes to Generative AI use cases, you will find the ability to generate something new that was never present before. The question worth asking is, “Is the generated output useful for you?” The answer vaguely depends on how many billion parameters your model is running and how well it is trained on your specific use cases.

Now, let me show you a simple example: if you tell an LLM with around 20 B (billion) parameters to create a full ERP implementation of your existing SAP® systems, then you have probably gotten the AI equation wrong. Both in relationships and LLM prompt engineering, keeping expectations realistic is key.


RAG(Retrieval-Augmented Generation)


Let’s try one more scenario. You have a large inventory dataset, and you want to find certain information. You don’t want to look into serial numbers that look like SSH key mumbo-jumbo, matching each character one by one, figuring out the product information, and checking its availability and supplier details. Aha! This is the right use case for our LLM friend, but with a twist.

Because LLMs aren’t databases, asking them to memorize millions of serial numbers will just make them hallucinate fake ones. Instead, we use a clever trick called RAG (Retrieval-Augmented Generation). You let the LLM act as the “brain.” It writes a query, searches your database for the exact product ID, links the supplier info, and then explains it to you in plain English. Comparing the inventory catalog and linking information—these are the types of use cases it is well-equipped to handle when paired with your tools. It’s more like a Math Olympiad winner who is strong in logic; the more parameters it has, the better it can reason.


Neural Network


Now comes the part about how you will use it. Does it require special hardware to run, or can our same old PC or Mac run this new-age Machine? The answer relies on how well you want The Machine (the LLM) to run. It’s packed within itself neural network layers. Neural network layers contains the information it learns in Maths formulas, each time it learns something new, it changes the internal formula. These adaptation is called training. The Neural network to be specific are stored in numbers. These numbers are called weights. Over time The Machine(LLMs) adjust the number and makes itself better. Think of it as many layer of autocorrect. Each layer corresponds to a specific activities and performs those over your inputs. Think of it a big factory where the raw material which is your prompts and input enters and in each steps along the way it is transformed into something different. The final result of the factory is your output.


GPU(Graphical Processing Unit)


When so much activity is carried out, it overwhelms your regular CPU and memory. If you need it done at a reasonable speed so it feels more like standard computer usage rather than a slow science experiment, you will need special hardware that can carry out these massive operations very fast. We call this special hardware a GPU. Yes, the exact same GPU you use to play your favorite Need for Speed.

This special hardware can carry out certain matrix multiplications in parallel, which is how the different layers of an LLM talk to each other. This hardware is costly, and you need to procure it before you can run your LLMs locally.

Now, the question that arises is: How do you know which LLMs to use and what kind of GPU you need?

Glad you are paying attention and asking these smart questions, which is keeping the soul of our conversation alive!

Now, where were we regarding The Machine?

Yes, LLMs can serve different purposes. If you want image generation, it needs to be trained on a massive dataset of images. If you want it to generate nursery rhymes, obviously, it needs to be trained on nursery rhymes. You get the point—it needs to be trained; otherwise, how on earth could it figure out what is expected of it?


Popular LLMs


For software development LLMs, some of the absolute best models can be found on Ollama and Hugging Face. These are wonderful hubs where you can get some highly capable LLMs for free, or to be specific, under open-source licenses.

Some fantastic open-weight models include Meta’s Llama 3, Google’s Gemma or OpenAI’s GPT-OSS.
You download these LLMs, and based on your GPU’s VRAM and capabilities, you pick a size.


LLMs Common Concepts

What is VRAM?

The VRAM here are your Graphics Card Memories which can run tuihese LLMs. Just like your Computer RAMs, VRAMs are specially designed to load your LLMs into memories and operate quickly in parallel. So, to run and train an LLMs GPU VRAMs are required.



LLM needs 70GB of VRAM, and you only have 20GB?

No problem. You pick a quantized version of the same LLM (like a Q4 quantization), so you get decent quality with massive memory savings. Quantization is your best bet if you want to run these Large Language Model with decent hardware setup. Think of it as a High Definition(HD) video in Raw Format which is large in size and you convert into a smaller size video to reduce size, without degrading the Quality of it. The same you can see in case of LLMs. You reduce the size without compromising on Quality.


What if you don’t have a graphics card?

Oh, no problem! Let’s use a cloud LLM. Honestly, running a local GPU takes a lot of energy and effort anyway. Here are some of the best cloud options:
ChatGPT, Google Gemini, or Anthropic’s Claude. These are massive models with 200+ billion parameters. You don’t need to worry about the hardware setup, as these come with subscription or API-based usage.

The only concern here for businesses is privacy. If you use the free consumer web bots, they might use your interactions to get better. However, if you use their Enterprise APIs, your data is strictly locked down. They won’t train on your company secrets, copy your work, or publish it—meaning you still get to keep the appreciation award you deserve!



Wrapping Up

So, what have we learned in this section?

  • Why we need AI (The Machine).
  • What is behind this AI (LLMs, or Large Language Models).
  • How they are built internally (Parameters & Weights).
  • Why they need a special hardware setup (Graphics Cards / GPUs).

If you want to learn more on the subject, there is a highly recommended course called “Learn AI Development for SAP® Developers,” which demonstrates all of this in a hands-on way.