Malware Obfuscation (XOR)
Hi folks, havox here i just planned to start the malware series, and this is the first one of that ,so here we gonna learn malware techniques, reverse engineering, and also detection techniques..
Obfuscation means Making something harder to understand or confusing
Warning: This blog discusses real-world malware obfuscation techniques, including code examples. Use this knowledge only for ethical security research, reverse engineering, or defensive purposes. Never use these methods to create or distribute malicious software.
Intro
This is the first post of a series which regards development of malicious software. In this series we will explore and try to implement multiple techniques used by malicious applications to execute code, hide from defenses and persist. Let’s create a C++ application that will run malicious shellcode while trying to not be caught by AV software. Why C++ and not C# or PowerShell script? Because it’s much more difficult to analyze compiled binary when compared to managed code or script.
The proof of concept code for this can be found on GitHub at . If you like my project and post, please feel free to give the repo a star! :)
Today i am writing this complete blog on obfuscation. I will start from the basic meaning, then cover many different types with proper explanation for each one, and then go deep into the three main encryption techniques that i always focus on – XOR, RC4 and AES. I will give clear working examples for each and also explain exactly how to detect them in real samples, here everything explained step by step in simple words like i talk to my friends. No short cuts, everything expanded.
What Does Obfuscation Mean?
Obfuscation is basically the art of making code or data look messy and hard to understand on purpose. In normal programming we write clean code so that other developers can read it easily. But in the malware world or when someone wants to protect their secret stuff, they deliberately turn that clean code into something that looks like garbage. The real logic is still there but hidden behind layers so that antivirus engines, security researchers or even other hackers cannot figure out what is happening just by looking at the binary.
Think of it like this – you have a secret message “download malware from this link”. Instead of keeping it as plain text, you scramble it so that in the exe file it shows only random characters and weird symbols. The program itself knows how to unscramble it at runtime and use the real message. This way signature based antivirus fails because there is no clear string to match. Developers also use obfuscation sometimes to hide API keys or license checks in their commercial software. But 90 percent of the time when you see heavy obfuscation in a binary, it is malware trying to stay alive longer. The main goals are always evasion, size reduction and making analysis take more time.
Common Types of Obfuscation
There are so many ways people do obfuscation and malware authors keep inventing new combinations every month. Let me explain each major type clearly so you understand when you see them in real samples.
Renaming obfuscation is the simplest and most common starting point. Here the author changes all variable names, function names and class names to random garbage like a1(), b2c3x(), or even single letters mixed with numbers. All comments are removed and the code formatting is destroyed. It does not change how the program runs but it makes the decompiled code almost impossible to read quickly. You open it in Ghidra and every function looks meaningless.
Control flow obfuscation is next level and it messes with the program logic itself. Normal if-else or while loops are broken into hundreds of small jumps and fake conditions. They add switch statements that never get used or insert goto-like jumps that confuse the control flow graph. The program still does the same thing but the flow looks like a spider web. Malware loves this because automated analysis tools get lost in the fake paths.
Junk code insertion means adding tons of useless instructions that do nothing. You will see calculations that are never used, loops that run but the result is discarded, or register pushes and pops that cancel each other. These junk parts increase the file size and waste the analyst’s time while the real malicious code stays hidden in between.
Packing and compression is very popular in droppers and ransomware. The entire executable is compressed with tools like UPX, ASPack, or custom packers. When you open the file in a hex editor you see only a small stub at the beginning. At runtime that stub unpacks the real code into memory and runs it. This hides all strings and functions until execution.
String obfuscation and encoding is something you will see in almost every sample. Instead of keeping plain strings like URLs or registry keys, they split the string into pieces and join them at runtime, or they apply simple encoding like Base64, ROT13, or custom character replacement. Sometimes they even store strings as array of numbers and convert them back using a loop. This way static string tools like strings.exe show almost nothing useful.
Polymorphic obfuscation is where the malware changes its appearance every time it spreads. The core logic stays same but the decryptor or stub code is generated differently for each victim using random keys or random junk instructions. This breaks signature based detection completely because no two samples look the same on disk.
Metamorphic obfuscation is even stronger than polymorphic. Here the entire malware body rewrites itself every time it runs or spreads. It changes instructions, reorders functions, adds or removes junk – basically the whole binary becomes different while doing exactly the same job. This is rare because it is very hard to code but you see it in advanced APT groups.
Code virtualization turns the original instructions into a custom virtual machine bytecode. The malware carries its own tiny VM interpreter and the real malicious code runs inside that VM. When you disassemble it you see only the VM loop and hundreds of custom opcodes. This is one of the hardest to reverse because you have to understand the VM first before you can see the real logic.
Binary padding and junk data addition is a cheap trick but still effective. They add huge blocks of random data or zero bytes at the end of sections to increase file size. Many quick scanners skip large files or certain entropy ranges so this helps the malware slip through.
All these types are often mixed together. A single malware can have packing + control flow + string encoding + XOR on top. Now let me go deep into the encryption based obfuscation because this is the part i am working on right now and these three are used in almost every advanced sample.
XOR Obfuscation
XOR is the king of simple encryption obfuscation. It is a bitwise operation where each byte of data is XORed with a key. The beauty is that doing the same operation again with the same key brings back the original data because A XOR B XOR B equals A again. Malware writers love it because it needs almost zero code, runs super fast, and no external libraries are required.
Usually they use a single byte key like 0xAA or 0x55 but sometimes they make it rolling – the key changes after every byte using addition or multiplication. You will see a small loop in the binary that takes a buffer and does XOR on each byte. In many ransomware samples the config or the C2 address is hidden with XOR.
Here is a clear Python example that shows exactly how it works:
Python
In real malware this loop is written in assembly with just a few instructions and runs on the entire .data section or a specific buffer. Variations include multi-byte XOR where they use 4-byte or 8-byte keys repeating.
How to Detect XOR Obfuscation
Detection is actually quite easy if you know what to look for. First calculate entropy of the sections – encrypted data always shows entropy above 7.0 while normal code is around 5-6. Use Detect It Easy (DIE) or PE-bear for quick check. Then open the file in hex editor and look for repeating patterns – if every 10th byte is same then single byte XOR is likely.
The best tool is Didier Stevens XORSearch – just run “xorsearch.exe sample.exe -s http” and it will brute force all 256 keys and show you the decoded strings instantly. In disassembler you will see a very small loop with XOR instruction on a register or memory byte. Dynamic analysis in x64dbg also works great – set breakpoint on memory write and watch the buffer getting decrypted.
RC4 Obfuscation
RC4 is a stream cipher and it is the next step after XOR. It uses a key (can be any length) to create a 256-byte S-box through Key Scheduling Algorithm then generates a pseudo-random keystream byte by byte and XORs the plaintext with that stream. It is still fast and the code size is small so lots of banking trojans and RATs use it to hide their configuration or communication data.
The implementation has two main parts – first the KSA that shuffles the S-box using the key, then the PRGA that keeps generating the keystream. You can find many samples where the key is a hardcoded string like “MySuperSecretKey2025”.
Here is the full working example in Python so you can test it yourself:
Python
How to Detect RC4 Obfuscation
In static analysis look for the classic 256-byte array initialization from 0 to 255 followed by a swapping loop – that is the KSA signature. In IDA or Ghidra you will see a loop that does exactly 256 iterations with addition and swap. Many YARA rules exist for this pattern. The key is often visible as a string or byte array nearby. During dynamic run in sandbox you can dump the memory after the RC4 function runs and see the clean strings appear.
AES Obfuscation
AES is the strongest and most professional choice among these three. It is a block cipher that works on 16-byte blocks with 10, 12 or 14 rounds depending on key size (128/192/256 bit). It uses S-box substitution, ShiftRows, MixColumns and AddRoundKey operations. Malware uses AES when they want real security – for example to encrypt the full payload, ransomware files, or to protect C2 communication.
Because AES needs bigger code, many samples just call Windows CryptoAPI functions like CryptEncrypt instead of implementing from scratch. You will see big constant tables (the AES S-box starts with bytes 63 7C 77 7B etc.) somewhere in the binary.
Here is a simple AES example using standard library (in real malware it is either CryptoAPI or custom implementation):
Python
How to Detect AES Obfuscation
The fixed AES S-box table (256 specific bytes) is a dead giveaway – just search for the byte sequence 63 7C 77 7B 7F in the binary. Round constants or the MixColumns matrix (02 03 01 01 etc.) are also unique. Capa tool or YARA rules with “AES” in name detect it automatically. In dynamic analysis watch for calls to CryptEncrypt or large memory allocations followed by data transformation.
General Detection Techniques That Work on All Types
No matter which obfuscation is used, always start with entropy check on all PE sections. High entropy sections almost always contain encrypted or packed data. Use FLOSS or strings with -n 8 to see if readable strings are missing. Run the sample in ANY.RUN or Hybrid Analysis sandbox and monitor memory for sudden appearance of clean URLs and PE headers – that is the moment decryption happened. Tools like PEiD, DIE, and capa give instant hints. If you are deep into reversing then static unpacking in x64dbg or Scylla and then re-analyze the dumped process is the ultimate way.
Obfuscation is basically a cat and mouse game that never ends. Malware authors keep making it stronger and analysts keep finding new ways to break it. Once you understand these patterns and practice on real samples, you will start seeing through the garbage very quickly. I am still working on more parts of this series so if you have any specific sample or want me to add screenshots or more examples just tell me.
Last updated