Finding Sources of Malware Samples

Malware analysis labs are only useful if you have code samples to study. Part of my lab setup I am working on improving is expanding where I find malware samples to find more intresting techniques and get more diverse samples.

1. Malware-Traffic-Analysis

Malware Traffic Analysis is a site that distributes malware samples, packet captures of the malware network traffic, and information about what type of malware it is. Usually I’ll download just the pcap file and try to find and extract any files from it that I can. Using those files, I’ll look for other stages of malware, what it’s trying to acomplish on the system, and any indicators of comprimise (IOCs).

2. MalwareBazaar

MalwareBazaar is a website similar to VirusTotal that allows people to upload files to check if they are malicous and identify what family it is from. Unlike VirusTotal, this site allows anybody to download samples for free.

The first way I use this site is to try to find a specific type of malware that I want to reverse. I first found this site by trying to find a sample of the backdoor used in the SolarWinds attack. Using hashes from FireEye’s blog post I was able to find the .dll of the software containing the backdoor just a few days after the blog was released. If you don’t have hashes for a sample you are looking for, MalwareBazaar also does a good job of tagging samples they can identify making it easy to find multiple pieces of malware for a given family.

The other way I use this site is to browse through their newly submitted files to find unidentified samples. This is good practice for starting with an unidentified file and trying to dissect it to see what it does and if it can be classified with an existing type of malware. With a site like Malware Traffic Analysis, I found I knew too much about the malware before starting to reverse engineer so I had specific functionality I would look out for. It’s more realistic that I would start with nothing and try to find IOCs for the sample. Not every sample that I pulled from this site is a good malware sample and is often just a buggy program someone wrote for nonmalicous purposes, but often I find weird intresting files that are fun to look at for a while even if they aren’t good malware.

3. CTFs

CTFs or capture the flag competitions are challenges to practice different cyber security skills. These challenges have a large range of topics including web security, cryptography, reverse engineering, open source intelligence, etc. The best reverse engineering one I have done so far is Flare-On made by FireEye.

4. Pastebin

Several year ago I bought a lifetime Pastebin pro subscription that gave me access to the scraping API. This gave me the ability to scan every new public text post, compare it to Yara rules I wrote, and save anything that might be malicous for later reverse engineering. I found that Pastebin was a good source of base-64 encoded PE files that could be decoded and disected. Often these were second or later stages of malware infections. Other rules that were successful were finding strings from malicous PowerShell scripts, packed JavaScript functions, and output from hacking tools. Pastebin has since cracked down on some of this malicous content and a lot of the types of files I used to find are gone. I may start up this project again to look for other threats, or use the same concept on other text hosting websites.

5. Email Addresses

Every once in a while I will look through my spam folder and find some malware worth looking at. Usually the emails in there are just scams somethimes with links to phishing sites, but sometimes I will find a malicous word document with macros that lead to some intresting malware samples. This is my least useful place to find malware, but it also takes no setup making it an easy first place to look. I plan on extending this idea by making a new email address just for malware and using it on sketchy sites and posting it around on forums to see if I can get an increase in malicous Spam.

6. Honeypots

Honeypots are intentionally vulnerable systems designed to be attacked. They can be connected to the Internet to see how attackers interact with them to gain insights about their actions. This includes collection any malware samples attackers try to download and execute on the system.

The honeypot project I am currently running is Cowrie. It is an open-source medium interaction honeypot written in Python. I have it exposed on ports 22 and 23 to have SSH and Telnet facing the Internet. Attackers scanning the Internet can find my system, log in to the honeypot with SSH or Telnet using weak credientals configured in Cowrie, and then start to attempt to attack the server. Any files downloaded with wget are saved along with information about IP addresses, credientals used to login, and any commands entered in the fake terminal to a location outside of the honeypot.

An intresting part of collecting malware from a honeypot rather than other sources is that many of the samples still have live command and control servers running and distributing them assuming you start analysis soon enough after. Several times I have been able to capture one malware sample, go to the IP address where the sample was downloaded from, and found the attacker’s command and control server containing the sample compiled for multiple architectures, more stages of the malware, and other files they used to run the malware.

One goal I have with this honeypot is capturing IOT malware. SSH and Telnet with weak credientals is a very common inital entry point for IOT attacks as shown with the Mirai Botnet. The credientals and additional customization for files on the system, running processes, and CPU architecture, can make the honeypot look like a real IOT device.