EngEd Community

Section’s Engineering Education (EngEd) Program fosters a community of university students in Computer Science related fields of study to research and share topics that are relevant to engineers in the modern technology landscape. You can find more information and program guidelines in the GitHub repository. If you're currently enrolled in a Computer Science related field of study and are interested in participating in the program, please complete this form .

How to Get Started with YARA for Malware Analysis

June 16, 2022

According to security researchers, over 450,000 new malicious programs are discovered daily. Currently, about 1343.64 malicious programs have been reported as of April 2022.

Malicious programs, commonly known as malware, can be defined as software developed intentionally to harm computer systems or alter their normal mode of operation.

We can spread malware through various methods, such as downloading from pirated sites, storage mediums i.e. USB drives, emails with malicious links or attachments that appear to be from a legitimate source, etc.

This article looks at YARA, a malware analysis tool used to detect patterns of malware characteristics in files, using a rule-based approach. We will focus on what YARA is, how to install it in Windows and Linux environments, and finally handle its syntax.

Table of contents

Prerequisites

To follow along with this tutorial, the reader will need:

  • Some basic knowledge of computer programming.
  • Background knowledge of C programming language.

What is YARA?

YARA is an open-source tool used for malware analysis. This tool was developed by Victor Alvarez of VirusTotal. YARA uses a rule-based approach to match patterns of malware characteristics in files. The rules usually contain strings, regular expressions, and special operators that describe certain characteristics of malware families followed by a boolean operation.

How to install YARA

We will install YARA on both Linux and Windows systems for this tutorial.

Linux installation

  • First, update your packet manager by running:
sudo apt update -y && sudo apt upgrade -y
  • Installing YARA on your computer
sudo apt install yara

After running the command above, you can now access YARA from your command line.

Windows installation

  • Download this zip file from the YARAs GitHub page.
  • Unzip the file and run the yara.exe executable.

For Mac users, you can use brew to install YARA.

YARA synthax rules

Here is a sample of YARA rule:

rule SectionSample{
  meta:
   author = Felix Vaati
   description = Simple YARA rule
  strings:
   $text_sample="readers" nocase
   $hex_sample={73 65 63 74 69 6f 6e 20 ?? 65 61 64 65 72 73}
  
  condition:
   any of them //checks whether a file has any of the above rules
}

Let us go through the sample rule to understand its syntax:

Line 1

rule SectionSample{
}

Every YARA rule has the keyword rule as seen in the first line of this rule. The keyword is then followed by a rule name or identifier, in our case, it’s SectionSample. The rule identifier has some naming conventions, it can contain an underscore, or any alphanumeric character.

However, the first character cannot be a digit, just as in the C programming language. Although all alphanumeric characters are allowed, there are some words regarded as keywords and hence can’t be used as rule names e.g. any, itstartswith, meta, entrypoint.

Line 2

  meta :
    author = Felix Vaati
    description = Simple YARA rule

This line contains the metadata of the YARA rule. In this section, you can include the author’s name, the date you created the rule, a description of what the rule does etc.

Line 3

  strings:
   $text_sample="readers" nocase
   $hex_sample={73 65 63 74 69 6f 6e 20 ?? 65 61 64 65 72 73}

The strings section contains values we want to search for in files. This section can contain text, hexadecimal, or regular expressions.

$text_sample and $hex_sample are the variables where we will store our strings. Hex strings can contain wild cards where some bytes are unknown. We display wild cards using ??.

Strings can have modifiers, we can use modifiers to define strings. In our example, we have used the nocase modifier to show that the YARA engine should ignore the case when matching our string.

Line 4

  condition:
   any of them //checks whether a file has any of the above rules

For our YARA rule to be complete, it must contain a condition, otherwise, it will be just another text file. Conditions are boolean expressions that guide the YARA engine in matching the strings. Our sample rule will check if a file has any of the two strings text_sample, and hex_sample.

Conditions can also include the location of a string in the file. This helps especially in identifying the file type and hence reducing the chances of our rules producing false positives.

We can also include file size as a condition when we want to know the approximate size of a malware file. Malware researchers do share the file size of malware files and we can use that to enhance our YARA rules.

Comments

YARA rules can also contain comments just like in other programming languages. To write comments in YARA, we use // for single-line comments and /* */ for multi-line comments. We save YARA rules as files with the extension . yar. A single . yar file can contain more than one YARA rule.

Open any code editor of your choice, copy our sample YARA rule and save the file as a sample. yar or any random name. Create another text file and add Readers to it. Add some random text to the file and save it as a text file (.txt).

Running YARA rules

To run our YARA rule we will use the keyword yara to access the yara engine, for Windows you have to specify either the yara32 or yara64 bit version.

The name of the rule file (the . yar file) and then the file we want to test. Sometimes, we might want to check our entire file system, and hence we can use . in place of the file we want to test.

┌──(dyrstiu㉿kali)-[~/Documents/Demo]
└─$ yara sample.yar test.txt
SectionSample test.txt

Checking over one file, i.e. the rest of the filesystem.

┌──(dyrstiu㉿kali)-[~/Documents/Demo]
└─$ yara sample.yar ~/. 
SectionSample /home/kali/./stegsolve.jar
SectionSample /home/kali/./phoneinfoga

Just like most programs, we can curate our output by including flags. If we want the output of the yara engine to contain the location of the matched string, we will use the -s flag when running our rule file.

The output is shown below.

┌──(dyrstiu㉿kali)-[~/Documents/Demo]
└─$ yara -s sample.yar ~/.                   
SectionSample /home/kali/./stegsolve.jar
0x363b6:$text_sample: Readers
0x37ba5:$text_sample: Readers
SectionSample /home/kali/./phoneinfoga
0x5a8b1f:$text_sample: readers
0x5abcff:$text_sample: readerS
0x5c33e9:$text_sample: ReaderS
0xaee50e:$text_sample: readers
0xaf3fea:$text_sample: readers
0xb250c7:$text_sample: readers
0xb284b8:$text_sample: readers
0xb36369:$text_sample: readers
0xb36ea1:$text_sample: readers
0xbd73d6:$text_sample: readers
0xdc0741:$text_sample: ReaderS
0xdca7a9:$text_sample: ReaderS

To know about other flags you can include when running your rule file, run yara --help.

YARA has modules we can use to improve our rules. The modules include pe, cuckoo, and elf among others. We can also write our modules to help improve our rules. To include a module in our rule, we use import followed by the module name in double quotes.

It is possible to include another . yar file when developing a new rule file. This is made possible by using include followed by the file path of the . yar file you want to include. To reduce cases of generating false positives, you can add functions in the condition, e.g. unit16be(0)

Where you have the sample malware file, you might decide to auto-generate YARA using available tools such as yaraGen. yaraGen is a Python tool developed by Florian Roth that aids in generating YARA rules when provided with the malware sample.

To install yarGen check this GitHub page for the instructions. An important point, before using yarGen, make sure you update its database to get accurate rule generation. After we have generated a rule file, you can tweak it to your preference.

Another interesting tool to look at is Valhalla. This is a YARA rule repository where you can find a YARA rule for common malware. You can access Valhalla here.

You can also visit this GitHub page to check out more YARA rules. Python also has a YARA library. We can use YARA from our Python scripts. To import this library simply: import yara.

Conclusion

In this tutorial, we learned what is YARA and how to install it. We also created a sample of a Yara rule that checks if a string is provided and then used tools to improve the YARA rule generation.

Happy coding!


Peer Review Contributions by: Jethro Magaji