How to perform file searches
VirusTotal Intelligence allows you to search through our dataset in order to identify files that match certain criteria (hash, antivirus detections, metadata, submission file names, file format structural properties, file size, etc.). We could say that it is pretty much like the "Google" of malware.
In order to ease the use of the application we have classified the search queries and modifiers into the following categories:
Retrieving files by hash
Identifying files according to antivirus detections
Content search (VTGrep)
File similarity search
Retrieving files by hash
To search for a file that has a given md5, sha1 or sha256 just type in the hash under consideration in the main search box. Example, searching for the file with sha256: 142b638c6a60b60c7f9928da4fb85a5a8e1422a9ffdc9ee49e17e56ccca9cf6e.
If you have a list of hashes (e.g. exported from some other application), with independence of the type of hash (md5, sha1 or sha256) and whether they are mixed, and you want to search for all of them at the same time you should refer to the search box feature at the main landing site. You just have to paste your hashes and press enter.
All of the previous search modalities and search modifiers can be combined through the use of search operators. The query language supports some Boolean operators as well as parentheses for grouping parts of the query together. The supported Boolean operators are AND, OR, and NOT.
By default, when you create queries that match multiple fields at the same time, each value is combined with a Boolean AND. For the query as a whole to match, all the specified values must match.
AND Boolean operator
To search for all those PDFs that have and invalid XREF table have two options. We can just ignore the use of any boolean operator since by default search modifiers are concatenated via ANDs:
Another option is to explicitely introduce the AND operator:
type:pdf AND tag:invalid-xref
OR Boolean operator
We might be interested in retrieving all files that are either DLLs or executables, the OR operator can help us with this task:
type:pedll OR type:peexe
NOT Boolean operator
Just as an example, let us use the NOT boolean operator to find all those Portable Executables identified by at least one antivirus vendor with the family name "zbot" and not being tagged as corrupt (i.e. are malformed and will not execute in a real system):
engines:zbot NOT tag:corrupt
Parentheses for grouping parts
More complex queries can be built via the use of parenthesis. Let us extend the previous query also to identify other banking malware variants:
(engines:zbot OR engines:sinowal) NOT (tag:corrupt)
Example use cases
This section details some common searches users have asked for in the past, they serve just as examples to illustrate how all of the info provided in the previous sections glues together.
Studying malicious PDFs
In order to try to extract a study base of malicious PDFs from VirusTotal the first idea that comes to our minds is to do something as simple as:
But this is not the only thing you can do. Very often PDFs with exploits will have an invalid XREF table, hence, it also makes sense to do something along the lines of:
type:pdf tag:autoaction tag:js-embedded
Even easier, there is a specific tag for exploits (whenever we have enough indications is or contains an exploit), so let us just make use of it:
Retrieving exploit samples
The last example of the previous study case ended with the simplest form of searching for exploits:
Even more interesting is the fact that you can search for samples tagged with specific Common Vulnerability and Exposure (CVE) identifiers:
Identifying potential false positives
A false positive is another way of saying "mistake". As applied to the field of antivirus programs, a false positive occurs when an antivirus program mistakenly flags an innocent file as being malicious. This may seem harmless enough, but false positives can be a real nuisance.
VirusTotal can be used to identify potential false positives. For example, very often signed Portable Executables with a low number of detections will be a false positive (although not always). Let us just suppose we are interested in retrieving false positives by Symantec, we can do something along the lines of:
tag:signed positives:3- symantec:infected
These are most probably files that we want to check twice to verify that they are indeed malicious.
Solitary detections can also be sometimes potential false positives (although not always):
We surely also want to look at all those files that are either in the National Software Reference Library or an online reputed software collection and are being detected:
(tag:nsrl OR tag:software-collection) AND symantec:infected AND positives:10-
The number of unique sources that sent a given file to VirusTotal can also be a good indicator of whether a given file is innocuous. Very often, the files that are extremely widespread are benign in nature (although not always):
If you are interested in receiving further information regarding some study case of your own please do not hesitate to contact us.