From time to time, I find myself having to rail against the misuse of VirusTotal’s service as a sort of surrogate AV product test. But I don't think I've done it here before, so here I go.
VirusTotal subjects the files people submit to a battery of command-line scanners, as a means of finding out whether or not it's malware. That’s fine for a rough assessment of the likelihood that a submitted file is malicious, but it isn't a detection test. On one hand, the fact that a scanner doesn't identify a file as malware does not mean it isn't malicious, of course, and we shouldn't forget that. On the other hand, if a file is identified as malicious by one or more scanners but not by others, it doesn't necessarily mean that the scanners that don’t flag it as malicious are incompetent.
- Obviously, sometimes a file is misdiagnosed as malicious (a false positive) and sometimes that false positive may ‘cascade’ through more than one lab because of insufficient checking when samples are shared.
- VirusTotal doesn’t exercise the whole functionality of an anti-virus scanner, still less that of a security suite: it uses command-line program versions, rather than the interactive and/or real-time scanning functionality that even a simple commercial scanner has.
- Generally, command-line scanners simply look at the code passively: unless they run the code in a safe environment (emulation or sandboxing) to see what it really does, they may not recognize some malcode that their on-access components would have recognized.
In other words, VirusTotal doesn’t tell you whether a product is capable of detecting a malicious file. At best, it tells you whether it’s capable of detecting it using that particular program module and configuration.
In a paper I co-authored with VirusTotal’s Julio Canto, we pointed out that: “VirusTotal uses a group of very heterogeneous engines. AV products may implement roughly equivalent functionality in enormously different ways, and VT doesn’t exercise all the layers of functionality that may be present in a modern security product.”
Imperva, a company selling an alternative security technology recently used VT to ‘prove’ that anti-virus products are incapable of detecting new malware by submitting 82 samples to VT. According to the company – or rather according to articles by journalists to whom they’d shown the report – none of the samples were initially flagged as malicious by any of the engines used by VT, and some scanners didn’t recognize them until four weeks later. However, that 'proof' is based on a number of fallacies (and in any case, 82 unverifiable samples looks pretty puny in the face of the tens - even hundreds - of thousands of samples an AV lab processes daily).
After my colleague Righard Zwienenberg made some of the points above (and quite a few others) in a blog article in response to that report, the Dutch-language web page security.nl published some counterclaims by Imperva (if I read it correctly – my knowledge of the Dutch language is mostly limited to oddments of Afrikaans slang):
- The report doesn’t compare scanners. Well, no: unlike many other ‘tests’ that rely on VT statistics to assess performances, the idea seems to be to deny the usefulness of any AV product (except maybe free AV). However, that’s not the point, which is that Imperva’s claim to assess “The average detection rate of a newly created virus” and to “monitor how quickly the vendors incorporated detection capabilities into their products” is invalidated, since submission of a sample to VT doesn’t exercise the whole functionality of the products used. All it determined was whether products incorporated detection over time into their command-line products. Most AV users today are largely reliant on their anti-malware's on-access scanning, rather than frequent passive scans of the whole disk.
- The intention was to ‘determine the effectiveness of scanners with a random collection of malware’. This misses the point entirely: competent AV testing isn’t about random samples. On the contrary, samples have to be verified and correctly classified. (And scanners have to be configured accordingly in order to provide an accurate test.) VirusTotal didn’t do this (which is fine, because VT itself is very definite about the fact that its service is not suitable for product testing. Whether it’s a single product or all AV products "under test" is beside the point.
- Anti-virus companies use VirusTotal in their own blog posts to indicate how many manufacturers detect specific malware. And yes, actually, we do from time to time, normally as a rough guide to whether the industry as a whole is aware of a very new threat. And perhaps we should make it clearer when we do so that it is only a (very) rough guide, and not a poorly conceived or deliberately misleading attempt to compare the performance of other products to our own.
As it happens, the number of blogs in which I’ve referred to VirusTotal reports in the context of specific malware is far, far fewer than the number of blogs (not to mention papers and articles) in which I’ve expressed my concern at the misuse of VT as a quasi-test. But I'll certainly make sure in future that when I refer to VT reports, I make it clear that the report is not an authoritative guide to whether individual products offer actual protection against the malware under consideration.
Actually, there’s a critical point here. It’s perfectly possible for a product to provide protection against malware for which it doesn’t actually have specific detection. I’m not referring here to generic and heuristic detections here (effective though they sometimes are) but to multi-layering. Commercial anti-malware vendors would rather sell you a security suite than a ‘pure’ anti-virus scanner. Not because they get more money for them – well, not only because they get more money ;-) – but because a security suite offers protection on several levels. That doesn’t mean a security suite offers 100% protection against all threats, or even all malware, but it does mean that a security suite offers better protection than a ‘simple’ scanner (trust me, there’s nothing simple about a good AV scanner...) But VT only exercises a small part of that protective functionality.
Here are a couple of good articles by Paul Ducklin and Didier Stevens plus yet another from Prevx to which the VirusTotal site drew my attention. Kurt Wismer also posted a characteristically sensible blog on the topic a while back.
David Harley CITP FBCS CISSP
ESET Senior Research Fellow