VCM3: Duplicated files on virus collections
Duplicated files on virus collections
by Germano
I'm sure you got at least one or some hundreds dupes, during your trader life. That is pretty boring.
What's a dupe? The dupes are one or more COPY of a FILE. In other words these files are completely identical. In my opinion there are no reason to keep identical files into a vx collection. They are unuseful and they involved you to lose trades :-(
I recommend to cancel all dupes stored in your collection.
Here, the major F.A.Q. about dupes.
Q: What is a dupe?
A: How i said, a dupe is a identical COPY of a FILE, you already have in your collection.
They may coexists in separate folders of your directory structure or in the same folder using different names.
Q: In my collection there are many files infected by a unique virus (i.e.: i have 10 samples of Jerusalem.a), they are dupes?
A: No they don't. They are "different files infected by a unique virus". You can keep these samples.
Q: During trades, i get dupes very often, why?
A: There are many reasons. For example, the usage of AVP with different parameters could generate different logs.
The /redundant option able KAV to recognize more infected files; some "warning" could be recognized as "not-a-virus" or so.
During the comparation you thinking get new samples, but they are only samples recognized as different names.
Also checking old logs involve you to get dupes, that's because some vx names could be changed.
There is also an issue in F-prot: some files are recognized infected by a different vx simply changing the extension.
Q: How i can find dupes in my collection?
A: A valid solution is the usage of VIRWEED. VIRWEED makes a CRC32 database. A CRC32 is a hexadecimal number of 8 characters that identifies a file.
In theory a file has an unique CRC32. VIRWEED compare all CRC32 finding dupes, so this command...
VIRWEED -CIL C:\YOURVX
makes a CRC32 database named VIRWEED.CRC and makes a file named DUPES.LOG the list of all dupes found, then...
VIRWEED -CD C:\YOURVX
to delete all dupes stored in your collection automatically
Q: I'm tired getting dupes, how i do?
A: I higly recommend to rename all files to their CRC32. To do that you may use RENFILES, so...
RENFILES -RC C:\YOURVX
all files will be renamed automatically.
If one or more dupes are stored into a same folder RENFILES encounters a conflict. You know files with same name can't stay into a same folder.
In this case RENFILES solves the conflict renaming the dupes as XXXXXXXX_ Collision_X (i.e.: FEDCBA98.EXE, FEDCBA98_Collision_1.EXE, FEDCBA98_Collision_2.EXE etc.).
All files reporting the Collision suffix can be deleted.
Q: It's possible avoid dupes during a trade?
A: Yes, but the LOG you are looking for must be CRC32 format. That's the main reason i recommend the CRC32 renaming.
So make first a CRC32 database of your collection (the file VIRWEED.CRC), then...
VS2000 -FD C:\VIRWEED.CRC C:\OTHERTRADER.LOG
the second log (OTHERTRADER.LOG) will be cropped from all CRC32 contained in VIRWEED.CRC (your database); now...
VS2000 -C OTHERTRADER.LOG
and you check vx only for CRC32 you miss.
Q: I don't understand
A: me too :-p
Note by VirusBuster:
Other really recommended file weeder (software to remove duplicate files) is FAST! File Weeder by Bumblebee, included in VS2000 package.