Author |
Message |
mbarnstijn
Joined: 18 Nov 2018 Posts: 9
|
Posted: Sun Nov 18, 2018 1:13 pm Post subject: CRC verify after copy: why does VV calculate CRC on source? |
|
|
The target file has to be read back to verify that it was written properly.
But why not calculate the CRC as VV reads the source file during the copy, rather than read the source file again during the verify process?
I would consider this an oversight at best, or a bug at worst. During multiple very large (30GB+) file copies in a Replicate or Backup operation, where the source AND target are both on NAS devices, re-reading the source file seems to be a huge waste of time. That CRC could have been calculated on-the-fly as the source file was being read and copied to the target.
If you really MUST read the source file again, could you not try to read BOTH files at the same time, to make the best use of network bandwidth? Reading first one file and then the other is also quite inefficient. It should be relatively straightforward to determine whether source or target are slow access, and take steps to read both together when both are slow.
Naturally, all this is just my opinion. I'm sure the developers had good reason to do things their way. I just disagree! I'm watching a 40TB Replicate on track to taking 20 days across a aggregated 4 Gigabit network link. 30MB/s is a pretty slow VV transfer average when a normal windows copy of a large 30GB+ file runs 90MB/s to the same NAS devices from the same computer.
Cheers,
--michael |
|
Back to top |
|
|
TGRMN Software Site Admin
Joined: 10 Jan 2005 Posts: 8763
|
Posted: Mon Nov 19, 2018 12:50 am Post subject: |
|
|
Hi, I do not think source CRC is recalculated after copy during the verification process, what version of ViceVersa PRO are you using? thanks _________________ --
TGRMN Software Support
http://www.tgrmn.com
http://www.compareandmerge.com |
|
Back to top |
|
|
mbarnstijn
Joined: 18 Nov 2018 Posts: 9
|
Posted: Mon Nov 19, 2018 11:53 am Post subject: |
|
|
I am running VV Pro 3001 on my Windows 7 x64 workstation.
I can monitor network and file access on both NAS devices. Yes, I do have full control of the network, and yes, there are no other devices accessing the NAS. I can see all network activity on each of the NAS's network monitor pages, and it's only my own workstation accessing those NAS at all times.
I can see VV Pro status as well.
During the copy of large files (20GB+), both NAS show ~30MB/s network usage, and show one open file each with the same name as the VV Pro status shows.
During the CRC check after the copy, first the source NAS shows ~30MB/s network usage with that same open file, and the target NAS shows no network activity and no open files. [Edit: I think it's not open, have to check again. It takes hours to get back to this point during many file copies!] Then the source NAS shows no network activity [edit: but the file remains open] while the target NAS shows ~30MB/s network usage and that same open file.
So, yes, VV Pro is reading the entire source file a second time to calculate the CRC for the CRC verification step. That's the only possible interpretation of the network and file activity, don't you think?
Cheers,
--michael |
|
Back to top |
|
|
mbarnstijn
Joined: 18 Nov 2018 Posts: 9
|
Posted: Mon Nov 19, 2018 3:00 pm Post subject: |
|
|
Note: I made edits to the above post to correct a mistake, and to remove one assumption I made without waiting to verify it. |
|
Back to top |
|
|
mbarnstijn
Joined: 18 Nov 2018 Posts: 9
|
Posted: Mon Nov 19, 2018 5:06 pm Post subject: |
|
|
Hmmm. The profile was set to use Windows File Copy rather than VV's internal copy method. That might affect the CRC calculation step, I thought, so I unchecked that box and started again. The copy speed increased from an average of 30MB/s to 45 MB/s. I do have the profile set to NOT use internal memory for copying, because the files are so large -- from 10GB to 70GB -- and also the profile copies directly, not to a temporary file.
However, the CRC check after copy behaviour remains essentially the same.
I've confirmed by sitting through an entire copy of 45GB ISO image file.
After the copy is complete, and VV is showing that a CRC check is happening, the source NAS shows the file open in read mode and its network connection is active at 85MB/s. The target NAS has the file open (in RW mode the entire time that source is being read!) and zero network activity.
Then the network activity on the source drops to zero (but the file remains open in read only mode) and the network activity on the target jumps up to 85MB/s and the file is open (still in read write mode!).
So I still contend that the source file CRC is calculated AFTER the copy by reading the entire source file again. If you didn't think that was supposed to happen, then I suggest this might be a "feature" worth removing!
I have another question, perhaps better asked in feature requests: why not copy more than one file at a time? On systems with multiple network ports, if aggregated, this could greatly speed up file transfer rates. In my case, both NAS devices have multiple aggregated gigabit network ports, and my workstation has 4 aggregated gigabit ports. It should be possible to saturate all 4 ports for the file copies. Even just an option to run two file copies at a time would give every user with more than one network port an immediate doubling in large file copy speed.
Thank you for taking the time to read this detailed post, and putting up with my earlier inaccuracies.
Cheers,
--michael |
|
Back to top |
|
|
TGRMN Software Site Admin
Joined: 10 Jan 2005 Posts: 8763
|
Posted: Tue Nov 20, 2018 12:05 am Post subject: |
|
|
Hi,
thanks for detailed post, re the CRC calculation, I will check the source code and let you know. Re copying multiple files at the same time, it is something we will be adding, for now it is possible to run multiple profiles at the same time if needed...
thanks _________________ --
TGRMN Software Support
http://www.tgrmn.com
http://www.compareandmerge.com |
|
Back to top |
|
|
mbarnstijn
Joined: 18 Nov 2018 Posts: 9
|
Posted: Tue Nov 20, 2018 1:20 am Post subject: |
|
|
Sure, I could run multiple profiles, but that's a bit of extra work here... I'm Replicating a single directory containing about 1000 folders, each with a few metadata files as well as one large ISO backup file of blu-ray media that I own.
To run multiple profiles for this action, I would have to manually add half the folders to one profile and half to the other. A bit of a pain, don't you think?
Can I donate money to the coding team so that the multiple file copy feature gets added sooner? Only half kidding. Time is money. This might be worth it to me! |
|
Back to top |
|
|
mbarnstijn
Joined: 18 Nov 2018 Posts: 9
|
Posted: Sat Nov 24, 2018 10:46 am Post subject: |
|
|
If and when you have a beta version of VV Pro to test with a "correction" that avoids re-reading the source file for CRC calculation, I would be very happy to test that for you! Either post here or email me; I'll get back to you the same day.
Cheers,
--michael |
|
Back to top |
|
|
mbarnstijn
Joined: 18 Nov 2018 Posts: 9
|
Posted: Mon Nov 26, 2018 1:04 am Post subject: |
|
|
While you're at it, if you don't already, how about reading source and target files simultaneously if the user requests CRC comparison check while building the initaial source/target file lists? (I.e. do it multithreaded.) |
|
Back to top |
|
|
TGRMN Software Site Admin
Joined: 10 Jan 2005 Posts: 8763
|
|
Back to top |
|
|
TGRMN Software Site Admin
Joined: 10 Jan 2005 Posts: 8763
|
Posted: Tue Nov 27, 2018 3:08 am Post subject: |
|
|
You are correct, I confirm that both source and target CRC are re-calculated after the copy. This is something that needs to be investigated... during the file copy the origin CRC could be stored while reading the file contents and then compared vs the destination which instead would need to be re-calculated to make sure the copy was OK. _________________ --
TGRMN Software Support
http://www.tgrmn.com
http://www.compareandmerge.com |
|
Back to top |
|
|
|