Author |
Message |
TGRMN Software Site Admin
Joined: 10 Jan 2005 Posts: 8758
|
Posted: Tue Apr 14, 2015 9:09 am Post subject: |
|
|
Thanks, this is very useful. So we can exclude the tracking db as a cause of this issue.
The way ViceVersa works is this:
Firstly (A) it scans source and targets and then (B) deletes / copied files. Just before deletion it rechecks that the attributes / timestamps of the file to be deleted have not changed. That's to make sure that the file is still the same file.
Now it looks like under one account (A) and (B) file attributes and/or timestamps are reported by the underlying system as different, so ViceVersa skips the file, but when running under your own admin account, all is fine and (A) = (B), so the file is deleted! _________________ --
TGRMN Software Support
http://www.tgrmn.com
http://www.compareandmerge.com |
|
Back to top |
|
|
iRalph
Joined: 20 Mar 2015 Posts: 12
|
Posted: Mon Apr 20, 2015 6:55 pm Post subject: |
|
|
Hi,
So what are the next steps ? Can you provide a debug build to see what prevents the target files from being deleted (what parts of the code generates the "[File changed] [SKIPPED]" error message) ?
Since my last reply on this topic, I've checked also other jobs which replicate files to other systems (without using a tracking database) and even when the user account has full access on the target folders and files (when service account is not a CIFS admin), the error message shows up ! This happens when the job runs automatically in the background with VV Scheduler or when the job runs in interactive mode with the same account ! As soon as the service account is added as administrator on the target CIFS server, orphan file deletion works as expected (thus without changing the NTFS security, the local Administrators group is not part of the ACL) ! So there must be a file access difference between a normal account and an administrative account...
For security reasons, we can't run replication jobs with admin accounts, so we need to find another solution/workaround for this issue.
There is one more "weird" behaviour: albeit most of the orphan files can't be deleted on the target server, a very few of them are deleted ! So when I run the same job again and again, the number of orphan target files drops over time, but very slowly ! Usually, one of the first orphan file and one of the last orphan files of the list are deleted, but not always ! After running the same job a dozen times, no orphan files were deleted anymore, and there were more than 50 orphan files left. It's really a crazy random-like behaviour.
I was also wondering if it would somehow be related to SMB2 file share caching, as implemented by default on Windows Server 2008 and above. So I've done the exact same test after disabling the DirectoryCache and the FileInfoCache (as explained here: https://technet.microsoft.com/en-us/library/ff686200(v=ws.10).aspx), but the results are still the same... so the SMB2 file share caching can be ruled out.
Although I don't think it's related, maybe it has something to do with the UNC file/path length. Is it really supported to use this kind of UNC path: \\?\UNC\cifssrv1\share\folder ? When using this syntax, what is the maximum supported file/path length ?
|
|
Back to top |
|
|
TGRMN Software Site Admin
Joined: 10 Jan 2005 Posts: 8758
|
Posted: Tue Apr 21, 2015 10:43 am Post subject: |
|
|
\\?\UNC\cifssrv1\share\folder should support very long paths, are you seeing this error only for long paths?
If you remove the \\?\ notation, does deletion work properly?
thanks _________________ --
TGRMN Software Support
http://www.tgrmn.com
http://www.compareandmerge.com |
|
Back to top |
|
|
iRalph
Joined: 20 Mar 2015 Posts: 12
|
Posted: Tue Apr 21, 2015 12:34 pm Post subject: |
|
|
Hi TGRMN Software,
Quote: |
If you remove the \\?\ notation, does deletion work properly?
|
Just did the test and no, the output is the same without the leading \\?\ notation. But again, only 2 files (out of 38) were deleted, the rest has been skipped with the "Skipped. [File changed]" error messages.
Running it a 2nd time, 2 other orphan files were deleted (out of 36)
3rd time: 2 other orphan files were deleted (out of 34)
4th time: 2 files deleted (out of 32)
5th time: no files deleted.
6th time: 1 file deleted (out of 30)
7th time: 1 file deleted (out of 29)
8th time: 1 file deleted (out of 28)
9th time: no files deleted
As you can see, the results are not consistent, there seem to be a random factor somewhere...
The security ACL do not change on destination, service account has "change access" to all files and folders.
I've also checked the Kerberos ticket size (because we have some accounts which belong to a long list of domain local/global groups), but it was ok, the service account Kerberos ticket is "only" 7K large, so it hasn't something to do with some Kerberos ticket buffer caching in your application.
Weird, weird, weird, it's driving me crazy... |
|
Back to top |
|
|
TGRMN Software Site Admin
Joined: 10 Jan 2005 Posts: 8758
|
Posted: Wed Apr 22, 2015 1:08 am Post subject: |
|
|
I will make a version to log extra info on why the files are skipped, could you write to support with your e-mail address? thank you _________________ --
TGRMN Software Support
http://www.tgrmn.com
http://www.compareandmerge.com |
|
Back to top |
|
|
iRalph
Joined: 20 Mar 2015 Posts: 12
|
Posted: Wed Apr 22, 2015 5:16 pm Post subject: |
|
|
Just sent the e-mail with my address inside...
Thanks |
|
Back to top |
|
|
iRalph
Joined: 20 Mar 2015 Posts: 12
|
Posted: Wed Jun 10, 2015 4:38 pm Post subject: Debug version wanted |
|
|
Hi TGRMN Software support,
I'm still waiting for the special VV version which should log extra info when skipping files...
Does it take so long to build such "debug" version ??
Almost 3 months since I opened the inquiry and the problem is getting worse on our side (more and more files are not properly deleted on target), so we really need to address this issue.
Thanks... |
|
Back to top |
|
|
TGRMN Software Site Admin
Joined: 10 Jan 2005 Posts: 8758
|
|
Back to top |
|
|
iRalph
Joined: 20 Mar 2015 Posts: 12
|
Posted: Mon Jun 15, 2015 11:03 am Post subject: |
|
|
Hello TGRMN Software support,
Great, I've downloaded the special debug version and ran one of the problematic job...
For each orphan file, the log says:
2015-06-15 12:52:38 : Deleting file \\?\UNC\cifs-srv1\share1\folder1\folder2\HP-Service-Pack-Proliant\2013.02\Extracted\hp\swpackages\kmod-mpt2sas-PAE-13.10.02.00-2.rhel5u8.i686.rpm [File changed. File attributes changed from {32} to {544}] [SKIPPED].
I let you interpret the extra logging, apparently a file attribute has changed between the comparaison pass and the delete pass. Wondering which attribute causes the issue... |
|
Back to top |
|
|
TGRMN Software Site Admin
Joined: 10 Jan 2005 Posts: 8758
|
Posted: Tue Jun 16, 2015 12:24 am Post subject: |
|
|
Thanks, apparently just before being deleted, the file is reported as having the file attribute FILE_ATTRIBUTE_SPARSE_FILE. _________________ --
TGRMN Software Support
http://www.tgrmn.com
http://www.compareandmerge.com |
|
Back to top |
|
|
TGRMN Software Site Admin
Joined: 10 Jan 2005 Posts: 8758
|
|
Back to top |
|
|
iRalph
Joined: 20 Mar 2015 Posts: 12
|
Posted: Tue Jun 16, 2015 10:34 am Post subject: |
|
|
Hi TGRMN Software support,
Finally something to "eat"
I wonder why the file would be presented with a different "sparse" attribute when scanning first and then just before deleting it, this is very strange. Logically, if you use the same system call to check the attributes, it should return the same value, unless there is somewhere a caching effect. Our EMC VNX CIFS servers do have an embedded file cache which could potentially change the way files are presented to the client. But as long as the file itself is not read, file cache shouldn't be involved. When scanning/comparing the files, I assume you read the file attributes only (e.g. the meta data) and not the file itself.
I'll ask our EMC support if they know something about this sparse attribute and how it could change over time on the same file, even without accessing it, but just when reading the meta data.
If it ends up that the VNX systems work "as designed" and we can't change this sparse attribute "on-the-fly" change, would it be possible to modify the ViceVersa code so that it doesn't care if this specific attribute changes, maybe providing an option to dismiss this sparse file attribute change ? |
|
Back to top |
|
|
iRalph
Joined: 20 Mar 2015 Posts: 12
|
Posted: Tue Jun 16, 2015 1:41 pm Post subject: |
|
|
But hold on, why would the sparse file attributes change for a user with "change" access and not change for a user with "full" access ??
It's either a bug in the ViceVersa code, or a bug in the VNX Operating Environment...
Do you really use the same code and the same variable length/type to store the attribute list during the compare phase and the delete phase ? |
|
Back to top |
|
|
TGRMN Software Site Admin
Joined: 10 Jan 2005 Posts: 8758
|
Posted: Tue Jun 16, 2015 10:48 pm Post subject: |
|
|
Hi, I suspect it's an issue with the VNX Operating Environment. ViceVersa uses exactly the same API and just before deleting it tries to make sure the file has not changed (name, size, attributes, timestamp). Also the fact that the sparse attribute is randomly reported is suspicious, depending on user and/or time of day.... we can have this changed is the code, I will send a new version that ignores file attribute changes just before deletion, let's see if that is the only issue. thanks _________________ --
TGRMN Software Support
http://www.tgrmn.com
http://www.compareandmerge.com |
|
Back to top |
|
|
iRalph
Joined: 20 Mar 2015 Posts: 12
|
Posted: Fri Jun 19, 2015 11:11 am Post subject: |
|
|
Hello TGRMN Software support,
I did some more in-depth testing, here the results...
I could reproduce the problem ViceVersa is facing when scanning files first and trying to delete them during a second pass. The SparseFile attribute indeed changes over time. I suspected the Windows client redirector cache to be the culprit, but it doesn't seem to be the root cause. According Microsoft, the FileInfoCacheLifetime has a default value of 10 seconds. With PowerShell, I've tested the SparseFile attribute several times on the same folder (and not a specific file, this is important). The first pass always shows the SparseFile attribute set, example:
PS C:\> Get-ChildItem -path "\\cifs-srv1\share\folder\x64\v8.70" | select name,attributes
Name Attributes
---- ----------
cp012156.exe Archive, SparseFile
cp012639.exe Archive, SparseFile
cp012817.exe Archive, SparseFile
When I run the same command 2 to 5 seconds later, I get the same result.
If I wait 5-10 seconds more, the result is different:
Name Attributes
---- ----------
cp012156.exe Archive
cp012639.exe Archive
cp012817.exe Archive
As you can see, the SparseFile attribute is gone ! It stays like this if I repeatedly check the file attributes, like every 5-10 seconds.
However, if I wait 35 seconds before testing the attributes one more time, the SparseFile attribute is back again !
Name Attributes
---- ----------
cp012156.exe Archive, SparseFile
cp012639.exe Archive, SparseFile
cp012817.exe Archive, SparseFile
For testing purposes, I decided to bypass the Windows client redirector cache, for this I had to define three registry entries, as described here https://technet.microsoft.com/en-us/library/ff686200(v=ws.10).aspx
Key: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Lanmanworkstation\Parameters
Value1: FileInfoCacheLifetime [DWORD] = 0
Value2: DirectoryCacheLifetime [DWORD] = 0
Value3: FileNotFoundCacheLifetime [DWORD] = 0
I guess the first value would be enough, but to make sure I've also disabled the Directory and the FileNotFound caches. After a reboot of the ViceVersa/VVengine server, the disabled cache didn't bring any benefit, the SparseFile attribute is shown first, then 10 seconds later, it's gone. And again, waiting 35 seconds more, the attribute is back again...
Then I decided to "play" with the SMB protocol version. As our VVengine/ViceVersa server runs Windows 2008 R2, the supported SMB protocols are SMB1 and SMB 2 (SMB2.1). On Windows 2003 (shortly out of support...) which supports only SMB1, I've tested the same PowerShell command on the same fileshare/folder: the SparseFile attribute was always shown, it didn't change like on Windows 2008 R2 !
So I decided to disable the SMB2 protocol on Windows 2008 R2 (https://support.microsoft.com/en-us/kb/2696547), rebooted the server of course, and guess what ? The SparseFile attribute was always shown correctly !!!
I continued my tests with Windows 2012 R2 which supports SMB1, SMB2 (2.1), SMB3 (3.02). When using the SMB3 protocol (3.02), the SparseFile attribute is [u]never[/v] shown on online files. However, it is always shown on offline files. With SMB3 on Windows 2012 R2, the attribute list is consistently the same, no matter how long I wait before running successive PowerShell commands. Interstingly, after disabling the SMB3 and SMB2 protocols on Windows 2012 R2, thus using the SMB1 protocol, the files still miss their SparseFile attribute. So Windows 2012 R2 does not get the SparseFile attribute (unless the file is offline), no matter which SMB protocol is used.
So the issue with this bloody SparseFile attribute seems to be linked to the SMB2 (SMB2.1) protocol, either on the client side (Windows 2008 R2) or on the server side (EMC VNX2 appliances).
I've opened a support ticket @EMC to clear this out, if the problem comes from the VNX Operating Environment, or from Windows 2008 R2. Needless to say that I fully patched the Windows 2008 R2 with WSUS critical and important updates (June 2015) before running the above tests.
ViceVersa is not in cause when it comes to file attributes handling as I could reproduce the issue with PowerShell.
However, it would be good to include a ViceVera option to discard any SparseFile attribute change between the inventory phase and the execution phase, so that orphan target files are properly deleted.
As I am in contact with you by e-mail, I know that you're working on a new build which should include this option (or dismiss this attribute change) in the file comparison engine.
|
|
Back to top |
|
|
|
|
You can post new topics in this forum You can reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © phpBB Group
|