Tuesday 16 December 2014

Understanding DFS replication processes and what to do if it stops working

First of all, a quick introduction to Microsoft's DFSR - DFSR was introduced in Windows 2003 R2 and is the mechanism used to replicate files between servers.  This is especially useful when you are using DFS namespaces to publish file shares as you want all possible targets in DFS to have the same files on them.  If you're now lost, you should probably go and read up on DFS concepts on Microsoft's sites.

Now on to the nuts and bolts of how the DFS Replication service works.

The DFS Replication service maintains a database of filenames, paths and hashes in the system volume information\DFSR folder.  It also holds a copy of the database in memory when it's running.  There is only a single database *per drive letter*.  Do not mess with this!

DFSRPrivate is a symlink in each folder configured for replication which points to the DFSR database.
When the first member is added to a replication group, it’s designated primary and builds it’s database of file hashes.
During the build process it is marked as primary (check with dfsradmin membership list /RgName:xxxxx /Attr:MemName,RfName,IsPrimary where xxxxx is your replication group name - use the full path including the fqdn if present)
Once the database build is complete, the primary flag goes away and an Event 4112 is logged in the event log.  Also the replication state changes to 4.

e.g. (where dfsns is the namespace)

Before:
D:\>dfsradmin membership list /RgName:domain.local\dfsns\RepGroup1 /Attr:MemName,RfName,IsPrimary
MemName  RfName   IsPrimary
SERVER1 FOLDER1 Yes
D:\>Wmic /namespace:\\root\microsoftdfs path dfsrreplicatedfolderinfo get replicationgroupname,replicatedfoldername,state | find /I "FOLDER1"
FOLDER1      domain.local\dfsns\RepGroup1   0

After:
D:\>dfsradmin membership list /RgName:domain.local\dfsns\RepGroup1 /Attr:MemName,RfName,IsPrimary
MemName  RfName   IsPrimary
SERVER1  FOLDER1  No
D:\>Wmic /namespace:\\root\microsoftdfs path dfsrreplicatedfolderinfo get replicationgroupname,replicatedfoldername,state | find /I "FOLDER1"
FOLDER1  domain.local\dfsns\RepGroup1   4

If another server is introduced to the replication group while this process is happening, bad things ™ happen, so let it complete!  This can take several hours on very large servers.

The database is built with something called fence value assigned in the database against each file.  This is used in the event of conflicts (e.g. the same file being found in the same place on another server)

All files on the primary member are assigned the “Initial Primary” fence value.  This guarantees that this server is considered the authoritative server during  initial replication.  If another server is introduced before the initial database build is complete, that second server considers itself primary too, so it will assign the same fence value, hence conflicts and the bad things ™.

Once the database has been built, a second member can be introduced.  This will start building its database and assign the Initial Sync fence value to all files it finds (assuming there are existing files).  The second server will then compare the database with the first server and start copying over any missing or different files.

•    If a file doesn’t exist on the secondary member, it will just be copied over and it will move to the next file
•    If a file exists already, then the fence values are compared for conflict resolution.
   o    The higher fence value wins and overwrites the lower.  RDC (remote differential compression) is used to check the files for differences (comparing the file hashes) and changed blocks are copied to the second if required (usually nothing will be copied because the files are usually the same e.g. preseeded
   o    If they have the same fence value, the bad things ™ now occur as there is a conflict.  Conflict resolution is invoked and uses first create time, last modified time to decide which file should win. Conflicting files are moved to the DrfsPrivate\ConflictedandDeleted folder.
This means that the live data on the primary server is moved out of live shares. It will stay in the ConflictedandDeleted folder until it runs out of quota at which time it will be flushed.
This isn’t necessarily the end of the world because the file is still on the second server and *should* eventually be replicated back, but if a User is looking for their file before this happens, they will not see it.

More information on the sequencing can be found on the MS page here:

As indicated below, the Initial Primary fence (2) is higher than the initial sync value (1) which are the two values assigned during the initial setup of replication.

0 Unfence This file or folder will lose all conflicts.
1 Initial Sync Initial fence value for non-primary member.
2 Initial Primary Initial fence value for primary member.
3 Default Default fencing value.
4 Fence Fence with current time stamp.

How to hide unknown devices in Dell OpenManage Essentials (or IT Assistant)

When discovering a large number of devices, usually you'll pick up a few non-Dell bits of hardware, such as Virtual Machines or network switches.

Luckily Dell have an option to avoid cluttering up OME with these, although of course they have put it in a wierd place.

Instead of being with all the main configuration options under preferences, it's actually under the discovery schedule settings in Manage / Discovery and Inventory / Discovery Schedule