User talk:BenjaminCPierce

There We Go
Sorry, but alot of the formatting was lost. This would be a perfect fit at Wikibooks, by the way. karmafist 04:27, 16 December 2005 (UTC)

General Questions

What are the differences between Unison and rsync?

Rsync is a mirroring tool; Unison is a synchronizer. That is, rsync needs to be told ``this replica contains the true versions of all the files; please make the other replica look exactly the same. Unison is capable of recognizing updates in both replicas and deciding which way they should be propagated.

Both Unison and rsync use the so-called "rsync algorithm," by Andrew Tridgell and Paul Mackerras, for performing updates. This algorithm streamlines updates in small parts of large files by transferring only the parts that have changed.

What are the differences between Unison and CVS, Subversion, etc.?

Both CVS and Unison can be used to keep a remote replica of a directory structure up to date with a central repository. Both are capable of propagating updates in both directions and recognizing conflicting updates. Both use the rsync protocol for file transfer.

Unison's main advantage is being somewhat more automatic and easier to use, especially on large groups of files. CVS requires manual notification whenever files are added or deleted. Moving files is a bit tricky. And if you decide to move a directory... well, heaven help you.

CVS, on the other hand, is a full-blown version control system, and it has lots of other features (version history, multiple branches, etc.) that Unison (which is just a file synchronizer) doesn't have.

Similar comments apply to newer version control systems such as Subversion.

Has anybody tried to use Unison in conjunction with CVS in order to replicate a CVS repository for active development in more than one geographical location at the same time? Do you forsee any issues with trying to do such a thing, or do you have any tips as to how to get a setup like that working?

Unison and CVS (or Subversion, etc.) can be used together. The easiest way is to replicate your files with Unison but keep your CVS repository on just one machine (and do a commit on that machine after each time you synchronize with Unison, if files in that directory have changed). More complex schemes are also possible (e.g., using a remote CVS server and checking in from any host with one of the replicas), but should be used with care. In particular, if you use a remote CVS server, it is important that you do _not_ tell Unison to ignore the files in the CVS subdirectory.

Will unison behave correctly if used transitively? That is, if I synchronize both between host1:dir and host2:dir and between host2:dir and host3:dir at different times? Are there any problems if the "connectivity graph" has loops?

This mode of usage will work fine. As far as each "host pair" is concerned, filesystem updates made by Unison when synchronizing any other pairs of hosts are exactly the same as ordinary user changes to the filesystem. So if a file started out having been modified on just one machine, then every time Unison is run on a pair of hosts where one has heard about the change and the other hasn't will result in the change being propagated to the other host. Running unison between machines where both have already heard about the change will leave that file alone. So, no matter what the connectivity graph looks like (as long as it is not partitioned), eventually everyone will agree on the new value of the file.

The only thing to be careful of is changing the file again on the first machine (or, in fact, any other machine) before all the machines have heard about the first change -- this can result in Unison reporting conflicting changes to the file, which you'll then have to resolve by hand. The best topology for avoiding such spurious conflicts is a star, with one central server that synchronizes with everybody else.

Is it OK to run several copies of Unison concurrently?

This will work fine, as long as each running copy is synchronizing a different pair of roots (i.e., as long as each copy is using a different archive file).

What will happen if I do a local (or NFS, etc.) sync and some file happens to be part of both replicas?

It will look to Unison as though somebody else has been modifying the files it is trying to synchronize, and it will fail (safely) on these files.

What happens if Unison gets killed while it is working? Do I have to kill it nicely, or can I use kill -9? What if the network goes down during a synchronization? What if one machine crashes but the other keeps running?

Don't worry; be happy. See the section "Invariants" of the user manual.

What about race conditions when both Unison and some other program or user are both trying to write to a file at exactly the same moment?

Unison works hard to make these ``windows of danger as short as possible, but they cannot be eliminated completely without relying on (non-portable) support from the operating system.

What will happen if I run Unison after my archive files get deleted or damaged?

A missing or damaged archive is treated the same as a completely empty one. This means that Unison will consider all the files in both replicas to be new. Any files that exist only in one replica will be transferred to the other replica (because it will look as though they have just been created); files that exist on both replicas but have different contents will be flagged as conflicts; files that have the same contents on both replicas will simply be noted in the rebuilt archive. If just one of the archive files is missing or damaged, Unison will ignore the other one and start from an empty archive.

Using Unison on Specific Operating Systems

Generic Unix Questions

Is it OK to mount my remote filesystem using NFS and run unison locally, or should I run a remote server process?

NFS-mounting the replicas is fine, as long as the local network is fast enough. Unison needs to read a lot of files (in particular, it needs to check the last-modified time of every file in the repository every time it runs), so if the link bandwidth is low then running a remote server is much better.

I've heard that the Unix file locking mechanism doesn't work very well under NFS. Is this a problem for Unison?

No.

What will happen if I try to synchronize a special file (e.g., something in /dev, /proc, etc.)?

Unison will refuse to synchronize such files. It only understands ordinary files, directories, and symlinks.

Is it possible to run Unison from inetd (the Unix internet services daemon)?

Toby Johnson has contributed a detailed min-HOWTO describing how to do this. (Yan Seiner wrote an earlier howto, on which Toby's is based.)

OS X

Does Unison work on Mac OSX?

Recent versions of Unison work well on OS X, including support for synchronizing files with resource forks, handling of creator strings, etc.

A few caveats:

* OSX native filesystems are case-insensitive (i.e., 'a' and 'A' are the same file), but Unison doesn't recognize this. A workaround is to add the line

ignorecase = true

to your profile.

* Unison will be confused by some files that are frequently updated by OSX, and will report lots of errors of the form "XXX has been modified during synchronization." These files --- in particular, files with names like .FBCLockFolder and .FBCIndex --- should be ignored by adding

ignore = Name .FBCIndex ignore = Name .FBCLockFolder

to your profile.

* Unison does not run on Mac OS 9 or earlier.

Installing Unison from Fink (on OS X) does not work.

We've had reports that fink installation only works when the unstable packages are selected. See [1] for more information.

How do I pass environment variables to the Aqua version of Unison?

Create a file .MacOSX/environment.plist in your home directory containing:

 <!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">  UNISONLOCALHOSTNAME my_canonical_host_name_for_unison

For more information about this file, see [2].

Windows

Unison creates two different windows, the main user interface and a blank console window. Is there any way to get rid of the second one?

The extra console window is there for ssh to use to get your password. Unfortunately, in the present version of unison the window will appear whether you're using ssh or not. Karl Moerder contributed some scripts that he uses to make the command window a bit more attractive. He starts unison from a shortcut to a .cmd file. This lets him control the attributes of the command window, making it small and gray and centering the passphrase request. His scripts can be found at [3].

It is also possible to get rid of the window entirely (for users that only want socket mode connections) by playing games with icons. If you make a symbolic link to the executable, you can edit the properties box to make this window come up iconic. That way when you click on the link, you seem to just get a unison window (except on the task bar, where the text window shows).

It looks like the root preferences are specified using backslashes, but path and ignore preferences are specified with forward slashes. What's up with that?

Unison uses two sorts of paths: native filesystem paths, which use the syntax of the host filesystem, and "portable" paths relative to the roots of the replicas, which always use / to separate the path components. Roots are native filesystem paths; the others are root-relative.

I'm having trouble getting unison working with openssh under Windows. Any suggestions?

Antony Courtney contributed the following comment:

I ran in to some difficulties trying to use this ssh client with Unison, and tracked down at least one of the problems. I thought I'd share my experiences, and provide a 'known good' solution for other users who might want to use this Windows / Unison / ssh / Cygwin combination. If you launch Unison from bash, it fails (at least for me). Running unison_win32-gtkui.exe, I get a dialog box that reads:

Fatal error: Error in checkServer: Broken pipe [read]

and a message is printed to stderr in the bash window that reads:

ssh: unison_win32-gtkui.exe: no address associated with hostname.

My guess is that this is caused by some incompatibility between the Ocaml Win32 library routines and Cygwin with regard to setting up argv[] for child processes.

The solution is to launch Unison from a DOS command prompt instead; or see section X.

How can I get Unison to talk through a Putty link on Windows?

Martin Cleaver has written up a set of instructions [4].

I'm using Windows XP + Cygwin. Unison stops responding after I enter my ssh password. What can I do to fix it?

This appears to be a problem with the Cygwin DLL version 1.5.11, perhaps in conflict with Win XP. Downgrading to DLL version 1.5.10-3 usually solves the problem.

To downgrade using the Cygwin setup program, run Cygwin's setup.exe, chose the "Base" tree of packages and toggle the "cygwin: the UNIX emulation engine" package until it is set to install v1.5.10-3.

Thanks to Michael McDougall for this answer.

(Uh oh... Michael writes, later: Actually, I've been communicating with one of the people afflicted with this bug and I think my solution is out of date. It looks like the new Cygwin DLL still has the bug, but the Cygwin setup program only lets you downgrade one version--now you can choose between 1.5.11 and 1.5.12, both of which break Unison.)

Karl Crary has a different workaround for this problem, using socket connections and port forwarding. The key idea is to create a script looking like this:

unison -socket NNNN & ssh -R NNNN:localhost:NNNN user@remote.site /path/to/unison -killServer socket://localhost:NNNN//local-root remote-root

Is there a way, under Windows, to click-start Unison and make it synchronize according to a particular profile?

Greg Sullivan sent us the following useful trick:

In order to make syncing a particular profile "clickable" from the Win98 desktop, when the profile uses ssh, you need to create a .bat file that contains nothing but "unison profile-name" (assuming unison.exe is in the PATH). I first tried the "obvious" strategy of creating a shortcut on the desktop with the actual command line "unison profile, but that hangs. The .bat file trick works, though, because it runs command.com and then invokes the .bat file.

Troubleshooting

Are there any general troubleshooting strategies?

A general recommendation is that, if you've gotten into a state you don't understand, deleting the archive files on both replicas (files with names like arNNNNNNNNNNNNNNN in the .unison directory) will return you to a blank slate. If the replicas are identical, then deleting the archives is always safe. If they are not identical, then deleting the archives will cause all files that exist on one side but not the other to be copied, and will report conflicts for all non-identical files that do exist on both sides.

(If you think the behavior you're observing is an actual bug, then you might consider moving the archives to somewhere else instead of deleting them, so that you can try to replicate the bad behavior and report more clearly what happened.)

The text mode user interface fails with "Uncaught exception Sys_blocked_io" when running over ssh2.

The problem here is that ssh2 puts its standard file descriptors into non-blocking mode. But unison and ssh share the same stderr (so that error messages from the server are displayed), and the nonblocking setting interferes with Unison's interaction with the user. This can be corrected by redirecting the stderr when invoking Unison:

unison -ui text 2>/dev/tty

(The redirection syntax is a bit shell-specific. On some shells, e.g., csh and tcsh, you may need to write

unison -ui text > & /dev/tty

instead.)

What does "DANGER.README: permission denied" mean?

If you see something like

Propagating updates [accounting/fedscwh3qt2000.wb3] failed: error in renaming locally: /DANGER.README: permission denied

it means that unison is having trouble creating the temporary file DANGER.README, which it uses as a "commit log" for operations (such as renaming its temporary file accounting/fedscwh3qt2000.wb3.unison.tmp to the real location accounting/fedscwh3qt2000.wb3) that may leave the filesystem in a bad state if they are interrupted in the middle. This is pretty unlikely, since the rename operation happens fast, but it is possible; if it happens, the commit log will be left around and Unison will notice (and tell you) the next time it runs that the consistency of that file needs to be checked.

The specific problem here is that Unison is trying to create DANGER.README in the directory specified by your HOME environment variable, which seems to be set to /, where you do not have write permission.

The command line "unison work ssh://remote.dcs.ed.ac.uk/work" fails, with "fatal error: could not connect to server." But when I connect directly with ssh remote.dcs.ed.ac.uk/work, I see that my PATH variable is correctly set, and the unison executable is found.

In the first case, Unison is using ssh to execute a command, and in the second, it is giving you an interactive remote shell. Under some ssh configurations, these two use different startup sequences. You can test whether this is the problem here by trying, e.g.,

ssh remote.dcs.ed.ac.uk 'echo $PATH'

and seeing whether your PATH is the same as when you do

ssh remote.dcs.ed.ac.uk   [give password and wait for connection] echo $PATH

One method that should always work is this [thanks to Richard Atterer for this]: log into the machine, set up PATH so the program is found execute

echo "PATH=$PATH" >>~/.ssh/environment

All this seems to be controlled by the configuration of ssh, but we have not understood the details---if someone does, please let us know.

When I use ssh to log into the server, everything looks fine (and I can see the Unison binary in my path). But when I do 'ssh unison' it fails. Why?

[Thanks to Nick Phillips for the following explanation.]

It's simple. If you start ssh, enter your password etc. and then end up in a shell, you have a login shell.

If you do "ssh myhost.com unison" then unison is not run in a login shell.

This means that different shell init scripts are used, and most people seem to have their shell init scripts set up all wrong.

With bash, for example, your .bash_profile only gets used if you start a login shell. This usually means that you've logged in on the system console, on a terminal, or remotely. If you start an xterm from the command line you won't get a login shell in it. If you start a command remotely from the ssh or rsh command line you also won't get a login shell to run it in (this is of course a Good Thing -- you may want to run interactive commands from it, for example to ask what type of terminal they're using today).

If people insist on setting their PATH in their .bash_profile, then they should probably do at least one of the following:

* stop it; * read the bash manual, section "INVOCATION"; * set their path in their .bashrc; * get their sysadmin to set a sensible system-wide default path; * source their .bash_profile from their .bashrc ...

It's pretty similar for most shells.

Unison crashes with an "out of memory" error when used to synchronize really huge directories (e.g., with hundreds of thousands of files).

You may need to increase your maximum stack size. On Linux and Solaris systems, for example, you can do this using the ulimit command (see the bash documentation for details).

Why does unison run so slowly the first time I start it?

On the first synchronization, unison doesn't have any "memory" of what your replicas used to look like, so it has to go through, fingerprint every file, transfer the fingerprints across the network, and compare them to what's on the other side. Having done this once, it stashes away the information so that in future runs almost all of the work can be done locally on each side.

I can't seem to override the paths selected in the profile by using a -path argument on the command line.

Right: the path preference is additive (each use adds an entry to the list of paths within the replicas that Unison will try to synchronize), and there is no way to remove entries once they have gotten into this list. The solution is to split your preference file into different "top-level" files containing different sets of path preferences and make them all include a common preference file to avoid repeating the non-path preferences. See the section "Profile Examples" of the user manual for a complete example.

I can't seem to override the roots selected in the profile by listing the roots on the command line. I get "Fatal error: Wrong number of roots (2 expected; 4 provided)."

Roots should be provided either in the preference file or on the command line, not both. See the section "Profile Examples" of the user manual for further advice.

I get a persistent 'rsync failure' error when transferring a particular file. What can I do?

We're not sure what causes this failure, but a workaround is to set the rsync flag to false.

Tricks and Tips

I want to use Unison to synchronize really big replicas. How can I improve performance?

When you synchronize a large directory structure for the first time, Unison will need to spend a lot of time walking over all the files and building an internal data structure called an archive. There is no way around this: Unison uses these archives in a critical way to do its work. While you're getting things set up, you'll probably save time if you start off focusing Unison's attention on just a subset of your files, by including the option -path some/small/subdirectory on the command line. When this is working to your satisfaction, take away the -path option and go get lunch while Unison works. This rebuilding operation will sometimes need to be repeated when you upgrade Unison (major upgrades often involve changes to the format of the archive files; minor upgrades generally do not.)

Next, you make sure that you are not "remote mounting" either of your replicas over a network connection. Unison needs to run close to the files that it is managing, otherwise performance will be very poor. Set up a client-server configuration as described in the installation section of the manual.

If your replicas are large and at least one of them is on a Windows system, you will probably find that Unison's default method for detecting changes (which involves scanning the full contents of every file on every sync---the only completely safe way to do it under Windows) is too slow. In this case, you may be interested in the fastcheck preference, documented in the section "Fast Update Checking" of the user manual.

In normal operation, the longest part of a Unison run is usually the time that it takes to scan the replicas for updates. This requires examining the filesystem entry for every file (i.e., doing an fstat on each inode) in the replica. This means that the total number of inodes in the replica, rather than the total size of the data, is the main factor limiting Unison's performance.

Update detection times can be improved (sometimes dramatically) by telling Unison to ignore certain files or directories. See the description of the ignore and ignorenot preferences in the section "Preferences" of the user manual.

(One could also imagine improving Unison's update detection by giving it access to the filesystem logs kept by some modern "journaling filesystems" such as ext3 or ReiserFS, but this has not been implemented. We have some ideas for how to make it work, but it will require a bit of systems hacking that no one has volunteered for yet.)

Another way of making Unison detect updates faster is by "aiming" it at just a portion of the replicas by giving it one or more path preferences. For example, if you want to synchronize several large subdirectories of your home directory between two hosts, you can set things up like this:

Create a common profile (called, e.g., common) containing most of your preferences, including the two roots:

root = /home/bcpierce root = ssh://saul.cis.upenn.edu//home/bcpierce ignore = Name *.o   ignore = Name *.tmp etc.

Create a default profile default.prf with path preferences for all of the top-level subdirectories that you want to keep in sync, plus an instruction to read the common profile:

path = current path = archive path = src path = Mail include common

Running unison default will synchronize everything.

(If you want to synchronize everything in your home directory, you can omit the path preferences from default.prf.)

Create several more preference files similar to default.prf but containing smaller sets of path preferences. For example, mail.prf might contain:

path = Mail include common

Now running unison mail will scan and synchronize just your Mail subdirectory.

Once update detection is finished, Unison needs to transfer the changed files. This is done using a variant of the rsync protocol, so if you have made only small changes in a large file, the amount of data transferred across the network will be relatively small. Unison carries out many file transfers at the same time, so the per-file set up time is not a significant performance factor.

Is there a way to get Unison not to prompt me for a password every time I run it (e.g., so that I can run it every half hour from a shell script)?

It's actually ssh that's asking for the password. If you're running the Unison client on a Unix system, you should check out the 'ssh-agent' facility in ssh. If you do

ssh-agent bash

(or ssh-agent startx, when you first log in) it will start you a shell (or an X Windows session) in which all processes and sub-processes are part of the same ssh-authorization group. If, inside any shell belonging to this authorization group, you run the ssh-add program, it will prompt you once for a password and then remember it for the duration of the bash session. You can then use Unison over ssh---or even run it repeatedly from a shell script---without giving your password again.

It may also be possible to configure ssh so that it does not require any password: just enter an empty password when you create a pair of keys. If you think it is safe enough to keep your private key unencrypted on your client machine, this solution should work even under Windows.

Can Unison be used with SSH's port forwarding features?

Mark Thomas says the following procedure works for him:

After having problems with unison spawning a command line ssh in Windows I noticed that unison also supports a socket mode of communication (great software!) so I tried the port forwarding feature of ssh using a graphical SSH terminal [5]

To use unison I start TTSHH with port forwarding enabled and login to the Linux box where the unison server (unison -socket xxxx) is started automatically. In windows I just run unison and connect to localhost (unison socket://localhost:xxxx/ ...)

How can I use Unison from a laptop whose hostname changes depending on where it is plugged into the network?

This is partially addressed by the rootalias preference. See the discussion in the section "Archive Files" of the user manual.