Case sensitivity

Michael Haggerty mhagger at alum.mit.edu
Mon Aug 31 22:25:58 BST 2009


Larry Drews wrote:
> However, I am running into a problem with filename case sensitivity.  The 
> filenames from StarTeam are not uniform, their case usage is all over the 
> map.  I have ended up with two files in the bzr repository that differ only 
> in case usage.

Just for fun I'd like to point out that there is another related problem
that also occurs on *nix systems--that of Unicode normalization.  The
Subversion project has struggled with this problem, too [1,2].

I'm not an expert, but here is the problem as I understand it from
reading the SVN mailing lists:

Many Unicode characters can be represented in multiple, equivalent ways
[3].  For example, the precomposed character 'ü' is a canonical
equivalent to the sequence 'u' and '¨', a combining diaeresis.

Thus a single "logical" filename, encoded in (say) UTF8, can be
represented as multiple distinct byte strings.  This is similar to the
problem that case-insensitive file systems consider some filenames to be
logically equivalent even though they are represented by different byte
strings.

To get around this problem, Unicode defines four "normal forms"; either
fully composed or fully decomposed, and considering canonical or
compatible equivalence.  Each unicode string maps to a unique string in
each of these normal forms, and two strings considered to be equivalent
map to the same normal forms.

IIRC, Windows and Linux don't enforce a normal form but typically use
normal form composed whereas Mac OSX enforces normal-form decomposed.
This means that the filename used to write a file under Mac OSX is not
necessarily the same as the filename read back, which caused huge
problems for Mac OSX users of Subversion.

So while you are agonizing about case insensitivity, you might also
consider Unicode normalization :-)

Michael

[1] http://trac.macports.org/ticket/17813
[2]
http://svn.collab.net/repos/svn/trunk/notes/unicode-composition-for-filenames
[3] http://en.wikipedia.org/wiki/Unicode_equivalence




More information about the bazaar mailing list