In most object oriented programming languages (like C# and Java) variable references support atomic assignment, even though the storing of a value into a variable really requires two primitive operations - a store and a write. The promise made by the CLR that simple assignments such int num = 1; is atomic, is really a promise that a context switch will never occur between the store and write operations required to execute that statement - essentially, the two operations are treated as one. Consider the following, though:
int num = 1;
++num;
The initial assignment we know is atomic - but what about the increment? To quote from the C# Language specification, "there is no guarantee of atomic read-modify-write, such as is the case of increment and decrement". ++num is really a shorthand for num = num + 1, which results in fetching num from memory, storing it in a register, storing the 1 in a register, performing the add on the two, and then storing the result back into the memory originally allocated to the num variable. That's a much more complex set of instructions being carried out, and in a multi-threaded environment we'd run the risk of a context switch happening in the middle of carrying them out (ie. the execution being halted on one thread and resumed on another), possibly leading to a loss of data integrity. To safeguard against this, we need to synchronize the operation. Fortunately, the framework has a set of methods that deal specifically with synchronizing increments and decrements efficiently:
int num = 1;
System.Threading.Interlocked.Increment(ref num);
Reference Types Too!
It is not only (most) value types that can be atomically assigned, however - the rule also holds for reference types. So for example, the CLR ensures that the following assignment is atomic:
Person p = new Person("Fredrik");
Considering what we discussed above this may seem a bit strange, because surely we have an even more complex set of operations going on than simply incrementing an integer. However, remember that we evaluate the right side of an assignment before assigning it to the variable. So first, we construct the new object instance, and then we assign it to the variable. Until the object is properly constructed, nobody else can access it, and thus we can safely say the assignment is atomic. Note that this does not mean that the constructor is implicitly synchronized, however - if the constructor accesses any shared resources then it is up to you to synchronize access to these. But the assignment itself is safe.
Complex Assignments
So far, we've talked about atomic assignments. We touched upon the idea of complex assignments with the increment example, but really what I mean with a complex assignment is something like, say, the following property Friends on an imaginary Person class:
private IList<Person> _friends = null;
public IList<Person> Friends
{
get
{
if (null == _friends)
{
_friends = new List<Person>();
_friends.Add(new Person("Bob"));
_friends.Add(new Person("Joe"));
}
return _friends;
}
}
Here, we've implemented a really naive lazy loaded property. The first time it is accessed, the list is created and populated. The assignment of the _friends variable is said to be complex because it entails several operations before it is in the desired state - basically we have to both construct and then initialize it (adding the Bob and Joe items to the list). I'm sure you can see what might happen here - if a context switch were to occur after the construction of the object but before the two items have been added, then another thread may come along and find an empty list when accessing the Friends property. Let's solve this. Consider the following improvement:
private IList<Person> _friends = null;
public IList<Person> Friends
{
get
{
if (null == _friends)
{
List<Person> temp = new List<Person>();
temp.Add(new Person("Bob"));
temp.Add(new Person("Joe"));
_friends = temp;
}
return _friends;
}
}
By introducing a temporary variable to hold the list until we've both constructed and initialized it, we 'fake' an atomic assignment, effectively ensuring our data integrity. Bart De Smet recently wrote an interesting post about how the C# 3.0 Object Initializers apply this pattern.
(Super-)Synchronize Me
There's one potential problem with the above, though - it is not synchronized. This means that we may actually end up constructing and assigning the list several times. Imagine that a context switch happens after thread A has evaluated the if condition and found the list to be null, entering the if scope. This thread will now construct and initialize the list, and then assign it to the backing field of the property. However, the context switch lets thread B access the property before thread A has a change to assign the list it is creating to the _friends field - so thread B also sees that _friends is null, and starts to create its own list. In most cases, this won't have any bad side effects apart from a possibly insignificant performance hit, but it can easily be disastrous. What if Thread B executed this:
IList<Person> friends = person.Friends;
When the context switches again, thread A will assigns the list it created list to _friends, orphaning the one thread B created - the reference that Thread B stored for use later is no longer the same as the one the Person object has - auch! Sometimes this won't matter, but I'm sure you can see the dangers here. Fortunately we can fix this fairly easilly by employing the double-checked locking pattern:
private IList<Person> _friends = null;
private static readonly object _friendsLock = new object();
public IList<Person> Friends
{
get
{
if (null == _friends)
{
lock (_friendsLock)
{
if (null == _friends)
{
List<Person> temp = new List<Person>();
temp.Add(new Person("Bob"));
temp.Add(new Person("Joe"));
_friends = temp;
}
}
}
return _friends;
}
}
Now, Thread B will block on the lock until thread A has finished constructing and initializing the list. When Thread B is later given the lock, it will test to see if the _friends variable has been assigned anew, see that it has, and skip the if scope. We've effectively ensured that the logic inside the if scope will only ever be executed once.
With increasing demand for multi-threaded applications, the principles discussed here become more and more important to grasp in order to write software that keeps the integrity of its data intact. If you want to know more, a good place to start is Bill Wagner's excellent article 'Write Code for a Multithreaded World' which was published in the August 2007 issue of VisualStudio Magazine.