BabyCache

Mauro Guido
13y
10.7k
0
0

Article

What is a cache?

As you surely know, a cache is a component that stores data as it were a little database. Original data are always stored in a file, typically a relational database, and accessing it every time you need them could negatively affect your overall application performance. On the other hand, accesses to the cache are very fast because caches are in-memory components.

So, how can we use a cache? Have a look at the following flowchart:

Fig. 1

Using a cache, you need to access the database only when the cache doesn't contain the requested data, improving your application performance.

How BabyCache works

There are many cache libraries on the Internet, but I preferred to build my own library, my very simple library, I decided to call it "BabyCache" just for that reason. It's a cache of a fixed dimension and you can decide that dimension when you create a new instance. Let's see how a 10-element cache works:

Fig. 2

Fig. 2 shows a 10-element cache with 9 elements in it. Every cache element is composed by a key-value pair and the keys are represented in the figure. We can imagine the BabyCache as a set of elements ordered by the time the elements are stored (every element has been stored before than the elements on its left and after than the elements on its right; so the newest element is in the first position on the left and the oldest element is in the last position on the right). Two operations can be performed on our cache: you can insert a new element or you can ask the cache for a stored element. Let's have a look at these simple operations:

Insert: the order of the elements into the cache must be preserved even if you add a new element; so a new element makes the other elements to move to right; if the cache is already full, the oldest element is discarded.

Fig. 3 - The element with key "8" has just been added to the cache shown in fig. 2

Fig. 4 - The element with key "58" has just been added to the cache shown in fig. 3 - The element with key "23" has been discarded
Get: if you ask the cache for a stored element, you will be given the value corresponding to the given key. Now, since a recently used data will very probably used in the immediate future (temporal locality), it is useful to "refresh" that element moving it in the first position.

Fig. 5 - The element with key "3" has just been asked the cache shown in fig. 4 - That element is moved in the first position

If you are a simple user, you can stop reading this article right now. The curious people can go ahead and have a look at the BabyCache's code.

The code

The BabyCache project is composed of three files:

IBabyCache: It's a simple interface that defines the main methods of the cache. The elements into the cache are all of the same type: the keys are all of type TKey, the values are all of type TValue. This interface has been created just to allow you to simply add your own implementation of the cache. It defines the signature of four methods:

- Add: adds a new element into the cache. The new element has a key of type TKey and a value of type TValue
- GetObject: returns the value of the element corresponding to the specified key
- Clear: clears the cache, i.e. deletes all the elements in it
- Count: returns the number of elements currently stored into the cache
BabyCacheElement: it's the single element that will be stored into the cache. It has a constructor and two simple methods to get the key and the value of the element.
BabyCacheManager: it's the main class of this library. The most important element of this class is the private attribute _cache. It is a LinkedList of BabyCacheElement and it is this that will contain the cache elements. I chose a LinkedList mainly for one reason: both inserting an element in the first position and removing an element from the last position are O(1) operations, so they are not depending on the cache size. The other attribute _max_size is the maximum size of the cache, i.e. the maximum number of elements that the cache can contain.

- The constructor is very simple, it just creates a cache of the specified size:

- The Add method is very simple as well. It first checks the cache size and then removes the last (oldest) element if the cache is full. Then it creates a new instance of BabyCacheElement with the specified key and value and adds it as the first element in the cache. This is a very simple operation since the single operations (counting the elements in the cache, removing the last element, create a new instance of BabyCacheElement, and adding the new element in the first position) are all O(1) operations, resulting in an overall O(1) operation

- Let's have a look at the GetObject method. For every BabyCacheElement in the cache, it checks whether the key of the current object is equal to the specified key and, if so, that object is moved in the first position (removed from the current position and added to the first position); then its value is returned to the caller. If no object corresponds to the specified key, a null value is returned.

This operation is a little heavier than the Add one. Comparing the specified key with the current object's key, adding the new element in the first position and returning the element's value are all O(1) operations, but removing the element from the cache is a O(n) operation (the Remove instruction will be executed at most one time, if the specified key is found). The foreach instruction iterates over the cache's elements so it is a O(n) operation. The overall operation is then O(2n): a O(n) iteration and a O(n) operation performed one time only during the iteration.

For educational purpose only, I created another method for fetching the desired object's value. That method just creates a new LinkedList and adds the old list's elements as it iterates over them. The desired element, if found, is not suddenly added to the new list but stored and added at the end of the iteration (the element has to be refreshed so it is moved to the first position)

All the operations in the foreach instruction are all O(1) operations and so the operations performed if the element is found are. The overall complexity is so O(n), better than the previous method. In this case, however, the AddLast method is called n times, differently from the previous method when an element was added to the collection at most one time. Here are the average execution times of the two methods above on two different computers (PC#1 and PC#2) and on two different types of caches (<string, string> and <int, string>) both of 1,000,000 elements (so many elements for a cache, I know, but I needed them to have not short execution times):

	<string, string> PC#1	<string, string> PC#2	<int, string> PC#1	<int, string> PC#2
GetObject	193 ms	102 ms	155 ms	88 ms
GetObjectByDuplicating	623 ms	307 ms	1262 ms	464 ms

We can observe here than the GetObject method is always faster than the GetObjectByDuplicating one; so because in the first method an element is added to the collection at most one time.

Concurrency

What about concurrency? What would happen if you used BabyCache in a web application where multiple threads might access the cache at the same time? If you use a single BabyCache instance (for example if you use BabyCache as a static attribute of a class) what would happen if one thread is reading while another one is deleting an element from the cache? That was the situation I had while developing a new feature in my website about the Italian fantasy soccer: information about players are not related to a single user, so I decided to store the recently used players data into a shared cache. Two unwanted situations might happen while using a shared BabyCache instance in a web application:

Let's suppose we have two threads, T1 and T2, and a full cache. T1 is asking for the last (older) element in the cache, T2 is adding an element not present in the cache. Here is when this critical situation happens:

The first thread T1 has just detected the desired element which is in the last position and is now going to remove that element in order to refresh it. At the same time, a second thread T2 is adding a new element to the cache and, since the cache is full, the last element is removed. The first thread T1 needs to remove the detected element but it is no longer present in the cache so the Remove instruction has no effect. The following instruction adds the element to the cache while the second thread adds the new element. When both threads end their execution, our cache's size will be _max_size + 1 !!!. The critical situation is better explained by the following sequence diagram:
The second situation is more serious than the first one. It happens when a thread modifies the cache while another thread is iterating over it to find out an element. A "System.InvalidOperationException: Collection was modified after the enumerator was instantiated" will be thrown.

ConcurrentBabyCache

I created another version of BabyCache to avoid the concurrency problems I've just explained in the previous paragraph. It's a simple subclass of BabyCacheManager and its code is very clear.

As you can see in the picture above, this class uses the BabyCacheManager's methods calling them inside a lock block, so one thread at a time can access the cache.

Conclusion

There are probably many ways to improve my BabyCache's performance. For example, a new collection type might be used in place of the LinkedList. So, any suggestion and correction are welcome! That's all, enjoy your BabyCache.