Producer Consumer: A Comparison of C and Go

April 02, 2014 by Alex Coco

I’ve been planning to learn some Go for a while now. A friend of mine loves it and takes every opportunity he has to get me to try it. I finally got to try it out and I’m really loving it. One of the courses I’m taking right now is Operating Systems. It has significantly less to do with operating systems than it has to do with learning about deadlocks and synchronization in C. One of the problems we explored was the producer/consumer problem. I decided to write a solution to this problem in both C and Go and compare them.

The Problem

Suppose you have two independent tasks: produce data and consume data. Producing data could be anything. For example, you might be fetching data from a web service, reading data from disk, or generating it yourself. Similarly, consuming data could mean anything that requires data that has been previously produced.

One way to complete these tasks is to do them sequentially: first you produce some data then you consume it. While this works initially, it’s not very efficient when your two tasks take a different amount of time to complete. You might be reading a file from disk and then trying to do an expensive computation on the data you read. If the expensive computation takes longer than reading the file then most of your time will be spent waiting for the computation to complete.

Modelling this problem as a producer/consumer problem makes a lot of sense. To do this, we’ll want to have the two tasks run in parallel. The producer and consumer threads will run at the same time, each waiting for data to be produced or consumed. Where this becomes especially powerful is when we have more than one producer or consumer. If producing data is twice as fast as consuming it, then it makes sense to use two consumers for every producer to get the maximum amount of work done.

The Solution

To be honest, an implementation in C is longer than I’d like and I don’t want to have a wall of C on my page. Here’s a link to source for the full example in C.

The buffer.c file contains code to create a buffer with limited capacity. Inserting and removing data from this buffer is threadsafe; I’m using the pthread library’s pthread_mutex_t and pthread_cond_t types for locking and waiting on conditions. buffer_enqueue will wait until there is free space before inserting, and buffer_dequeue will wait until it is not empty before removing.

Having a threadsafe buffer makes the rest of the code significantly easier to read. Here’s an excerpt from producer_consumer.c:

void *producer(void *p) {
	buffer *buff = (buffer*) p;

	/* constantly produce data */
	while (1) {
		/* simulate work to produce data */
		sleep(PRODUCE_TIME);

		/* produce data (integer between 1 and 5) */
		int data = rand() % 5 + 1;

		printf("Produced data: %i\n", data);
		fflush(stdout);

		/* insert data into the buffer */
		buffer_enqueue(buff, data);
	}
}

That function is what is run from every producer thread. It is responsible for generating some data (integers, in this case) and inserting it into the buffer. Now here’s the function that is run from every consumer thread:

void *consumer(void *p) {
	buffer *buff = (buffer*) p;

	/* constantly consume data */
	while (1) {
		/* retrieve data */
		int data = buffer_dequeue(buff);

		/* simulate work to consume data */
		sleep(CONSUME_TIME);

		printf("Consumed data: %i\n", data);
		fflush(stdout);
	}
}

This works opposite of how the producer works: first it removes data from the buffer then consumes it. You’ll notice that this code just loops forever, inserting and removing data from the buffer. It appears so simple because in this implementation it is the responsibility of the buffer data structure to wait until the action can be performed. In other words, buffer_enqueue and buffer_dequeue block until the action can be performed.

If I only had producers and no consumers then my producers would fill up the buffer and wait until an item is removed by a consumer before adding a new one. It’s deceptively simple!

Go

I’d like to compare this to the implementation in Go. Here’s a link to the complete file. At a glance, it looks extremely similar. That’s because they are! The most noticeable difference is that we don’t need to implement our own threadsafe buffer.

The C implementation used a buffer made from scratch with concurrency primitives such as pthread_mutex_t and pthread_cond_t to make sure that two threads didn’t write into the buffer at the same time. Furthermore, the C implementation uses pthreads to start the producer and consumer functions in their own POSIX threads.

Go uses channels to communicate between goroutines. A goroutine is like a lightweight thread that can be used to run a function concurrently. Go provides a super simple syntax for doing this: just prefix a function call with go. This replaces all the work of creating thread attributes and starting threads.

In Go, a channel is like a threadsafe queue. This does exactly what the buffer in buffer.c does. It allows us to insert (buff<-) and remove (<-buff) data in a threadsafe way without having to think about it. The Go implementation creates a channel that contains integers and limits the capacity. This means that inserting will block until there is room, and removing will block until it’s not empty.

Go is a powerful language that makes it very easy to write concurrent programs. It’s a system programming language with high level concepts built right in and it’s a pleasure to work with. I’m looking forward to using it in future projects.