How To Distribute Items Into Buckets In C#

The Partition Problem is a very old conundrum in Computer Science. The usual question goes: given some buckets of some capacity, and some items of some size, how can I distribute the items by the buckets in an optimal manner?

The answer to this is more complex than it seems at first glance, as it depends on what you mean by some buckets and optimal.

Do you know in advance how many buckets you have? Or their size?
Do you have to fill more buckets with fewer items or less buckets with more items?
Do you allow bucket overflow or enforce capacity?
Do you want the perfect distribution regardless of performance or is fast-and-furious good enough for you?

There are lots of ways of answering these questions and even more ways of implementing solutions for them, from simple greedy algorithms to more complex heuristics based ones.

Today I’ll get you started the basics for a simple and generic distribution algorithm based on a descending greedy approach, with a couple of variations:

  • Fixed bucket count: Distribute all items up to available buckets. This is a common scenario and a good place to start.
  • Fixed bucket size, uncapped bucket count: Prioritize filling up buckets up to their size, before adding new ones.
  • Auto-capped bucket size, uncapped bucket count: Cap bucket size by the largest item, then behave as the previous method. This is useful when you want to adjust some work to the size of the longest task, so all tasks complete together as possible.

So let’s set this up.

Bucket

First, we create a helper class to act as a bucket:

    public class Bucket
    {
        public int TotalCost { get; set; } = 0;
        public IList Items { get; } = new List();

        private Func cost;

        public Bucket(Func cost)
        {
            this.cost = cost;
        }

        public void Add(T item)
        {
            Items.Add(item);
            TotalCost += cost(item);
        }
    }

This class abstracts the bucket concept out of the algorithm and helps keep things simple. Note how the bucket accepts a cost function. This makes it simple to keep the current bucket cost updated without having to recalculate things all the time.

Bucketizer

Now we add a Bucketizer class to hold the algorithms. You can also call it Potatoes, but I’ll leave that up to you.

    public class Bucketizer
    {
    }

Bucketize By Bucket Count

So let’s say you know how many buckets you have available upfront. This variation will distribute any number of items over a fixed number of buckets. If you have fewer items than buckets, then the method returns only those buckets that got filled.

        public Bucket[] BucketizeByBucketCount(IEnumerable items, Func cost, int maxBucketCount)
        {
            // validate
            if (items == null) throw new ArgumentNullException(nameof(items));
            if (cost == null) throw new ArgumentNullException(nameof(cost));
            if (maxBucketCount  cost(x)).ToArray();

            // check if there is anything to bucket
            if (ordered.Length < 1)
            {
                // return the empty bucket array
                return buckets.ToArray();
            }

            // how many items we have vs how many buckets
            var needs = (ordered.Length < maxBucketCount ? ordered.Length : maxBucketCount);

            // distribute items by bucket count
            foreach (var item in ordered)
            {
                // can we still add buckets
                if (buckets.Count  x.TotalCost).First().Add(item);
                }
            }

            // done
            return buckets.ToArray();
        }

This simple algorithm takes all of the items, orders them by descending order, and then puts each item into the bucket with the most space available. However, it will create a new bucket first, up to the maximum number, before reusing existing buckets. The magic here is the descending order – this will give you very decent results most of the time while keeping the code easy to understand.

Note that the code above is not super fast – it has (Buckets * Items) max iterations plus the initial cost of ordering – but you can optimize it with a dictionary (keyed on current bucket weight) once you adjust it to your needs.

I’ll leave that exercise up to you.

Bucketize By Bucket Size

So you don’t care about the number of buckets. Instead you want to fill up as few buckets as possible up a maximum bucket size. In that case, you can go with this:


        public Bucket[] BucketizeByBucketSize(IEnumerable items, Func cost, int maxBucketSize)
        {
            // validate
            if (items == null) throw new ArgumentNullException(nameof(items));
            if (cost == null) throw new ArgumentNullException(nameof(cost));
            if (maxBucketSize < 1) throw new ArgumentOutOfRangeException(nameof(maxBucketSize));

            // our buckets
            var buckets = new List();

            // order items by descending order of cost
            var ordered = items.OrderByDescending(x => cost(x)).ToArray();

            // check if there is anything to bucket
            if (ordered.Length < 1)
            {
                // return the empty bucket array
                return buckets.ToArray();
            }

            // add the first item regardless
            var bucket = new Bucket(cost);
            bucket.Add(ordered[0]);
            buckets.Add(bucket);

            // add following items
            for (int i = 1; i  b.TotalCost + icost  b.TotalCost).FirstOrDefault();

                // is there a bucket for this
                if (bucket != null)
                {
                    // add the item to the found bucket
                    bucket.Add(item);
                }
                else
                {
                    // add the item to a new bucket
                    bucket = new Bucket(cost);
                    bucket.Add(item);
                    buckets.Add(bucket);
                }
            }

            // done
            return buckets.ToArray();
        }

For each item, the algorithm will first try to find a bucket where the item can fit. If it does, great, job done. If not, then it creates a new bucket for the item. Again, the magic is in the descending order – by focusing on the biggest items first, you end up with a decent distribution most of the time. Kind of like dropping the big stones in the flask before pouring the sand, if you know what I mean.

Bucketize By Top Item

Okay, so you have neither a fixed bucket count, nor a fixed bucket size. Instead you have a certain number of items, a couple of which are very expensive while most are on clearance sale. You want to adjust the bucket size and count so that all items fit to the most expensive task. This scenario is common in distributed systems – when you need to distribute variable cost tasks that work as a unit – but can be anything you want.

This problem is in fact a particular case of the previous example – you are still defining a max bucket size, it just happens to be the size of the biggest item.

        public Bucket[] BucketizeByTopItem(IEnumerable items, Func cost)
        {
            // validate
            if (items == null) throw new ArgumentNullException(nameof(items));
            if (cost == null) throw new ArgumentNullException(nameof(cost));

            // check if there are any items
            if (!items.Any())
            {
                // return empty result if so
                return new Bucket[0];
            }

            // find the top cost
            var top = items.Select(i => cost(i)).Max();

            // bucketize by max bucket size using top item cost
            return BucketizeByBucketSize(items, cost, top);
        }

All this code does is to discover the item with the highest cost, and the delegate the rest of the logic to the previous method. Nothing to it.

There are many other ways of bucketizing items and in far more optimal ways that the examples above – but I hope this gives you a head start.

Till next time.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s