Rebuilding QueryModels from an EventStore

Rebuilding all read-models (projections) from an EventStore can be slow, because we are processing hundreds of thousands of events with several projections. This is one area where having a boost in performance is crucial and it is where I want to understand how much our logging infrastructure impact on performance.

For this scenario I have a small batch of events (~4k) and evlog4net logging level set at DEBUG level; this will generates ~35k log messages in this reference scenario.

Projections are grouped by slots, each slot contains a certain number of projections, and each slots processes events in a separate thread (the exact architecture will be subject of future post)

First run

Here is what happened during the first run.

First Run - no log enabled

This is a low resolution image, but it represents how much time the system needs to rebuild all projections with no log enabled. The Y-Axis represents how many events remain to dispatch for each slot (each graph is one slot), X-Axis represents time..

This is the reference scenario, now logging is enabled at DEBUG level, both for our MongoBufferedAppender and for ColoredConsoleAppender.

What happens with all logs enabled

The situation really changed a lot:

Second Run - ALL logs enabled

Even if you have no X-Axis legend, it is really evident that the whole process is slowed down a lot having log enabled.

We do not need exact measurement, even if the image is a low resolution one and we have no legend on X-Axis it is evident that: Enabling logging is slowing down everything.

We have also a really strange behaviour: most of the projections consumes events lineraly, three slots are really slower (this is normal), but when the other slots finished rebuild, the three slow slots speed up considerably. This means that Logging infrastructure is somewhat bounding toghether performances of different slots.

Believe it or not, the reason is, ColoredConsoleAppender. We have multiple services running in Console applications with TopShelf; they can run as a service in production environment, but we use console appender to have a visual clue of what is happening during development/debug, when services are run as standard console applications.

Stopping ColoredConsoleAppender

The next run is done simply setting threshold level of ColoredConsoleAppender to INFO instead of DEBUG. this reduce number of logs to ~500 for the entire run. Here is the result.

Third Run - only mongodbappender

The whole rebuild is really faster. It is still slower that having no log enabled but we have a real better scenario. Again, you do not need exact measurement, the impact of disabling ColoredConsoleAppender is so great that it is evident that Logging to console has a huge impact on performances.

Actually I never liked too much console appender in log4net for various reason.

  • It slows down your application
  • It is difficult to read logs, console app have a limited buffer space and logs disappear quickly.
  • Log disappear when the program is closed
  • You have no way to query log, you only have a huge sequence of text scrolling on a console app
  • If the program crashes usually you see a red log with the exception and the console app immediately closes befoure you can read what happened.

All these consideration lead to an obvious conclusion: Console Appender is not useful, the only purpose is showing you that the program is doing something and it is not blocked somewhere.

If you want to use logging infrastructure to have an idea of progress of your application, please convince yourself that this is not a good idea. Using logs to verify that the application is doing something has little utility and it is not the reason logs are built for.

Now it is time to the final run.

Using MongoDbAppender with writer on a dedicated thread and LooseFix

We can further improve performance if you know how log4net works internally. Our MongoDbAppender inherits from BufferingAppenderSkeleton and you can read on my english blog why it slows down performances: BufferingAppenderSkeleton performance problem.

Armed with this knowledge I’ve done a final run using LooseFix and asking mongo appender to use a dedicated thread to save logs and here is the result.

Fourth Run - mongodbappender with LooseFix and dedicated thread

Even if the image is small, you can verify that this image is quite similar to the first one. This is the desired situation, logging infrastructure cannot be completely transparent in performances, but if the loss in performance is negligible, the advantage of logging lots of information is invaluable.

MongoDbAppender have several advantages over ColoredConsoleAppender

  • It is durable; logs are maintained after the application is closed
  • You can query to find logs with lots of conditions (even regular expression)
  • You can setup a simple process to move logs from various Mongo source to a single location (Ex Elasticsearch + Kibana)
  • If you poll log collection and read most recent 10 logs and total count you can have idea of progress (this is not a perfect solution, but at least you can use for this purpose if you want).

What we learned

This example give us some guidelines on how to handle log in production:

Logging is useful and the more information you log, the easier is to find a problem. The golden rule is: measure performance penalties in production, use only a single appender to persists logs in a durable medium that allow searching and you are ready to go.

Gian Maria.

The mytical perfect Unit Test Fixture

After years of unit testing I’m really convinced that the real complexity in writing a good Unit Test is setting up a strong test fixture. Here is an example: in Jarvis we have a test that runs fine in R# test runner, in Nunit GUI test runner, in Team City build test runner, but is red when it runs inside Visual Studio Test Runner with Nunit Test adapter.

After a brief investigation we discovered that the cause was a different handling of System.Threading.Thread.CurrentPrincipal object within the various test runners. Here is our situation

What is happening inside VS?

Here is a stupid test that confirmed our suspicions.

[TestFixture]
public class verify_principal
{
    [Test]
    public void verify_principal_is_empty()
    {
        Assert.That(System.Threading.Thread.CurrentPrincipal.Identity.Name, Is.Null.Or.Empty);
    }
}

This simple test verify that the name of the identity associated to the thread used to run the test is null or empty. This test runs green with every test runner, except with Nunit test adapter in Visual Studio.

When test is executed in Visual Studio, CurrentPrincipal identity is a valid Windows Identity and it’s equal to your current Windows user, and this makes some of our test fails when executed inside visual studio. This confirms me that something different happens inside Visual Studio and I want to fix it.

Fixture Fixture Fixture

The test that failed is this one

[Test]
public void raising_events_without_user_context_should_throw()
{
	var router = new AggregateRootEventRouter<SampleAggregate.State>();
	var ex = Assert.Throws<InvalidPrincipalException>(() =>
		{
			router.Dispatch(new SampleAggregateCreated(new SampleAggregateId(1)));
		});
}

This test verifies that: if you try to raise a Domain Event without user context, the engine should throw an exception. Clearly this test fails with Visual Studio Nunit test adapter, because CurrentPrincipal.Identity is equal to current user but it runs perfectly with other test runners.

Actually we can blame VS Test Runner for this behaviour, but the real cause of the erratic test is due to an imperfect test fixture. Let’s this for a moment on the purpose of this test; it is going to verify: how our engine behave if no user context is set. It works perfectly in all test runners except Visual Studio because they runs the test without an identity set in CurrentPrincipal, but this fact is not an assumption of nunit framework.

If you believe that the real culprit is VS Test Runner, consider what happened if, before this text executes, anohter test sets a valid Principal in Thread.CurrentPrincipal property and it does not restore it to original value when the test is finished. You will end with a failure in all test runners, but the failure happens only if the test that sets principal is executed before this one. What you have is

  1. You run all test, test is red
  2. You run the test alone, test is green (because the test that changes CurrentPrincipal is not run)
  3. You run all test again, test is red (ouch)
  4. You change the name of some test and the test returns green (maybe because the test that changes CurrentPrincipal is run after the erratic test

This is called erratic test, because it is a test that can fail due to other external conditions. Such kind of tests bring real pain when they fails, because you need to understand if the test fails because the underling assumption is wrong (you have a bug) or if it’s an external situation that makes it fail (the bug is in the test). In the long run this kind of test should be either fixed or removed from your suite of test.

The above test is better refactored in this way:

[Test]
public void raising_events_without_user_context_should_throw()
{
    var old_identity = System.Threading.Thread.CurrentPrincipal;
    System.Threading.Thread.CurrentPrincipal = new GenericPrincipal(new GenericIdentity(""), new String[] {});
	var router = new AggregateRootEventRouter<SampleAggregate.State>();
	var ex = Assert.Throws<InvalidPrincipalException>(() =>
		{
			router.Dispatch(new SampleAggregateCreated(new SampleAggregateId(1)));
		});
    System.Threading.Thread.CurrentPrincipal = old_identity;
}

This test is superior for a number of reasons

  • It explicitly clears the user context before the test.
  • The test is explicitly telling you that the engine is using System.Threading.Thread.CurrentPrincipal to check current user.
  • This test is stable and runs correctly in every test runner.
  • This test is stable even if a previous test set a valid CurrentPrincipal in current Thread for some other reason.

Lesson learned

To create stable and reproducible tests, you need to be sure that the system is the desired state in Fixture Set Up phase, and with desired state I mean every external component or piece of the system that is exercised in the text. A typical error is believeing that desired fixture was already setup for you by some other “entity” like the test runner or other tests.

Gian Maria.

Code that was never executed in production

One of the greatest risk in software development is letting untested code slip in production. The general rule is: you should always execute your code and verify that it behaves as you expected before moving to production. Clearly we should prefer automated testing over manual testing, but at least we need to be sure that every part of the code was at least executed once before promote it to production.

A real example

Suppose we have this code that represents an Id based on a simple string.

public class StringId
{
    public String Value { get; set; }

    public static implicit operator String(StringId id)
    {
        return id.Value;
    }

    public override bool Equals(object obj)
    {
       //....
    }
}

And we have also a Customer class based on this type of Id

public class CustomerId : StringId
{
    public CustomerId(String rawValue)
    {
        this.Value = rawValue;
    }
}

public class Customer
{
    public CustomerId Id { get; set; }

    public String Name { get; set; }

    public String Surname { get; set; }

    public Int32 Age { get; set; }

    public Double TotalOrderAmount { get; set; }
}

Finally we have some services that handle accessing data in Sql Database with NHibernate and we have a function that return all customers. To simplify this discussion lets consider a service that keeps data in memory.

public class CustomerService
{

    public Dictionary<CustomerId, Customer> _inMemoryRepo = new Dictionary<CustomerId, Customer>();

    public void AddCustomer(Customer customer)
    {
        _inMemoryRepo[customer.Id] = customer;      
    }

    public IEnumerable<Customer> GetAllCustomers()
    {
        return _inMemoryRepo.Values
           .OrderBy(c => c.TotalOrderAmount)
           .ThenBy(c => c.Id);
    }
}

Trivial code isn’t it? Since it is super simple probably you think that there is no need for testing. If you believe that the above code is simple and correct, you missed a bug.

Let’s try to write a simple unit test that exercise the code.

[TestMethod]
public void verify_get_all_customers()
{
    CustomerService sut = new CustomerService();
    sut.AddCustomer(new Customer() { Id = new CustomerId("c1"), TotalOrderAmount = 10.0d });
    sut.AddCustomer(new Customer() { Id = new CustomerId("c2"), TotalOrderAmount = 5.0d });
    sut.AddCustomer(new Customer() { Id = new CustomerId("c3"), TotalOrderAmount = 16.0d });

    var result = sut.GetAllCustomers();
    result.Select(c => c.Id.Value)
        .Should()
        .Have.SameSequenceAs(new [] { "c2", "c1", "c3" });
}

This test is oversimplified, but it represent a similar test that you can write if the service access database trough repository interface.

This test is green, but you have a bug!

The real problem shows up only with certain test data. This new test can highlight the bug.

[TestMethod]
public void verify_get_all_customers_with_same_amount()
{
    CustomerService sut = new CustomerService();
    sut.AddCustomer(new Customer() { Id = new CustomerId("c1"), TotalOrderAmount = 16.0d });
    sut.AddCustomer(new Customer() { Id = new CustomerId("c2"), TotalOrderAmount = 10.0d });
    sut.AddCustomer(new Customer() { Id = new CustomerId("c3"), TotalOrderAmount = 16.0d });

    var result = sut.GetAllCustomers();
    result.Select(c => c.Id.Value)
        .Should()
        .Have.SameSequenceAs(new[] { "c2", "c1", "c3" });
}

Now examine the differencies: basically it is the same test as before, but now we have two customers with the same TotalOrderAmount, and the test fails with this message.

System.ArgumentException: At least one object must implement IComparable.

And this is caused by the .ThenBy LINQ instruction. This instruction tries to order objects based on Id property, but since it is a complex type, LINQ does not know how to order based on CustomerId class. The previous test never exercised the condition specified by the .ThenBy

The solution is really trivial, just implement the IComparable interface for the StringId base class and everything is green. The real problem happens when this error is discovered in production and you need to do some quick fix.

Lesson learned

This kind of problem is caused by letting slip in production code that was never executed, nor manually, nor with Unit Testing. This kind of code constitutes a time bomb in your system that can explode in every moment.

A possible solution is trying to cover all lines of code with Unit Test code coverage, but code coverage is no perfect. In the previous example if you run only the first unit test (the one that succeeds) you got 100% code coverage for CustomerService class. This happens because the code that was really not covered is inside the implementation of the ThenBy LINQ operator.

This can be a classic example on how difficult is writing Unit Tests that are able to discover every bug in your code.

Gian Maria.