async multithreading support #147

alexsku · 2016-12-22T21:58:26Z

This is my attempt to improve multithreading support for miniprofiler v4. I think the current version of the mini profiler would have issues in scenarios where the current thread spawns more than one task and waits for them.

        [Fact]
        public void Head_WhenMultipleTasksSpawned_EachSetsItsOwnHead()
        {
            MiniProfiler.Settings.ProfilerProvider = new DefaultProfilerProvider();
            var allTasks = new SemaphoreSlim(0, 1);
            var completed = new TaskCompletionSource<int>();
            using (GetRequest("http://localhost/Test.aspx", startAndStopProfiler: false))
            {
                MiniProfiler.Start();
                var sut = MiniProfiler.Current;
                var head = sut.Head;

                Task.Run(() => {
                    Assert.Equal(head, sut.Head);
                    using (sut.Step("test1"))
                    {
                        allTasks.Release();
                        completed.Task.Wait();
                    }
                });
                allTasks.Wait();
                Task.Run(() => {
                    using (sut.Step("test2"))
                    {
                        allTasks.Release();
                        completed.Task.Wait();
                    }
                });
                allTasks.Wait();

                Assert.NotNull(sut.Head);
                Assert.Equal(head, sut.Head);
                completed.SetResult(0);
            }
        }

In this scenario there would be multiple threads working with the mini profiler (since they would share it via AsyncLocal). This could lead to random failures since Timing class is not thread safe. Also this could lead to inconsistencies since the spawned tasks would replace the Head timing for the mini profiler instance, with one of the tasks's timing potentially becoming a child of another's.

unit tests are no longer run in parallel

alexsku · 2016-12-22T21:59:10Z

src/MiniProfiler.Shared/Helpers/Net45/AsyncLocal.cs

+    /// Unlike DbContextScope's implementation this implementation doesn't support serialization/deserialization 
+    /// ths not allowing to cross app domain barrier.
+    /// </summary>
+    internal class AsyncLocal<T>


a polyfill for AsyncLocal which is not available before .Net 4.6

alexsku · 2016-12-22T22:00:01Z

src/MiniProfiler.Shared/MiniProfiler.cs

+            get
+            {
+                Settings.EnsureProfilerProvider();
+                return Settings.ProfilerProvider.CurrentHead;


It is possible that the current thread spawns more than one task on separate threads and wait for them. As a result technically it is possible that there would be more than one separate head for a miniprofiler instance (one for each spawned thread). This code offloads managing heads to the profiler provider, so if the provider handles spawning threads (i.e. it relies on AsyncLocal or CallContext) then it should be able to manage head properly.

alexsku · 2016-12-22T22:00:28Z

src/MiniProfiler.Shared/ProfileProviders/DefaultProfilerProvider.cs

 using System.Runtime.Remoting.Messaging;
-using System.Web;
+using System.Web; 
+using StackExchange.Profiling.Helpers.Net45;


Since there is AsyncLocal polyfill we can keep the implementation of the provider the same - it will be either using the polyfill or .net version of AsyncLocal

This is a pretty drastic change in behavior, from defaulting to 1 per request, to 1 per request or background thread. This would break many expectations in Stack Overflow's code, as an example.

alexsku · 2016-12-22T22:01:08Z

src/MiniProfiler.Shared/Timing.cs

        /// </summary>
        [DataMember(Order = 5)]
-        public List<Timing> Children { get; set; }
+        public List<Timing> Children


since technically there could be multiple threads working with the same Timing instance at the same time (my example above with a thread spawning two tasks and waiting for them) Children and CustomTimings need to be protected from multithreaded access. Here I added a backing field _children and protected it with _lockObject. The property always return a copy of _children so the caller can iterate through the list safely. I didn't see any code that would attempt to modify the children of a Timing via Children property so I think nothing is broken by this change

Ideally the property should return IReadOnlyList (by just casting the created List) but protobuf doesn't really work well with IReadOnlyList (due to this issue: protobuf-net/protobuf-net#90). It might be possible to use IImmutableList (backed by ImmutableArray) which is supported by protobuf. But the problem would be with CustomTimings property which returns Dictionary. IReadOnlyDictionary would not work with protobuf either, and ImmutableDictionary has significantly worse performance comparing to Dictionary which makes me reluctant to use it.

Another option is to change the property to GetChildren/SetChildren method pair to emphasize that there are not just access to the backing filed but actually copy the contents of it.

Please let me know what you prefer I can change it

This approach (like earlier comments) creates a lot of allocations. A new list per accessor really just isn't tenable from a performance or goals standpoint for the library. I'm just not sure any of the proposed solutions is lightweight enough to work.

Do you think there will be many accesses to the Children property? While collecting timings during the profiler session Children property is no longer used, instead the backing list is used, where the reallocations do not happen. I think the only time when Children property is accessed is during the processing of captured mini profiler snapshot. And even there i wouldn't expect many of the accesses. Maybe I'm missing something, could you give me examples that you think where the property will be accessed a lot?

As an example, if you have a large profiling tree then the serialization at the end will pay a very heavy toll in duplication...for no reason in the situation. We'd just be generating n copies of structures as part of a normal operation that has no racing at all.

What the lock is trying to protect is the actual reference race, but there's a premium price on the member access within that reference as a result.

I doubt the overhead of the duplication would be that high. The current code is using List to keep children and while children are added it is doing similar duplication internally potentially multiple time. This would be just one more duplication. And it is extremely fast (less than a nanosecond for small lists (less than 20 elements) in my tests)

alexsku · 2016-12-22T22:01:43Z

src/MiniProfiler.Shared/Timing.cs

        /// </summary>
        [DataMember(Order = 6)]
-        public Dictionary<string, List<CustomTiming>> CustomTimings { get; set; }
+        public Dictionary<string, List<CustomTiming>> CustomTimings


Similarly to Children CustomTimings is backed by a field (_customTimings) and is protected with _lockObject

Same as comments above - many copies and a lot of overhead added here for only a little bit of safety.

Also note that simply by the locking nature here, MiniProfiler would be causing performance problems itself. That's definitely a no-no for any profiler.

I don't think locks add much overhead actually. Most of the time it would be uncontested locking where the overhead is fairly minimal (this is an example i found someone attempted to measure it: http://kynosarges.org/LockPerformance.html) and when the locks are contested clearly we want to have them.

Also I don't think there is any alternative to using locks. As soon as the code starts using DefaultProfileProvider (from the v4 branch) there is a possibility of multiple threads accessing single Timing, the only thing the application needs to do is to send requests to a couple of data sources (e.g. redis and database) asynchronously and wait for them all at one place (like using Task.WhenAll). I'm not sure how you envision to support mitigate this without using locks.

These will add overhead. As an example we run this for every single request at Stack Overflow. So the profiler can't be in the way or causing contention. As an example, we have mobile feed threads that may returns tens of thousands of tasks in a very narrow window. - Some of them will wait on the locks.

Equally important, anyone accessing these gets a copies. This really is a no-go under any circumstances. One of the founding principles of MiniProfiler is the lightweight nature. The amount of allocations involved for operational overhead here is simply untenable. We'll have to find another approach.

In my opinion the concurrency issue that i'm trying to fix is highly likely with web applications using async actions. Here is my pr (on top of V4) that reproduces the concurrency issue in a web app: alexsku#4

In the code I follow a standard pattern where the action executes multiple async service calls and then awaits for them together using Task.WhenAll. The service calls internally make an async call with ConfigureAwait(false). The completions of those calls would execute in parallel causing the issue.

Also the overhead of the lock is very small when the lock is not contested. I wrote a small test to measure the overhead: alexsku#5 and the overhead of lock is negligible (~25 nanoseconds). I think even with the lock the MiniProfiler stays fairly lightweight.

alexsku · 2016-12-22T22:02:26Z

src/MiniProfiler/ProfileProviders/WebRequestProfilerProvider.cs

+        /// <summary>
+        /// Current head timing.
+        /// </summary>
+        public override Timing CurrentHead


I'm not sure what is the intended future of WebRequestProfilerProvider and why DefaultProfilerProvider cannot be used all the time. I added this property only to support the changes to the interface.

Another option is to "merge" WebRequestProfilerProvider and DefaultProfilerProvider, in this case WebRequestProfilerProvider.Current and WebRequestProfilerProvider.CurrentHead would be backed by AsyncLocals (or the polyfills). I don't see specific downside with this and the upside is that this provider would support async controller actions (that are using .ConfigureAwait(false) internally).

I agree it's possible WebRequest provider can go away, but it's a fairly drastic behavior change and possible weight and breaking change to do such. Consider at the moment the case where there is no profiler because of the async context breaks to backend code. If suddenly there is a profiler on massive operations, that may add significant weight and allocations (even from a stack trace strings standpoint), as well as serialization costs at the end to that path. This can easily break things.

I'm not against it per se, but it'd definitely be a top breaking change alert in the v4 notes.

I think maybe the behavior change concern can be mitigated by introducing AsyncWebRequestProfilerProvider that would support the async operations while WebRequestProfilerProvider can stay unchanged. As a result the switch to the new behavior would be an explicit action required from the user of the library

Perhaps, the name's very unintuitive (e.g. "is this only to profile async things?") but I agree that may be the sane migration approach for users.

alexsku · 2016-12-22T22:02:45Z

tests/MiniProfiler.Tests/App.config

    </DbProviderFactories>
  </system.data>
+  <appSettings>
+    <add key="xunit.parallelizeTestCollections" value="false"/>


it looks like xunit prefers to run unit tests in parallel, which causes random issues when static fields (e.g. ProfilerProvider) are getting changed. I think the best is to disable parallelization, i think this disables it (https://xunit.github.io/docs/configuring-with-xml.html)

The proper fix here is to categorize the tests, which is already happening in some cases. Disabling parallelization just hides actual issues.

this is true, however i think this might be pretty tough to do with the mini profiler library tests. The library relies on a static variable to keep the profiler provider and some of the tests replace profiler providers, if a provider is replaced in a middle of test's execution it might result in some fairly weird failures. I think pretty much all tests would need to be in the same category and in future it would be a pain to maintain the categorization, since the requirements are not quite obvious.

And even with the parallelization disabled the unit tests execute pretty fast.

I'm already doing this today with the tests that have different storages, etc. required. At a higher level: it doesn't matter if it's hard to do, it matters if it's the right thing to do. Disabling parallelization doesn't solve the problem, it masks a problem.

alexsku · 2016-12-22T22:03:05Z

tests/MiniProfiler.Tests/BaseTest.cs

            var props = from p in typeof(T).GetProperties(BindingFlags.Public | BindingFlags.Instance)
                        where p.IsDefined(typeof(System.Runtime.Serialization.DataMemberAttribute), false)
-                        && !p.PropertyType.GetInterfaces().Any(i => i.Equals(typeof(IDictionary)) || i.Equals(typeof(IList)))
+                        && !p.PropertyType.GetInterfaces().Any(i => i.Equals(typeof(IEnumerable)))


I think most of implementations of enumerable cannot be compared using Object.Equals, so i think this is the best catch all case for all enumerable (otherwise every time i try to use IImmutableList the tests fail since is not an IList)

This was explicit because, as an example, strings are IEnumerable. This would need to change back to the previous behavior.

Right, i didn't think about string :)

alexsku · 2016-12-22T22:03:48Z

tests/MiniProfiler.Tests/MiniProfilerTest.cs

+            }
+        }
+
+        [Fact]


this test is a demonstration of the scenario i'm trying to support when one thread spawns two tasks and wait for their completion. In this scenario I would expect the tasks's Timings to be children of the main thread's Timing instance

alexsku · 2016-12-22T22:04:10Z

tests/MiniProfiler.Tests/Storage/SqlServerStorageTest.cs


        public SqlServerStorageTest(SqlCeStorageFixture<SqlServerStorageTest> fixture)
        {
+            MiniProfiler.Settings.ProfilerProvider = new WebRequestProfilerProvider();


the test should be using WebRequestProfilerProvider the DefaultProfilerProvider doesn't save the stopped miniprofiler instances and fails these tests.

NickCraver · 2016-12-23T01:48:25Z

src/MiniProfiler.Shared/ClientTimings.cs

+            get
+            {
+                lock(_lockObject)
+                    return _timings == null ? null : new List<ClientTiming>(_timings);


This creates a duplicate list on every access, that's a lot of allocations for the safety.

NickCraver

Let me start with: there's some really good stuff in here. While I really like the idea of making this more multi-threaded friendly, the amount of locking in here is a no-go to achieve that. As a profiler we must above all have as little impact as possible. The locks and allocations added here would have a rather drastic impact (GC and lock stalls) to any large operations or anything with a lot of profiling.

I just don't think that in general the lock/copy approach is going to work.

Specific notes:

I'm not sure where the Immutable reference comes in, it looks like this was vestige from an earlier path?
There's good code cleanup on the storage nodes, we should pull this in.
The idea of collapsing the WebRequest provider (or possibly rebasing it on Default) is worth looking at.
Killing .ForEach() is good, and on my list overall.
AsyncLocal stub for .Net 4.5 probably is the way to go for simplification overall, not a bad idea at all.

NickCraver · 2016-12-23T01:51:10Z

src/MiniProfiler.Shared/MiniProfiler.Shared.csproj

  </ItemGroup>
  <ItemGroup>
    <PackageReference Include="NETStandard.Library" Version="1.6.1" />
+    <PackageReference Include="System.Collections.Immutable" Version="1.3.0" />


Compared to current, we're taking a package dependency where there is none today (the .Library should move under netstandard only here). The jump from 0 to 1 is pretty big.

I'm not even seeing where this is actually used...am I missing a diff?

Oops, i should remove it, it is a left over from my experiments with using immutable collections

NickCraver · 2016-12-23T01:54:06Z

src/MiniProfiler.Shared/ProfileProviders/BaseProfilerProvider.cs


+        /// <summary>
+        /// GEts or sets the current head timing. This is used by <see cref="MiniProfiler.Head"/>.
+        /// </summary>


Really not a fan of this - this is the Profiler provider, not "things about the profiler" provider. It shouldn't have knowledge or intertwined dependencies of the inner-workings, since implementations of this can and will live outside this library. Generally, this doesn't have any known internal state at all.

I understand your concern, my thinking was that there could be more than one head timing in an async scenario. However in the current design whether async scenario is supported is left up to the profiler provider, so it looks like the only way to support is to store current head in the profiler provider. Do you think it would be better to move the AsyncLocal backing the head into mini profiler?

The timing is specific to the operation, so yes, possibly making the head AsyncLocal makes the most sense. We'll have to think through if that actually breaks anything though. And the context fetch isn't free either for all requests, so it's not a clear win on the performance side.

NickCraver · 2016-12-23T01:55:47Z

src/MiniProfiler.Shared/ProfileProviders/DefaultProfilerProvider.cs

 using System.Runtime.Remoting.Messaging;
-using System.Web;
+using System.Web; 
+using StackExchange.Profiling.Helpers.Net45;


This is a pretty drastic change in behavior, from defaulting to 1 per request, to 1 per request or background thread. This would break many expectations in Stack Overflow's code, as an example.

NickCraver · 2016-12-23T02:02:55Z

src/MiniProfiler.Shared/Timing.cs

        /// </summary>
        [DataMember(Order = 5)]
-        public List<Timing> Children { get; set; }
+        public List<Timing> Children


This approach (like earlier comments) creates a lot of allocations. A new list per accessor really just isn't tenable from a performance or goals standpoint for the library. I'm just not sure any of the proposed solutions is lightweight enough to work.

NickCraver · 2016-12-23T02:04:04Z

src/MiniProfiler.Shared/Timing.cs

        /// </summary>
        [DataMember(Order = 6)]
-        public Dictionary<string, List<CustomTiming>> CustomTimings { get; set; }
+        public Dictionary<string, List<CustomTiming>> CustomTimings


Same as comments above - many copies and a lot of overhead added here for only a little bit of safety.

Also note that simply by the locking nature here, MiniProfiler would be causing performance problems itself. That's definitely a no-no for any profiler.

NickCraver · 2016-12-23T02:09:04Z

src/MiniProfiler/ProfileProviders/WebRequestProfilerProvider.cs

+        /// <summary>
+        /// Current head timing.
+        /// </summary>
+        public override Timing CurrentHead


I agree it's possible WebRequest provider can go away, but it's a fairly drastic behavior change and possible weight and breaking change to do such. Consider at the moment the case where there is no profiler because of the async context breaks to backend code. If suddenly there is a profiler on massive operations, that may add significant weight and allocations (even from a stack trace strings standpoint), as well as serialization costs at the end to that path. This can easily break things.

I'm not against it per se, but it'd definitely be a top breaking change alert in the v4 notes.

NickCraver · 2016-12-23T02:09:47Z

tests/MiniProfiler.Tests/App.config

    </DbProviderFactories>
  </system.data>
+  <appSettings>
+    <add key="xunit.parallelizeTestCollections" value="false"/>


The proper fix here is to categorize the tests, which is already happening in some cases. Disabling parallelization just hides actual issues.

NickCraver · 2016-12-23T02:11:25Z

src/MiniProfiler.Providers.SqlServer/SqlServerStorage.cs

-                    });
-                    SaveClientTimings(profiler.ClientTimings.Timings, conn);
+                    }
+                    SaveClientTimings(clientTimings, conn);


+1 - good cleanup cleanup on these.

NickCraver · 2016-12-23T02:11:42Z

src/MiniProfiler.Providers.SqlServer/SqlServerStorage.cs

-                if (profiler.ClientTimings != null && profiler.ClientTimings.Timings != null && profiler.ClientTimings.Timings.Any())
+
+                var clientTimings = profiler.ClientTimings?.Timings;
+                if (clientTimings != null && clientTimings.Any())


clientTiming?.Any() ?? false

alexsku · 2017-01-04T04:20:43Z

I found an issue describing a similar problem to which i'm trying to fix: #35
I updated the issue pointing to this PR as a possible fix

NickCraver · 2017-02-18T19:08:56Z

I'm closing this out as-is for all the comments above. The approach, unfortunately, just doesn't scale. If we used this at Stack Overflow we'd encounter an unsustainable increase in allocations. I promise to look at the issue in v4, but this methodology on the whole just isn't tenable.

alexsku added 4 commits December 20, 2016 19:32

async multithreading support

5e6bfcf

minor improvements to support mutithreading

35ecbfb

returning IImmutableList instead of List

5805953

unit tests are no longer run in parallel

reverted back to using List as return value of Children

0000723

alexsku commented Dec 22, 2016

View reviewed changes

NickCraver reviewed Dec 23, 2016

View reviewed changes

alexsku mentioned this pull request Jan 4, 2017

Timing.Children is not threadsafe. #35

Closed

NickCraver closed this Feb 18, 2017

+                          }
+                      }
+                      [Fact]

async multithreading support #147

async multithreading support #147

Uh oh!

Conversation

alexsku commented Dec 22, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NickCraver left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment