Skip to content

Conversation

@alexsku
Copy link

@alexsku alexsku commented Dec 22, 2016

This is my attempt to improve multithreading support for miniprofiler v4. I think the current version of the mini profiler would have issues in scenarios where the current thread spawns more than one task and waits for them.

        [Fact]
        public void Head_WhenMultipleTasksSpawned_EachSetsItsOwnHead()
        {
            MiniProfiler.Settings.ProfilerProvider = new DefaultProfilerProvider();
            var allTasks = new SemaphoreSlim(0, 1);
            var completed = new TaskCompletionSource<int>();
            using (GetRequest("http://localhost/Test.aspx", startAndStopProfiler: false))
            {
                MiniProfiler.Start();
                var sut = MiniProfiler.Current;
                var head = sut.Head;

                Task.Run(() => {
                    Assert.Equal(head, sut.Head);
                    using (sut.Step("test1"))
                    {
                        allTasks.Release();
                        completed.Task.Wait();
                    }
                });
                allTasks.Wait();
                Task.Run(() => {
                    using (sut.Step("test2"))
                    {
                        allTasks.Release();
                        completed.Task.Wait();
                    }
                });
                allTasks.Wait();

                Assert.NotNull(sut.Head);
                Assert.Equal(head, sut.Head);
                completed.SetResult(0);
            }
        }

In this scenario there would be multiple threads working with the mini profiler (since they would share it via AsyncLocal). This could lead to random failures since Timing class is not thread safe. Also this could lead to inconsistencies since the spawned tasks would replace the Head timing for the mini profiler instance, with one of the tasks's timing potentially becoming a child of another's.

/// Unlike DbContextScope's implementation this implementation doesn't support serialization/deserialization
/// ths not allowing to cross app domain barrier.
/// </summary>
internal class AsyncLocal<T>
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a polyfill for AsyncLocal which is not available before .Net 4.6

get
{
Settings.EnsureProfilerProvider();
return Settings.ProfilerProvider.CurrentHead;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that the current thread spawns more than one task on separate threads and wait for them. As a result technically it is possible that there would be more than one separate head for a miniprofiler instance (one for each spawned thread). This code offloads managing heads to the profiler provider, so if the provider handles spawning threads (i.e. it relies on AsyncLocal or CallContext) then it should be able to manage head properly.

using System.Runtime.Remoting.Messaging;
using System.Web;
using System.Web;
using StackExchange.Profiling.Helpers.Net45;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is AsyncLocal polyfill we can keep the implementation of the provider the same - it will be either using the polyfill or .net version of AsyncLocal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty drastic change in behavior, from defaulting to 1 per request, to 1 per request or background thread. This would break many expectations in Stack Overflow's code, as an example.

/// </summary>
[DataMember(Order = 5)]
public List<Timing> Children { get; set; }
public List<Timing> Children
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since technically there could be multiple threads working with the same Timing instance at the same time (my example above with a thread spawning two tasks and waiting for them) Children and CustomTimings need to be protected from multithreaded access. Here I added a backing field _children and protected it with _lockObject. The property always return a copy of _children so the caller can iterate through the list safely. I didn't see any code that would attempt to modify the children of a Timing via Children property so I think nothing is broken by this change

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the property should return IReadOnlyList (by just casting the created List) but protobuf doesn't really work well with IReadOnlyList (due to this issue: protobuf-net/protobuf-net#90). It might be possible to use IImmutableList (backed by ImmutableArray) which is supported by protobuf. But the problem would be with CustomTimings property which returns Dictionary. IReadOnlyDictionary would not work with protobuf either, and ImmutableDictionary has significantly worse performance comparing to Dictionary which makes me reluctant to use it.

Another option is to change the property to GetChildren/SetChildren method pair to emphasize that there are not just access to the backing filed but actually copy the contents of it.

Please let me know what you prefer I can change it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach (like earlier comments) creates a lot of allocations. A new list per accessor really just isn't tenable from a performance or goals standpoint for the library. I'm just not sure any of the proposed solutions is lightweight enough to work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there will be many accesses to the Children property? While collecting timings during the profiler session Children property is no longer used, instead the backing list is used, where the reallocations do not happen. I think the only time when Children property is accessed is during the processing of captured mini profiler snapshot. And even there i wouldn't expect many of the accesses. Maybe I'm missing something, could you give me examples that you think where the property will be accessed a lot?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an example, if you have a large profiling tree then the serialization at the end will pay a very heavy toll in duplication...for no reason in the situation. We'd just be generating n copies of structures as part of a normal operation that has no racing at all.

What the lock is trying to protect is the actual reference race, but there's a premium price on the member access within that reference as a result.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt the overhead of the duplication would be that high. The current code is using List to keep children and while children are added it is doing similar duplication internally potentially multiple time. This would be just one more duplication. And it is extremely fast (less than a nanosecond for small lists (less than 20 elements) in my tests)

/// </summary>
[DataMember(Order = 6)]
public Dictionary<string, List<CustomTiming>> CustomTimings { get; set; }
public Dictionary<string, List<CustomTiming>> CustomTimings
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to Children CustomTimings is backed by a field (_customTimings) and is protected with _lockObject

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as comments above - many copies and a lot of overhead added here for only a little bit of safety.

Also note that simply by the locking nature here, MiniProfiler would be causing performance problems itself. That's definitely a no-no for any profiler.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think locks add much overhead actually. Most of the time it would be uncontested locking where the overhead is fairly minimal (this is an example i found someone attempted to measure it: http://kynosarges.org/LockPerformance.html) and when the locks are contested clearly we want to have them.

Also I don't think there is any alternative to using locks. As soon as the code starts using DefaultProfileProvider (from the v4 branch) there is a possibility of multiple threads accessing single Timing, the only thing the application needs to do is to send requests to a couple of data sources (e.g. redis and database) asynchronously and wait for them all at one place (like using Task.WhenAll). I'm not sure how you envision to support mitigate this without using locks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will add overhead. As an example we run this for every single request at Stack Overflow. So the profiler can't be in the way or causing contention. As an example, we have mobile feed threads that may returns tens of thousands of tasks in a very narrow window. - Some of them will wait on the locks.

Equally important, anyone accessing these gets a copies. This really is a no-go under any circumstances. One of the founding principles of MiniProfiler is the lightweight nature. The amount of allocations involved for operational overhead here is simply untenable. We'll have to find another approach.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion the concurrency issue that i'm trying to fix is highly likely with web applications using async actions. Here is my pr (on top of V4) that reproduces the concurrency issue in a web app: alexsku#4

In the code I follow a standard pattern where the action executes multiple async service calls and then awaits for them together using Task.WhenAll. The service calls internally make an async call with ConfigureAwait(false). The completions of those calls would execute in parallel causing the issue.

Also the overhead of the lock is very small when the lock is not contested. I wrote a small test to measure the overhead: alexsku#5 and the overhead of lock is negligible (~25 nanoseconds). I think even with the lock the MiniProfiler stays fairly lightweight.

/// <summary>
/// Current head timing.
/// </summary>
public override Timing CurrentHead
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what is the intended future of WebRequestProfilerProvider and why DefaultProfilerProvider cannot be used all the time. I added this property only to support the changes to the interface.

Another option is to "merge" WebRequestProfilerProvider and DefaultProfilerProvider, in this case WebRequestProfilerProvider.Current and WebRequestProfilerProvider.CurrentHead would be backed by AsyncLocals (or the polyfills). I don't see specific downside with this and the upside is that this provider would support async controller actions (that are using .ConfigureAwait(false) internally).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's possible WebRequest provider can go away, but it's a fairly drastic behavior change and possible weight and breaking change to do such. Consider at the moment the case where there is no profiler because of the async context breaks to backend code. If suddenly there is a profiler on massive operations, that may add significant weight and allocations (even from a stack trace strings standpoint), as well as serialization costs at the end to that path. This can easily break things.

I'm not against it per se, but it'd definitely be a top breaking change alert in the v4 notes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe the behavior change concern can be mitigated by introducing AsyncWebRequestProfilerProvider that would support the async operations while WebRequestProfilerProvider can stay unchanged. As a result the switch to the new behavior would be an explicit action required from the user of the library

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, the name's very unintuitive (e.g. "is this only to profile async things?") but I agree that may be the sane migration approach for users.

</DbProviderFactories>
</system.data>
<appSettings>
<add key="xunit.parallelizeTestCollections" value="false"/>
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like xunit prefers to run unit tests in parallel, which causes random issues when static fields (e.g. ProfilerProvider) are getting changed. I think the best is to disable parallelization, i think this disables it (https://xunit.github.io/docs/configuring-with-xml.html)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proper fix here is to categorize the tests, which is already happening in some cases. Disabling parallelization just hides actual issues.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is true, however i think this might be pretty tough to do with the mini profiler library tests. The library relies on a static variable to keep the profiler provider and some of the tests replace profiler providers, if a provider is replaced in a middle of test's execution it might result in some fairly weird failures. I think pretty much all tests would need to be in the same category and in future it would be a pain to maintain the categorization, since the requirements are not quite obvious.

And even with the parallelization disabled the unit tests execute pretty fast.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm already doing this today with the tests that have different storages, etc. required. At a higher level: it doesn't matter if it's hard to do, it matters if it's the right thing to do. Disabling parallelization doesn't solve the problem, it masks a problem.

var props = from p in typeof(T).GetProperties(BindingFlags.Public | BindingFlags.Instance)
where p.IsDefined(typeof(System.Runtime.Serialization.DataMemberAttribute), false)
&& !p.PropertyType.GetInterfaces().Any(i => i.Equals(typeof(IDictionary)) || i.Equals(typeof(IList)))
&& !p.PropertyType.GetInterfaces().Any(i => i.Equals(typeof(IEnumerable)))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most of implementations of enumerable cannot be compared using Object.Equals, so i think this is the best catch all case for all enumerable (otherwise every time i try to use IImmutableList the tests fail since is not an IList)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was explicit because, as an example, strings are IEnumerable. This would need to change back to the previous behavior.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, i didn't think about string :)

}
}

[Fact]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is a demonstration of the scenario i'm trying to support when one thread spawns two tasks and wait for their completion. In this scenario I would expect the tasks's Timings to be children of the main thread's Timing instance


public SqlServerStorageTest(SqlCeStorageFixture<SqlServerStorageTest> fixture)
{
MiniProfiler.Settings.ProfilerProvider = new WebRequestProfilerProvider();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test should be using WebRequestProfilerProvider the DefaultProfilerProvider doesn't save the stopped miniprofiler instances and fails these tests.

get
{
lock(_lockObject)
return _timings == null ? null : new List<ClientTiming>(_timings);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a duplicate list on every access, that's a lot of allocations for the safety.

Copy link
Member

@NickCraver NickCraver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me start with: there's some really good stuff in here. While I really like the idea of making this more multi-threaded friendly, the amount of locking in here is a no-go to achieve that. As a profiler we must above all have as little impact as possible. The locks and allocations added here would have a rather drastic impact (GC and lock stalls) to any large operations or anything with a lot of profiling.

I just don't think that in general the lock/copy approach is going to work.

Specific notes:

  • I'm not sure where the Immutable reference comes in, it looks like this was vestige from an earlier path?
  • There's good code cleanup on the storage nodes, we should pull this in.
  • The idea of collapsing the WebRequest provider (or possibly rebasing it on Default) is worth looking at.
  • Killing .ForEach() is good, and on my list overall.
  • AsyncLocal stub for .Net 4.5 probably is the way to go for simplification overall, not a bad idea at all.

</ItemGroup>
<ItemGroup>
<PackageReference Include="NETStandard.Library" Version="1.6.1" />
<PackageReference Include="System.Collections.Immutable" Version="1.3.0" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared to current, we're taking a package dependency where there is none today (the .Library should move under netstandard only here). The jump from 0 to 1 is pretty big.

I'm not even seeing where this is actually used...am I missing a diff?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, i should remove it, it is a left over from my experiments with using immutable collections


/// <summary>
/// GEts or sets the current head timing. This is used by <see cref="MiniProfiler.Head"/>.
/// </summary>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really not a fan of this - this is the Profiler provider, not "things about the profiler" provider. It shouldn't have knowledge or intertwined dependencies of the inner-workings, since implementations of this can and will live outside this library. Generally, this doesn't have any known internal state at all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your concern, my thinking was that there could be more than one head timing in an async scenario. However in the current design whether async scenario is supported is left up to the profiler provider, so it looks like the only way to support is to store current head in the profiler provider. Do you think it would be better to move the AsyncLocal backing the head into mini profiler?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timing is specific to the operation, so yes, possibly making the head AsyncLocal makes the most sense. We'll have to think through if that actually breaks anything though. And the context fetch isn't free either for all requests, so it's not a clear win on the performance side.

using System.Runtime.Remoting.Messaging;
using System.Web;
using System.Web;
using StackExchange.Profiling.Helpers.Net45;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty drastic change in behavior, from defaulting to 1 per request, to 1 per request or background thread. This would break many expectations in Stack Overflow's code, as an example.

/// </summary>
[DataMember(Order = 5)]
public List<Timing> Children { get; set; }
public List<Timing> Children
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach (like earlier comments) creates a lot of allocations. A new list per accessor really just isn't tenable from a performance or goals standpoint for the library. I'm just not sure any of the proposed solutions is lightweight enough to work.

/// </summary>
[DataMember(Order = 6)]
public Dictionary<string, List<CustomTiming>> CustomTimings { get; set; }
public Dictionary<string, List<CustomTiming>> CustomTimings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as comments above - many copies and a lot of overhead added here for only a little bit of safety.

Also note that simply by the locking nature here, MiniProfiler would be causing performance problems itself. That's definitely a no-no for any profiler.

/// <summary>
/// Current head timing.
/// </summary>
public override Timing CurrentHead
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's possible WebRequest provider can go away, but it's a fairly drastic behavior change and possible weight and breaking change to do such. Consider at the moment the case where there is no profiler because of the async context breaks to backend code. If suddenly there is a profiler on massive operations, that may add significant weight and allocations (even from a stack trace strings standpoint), as well as serialization costs at the end to that path. This can easily break things.

I'm not against it per se, but it'd definitely be a top breaking change alert in the v4 notes.

</DbProviderFactories>
</system.data>
<appSettings>
<add key="xunit.parallelizeTestCollections" value="false"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proper fix here is to categorize the tests, which is already happening in some cases. Disabling parallelization just hides actual issues.

});
SaveClientTimings(profiler.ClientTimings.Timings, conn);
}
SaveClientTimings(clientTimings, conn);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 - good cleanup cleanup on these.

if (profiler.ClientTimings != null && profiler.ClientTimings.Timings != null && profiler.ClientTimings.Timings.Any())

var clientTimings = profiler.ClientTimings?.Timings;
if (clientTimings != null && clientTimings.Any())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clientTiming?.Any() ?? false

@alexsku
Copy link
Author

alexsku commented Jan 4, 2017

I found an issue describing a similar problem to which i'm trying to fix: #35
I updated the issue pointing to this PR as a possible fix

@NickCraver
Copy link
Member

I'm closing this out as-is for all the comments above. The approach, unfortunately, just doesn't scale. If we used this at Stack Overflow we'd encounter an unsustainable increase in allocations. I promise to look at the issue in v4, but this methodology on the whole just isn't tenable.

@NickCraver NickCraver closed this Feb 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants