You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I shelved my massive, broken cleanup PR, I set out to try again, with smaller steps, more commits, more testing, to make sure I didn't break anything.
My very first move was to simply remove the NuGet dependencies within the library - to strictly make them reference each other as projects and allow more tight coupling and rapid development. After doing so, I observed an extremely sharp decline in the ability for the CartPole-v1 example to "learn" and make meaningful progress, even with the exact same hyperparameters. I am unsure exactly which version of RLMatrix this regression became noticeable. My memory is a little foggy because I performed this test several weeks ago, but I believe as the current master branch code references a 0.4.x version, I simply randomly chose an older version, 0.2.0, and this issue went away.
I wrote a very simple example demonstrating this issue on the nouveau-2.0 of my fork of RLMatrix. A video demonstrating the situation can be found below. Simply changing Old to false in CartPole-v1.csproj creates a stark contrast in performance. I would be happy to provide relevant logs or other debugging information if needed, finding details that could explain this is a little out of my range of expertise.
2025-03-18.19-54-57-00.00.00.000-00.01.40.967.mp4
There absolutely could be something I'm missing here! I'm hoping that maybe I'm misusing the newer code in some way that makes it inconsistent with the older code.
Hello,
When I shelved my massive, broken cleanup PR, I set out to try again, with smaller steps, more commits, more testing, to make sure I didn't break anything.
My very first move was to simply remove the NuGet dependencies within the library - to strictly make them reference each other as projects and allow more tight coupling and rapid development. After doing so, I observed an extremely sharp decline in the ability for the CartPole-v1 example to "learn" and make meaningful progress, even with the exact same hyperparameters. I am unsure exactly which version of RLMatrix this regression became noticeable. My memory is a little foggy because I performed this test several weeks ago, but I believe as the current master branch code references a
0.4.xversion, I simply randomly chose an older version,0.2.0, and this issue went away.I wrote a very simple example demonstrating this issue on the
nouveau-2.0of my fork of RLMatrix. A video demonstrating the situation can be found below. Simply changingOldtofalseinCartPole-v1.csprojcreates a stark contrast in performance. I would be happy to provide relevant logs or other debugging information if needed, finding details that could explain this is a little out of my range of expertise.2025-03-18.19-54-57-00.00.00.000-00.01.40.967.mp4
There absolutely could be something I'm missing here! I'm hoping that maybe I'm misusing the newer code in some way that makes it inconsistent with the older code.