-
Notifications
You must be signed in to change notification settings - Fork 407
Open
Labels
discussionDiscussion of a typical issueDiscussion of a typical issuegood first issueGood for newcomersGood for newcomers
Description
This issue will keep tracking of DI-engine's updates of next few versions:
Future
- feature(wrh): add EDT code #808
- feature(xrk): add q-transformer #783
- feature(zc): add MetaDiffuser and prompt-dt #771
- feature(zjow): add envpool new pipeline #753
- feature(whl): add rlhf pipeline. #748
- refactor(gry): refactor reward model #636
- feature(whl): add PC+MCTS code #603
- feature(wgt): enable DI using torch-rpc to support GPU-p2p and RDMA-rpc #562
- feature(zms): add new league middlewares and other models and tools. #458
v0.5.4
(Expected to release in June 2025)
- fix(pu): fix noise layer's usage based on the original paper #866
- feature(pu): adapt to unizero-multitask ddp, and adapt ppo to support jericho config #860
- feature(nyz&dcy): add LLM/VLM reward model #859
- feature(nyz&dcy): add LLM/VLM RLHF loss (PPO/GRPO/RLOO) #857
- feature(wqj): add vllm collector #856
- polish(pu): delete unused enable_fast_timestep argument #855
- feature(nyz): add rlhf dataset #854
- AttributeError: 'MultiDiscrete' object has no attribute 'low' #852
- Noisy Net Issue #850
- LSTM Layer normalization #849
- feature(zjow): add Implicit Q-Learning #821
v0.5.3
- Bug in reset method #846
- Reset env_info #845
- Priority Experience Replay Bug #844
- feature(xyy):add HPT model to implement PolicyStem+DuelingHead #841
- conda安装di-engine,无法与python 3.9兼容 #838
- feature(pu): add resume_training option to allow the envstep and train_iter resume seamlessly #835
- feature(pu): add pistonball_env, its unittest and qmix config #833
- polish(TairanMK): update trading env #831
- polish(mark): add hybrid action space support to ActionNoiseWrapper #829
- feature(whl): add AWR algorithm. #828
- No Hidden Size List for ContinuousQAC? #826
- Observation shape in the custom marl environment #823
- feature(nyz): adapt DingEnvWrapper to gymnasium #817
- KeyError: 'obs_shape' #812
- feature(zym): update ppo config to support discrete action space #809
v0.5.2
- style(nyz): relax flask requirement #811
- feature(wrh): add taxi env latest version and dqn config #807
- style(hus): add new badge (hellogithub) in readme #805
- polish(zym): optimize ppo continuous act #801
- feature(wrh): add taxi env #799
- cannot run GTrXL demo since v0.5.0 #796
- doc(hus): update discord link and badge in readme #795
- docker内运行lunarlander_dqn_deploy失败 #793
- feature(nyz): add GPU utils #788
- fix(zjow): fix complex obs demo for ppo pipeline #786
- env(rjy): add ising model env #782
- feature(xrk): add new env named Flozen Lake and DQN algorithm. #781
- feature(ooo): add deprecated function decorator #778
- fix(eltociear): typo in config.py #776
- feature(nyz): add MADDPG pettingzoo example #774
v0.5.1
- feature(nyz): add MADDPG pettingzoo example #774
- polish(pu): polish comments in a2c/bcq/fqf/ibc policy #768
- polish(pu): polish NGU atari configs #767
- polish(rjy): polish pg/iqn/edac policy doc #764
- doc(zjow): polish the notation of classes and functions in torch_utils and utils #763
- doc(rjy): polish d4pg/ppg/qrdqn policy doc #762
- fix(pu): fix hppo entropy_weight to avoid nan error in log_prob #761
- doc(zjow): add API doc for ding agent #758
- feature(zjow): add qgpo policy for new DI-engine pipeline #757
- polish(rjy): polish the comments of collate_fn/profiler_helper/metric #755
- feature(luyd): fix dt new pipeline of mujoco #754
- polish(rjy): polish comments in normalizer_helper and lock_helper #752
- doc(whl): polish doc for loss, compression helper and bfs helper. #747
- polish(pu): polish comments and styles in files within torch_utils/network #745
- feature(cy): add dreamerV3 + MiniGrid code #725
- feature(rjy): add HAPPO algorithm #717
v0.5.0
- polish(zc): change PD config name #749
- polish(pu): polish comments in env_wrappers.py and ding_env_wrapper.py #742
- doc(zjow): polish ding model common/template note #741
- polish(rjy): polish comments in wqmix/ngu/pg model #739
- doc(whl): polish doc for data_helper, model_helper, parameter, metric, math_helper, backend_helper #738
- env(zjow): add Huggingface model card support for ppof envs #737
- polish(rjy): polish comments of qmix/pdqn/mavac #736
- feature(luyd): add collector logging in new pipeline #735
- doc(whl): add code doc for LT,DT,PC,BC models #734
- doc(zjow): update README.md and Colab demo #733
- polish(nyz): polish dqn and ppo comments #732
- feature(cy): polish anytrading #731
- feature(zjow): polish ppof agent code for opendilab huggingface #730
- fix(zjow): fix typo for QAC class #729
- test(luyd): add model test code #728
- style(eltociear): fix typo in optimizer_helper.py #726
- doc(zjow): polish rl_utils doc #724
- style(nyz): polish model template comments #722
- how to use logger #715
- fix(luyd): fix new pipeline impala in Lunarlander and Atari env #713
- feature(lxy): add dropout layers to dqn #712
- Error occurs when running in parallel #709
- feature(zc): add plan diffuser #700
- feature(whl): add tabmwp env and prompt pg policy #667
- feature(zjow): add new pipeline agent sac/ddpg/a2c #637
v0.4.9
- fix(lixuelin): to_ndarray fails to assign dtype for scalars #708
- feature(whl): add example of dqn eval #706
- fix(zyz): fix type spell error #704
- feature(pu): add three variants of Bilinear classes and a FiLM class #703
- feature(zjow): add middleware for ape-x structure pipeline #696
- doc(zjow): update README.md with openxlab badge #695
- refactor(lyd): refactor dt_policy in new pipeline and add img input support #693
-
enable_save_figure()
has no detailed document, and its comment is wrong. #688 - polish(nyz): polish offpolicy RL multi-gpu DDP training #679
- feature(nyz): fix py37 macos ci bug and update default pytorch to 1.12.1 #678
- feature(cxy): add cliffwalking env #677
- feature(cy): add tensor stream merging tools #673
- polish(nyz): simplify requirements #672
- feature(zp): add dreamerv3 algorithm #652
- feature(zc): add bcq algorithm #640
v0.4.8
- HELP: failed to run gym_hybrid_pdqn_config..py #664
- feature(nyz): add MAPPO/MASAC task example #661
- polish(pu): add LN and GN norm_type support in ResBlock #660
- fix(zjow): update td3bc d4rl config #659
- fix(pu): fix last_linear_layer_weight_bias_init_zero in MLP and add its unittest #650
- 运行DI-Zoo里面写好的config报错 #646
- feature(zjow): add PPO demo for complex env observation #644
- feature(zc): add EDAC and modify config of td3bc #639
- Missing of Regularization of GCL #627
- feature(wgt): add barrier middleware #570
v0.4.7
- fix(nyz): fix confusing shallow copy operation bug about next_obs #641
- feature(wgt): add torch-rpc fix dockerfile #628
- feature(whl): add gpt utils #625
- polish(gry): polish reward model and td error #624
- refactor(nyz): remove policy cfg sub fields requirements #620
- polish(gry): polish dqn config #611
- Tag v0.4.6 use __version__ 0.4.5 #609
- feature(nyz): add PPOF ch4 reward demo suuport #608
- feature(lxy): add popart & value rescale & symlog to ppof #605
- style(elt): fix typo in time_helper.py #602
- Document of SAC default config need to be updated #601
- fix(psharold): unsqueeze action_args in PDQN when shape is 1 #599
- fix(nyz): update ptz to latest version #597
- The "serial_pipeline" method has a small bug #592
- feature(lxy): modify ppof rewardclip and add atari config #589
- feature(gry): add MDQN algorithm #590
- Problem with DQfD doc #585
- MPDQN throw a RuntimeError #583
- feature(zjow): add wandb logger features; fix relative bugs for wandb online logger #579
- Examples for using Imitation Learning / Offline RL Algorithms #576
- fix(lisong): fix icm/rnd+onppo config bugs and app_key env bugs #564
- feature(whl): add PC algorithm #514
- feature(rjy): add dmc2gym+sac baseline both in state input and pixel input #451
v0.4.6
- feature(nyz): add ppof ch3 demo #581
- feature(nyz): setup evogym docker #580
- polish(whl): Update cartpole config #578
- feature(gry): add acrobot env and dqn config #577
- feature(zc): add carracing in box2d #575
- feature(zt): add metadrive-simulator env and related onppo config #574
- feature(lxy): add procedure cloning model #573
- style(eltociear): fix typo in contrastive_loss.py #572
- polish(nyz): polish example demos #568
- feature(nyz): add PPOF new interface support #567
- fix(nyz): fix py38 unittest bugs #565
- feature(nyz): add new gym hybrid viz #563
- feature(cy): add BDQ algorithm #558
v0.4.5
- feature(nyz): add policy gradient algo implementation #544
- feature(zjow): add load and save method for replaybuffer #542
- feature(wyh): madqn algorithm #540
- feature(whl): add more DingEnvWrapper example #525
- feature(nyz): add evaluator more info viz support #538
- env(zjow): add env gym_pybullet_drones #526
- env(lisong): add beergame supply chain optimization env #512
- ModuleNotFoundError: No module named 'ding' #535
- feature(zjow): add trackback log for subprocess env manager #534
- feature(nyz): add MADDPG algo #550
- Continuous action example for IMPALA #543
- polish(zjow): rename 'eval reward' -> 'episode return' #536
- feature(nyz): add new middleware distributed demo #321
Metadata
Metadata
Assignees
Labels
discussionDiscussion of a typical issueDiscussion of a typical issuegood first issueGood for newcomersGood for newcomers