Hello, when the LSTM network is used as the basic unit, the environmental state transition model is used for sampling, and the state sequence of t steps consecutive moments is used as the input of the network (s1,..., st).
How does Recurrent ppo set this value? Can you tell me the code location?
Thanks
Hello, when the LSTM network is used as the basic unit, the environmental state transition model is used for sampling, and the state sequence of t steps consecutive moments is used as the input of the network (s1,..., st).
How does Recurrent ppo set this value? Can you tell me the code location?
Thanks