issue about RPPO

Hello, when the LSTM network is used as the basic unit, the environmental state transition model is used for sampling, and the state sequence of t steps consecutive moments is used as the input of the network (s1,..., st).
How does Recurrent ppo set this value? Can you tell me the code location?
Thanks