Skip to content

Adds some new features #7

Closed
vybhav-ibr wants to merge 29 commits intojgillick:mainfrom
vybhav-ibr:develop
Closed

Adds some new features #7
vybhav-ibr wants to merge 29 commits intojgillick:mainfrom
vybhav-ibr:develop

Conversation

@vybhav-ibr
Copy link
Copy Markdown

adds some new features, such a sensors managers, more control types like (velocity control, force control, hybrid control), more command-types(position tracking, pose tracking), adds support for direct deployment to ros2 using topic based control.
Adds examples demonstrating learning with the skrl library, add examples showcasing how sensors can be used as part of the observations,
Note: currently the skrl example adapted from the 'simple' example doesn't converge, i am not sure if it requires some hyper-parameter tuning, or there is a bug, based on the recorded video it looks like the agent keeps resetting.
I have not included any examples for most of the other sensors, the position, pose commands or the new velocity and force control, I will add them after a successful merge. some doc-strings for the newly added features are incomplete/wrong, so the docs generated may be inadequate

…rol type specific dof indices and dof names, add support for drone entities, add support for direct deploymenet to ros using the ros2-topic-based-control
…ng, add a rudementary imu-sensor based reward for rewarding stability
…ption, add a function to convert from observation dict to a gymnasium space, modify the action space to to have multiple gymnaisum box spaces one for each observation type
@jgillick
Copy link
Copy Markdown
Owner

jgillick commented Nov 24, 2025

Thanks for submitting this PR, I really appreciate the contribution.

For future reference, it's best to break up big PRs into much smaller iterative ones. A PR with 6k+ lines is massive and not quick or easy to review.

Integrating with ROS is really interesting; I've not used ROS. I'd love to learn more about how your Genesis Forge environments integrate with that system.

I took a quick look through the PR, and here's some high-level feedback for us to get started on:

  • I just made some changes in the main branch, so be sure to merge those into your branch.
  • Be sure to run the Black python formatter on your code, so that it follows formatting standards.
  • Avoid single-letter variable names.
  • Address TODO comments; we don't want these merged into main.
  • The examples will need to converge before we merge them into main. I can try to help with that.
  • Keep the examples to the bare minimum necessary code, so it's clear to other people what is required vs. optional. Your examples seem to include ROS statements that aren't used in the example. Since ROS isn't part of the minimum example, remove those bits in the example code.

Sensors

I'm impressed by all the new sensors you've added. However, I'm not sure that adding them all as full managers is necessary. In general, managers are intended to encapsulate complex systems. If they're just wrappers around Genesis sensors, and used in observations/rewards/terminations, it might be best to just initialize the sensors in your environment and pass the Genesis sensor instance to the observation/reward/termination functions. You can see an example in this environment, which uses a lidar height sensor.

That said, the camera sensor manager you've added does seem to do some step-specific render handling, so this might be a good use for a manager. To simplify the manager and prevent us from duplicating all the sensor parameters in the initializer, it would be best to initialize the cameras in your environment and then pass them into the manager.

This is a good start. Thanks again for the contribution.

@vybhav-ibr
Copy link
Copy Markdown
Author

Thanks for the feedback, sorry for the long PR, since a lot of the features added are a variation of already existing features i figured it would be alright, but i will keep it in mind. I will format the code and edit/add the necessary docstrings.

Regarding the ROS integration, I will try to guard any imports or ros-specific code, the ros imports in the examples will be removed, I will also add a separate example for deployment with ros.
The ros specific code used here is quite basic, the only ros concepts used here are 'publishers' and 'subscribers'. Publishers are used for sending messages and subscribers are for receiving messages, a topic is like a pathway between the publishers and the subscribers. In this case a JointState message is used, The message type is like a standard archetype which is understood by both the publisher and the subscribers, If a ros2 managed env is used, the observations like 'base_pos', 'base_vel', 'joint_pos' for the network are received by the subscriber, and the actions are sent to the robot using the publisher. I think this feature is really helpful. I will be testing the ros-stuff more

Regarding the sensors, i initially though of basing all the sensors of the lidar example you already have, but i decided on using separate managers for the following reasons,

  1. future ros-integration, if the camera is passed as an argument, it would not be very intuitive to create a class for the interface between ros and genesis, and pass an instance of the class to the observation manager
  2. To model realistic sensor performance, we should use sensors which send data at a specified frequency, if the camera is rendered at every step. it would be computationally expensive and it would not be a representation of realistic sensor performance,
  3. Having separate sensor managers would be helpful for domain randomization, We could for instance augment the images rendered by the camera(as common in computer vision) for training a more robust feature detector, similar modifications can be done to the lidar sensors for adding noise(there is some inherent noise in the sensor), but changing the noise profile for each agent would make the RL algorithm more generalizable, It would also be nice to buffer the sensor data and and choose a random entry from the last n entries(I am currently not doing it as i am concerned about the VRAM usage), which is also more representative of a real sensor. Although these are currently not part of the code, having separate managers would make it easier to implement something like this in the future or for the user to extend upon the code

I will leave the 'skrl' and 'skrl_with_sensors' examples to you, I couldn't get it to converge, i think the issue is in the SKRL env wrapper

is "prevent us from duplicating all the sensor parameters in the initializer" a huge concern ?, they have default values, so the user would only need to provide the args if they need to.Realistically, i would only expect the user to provide the entity_name, link_name, frequency, and in some case resolution. If this is not intuitive, I could add an example(s)/README.md which uses the sensors,

Hope this clears up any issues that need addressing, let me know if any further changes are required

@jgillick
Copy link
Copy Markdown
Owner

is "prevent us from duplicating all the sensor parameters in the initializer" a huge concern ?

For me, it is for a couple of reasons.

  1. If we force the initialization of sensors to be created inside the manager, then anytime Genesis updates the sensor function parameters (adds/changes features), Genesis Forge will need to be updated to get these updates. It also makes backwards compatibility much harder. It essentially locks Genesis and Genesis Forge versions together.
  2. For me, the goal of Genesis Forge is not to abstract Genesis away from the developer, but to provide very useful utilities and tooling to be used with Genesis. It should be really easy to see all the things that are in the scene. They should be built and configured using Genesis and then passed into Genesis Forge where necessary. In many cases, just passed as a parameter to a reward function (etc).

You can see an example of this with the EntityManager. This manager could have been created to initialize the entity itself, but instead, the entity is created with the scene and then passed in. It also avoids us having to support all the ways and edge cases for initializing a rigid entity.

For me, there needs to be a very strong reason to create wrapper abstractions around existing functionality.

Having separate sensor managers would be helpful for domain randomization

Genesis has some domain randomization built into some of its sensors already. Not all sensors, but I wouldn't be surprised if they don't plan to add more.

If you're interested in domain randomization for observations, you can already add noise to observable values.

To model realistic sensor performance, we should use sensors which send data at a specified frequency, if the camera is rendered at every step. it would be computationally expensive and it would not be a representation of realistic sensor performance,

This is why I said that the camera manager was one that probably makes sense. Because it's doing more than just wrapping the Genesis sensor and is handling how frequently it updates. That said, this might also make sense to be included in Genesis directly.

future ros-integration, if the camera is passed as an argument, it would not be very intuitive to create a class for the interface between ros and genesis, and pass an instance of the class to the observation manager

I'd love to understand this more. In this case, how is ROS using the sensor?

@jgillick
Copy link
Copy Markdown
Owner

If a ros2 managed env is used, the observations like 'base_pos', 'base_vel', 'joint_pos' for the network are received by the subscriber, and the actions are sent to the robot using the publisher. I think this feature is really helpful. I will be testing the ros-stuff more

That's pretty cool. I'd love to see an example of this, if you have one. Even better would be a video. So, if I'm understanding this correctly, ROS essentially creates the observations, and then the environment passes the processed actions back to ROS to send to the actuators. Is this correct?

One thought I've been having along Sim2Real is, instead of baking the deploy code inside of Genesis Forge (because then we need to support all systems & edge cases), use the action manager directly:

# Load the policy (see eval script)
runner = OnPolicyRunner(env, cfg, log_path, device=gs.device)
runner.load(model)
policy = runner.get_inference_policy(device=gs.device)

while True:
      # Get observations from ROS
      obs = generate_observations(env) 
      
      # Get actions and convert them to actuator positions
      actions = policy(obs)
      actuator_pos = env.action_manager.step(actions)
      
      # ...send to actuators...

With this approach, the base environment is dedicated to training, but the action manager can still be used to translate the policy values to actuator positions or torques. Would this method work well for your ROS systems?

@jgillick
Copy link
Copy Markdown
Owner

What's the purpose of classes Ros2Env and Ros2ManagedEnvironment? From what I can tell, they're just duplicates of GenesisEnv and ManagedEnvironment.

@jgillick
Copy link
Copy Markdown
Owner

I took a pass at adding SKRL to the simple example. Does this branch PR converge for you: #10

cd examples/simple/

python ./train_skrl.py
python ./eval_skrl.py

@vybhav-ibr
Copy link
Copy Markdown
Author

Got it!, based on your comments i think the best approach is to create the sensors and pass them to the respective senor manager. As your justification for using the camera sensor manager(reading the sensor data at a specified frequency) is currently impplemented for all the sensors.

At the moment only the IMU sensor and the contact sensor have configurable sensor noise, the ray based sensors and the rasteriser camera dont have any noise added, apart from the minute inherent measurement noise, Usually raybsed sensors in a simulator are considered to have no noise, and artificial noise is added for training a robust policy.

If the user created a RL policy which uses any of the sensors, ros-integartion would be required for running the RL policy with the data recieved from the sensors on the real robot, the sensor manager could be modified to 'subscribe' to a ros topic and pass that recived data when ever the 'sensor_manager.get_rgb_img()' is called, if we were to not have the sensor_manager, We would need to create another class which mimics the 'gs.vis.camera.Camera' object, this class would need to have have a 'render' or 'read' method whch returns the data. So we would need a ros-interface class for each of the sensors to ensure deployment compatability with ros, I just dont think this is a very intutuve or elegant solution. but it can be done, Keep in mind that this is only reqruied for deployment with ros,

That's pretty cool. I'd love to see an example of this, if you have one. Even better would be a video. So, if I'm understanding this correctly, ROS essentially creates the observations, and then the environment passes the processed actions back to ROS to send to the actuators. Is this correct

It don't have a real robot, i will try to create an example with the gazebo simulator(commonly used with ros). and your understand of the ros functionality is spot on,

What's the purpose of classes Ros2Env and Ros2ManagedEnvironment? From what I can tell, they're just duplicates of GenesisEnv and ManagedEnvironment.

They are based on the Genesis env, the added functionality being to remove ant of the genesis specific stuff when deploying with ros. Unlike Genesis env the scene.build method and the scene.step methods are not called when using the ros2-env, as there is no scene to build or step, the ros2-env also has some additional class attributes mainly for informing the actiion manager, observationn manager etc to not run any genessis specific code. The ideas is that the user would train with a ManagedEnv and then for deployment with ros, the user would use the Ros2Managed env

# Load the policy (see eval script)
runner = OnPolicyRunner(env, cfg, log_path, device=gs.device)
runner.load(model)
policy = runner.get_inference_policy(device=gs.device)

while True:
      # Get observations from ROS
      obs = generate_observations(env) 
      
      # Get actions and convert them to actuator positions
      actions = policy(obs)
      actuator_pos = env.action_manager.step(actions)
      
      # ...send to actuators...

we could do it this way, but currently i am using the actuator manager, as the actions are sent to ros as a JointState message archetype, The message needs the joint names, and the joint control type, The jointStatemessage has a joint names field, joint positions filed, joint velocities filed, and joint effort field. if the joint positions has three values, then it is assumed that the first three joints from the joint names are position controlled, the next three joints are velocity controlled and so on.

It is convenient to have the ros-specific functions in the all the managers(these will only be called when the env passed to them is of type Ros2Env), this way the transition between the genesis training and the ros2 deployment is very much plug and play.
ROS is also cross platform so users could use ros with linux, mac-os or windows. If all the imports and ros-specific functions are guarded and with some exception handling there shouldn't be any issues caused by ROS,

If you are not onboard with ros-specific code in any of the managers, we could create copies of all the managers with the ros-stuff, so for training the user would use Action manager and for deployment with ros,the user would use Ros2ActionManager, this will double the amount of code, with minimal benefit

@jgillick
Copy link
Copy Markdown
Owner

This all is very interesting and I'm wrapping my head around the variety of ways that we could do this. Here's my current brainstorming, so far:

Initial thoughts...

I think it's interesting to view the training environment as the same script that is deployed with the robot. On one hand, most of the code in training environments is use for training (rewards/terminations) and would not be necessary for the deployed robot. On the other hand, there's something elegant about being able to reuse the code as-is.

Genesis Forge attempts to be lightweight, where possible, and not tied to a particular deployment system. That said, I see the value in having some lightweight methods for integrating ROS with a training environment.

Sensors

With the sensors you've defined, it looks like they're all basically doing the same thing and rate limiting the reads. There are two ways we could simplify this:

One sensor manager

Instead of creating a manager for every sensor type, there could be a single sensor manager that is used for all sensors. All it does is adds rate limiting to the read call.

imu_manager = SensorManager(
    sensor=self.imu, 
    frequency=0.2,
)

ObservationManager(
    self,
    cfg={
        "imu": {
            "fn": imu_manager.observtion
        }
    }
)

The problem with this approach is that not all sensors have the same read functions. For example, you might want to use read_image from a camera. Having a sensor manager for rate limiting, and optionally noise, seems a little cumbersome.

Observation function class

The dedicated manager might be a big overkill for just limiting observation frequency. In this case, we could define this as a observation function class. The nice thing about this is that it could be extended to things outside of Genesis sensors.

Here's an example of what that might look like

class RateLimitedObservation(MdpFnClass):
    def __init__(
        self,
        env: GenesisEnv,
        read: Callable,
        frequency: float
    ):
        super().__init__(env)
        self.last_data = None
        self.next_read_time = 0

    def __call__(
        self,
        env: GenesisEnv,
        read: Callable,
        frequency: float
    ):
        if self.last_data is None or self.env.scene.cur_t >= self.next_read_time:
            self.last_data = read()
            self.next_read_time = self.env.scene.cur_t + self.frequency
        return self.last_data

# ...

ObservationManager(
    self,
    cfg={
        "imu": {
            "fn": RateLimitedObservation,
            "params": {
                "read": lambda: imu.read(),
                "frequency": 0.5,
            },
        },
    },
)

I think I prefer this approach because it's simple and should work for nearly anything we want to observe.

ROS integration

For this, if I'm understanding things correctly, the only things we need from the training environment are the actuator, actions, and observation managers -- everything else should be disabled.

Looking at the actuator manager in your PR, it looks like the required ROS changes have nearly doubled the size of the file. It makes me wonder if there's a more elegant solution. I haven't dug into it too deeply yet, but is this trying to fetch the real actuator state from ROS, as well as matching the actuator name to the action index?

If you need to know are which actuator belongs to with action returned from the action manager, you can get that from <ActuatorManager>.dofs_names. This property will return the actuator joint names, in the same order as the actions.

If you need the actual actuator state values for observations, how would you feel about passing a subscribable topic to the observation manager, instead (I'm making some assumptions about how this works in ROS):

ObservationManager(
    self,
    cfg={
        "dof_position": {
            "fn": lambda env: self.actuator_manager.get_dofs_position(),
            "ros_topic": "/joint_position",
        },
    },
)

In theory, this could work for any values, including sensors. If ROS is enabled, the ObservationManager would pull the values from the ROS topic, otherwise, it would use the function. Alternatively, the observation name could potentially be the ROS topic, instead of adding a dedicated config value.

Overhead concerns

I think one of the flaws of the current actuator manager (and, really, all parts of Genesis Forge) is that it pulls the actuator names directly from the simulator model, and requires the genesis simulator to be running. For deployment, having genesis running might be more overhead than you want. Especially, if you're trying to run a trained model, you'll want to dedicate as much of the robot's CPU/GPU to the task at hand. I don't have an immediate solution to this, but it's something I'm going to think about in greater detail.

Generated deployment script

Another crazy thought, is that a robot deployment file or config is generated during training, which can be used entirely without Genesis. The nice thing with this approach is that it prevents having conditional ROS code around all the things connecting to genesis or creating domain randomization or sensor rate limiting.

This script would automatically be wired to the trained model, and have defined inputs (observations) and outputs (actuator action values). If made generic enough, it could also be used for robots which aren't using ROS (like a robot I'm building right now).

It could be called as a simple function:

env = TrainEnvironment()
while True
    actuator_values = env.step({
        "imu": ...,
        "dof_position": ...,
    })
    # ... send actions to ROS ...

This function would pass the observations, in the correct order, to the trained model, and then process the returned actions through the action manager in order to return the actuator values.

I'm not entirely sure how this would be created, but it would create some separation between training and deployed environments, while ensuring they still have important parity.

ROS Wrapper

Alternatively, a ROS wrapper environment could be created that would modify the base environment if ROS is enabled. This would:

  • Disable rewards, terminations, and reset managers
  • Update the ObservationManager to replace all observations with functions that return data from ROS, not the simulator.
  • Pass actions back to ROS to be sent to the actuators.

You can learn more about wrappers, here.


What are your thoughts?

@vybhav-ibr
Copy link
Copy Markdown
Author

vybhav-ibr commented Nov 26, 2025

Some initial clarifications about full RL-ROS deployment workflow i planned.
1, The user would train just as they do normally
2, The user would modify the env file to make their main Env class inherit from the Ros2ManagedEnv instasead of the ManagedEnv used during training.
3, The user would run the eval.py, just as they would normally

I think it would be possible to remove the rewards in the Ros2ManagedEnv or even the ManagedEnv for that manager, just need to disable the reward manager based on the Env type (we can base this on the EnvMode literal, which is not currently being used, as far as i am aware), this will help with any overhead

using one sensor manager

If one sensor manager per sensor type is too bulky, using one manager seems like a good alternative, we would have one manager which accepts a sensor or a dict of senors. The senor manager would have a 'step' method and based on the last_read_timestep, we will 'read' which ever sensor is necessary, The 'read' would be an overloaded internal method which accepts the sensor object, and additional args(in the case of the camera). There would also be an option to add some basic noise to the reading if required.

ObservationManager(
    self,
    cfg={
        "imu": {
            "fn": RateLimitedObservation,
            "params": {
                "read": lambda: imu.read(),
                "frequency": 0.5,
            },
        },
    },
)

I dont really like this approach, for all the senors except the camera this would be fine, but for the camera we would need to render all the cameras in a loop, which would be equal to the num_envs, and then concat the readings from all the cameras into an np array, convert to a tensor and then pass that. I think this should be abstracted from the user.

Even for the 'one-sensor manager' approach, the user would need to pass an array of camera in the case of the camera sensors, so we would need to accept a sensor object(list of sensor objects in the case of the camera sensor), or a dict of sensor objects(in the case of the camera sensor this would be dict of list of camera-senor objects). This can be done, but it is a bit of hassle, which is worth it as it decreases the number of sensor managers and make the code more maintainable in the future.

ROS Integartion

it looks like the required ROS changes have nearly doubled the size of the file.

Some increase in the LOC is to be expected, as we are creating a 'publisher' for the sending the data to ROS, and a 'subscriber' for getting the data from ros. The subscriber is required because the [dof_pos ,dof_vel, dof_forces] values for the observation are also received from the actuator manager. And around half of the increase in LOC can be attributed to the newly added features. For instance one of the properties of the actuator manager were the dof_names, I needed to add three more properties [pos_dof_names, vel_dof_names, force_dof_names, pos_dof_id, vel_dof_idx,force_dof_idx], there were not required previously because only position control was supported, A similar situation for the control_dofs_position, i have added control_dofs_velocity and control_dof_force. Even if we remove the ros-specific stuff, the LOC will be larger because the actuator manager was incomplete before. And i don't think an increase in the LOC alone makes code inelegant, but i could try to decrease the LOC by a small amount

If you need the actual actuator state values for observations, how would you feel about passing a subscribable topic to the observation manager, instead (I'm making some assumptions about how this works in ROS)

We can pass the topic as a parameter, But the observation manager would need to have a 'subscriber' to receive the data. I will try to explain with a simple analogy, Assume that a ros-message (in our case a message of archetype JointState) is your suitcase and you are at the airport baggage claim, A ros-topic would be the conveyer thingy, you would be the 'subscriber' and the staff member loading the bags on the conveyer would be the 'publisher'. Hope this clears up any doubts

Addressing overhead concerns

I share some of the concerns you stated, My proposed solution is that during the ros based policy inference(when the user is using the Ros2ManagedEnv) the scene,build and scene.step functions would not be called, so the simulator is not actually running(and for using the properties like robot.links, robot.joints, etc, the scene doesn't have to be running, the scene doesn't even have to be built for that matter. This has very minimal overhead, if any), I am not apposed to the idea of generating a config file either Isaaclab does something similar, perhaps you could refer to their implementation for inspiration, A config file generated would passed to the Ros2Env and GenericDeploymentEnv, We could store all the genesis-specific data such as dof_names, joint_names, joint actuation_types etc as dict and dumpy to yaml, shouldn't be too hard to implement

Personally i don't see anybody running torch rl policies on the edge, as most robots don't have a GPU, without a workstation. The common practice is to use a cpu optimized model such as ONNX, but this is good thing to optimise

env = TrainEnvironment()
while True
    actuator_values = env.step({
        "imu": ...,
        "dof_position": ...,
    })

This is interesting, i will think about this more, if we can make it work, The deployment would only be require python and torch, and we could deploy the policy on a rasberry pi for instances

ROS Wrapper

This is similar to my plan for the Ros2ManagedEnv, I will modify it to not calculate rewards, terminations etc. I will make this optional

Update the ObservationManager to replace all observations withfunctions that return data from ROS, not the simulator.

Currently the managers such as action_managers and actuator_managers choose whether to send the data to ROS or send the data to genesis based on the Env type. I will try to se how a RosWrapper could be used instead of a Ros2ManagedEnv, I will report back with my findings

After I share my findings(may take upto 2 days), i think we should do one final checklist and then I could start implementing

@jgillick
Copy link
Copy Markdown
Owner

jgillick commented Nov 26, 2025

The user would modify the env file to make their main Env class inherit from the Ros2ManagedEnv

If the developer is transitioning Sim2Real frequently, this approach sounds kinda heavy, or at least a nuisance, to have to edit the file between training and deploy and back again. If creating a wrapper doesn't work, another approach is:

  • Extract the environment observation, actuator, and action managers into their own files
  • Import them into the training environment
  • Create a ROS specific environment file that imports them.
  • Now you can edit the manager configs from a central place, and be used in both environments.

This way, you don't have to change the base class each time you want to switch from training to deploy.

I think it would be possible to remove the rewards in the Ros2ManagedEnv or even the ManagedEnv for that manager, just need to disable the reward manager based on the Env type

This was actually the original intention of the env type and could be useful here. 😃

I dont really like this approach, for all the senors except the camera this would be fine, but for the camera we would need to render all the cameras in a loop, which would be equal to the num_envs, and then concat the readings from all the cameras into an np array, convert to a tensor and then pass that. I think this should be abstracted from the user.

If it works for all sensors but the camera, then I'd start there. If the class is purely used for observations, a manager is not the correct tool, but an observation function class is.

For cameras, I'm not sure the loop to create a camera for each environment, and then manually batch the observation together, is the correct or most efficient method. Have you tried this in an environment that has reached convergence using cameras? Have you tried (batched rendering with gs-madrona](https://genesis-world.readthedocs.io/en/latest/user_guide/getting_started/visualization.html#batch-rendering-with-gs-madrona)? For this particular case, I'd look to Genesis for batching.

Some increase in the LOC is to be expected

I agree, but at this point, we're making the actuator manager highly dependent on ROS. It also means that any changes to this file are likely to break either the training or ROS implementation. One way around this is to extend the class to create a ROS specific one that overrides some of the functions, instead of peppering them with ROS conditions. But, I'm also not sure this would be necessary. If, essentially, the ROS changes in this file are for observations, then you're probably making more work for yourself by trying to make the actuator manager ROS aware. Instead, you just need to get the actuator states to the observation manager.

Regarding the non-ROS features, the way the *_dofs_idx functions are defined, they will perform the lookup every time the property is accessed, which is not very efficient. I think there's a better way to do this, but we can address that after we figure out the overall direction of things. A better way to do this would be to perform one loop in the build phase, and aggregate them by control type there. It would also be good to have a type hint for the supported control type strings.

It would be really helpful to me, if I could see the environment you built this for. That way, I could see how you've defined the actuator manager. Is your environment in github?

Isaaclab does something similar, perhaps you could refer to their implementation for inspiration

Absolutely. A lot of the inspiration for Genesis Forge came from IsaacLab. I've spent a lot of time in their source code, and I like how their code isn't tied to any deployment platform, like ROS. However, I know there is a ROS bridge that links the environment to the ROS system. The beautiful thing about this approach is that the deployment system is not hardcoded into the internals.

@jgillick
Copy link
Copy Markdown
Owner

jgillick commented Dec 1, 2025

Good news, it looks like the Genesis team has updated their camera sensor for batching. Here's their example with multiple camera configurations, and the original PR. This update should remove the need to manually mount a camera for each environment and batch the camera rendering -- ultimately, improving speed of your environment significantly.

@vybhav-ibr
Copy link
Copy Markdown
Author

hey sorry for the delay, i was busy with other things, I have made the required modifications, but it is still a draft, i would like to hear your though about the approach and the progress

here is everything that has been changed

1, removed the sensors managers and use dthe Mdp function as you suggested, now that camera is a supported sensor, all the sensors can be read without any issues
2, made most of the code genesis agnostic, for deployment we can use the whole genesis-forge repo without installing genesis on the robot
3, I have investigated whether to use a wrapper for ROS, it was possible but i decided against it, as it wasnt really necessary.
4, With the current approach user would not need to modify the env file between the four actions '[training(with genesis),evaluation(with genesis),playing(with genesis, without rewards, terminations, resets) and deploy(without genesis, without rewards, terminations, resets)]'
5, refactor of the actuator manager, changes made include ActuatorConfig class, making it easier for loading from a config(not yet implemented), cache the dofs_names, dofs_idxs, remove any ros-specific code, support both getting the values from the genesis_entity and the config files
6, added an example for the ros_based deployment, I tested to see if the values are being sent and received with an accompanying dummy ros_node
7, override the config function in env class in the eval file,

steps to test

use the 'pr' branch of the repo go to the new_simple example, complete the training, evaluate the policy(use the eval.py with the env_mode arg set to 'eval'), play the policy(use the eval.py with the env_mode arg set to 'play'). The eval_fixed provides a ros_based deployment, look at the function overriding for handling the actions and the
config overriding function for modifying the managers,

what is to be done

1, fix doc_strings, formatting and cleanup
2, config loading and whether to use a custom data type as arguments for the managers, i would like to do this for all managers(what is your opinion here about this?)
3, modify all the example accordingly
4, add example for deploying without ros or genesis, pure python

After we finalize the uncertainties, i will complete and open a PR.

@jgillick
Copy link
Copy Markdown
Owner

Thanks for the update. I don't see any new commits in the PR or a pr branch in the repo you linked. Have you pushed your changes up to github?

I think, at this stage, it would be best to break this PR up into several smaller PRs. This PR contains so much code, with significant architectural changes, that it'll be hard to discuss and merge them all at once. While smaller PRs will allow us to discuss each feature and merge the smaller ones quickly, while giving us time to refine the other ones more effectively.

Of the top of my head, here's the how I would break it up (you might want to break it up further, since I haven't had the chance to fully go through all 6,000 lines):

  • Sensor MDP function class, with functional example
  • New reward functions
  • Action/Actuator managers (this might need to be broken up into individual chunks)
  • ROS integration

Make sure each PR satisfies the following checklist:

  • Run the code through the Black python code formatter.
  • The code should be tested and reach convergence. Experimental code that doesn't reach convergence will not be merged. I'm happy to help where I can.
  • Examples are helpful. Here's a SKRL example you can look at: Add SKRL example #10
  • Doc strings should be up-to-date and relevant for the code/functions
  • No TODO comments
  • Meticulously review your own PR after submitting it. It's something I do with my own PRs and I always find bugs and cleanup in my own code when I do.

Over the next month, I'll be spending time doing some Sim2Real work with one of my environments. This might help some of our conversations with ROS2.

Thanks for all your work and interest!

@vybhav-ibr
Copy link
Copy Markdown
Author

I agree this PR is too long, I will split into 4-5 individual ones. Before that I would like to make sure i have your input about some of the features, I will open the PRs after following all the guidelines and when the code is at a stage where minimal further changes would be required

I seemed to have linked the wrong repo, here is the correct one repo. Use 'pr' branch

thoughts about these ?

  1. make every manger accept a custom type as the arg, this would make it easier to load from a config in the future, one disadvantage is no direct parameter hinting, although the docstring still has the definition
class ActuatorConfig(TypedDict):
    env: GenesisEnv
    joint_names: Union[List[str], str]
    default_pos: Union[float, NoisyValue, Dict[str, float]]
    ... so on

class ActuatorManager(BaseManager):
    """
    Example::

        class MyEnv(GenesisEnv):
            def __init__(self, *args, **kwargs):
                super().__init__(*args, **kwargs)

                actuator_config = {
                    "env": self,
                    "joint_names": ".*",
                    "kv": 0.5,
                    "dofs_limit: {"joint1":(-1.57, 1.57), "joint2":(-1.57, 1.57),}
                    "max_force": 8.0,
                }
                self.actuator_manager = ActuatorManager(actuator_config)
    """
    def __init__(
        self,
        actuator_config: ActuatorConfig
    ) -> None:
  1. use of a single Env,ManagedEnv class, and the utilization of EnvMode to decide whether to calculate rewards, resets, terminations(refer to the GenesisEnv and GenesisManagedEnv files)
  2. config method overriding, in the eval.py file to make the RL policy receive observation from the ros_node(or a custom communication interface), refer to the eval_fixed.py in new_simple example in the aforementioned repo to see the implementation, the tests using the ros-interface were successful!
  3. work towards making genesis-forge installable directly on the robot. so that the end user would be able to use all the managers, even without genesis installed. We could provide two install type "pip install genesis-forge["genesis"]" and "pip install genesis-forge["edge"]. This is helpful for sim2real testing to see if different observation noises/scales/offsets, have an affect on policy performance. I have implemented this an an alternative to your comment about 'generating a python script to deploy on the edge'

the LOC for the current version has decreased significantly, I think around 2.5k lines(excluding the changes to the examples and docstrings), spread across multiple PRs, it should be relatively easy to review

@jgillick
Copy link
Copy Markdown
Owner

jgillick commented Dec 11, 2025

make every manger accept a custom type as the arg, this would make it easier to load from a config in the future, one disadvantage is no direct parameter hinting, although the docstring still has the definition

This is a clever idea, but let's hold off on the config object method for the time being. This concept optimizes for the "on-robot" experience, while potentially negatively affecting the training development experience. The first experience a developer has with Genesis Forge training environments should be straight forward and intuitive to implement with full type hinting.

I need to spend a little more time exploring this side of the development pipeline before I can offer much guidance on that.

use of a single Env,ManagedEnv class, and the utilization of EnvMode to decide whether to calculate rewards, resets, terminations(refer to the GenesisEnv and GenesisManagedEnv files)

This can be handled by the developer of the environment, which means Genesis Forge doesn't need to be opinionated about this.

For example:

RewardManager(
    enabled=self.mode == 'train'
    ...
)
  1. config method overriding, in the eval.py file to make the RL policy receive observation from the ros_node(or a custom communication interface), refer to the eval_fixed.py in new_simple example in the aforementioned repo to see the implementation, the tests using the ros-interface were successful!

Isn't that the same as #1? I feel like we can do this more simply. I'm going to prototype a few things, and then I'll link you to an idea.

  1. work towards making genesis-forge installable directly on the robot.

This is an interesting consideration. I don't think it's a top priority, but I'd be interested to see what this looks like.

@jgillick
Copy link
Copy Markdown
Owner

jgillick commented Dec 11, 2025

I've pulled together a couple prototype PRs that might help influence your work:

POC Sim2Real Concept

PR: #11

The eval_deploy.py script doesn't actually do anything when run, but demonstrates how the environment could be deployed, with sensor values injected into the observation manager and actuator values extracted from the action manager.

Adapted to your example, it would look something like this:

env = Go2SimpleEnv(num_envs=1, headless=False, env_mode="deploy")

obs = env.observation_manager.observation_cfg
obs["angle_velocity"].fn = lambda env: ros_interface.get_angular_velocity(),
obs["linear_velocity"].fn = lambda env: ros_interface.get_linear_velocity(),
obs["projected_gravity"].fn = lambda env: ros_interface.get_projected_gravity(),
obs["dof_position"].fn = lambda env: ros_interface.get_dofs_position(),
obs["dof_velocity"].fn = lambda env: ros_interface.get_dofs_velocity(),

# ...

obs, _ = env.reset()
with torch.no_grad():
    while True:
        rclpy.spin_once(ros_interface, timeout_sec=0)
        actions = policy(obs)
        obs, _rews, _dones, _infos = env.step(actions)
        
        ros_interface._pos_actions = env.action_manager.get_actions()

Multiple action managers

PR: #12

This is another take on having multiple action managers. The idea here is that the actuator manager should not be aware of the control method of the actuators (position, force, velocity). This removes the need for it to be responsible for categorizing the actuators into explicit groups or to need a hybrid action manager to control them. In this case, you can define each action manager separately, and use the actuator_filter argument to select which actuators will be controlled by it. (I might rename actuator_filter)

For example:

self.actuator_manager = ActuatorManager(
    self,
    joint_names=[
        ".*_wheel",
        ".*_arm_joint"
    ],
    kp=20,
    kv=0.5,
)

self.wheel_actions = VelocityActionManager(
    self,
    actuator_filter=[ ".*_wheel" ],
    actuator_manager=self.actuator_manager,
)

self.arm_actions = PositionActionManager(
    self,
    actuator_filter=[ ".*_arm_joint" ],
    actuator_manager=self.actuator_manager,
    scale=0.25,
    clip=(-100.0, 100.0),
    use_default_offset=True,
)

If I adapted this approach to the Sim2Real deploy example, from above, you can easily get the processed actuator values for deploy, like this (assuming you have 3 action managers):

obs, _ = env.reset()
with torch.no_grad():
    while True:
        rclpy.spin_once(ros_interface, timeout_sec=0)
        actions = policy(obs)
        obs, _rews, _dones, _infos = env.step(actions)

        ros_interface._pos_actions = env.pos_action_manager.get_actions()
        ros_interface._vel_actions  = env.vel_action_manager.get_actions()
        ros_interface._force_actions = env.force_action_manager.get_actions()

@jgillick jgillick closed this Jan 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants