Conversation
… one type of control
…cific code to a conditional block
…rol type specific dof indices and dof names, add support for drone entities, add support for direct deploymenet to ros using the ros2-topic-based-control
…ng, add a rudementary imu-sensor based reward for rewarding stability
…added sensor manager
…ption, add a function to convert from observation dict to a gymnasium space, modify the action space to to have multiple gymnaisum box spaces one for each observation type
|
Thanks for submitting this PR, I really appreciate the contribution. For future reference, it's best to break up big PRs into much smaller iterative ones. A PR with 6k+ lines is massive and not quick or easy to review. Integrating with ROS is really interesting; I've not used ROS. I'd love to learn more about how your Genesis Forge environments integrate with that system. I took a quick look through the PR, and here's some high-level feedback for us to get started on:
SensorsI'm impressed by all the new sensors you've added. However, I'm not sure that adding them all as full managers is necessary. In general, managers are intended to encapsulate complex systems. If they're just wrappers around Genesis sensors, and used in observations/rewards/terminations, it might be best to just initialize the sensors in your environment and pass the Genesis sensor instance to the observation/reward/termination functions. You can see an example in this environment, which uses a lidar height sensor. That said, the camera sensor manager you've added does seem to do some step-specific render handling, so this might be a good use for a manager. To simplify the manager and prevent us from duplicating all the sensor parameters in the initializer, it would be best to initialize the cameras in your environment and then pass them into the manager. This is a good start. Thanks again for the contribution. |
|
Thanks for the feedback, sorry for the long PR, since a lot of the features added are a variation of already existing features i figured it would be alright, but i will keep it in mind. I will format the code and edit/add the necessary docstrings. Regarding the ROS integration, I will try to guard any imports or ros-specific code, the ros imports in the examples will be removed, I will also add a separate example for deployment with ros. Regarding the sensors, i initially though of basing all the sensors of the lidar example you already have, but i decided on using separate managers for the following reasons,
I will leave the 'skrl' and 'skrl_with_sensors' examples to you, I couldn't get it to converge, i think the issue is in the SKRL env wrapper is "prevent us from duplicating all the sensor parameters in the initializer" a huge concern ?, they have default values, so the user would only need to provide the args if they need to.Realistically, i would only expect the user to provide the entity_name, link_name, frequency, and in some case resolution. If this is not intuitive, I could add an example(s)/README.md which uses the sensors, Hope this clears up any issues that need addressing, let me know if any further changes are required |
For me, it is for a couple of reasons.
You can see an example of this with the EntityManager. This manager could have been created to initialize the entity itself, but instead, the entity is created with the scene and then passed in. It also avoids us having to support all the ways and edge cases for initializing a rigid entity. For me, there needs to be a very strong reason to create wrapper abstractions around existing functionality.
Genesis has some domain randomization built into some of its sensors already. Not all sensors, but I wouldn't be surprised if they don't plan to add more. If you're interested in domain randomization for observations, you can already add noise to observable values.
This is why I said that the camera manager was one that probably makes sense. Because it's doing more than just wrapping the Genesis sensor and is handling how frequently it updates. That said, this might also make sense to be included in Genesis directly.
I'd love to understand this more. In this case, how is ROS using the sensor? |
That's pretty cool. I'd love to see an example of this, if you have one. Even better would be a video. So, if I'm understanding this correctly, ROS essentially creates the observations, and then the environment passes the processed actions back to ROS to send to the actuators. Is this correct? One thought I've been having along Sim2Real is, instead of baking the deploy code inside of Genesis Forge (because then we need to support all systems & edge cases), use the action manager directly: # Load the policy (see eval script)
runner = OnPolicyRunner(env, cfg, log_path, device=gs.device)
runner.load(model)
policy = runner.get_inference_policy(device=gs.device)
while True:
# Get observations from ROS
obs = generate_observations(env)
# Get actions and convert them to actuator positions
actions = policy(obs)
actuator_pos = env.action_manager.step(actions)
# ...send to actuators...With this approach, the base environment is dedicated to training, but the action manager can still be used to translate the policy values to actuator positions or torques. Would this method work well for your ROS systems? |
|
What's the purpose of classes |
|
I took a pass at adding SKRL to the simple example. Does this branch PR converge for you: #10 cd examples/simple/
python ./train_skrl.pypython ./eval_skrl.py |
|
Got it!, based on your comments i think the best approach is to create the sensors and pass them to the respective senor manager. As your justification for using the camera sensor manager(reading the sensor data at a specified frequency) is currently impplemented for all the sensors. At the moment only the IMU sensor and the contact sensor have configurable sensor noise, the ray based sensors and the rasteriser camera dont have any noise added, apart from the minute inherent measurement noise, Usually raybsed sensors in a simulator are considered to have no noise, and artificial noise is added for training a robust policy. If the user created a RL policy which uses any of the sensors, ros-integartion would be required for running the RL policy with the data recieved from the sensors on the real robot, the sensor manager could be modified to 'subscribe' to a ros topic and pass that recived data when ever the 'sensor_manager.get_rgb_img()' is called, if we were to not have the sensor_manager, We would need to create another class which mimics the 'gs.vis.camera.Camera' object, this class would need to have have a 'render' or 'read' method whch returns the data. So we would need a ros-interface class for each of the sensors to ensure deployment compatability with ros, I just dont think this is a very intutuve or elegant solution. but it can be done, Keep in mind that this is only reqruied for deployment with ros,
It don't have a real robot, i will try to create an example with the gazebo simulator(commonly used with ros). and your understand of the ros functionality is spot on, What's the purpose of classes Ros2Env and Ros2ManagedEnvironment? From what I can tell, they're just duplicates of GenesisEnv and ManagedEnvironment. They are based on the Genesis env, the added functionality being to remove ant of the genesis specific stuff when deploying with ros. Unlike Genesis env the scene.build method and the scene.step methods are not called when using the ros2-env, as there is no scene to build or step, the ros2-env also has some additional class attributes mainly for informing the actiion manager, observationn manager etc to not run any genessis specific code. The ideas is that the user would train with a ManagedEnv and then for deployment with ros, the user would use the Ros2Managed env we could do it this way, but currently i am using the actuator manager, as the actions are sent to ros as a JointState message archetype, The message needs the joint names, and the joint control type, The jointStatemessage has a joint names field, joint positions filed, joint velocities filed, and joint effort field. if the joint positions has three values, then it is assumed that the first three joints from the joint names are position controlled, the next three joints are velocity controlled and so on. It is convenient to have the ros-specific functions in the all the managers(these will only be called when the env passed to them is of type Ros2Env), this way the transition between the genesis training and the ros2 deployment is very much plug and play. If you are not onboard with ros-specific code in any of the managers, we could create copies of all the managers with the ros-stuff, so for training the user would use Action manager and for deployment with ros,the user would use Ros2ActionManager, this will double the amount of code, with minimal benefit |
|
This all is very interesting and I'm wrapping my head around the variety of ways that we could do this. Here's my current brainstorming, so far: Initial thoughts...I think it's interesting to view the training environment as the same script that is deployed with the robot. On one hand, most of the code in training environments is use for training (rewards/terminations) and would not be necessary for the deployed robot. On the other hand, there's something elegant about being able to reuse the code as-is. Genesis Forge attempts to be lightweight, where possible, and not tied to a particular deployment system. That said, I see the value in having some lightweight methods for integrating ROS with a training environment. SensorsWith the sensors you've defined, it looks like they're all basically doing the same thing and rate limiting the reads. There are two ways we could simplify this: One sensor managerInstead of creating a manager for every sensor type, there could be a single sensor manager that is used for all sensors. All it does is adds rate limiting to the imu_manager = SensorManager(
sensor=self.imu,
frequency=0.2,
)
ObservationManager(
self,
cfg={
"imu": {
"fn": imu_manager.observtion
}
}
)The problem with this approach is that not all sensors have the same read functions. For example, you might want to use Observation function classThe dedicated manager might be a big overkill for just limiting observation frequency. In this case, we could define this as a observation function class. The nice thing about this is that it could be extended to things outside of Genesis sensors. Here's an example of what that might look like class RateLimitedObservation(MdpFnClass):
def __init__(
self,
env: GenesisEnv,
read: Callable,
frequency: float
):
super().__init__(env)
self.last_data = None
self.next_read_time = 0
def __call__(
self,
env: GenesisEnv,
read: Callable,
frequency: float
):
if self.last_data is None or self.env.scene.cur_t >= self.next_read_time:
self.last_data = read()
self.next_read_time = self.env.scene.cur_t + self.frequency
return self.last_data
# ...
ObservationManager(
self,
cfg={
"imu": {
"fn": RateLimitedObservation,
"params": {
"read": lambda: imu.read(),
"frequency": 0.5,
},
},
},
)I think I prefer this approach because it's simple and should work for nearly anything we want to observe. ROS integrationFor this, if I'm understanding things correctly, the only things we need from the training environment are the actuator, actions, and observation managers -- everything else should be disabled. Looking at the actuator manager in your PR, it looks like the required ROS changes have nearly doubled the size of the file. It makes me wonder if there's a more elegant solution. I haven't dug into it too deeply yet, but is this trying to fetch the real actuator state from ROS, as well as matching the actuator name to the action index? If you need to know are which actuator belongs to with action returned from the action manager, you can get that from If you need the actual actuator state values for observations, how would you feel about passing a subscribable topic to the observation manager, instead (I'm making some assumptions about how this works in ROS): ObservationManager(
self,
cfg={
"dof_position": {
"fn": lambda env: self.actuator_manager.get_dofs_position(),
"ros_topic": "/joint_position",
},
},
)In theory, this could work for any values, including sensors. If ROS is enabled, the ObservationManager would pull the values from the ROS topic, otherwise, it would use the function. Alternatively, the observation name could potentially be the ROS topic, instead of adding a dedicated config value. Overhead concernsI think one of the flaws of the current actuator manager (and, really, all parts of Genesis Forge) is that it pulls the actuator names directly from the simulator model, and requires the genesis simulator to be running. For deployment, having genesis running might be more overhead than you want. Especially, if you're trying to run a trained model, you'll want to dedicate as much of the robot's CPU/GPU to the task at hand. I don't have an immediate solution to this, but it's something I'm going to think about in greater detail. Generated deployment scriptAnother crazy thought, is that a robot deployment file or config is generated during training, which can be used entirely without Genesis. The nice thing with this approach is that it prevents having conditional ROS code around all the things connecting to genesis or creating domain randomization or sensor rate limiting. This script would automatically be wired to the trained model, and have defined inputs (observations) and outputs (actuator action values). If made generic enough, it could also be used for robots which aren't using ROS (like a robot I'm building right now). It could be called as a simple function: env = TrainEnvironment()
while True
actuator_values = env.step({
"imu": ...,
"dof_position": ...,
})
# ... send actions to ROS ...This function would pass the observations, in the correct order, to the trained model, and then process the returned actions through the action manager in order to return the actuator values. I'm not entirely sure how this would be created, but it would create some separation between training and deployed environments, while ensuring they still have important parity. ROS WrapperAlternatively, a ROS wrapper environment could be created that would modify the base environment if ROS is enabled. This would:
You can learn more about wrappers, here. What are your thoughts? |
|
Some initial clarifications about full RL-ROS deployment workflow i planned. I think it would be possible to remove the rewards in the Ros2ManagedEnv or even the ManagedEnv for that manager, just need to disable the reward manager based on the Env type (we can base this on the EnvMode literal, which is not currently being used, as far as i am aware), this will help with any overhead using one sensor managerIf one sensor manager per sensor type is too bulky, using one manager seems like a good alternative, we would have one manager which accepts a sensor or a dict of senors. The senor manager would have a 'step' method and based on the last_read_timestep, we will 'read' which ever sensor is necessary, The 'read' would be an overloaded internal method which accepts the sensor object, and additional args(in the case of the camera). There would also be an option to add some basic noise to the reading if required. I dont really like this approach, for all the senors except the camera this would be fine, but for the camera we would need to render all the cameras in a loop, which would be equal to the num_envs, and then concat the readings from all the cameras into an np array, convert to a tensor and then pass that. I think this should be abstracted from the user. Even for the 'one-sensor manager' approach, the user would need to pass an array of camera in the case of the camera sensors, so we would need to accept a sensor object(list of sensor objects in the case of the camera sensor), or a dict of sensor objects(in the case of the camera sensor this would be dict of list of camera-senor objects). This can be done, but it is a bit of hassle, which is worth it as it decreases the number of sensor managers and make the code more maintainable in the future. ROS Integartion
Some increase in the LOC is to be expected, as we are creating a 'publisher' for the sending the data to ROS, and a 'subscriber' for getting the data from ros. The subscriber is required because the [dof_pos ,dof_vel, dof_forces] values for the observation are also received from the actuator manager. And around half of the increase in LOC can be attributed to the newly added features. For instance one of the properties of the actuator manager were the dof_names, I needed to add three more properties [pos_dof_names, vel_dof_names, force_dof_names, pos_dof_id, vel_dof_idx,force_dof_idx], there were not required previously because only position control was supported, A similar situation for the control_dofs_position, i have added control_dofs_velocity and control_dof_force. Even if we remove the ros-specific stuff, the LOC will be larger because the actuator manager was incomplete before. And i don't think an increase in the LOC alone makes code inelegant, but i could try to decrease the LOC by a small amount
We can pass the topic as a parameter, But the observation manager would need to have a 'subscriber' to receive the data. I will try to explain with a simple analogy, Assume that a ros-message (in our case a message of archetype JointState) is your suitcase and you are at the airport baggage claim, A ros-topic would be the conveyer thingy, you would be the 'subscriber' and the staff member loading the bags on the conveyer would be the 'publisher'. Hope this clears up any doubts Addressing overhead concernsI share some of the concerns you stated, My proposed solution is that during the ros based policy inference(when the user is using the Ros2ManagedEnv) the scene,build and scene.step functions would not be called, so the simulator is not actually running(and for using the properties like robot.links, robot.joints, etc, the scene doesn't have to be running, the scene doesn't even have to be built for that matter. This has very minimal overhead, if any), I am not apposed to the idea of generating a config file either Isaaclab does something similar, perhaps you could refer to their implementation for inspiration, A config file generated would passed to the Ros2Env and GenericDeploymentEnv, We could store all the genesis-specific data such as dof_names, joint_names, joint actuation_types etc as dict and dumpy to yaml, shouldn't be too hard to implement Personally i don't see anybody running torch rl policies on the edge, as most robots don't have a GPU, without a workstation. The common practice is to use a cpu optimized model such as ONNX, but this is good thing to optimise This is interesting, i will think about this more, if we can make it work, The deployment would only be require python and torch, and we could deploy the policy on a rasberry pi for instances ROS WrapperThis is similar to my plan for the Ros2ManagedEnv, I will modify it to not calculate rewards, terminations etc. I will make this optional
Currently the managers such as action_managers and actuator_managers choose whether to send the data to ROS or send the data to genesis based on the Env type. I will try to se how a RosWrapper could be used instead of a Ros2ManagedEnv, I will report back with my findings After I share my findings(may take upto 2 days), i think we should do one final checklist and then I could start implementing |
If the developer is transitioning Sim2Real frequently, this approach sounds kinda heavy, or at least a nuisance, to have to edit the file between training and deploy and back again. If creating a wrapper doesn't work, another approach is:
This way, you don't have to change the base class each time you want to switch from training to deploy.
This was actually the original intention of the env type and could be useful here. 😃
If it works for all sensors but the camera, then I'd start there. If the class is purely used for observations, a manager is not the correct tool, but an observation function class is. For cameras, I'm not sure the loop to create a camera for each environment, and then manually batch the observation together, is the correct or most efficient method. Have you tried this in an environment that has reached convergence using cameras? Have you tried (batched rendering with gs-madrona](https://genesis-world.readthedocs.io/en/latest/user_guide/getting_started/visualization.html#batch-rendering-with-gs-madrona)? For this particular case, I'd look to Genesis for batching.
I agree, but at this point, we're making the actuator manager highly dependent on ROS. It also means that any changes to this file are likely to break either the training or ROS implementation. One way around this is to extend the class to create a ROS specific one that overrides some of the functions, instead of peppering them with ROS conditions. But, I'm also not sure this would be necessary. If, essentially, the ROS changes in this file are for observations, then you're probably making more work for yourself by trying to make the actuator manager ROS aware. Instead, you just need to get the actuator states to the observation manager. Regarding the non-ROS features, the way the It would be really helpful to me, if I could see the environment you built this for. That way, I could see how you've defined the actuator manager. Is your environment in github?
Absolutely. A lot of the inspiration for Genesis Forge came from IsaacLab. I've spent a lot of time in their source code, and I like how their code isn't tied to any deployment platform, like ROS. However, I know there is a ROS bridge that links the environment to the ROS system. The beautiful thing about this approach is that the deployment system is not hardcoded into the internals. |
|
Good news, it looks like the Genesis team has updated their camera sensor for batching. Here's their example with multiple camera configurations, and the original PR. This update should remove the need to manually mount a camera for each environment and batch the camera rendering -- ultimately, improving speed of your environment significantly. |
|
hey sorry for the delay, i was busy with other things, I have made the required modifications, but it is still a draft, i would like to hear your though about the approach and the progress here is everything that has been changed1, removed the sensors managers and use dthe Mdp function as you suggested, now that camera is a supported sensor, all the sensors can be read without any issues steps to testuse the 'pr' branch of the repo go to the new_simple example, complete the training, evaluate the policy(use the eval.py with the env_mode arg set to 'eval'), play the policy(use the eval.py with the env_mode arg set to 'play'). The eval_fixed provides a ros_based deployment, look at the function overriding for handling the actions and the what is to be done1, fix doc_strings, formatting and cleanup After we finalize the uncertainties, i will complete and open a PR. |
|
Thanks for the update. I don't see any new commits in the PR or a I think, at this stage, it would be best to break this PR up into several smaller PRs. This PR contains so much code, with significant architectural changes, that it'll be hard to discuss and merge them all at once. While smaller PRs will allow us to discuss each feature and merge the smaller ones quickly, while giving us time to refine the other ones more effectively. Of the top of my head, here's the how I would break it up (you might want to break it up further, since I haven't had the chance to fully go through all 6,000 lines):
Make sure each PR satisfies the following checklist:
Over the next month, I'll be spending time doing some Sim2Real work with one of my environments. This might help some of our conversations with ROS2. Thanks for all your work and interest! |
|
I agree this PR is too long, I will split into 4-5 individual ones. Before that I would like to make sure i have your input about some of the features, I will open the PRs after following all the guidelines and when the code is at a stage where minimal further changes would be required I seemed to have linked the wrong repo, here is the correct one repo. Use 'pr' branch thoughts about these ?
class ActuatorConfig(TypedDict):
env: GenesisEnv
joint_names: Union[List[str], str]
default_pos: Union[float, NoisyValue, Dict[str, float]]
... so on
class ActuatorManager(BaseManager):
"""
Example::
class MyEnv(GenesisEnv):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
actuator_config = {
"env": self,
"joint_names": ".*",
"kv": 0.5,
"dofs_limit: {"joint1":(-1.57, 1.57), "joint2":(-1.57, 1.57),}
"max_force": 8.0,
}
self.actuator_manager = ActuatorManager(actuator_config)
"""
def __init__(
self,
actuator_config: ActuatorConfig
) -> None:
the LOC for the current version has decreased significantly, I think around 2.5k lines(excluding the changes to the examples and docstrings), spread across multiple PRs, it should be relatively easy to review |
This is a clever idea, but let's hold off on the config object method for the time being. This concept optimizes for the "on-robot" experience, while potentially negatively affecting the training development experience. The first experience a developer has with Genesis Forge training environments should be straight forward and intuitive to implement with full type hinting. I need to spend a little more time exploring this side of the development pipeline before I can offer much guidance on that.
This can be handled by the developer of the environment, which means Genesis Forge doesn't need to be opinionated about this. For example: RewardManager(
enabled=self.mode == 'train'
...
)
Isn't that the same as #1? I feel like we can do this more simply. I'm going to prototype a few things, and then I'll link you to an idea.
This is an interesting consideration. I don't think it's a top priority, but I'd be interested to see what this looks like. |
|
I've pulled together a couple prototype PRs that might help influence your work: POC Sim2Real ConceptPR: #11 The Adapted to your example, it would look something like this: env = Go2SimpleEnv(num_envs=1, headless=False, env_mode="deploy")
obs = env.observation_manager.observation_cfg
obs["angle_velocity"].fn = lambda env: ros_interface.get_angular_velocity(),
obs["linear_velocity"].fn = lambda env: ros_interface.get_linear_velocity(),
obs["projected_gravity"].fn = lambda env: ros_interface.get_projected_gravity(),
obs["dof_position"].fn = lambda env: ros_interface.get_dofs_position(),
obs["dof_velocity"].fn = lambda env: ros_interface.get_dofs_velocity(),
# ...
obs, _ = env.reset()
with torch.no_grad():
while True:
rclpy.spin_once(ros_interface, timeout_sec=0)
actions = policy(obs)
obs, _rews, _dones, _infos = env.step(actions)
ros_interface._pos_actions = env.action_manager.get_actions()Multiple action managersPR: #12 This is another take on having multiple action managers. The idea here is that the actuator manager should not be aware of the control method of the actuators (position, force, velocity). This removes the need for it to be responsible for categorizing the actuators into explicit groups or to need a hybrid action manager to control them. In this case, you can define each action manager separately, and use the For example: self.actuator_manager = ActuatorManager(
self,
joint_names=[
".*_wheel",
".*_arm_joint"
],
kp=20,
kv=0.5,
)
self.wheel_actions = VelocityActionManager(
self,
actuator_filter=[ ".*_wheel" ],
actuator_manager=self.actuator_manager,
)
self.arm_actions = PositionActionManager(
self,
actuator_filter=[ ".*_arm_joint" ],
actuator_manager=self.actuator_manager,
scale=0.25,
clip=(-100.0, 100.0),
use_default_offset=True,
)If I adapted this approach to the Sim2Real deploy example, from above, you can easily get the processed actuator values for deploy, like this (assuming you have 3 action managers): obs, _ = env.reset()
with torch.no_grad():
while True:
rclpy.spin_once(ros_interface, timeout_sec=0)
actions = policy(obs)
obs, _rews, _dones, _infos = env.step(actions)
ros_interface._pos_actions = env.pos_action_manager.get_actions()
ros_interface._vel_actions = env.vel_action_manager.get_actions()
ros_interface._force_actions = env.force_action_manager.get_actions() |
adds some new features, such a sensors managers, more control types like (velocity control, force control, hybrid control), more command-types(position tracking, pose tracking), adds support for direct deployment to ros2 using topic based control.
Adds examples demonstrating learning with the skrl library, add examples showcasing how sensors can be used as part of the observations,
Note: currently the skrl example adapted from the 'simple' example doesn't converge, i am not sure if it requires some hyper-parameter tuning, or there is a bug, based on the recorded video it looks like the agent keeps resetting.
I have not included any examples for most of the other sensors, the position, pose commands or the new velocity and force control, I will add them after a successful merge. some doc-strings for the newly added features are incomplete/wrong, so the docs generated may be inadequate