Clarification Needed on num_depot Usage in MDCPDP Environment #220
Replies: 3 comments 3 replies
-
any update in here? |
Beta Was this translation helpful? Give feedback.
-
I modified the code in class MDCPDPGenerator: |
Beta Was this translation helpful? Give feedback.
-
Hi @Moonbohoon and @699Felix ! (I'm sorry that we have completely missed this discussion for a long time now 😅 ) We have made some updates to the environment and model, that we report here: These fixes are based on feedback from @Moonbohoon in #220 (which was due quite some time ago 😅 ) and updates to the parallel autoregressive counterpart of this environment in PARCO. Now Below is an example to instantiate a model with the new env: import torch
from rl4co.utils.trainer import RL4COTrainer
from rl4co.envs.routing.mdcpdp.env import MDCPDPEnv, MDCPDPGenerator
from rl4co.models.zoo.am.policy import AttentionModelPolicy
from rl4co.models.nn.env_embeddings.init import MDCPDPInitEmbedding
from rl4co.models.nn.env_embeddings.context import MDCPDPContext
# Greedy rollouts over trained model
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
embed_dim = 128
policy = AttentionModelPolicy(
env_name="mdcpdp",
init_embedding=MDCPDPInitEmbedding(embed_dim),
context_embedding=MDCPDPContext(embed_dim),
embed_dim=embed_dim,
).to(device)
generator = MDCPDPGenerator(min_capacity=2, max_capacity=3, num_agents=10, num_loc=60, depot_mode="multiple") # or set to "single" for M agents with the same starting location
env_ar = MDCPDPEnv(generator, problem_mode="open")
td_gen_ar = env_ar.generator(26)
td_reset_ar = env_ar.reset(td_gen_ar.clone()).to(device)
# Inference
with torch.inference_mode():
out_ar = policy(td_reset_ar.clone(), env_ar, decode_type="greedy")
# Plotting (untrained!)
actions_ar = out_ar["actions"]# .reshaape(td_init.shape[0], -1)
print("Average tour length: {:.2f}".format(-out_ar['reward'].mean().item()))
for i in range(2):
print(f"Tour {i} length: {-out_ar['reward'][i].item():.2f}")
env_ar.render(td_reset_ar[i], actions_ar[i].cpu()) |
Beta Was this translation helpful? Give feedback.
-
I'm currently working with the
MDCPDPEnv
environment from this repository and encountered an issue regarding the usage ofnum_depot
. In the_step
and_get_reward
function ofenv.py
andrender.py
, num_depot is derived fromtd["capacity"].shape[-1]
, which seems unusual to me. Here's the relevant code snippet:I believe it should be derived from
depot
, notcapacity
. Is this understanding correct?Any guidance or confirmation on this issue would be greatly appreciated. Thank you!
Best regards,
Beta Was this translation helpful? Give feedback.
All reactions