New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP Client retry different IP address #50333
Comments
This can already be managed from the caller side by doing DNS resolution before issuing the request and using the |
Exactly - the proposal was for a more friendly way to do this. I'm not exactly sure how this could work. ;) |
Couldn't we add a option like "cycle_ips" to This shouldnt be a big problem or am I missing something? |
The problem is that the curl driver does the dns resolution async, and if we do it ourselves, that blocks, as far as I'm aware. |
Yes, but why does that matter? It is a very nieche and optin feature and we only have to resolve the DNS once manually, because we pass the result via the A possible solution which I haven't tested right now but will most likely work is that we only resolve the IPs manually AFTER the first request has failed. There is a good chance that when cURL resolves a domain, it uses the same DNS cache as PHP will access when calling Most likely we would have to do that anyway because otherwise we do not have access to the full URL that is requested because of relative paths. When only doing that after the first request has failed, we can use the async context ($context->getInfo('url')) or, if configured, the <?php
$start = microtime(true);
$ips = [];
foreach (dns_get_record('symfony.com', \DNS_A | \DNS_AAAA) as $dnsRecord) {
$ips[] = $dnsRecord['ip'] ?? $dnsRecord['ipv6'];
}
$duration = microtime(true) - $start;
var_dump([
'IPs' => $ips,
'Duration' => $duration
]); The first run took about 0.05 seconds, subsequent runs took on average 0.002 seconds. I couldn't find it documented which DNS cache PHP uses, but it will most likely be from the OS which cURL should use as well. I would give it a shot of implementing it after my previous PR to the |
I wouldn't say it is niche at all. Anyone using AWS services would benefit from this, because it is supper common to get routed to a bad machine and for retries to fail in the way that I described, once you are running at scale. I see it happen most weeks, at a scale of around ~100M requests to SNS per month. It would have also mitigated an outage that occurred last year where SNS was unavailable for ~90 minutes at half of the IP addresses in the A-record, resulting in half my requests failing. |
The reason non-async-dns matters is because a use case is sending 10k parallel requests to SNS, and we don't want to have blocking setting up the requests, |
I think I am missing something here. I proposed only resolving the DNS hostname AFTER a request has failed. Wouldn't the blocking actually work in our favor then because it is only resolved once (cause parallel execution is halted and later resumed) and after that the local DNS cache is used and latency is at around 0.002 seconds (2ms)? At 10k requests, that would add about 20seconds of delay. If that is a problem, I dont think there is a general way the component can fix that besides maybe using a static class variable as DNS cache? Then we would need some kind of policy to invalidate the variable cache to avoid memory leaks. Maybe the TTL of the DNS record? Calling |
Thank you for this suggestion. |
Description
If an A or AAAA record (or similarly for CNAMEs) resolves to multiple IP addresses, and a request fails, it would be great to be able to request a retry using a different IP address, rather than retries using the original cached IP. It is pretty common that a particular IP can be experiencing failures and not cycled out on the other end for multiple minutes, or never at all.
Example
An example use case is sending requests to AWS SNS in us-east-1. I have experienced being routed to a server that always returns a 5xx error, and retries would never recover - being able to retry against a different IP address would have resolved the issue, allowing the retry to succeed.
The text was updated successfully, but these errors were encountered: