I have a simple python socket server receiving "command" code that is encoded in ASCII. Most bytes are decoded properly with utf-8 by doing data.decode("utf-8"), but for some of them, that converts to some random characters through latin-1.
Here are two examples
byte_string1 = b'\xa3\xb67' # When client sends 67
byte_string2 = b'\xa3\xb6\xa3\xb6' #When client sends 66
I can see the number 67 and 6-6 in the input, but have been unable to extract them out. Is there a proper way to handle these?
My current attempt and I am expecting strings back from data in bytes:
def get_command(data):
try:
command = data.decode("utf-8")
except UnicodeDecodeError as err1:
logger.debug(f"utf-8 UnicodeDecodeError: {err1} for data: {data}")
try:
command = data.decode("latin-1")
except UnicodeDecodeError as err2:
logger.debug(f"latin-1 UnicodeDecodeError: {err2} for data: {data}")
logger.debug(
f"Taking a guess that the bytes are integers, for data: {data}"
)
command = [b for b in data]
return command
server_ip = '0.0.0.0'
server_port = 1234
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.bind((server_ip, server_port))
server_socket.listen(5)
while True:
data = client_socket.recv(1024)
if not data:
break
command = get_command(data)
\xa3
, which is outside of the ASCII range, and you shouldn't be trying to decode with anything other than ASCII. Thsi protocol could use a better description.recv
does not honor any message boundaries you might thing exist from the remote send. It is purely byte oriented and if you want to decode something more than 1 byte long, you need to handle the receive not getting the data all at once.A3 B6 37
and string 2 is 4 bytesA3 B6 A3 B6
. I don't see how that translates to the client sending67
and66
. Is there a standard header here? And by 66, do you mean 2 ascii digits? A single decimal integer ... or perhaps hex?