c# tcp - Configure socket ACK timeout?

2 Answers

This is an old question, but it hits home with me... As alluded to in your original question, this should be done at the application layer.

I'm hoping my experience may be helpful as I had the exact same thoughts as you (and even fought with other developers on my team over this insisting TCP should get the job done). In reality its quite easy to mess up TCP with wireless connections, conflicting network MTUs and sometimes poorly implemented routers/access points which ACK prematurely or during failure conditions. But also because TCP is intended to stream from one source to one destination, not really to ensure full-duplex transacted communications.

I spent a number of years working for an embedded device manufacturer and wrote a complete client-server system for wireless barcode terminals in a warehouse. Not cellular in this case, but wifi can be just as bad (but even WiFi will prove the desired task useless). FYI, my system is still running reliably in production today after almost 7 years, so I think my implementation is reasonably robust (it experiences regular interference from industrial manufacturing machines/welders/air compressors/mice chewing network wires, etc).

Understanding the problem

@rodolk has posted some good info. TCP level ACKs do not necessarily correspond 1-1 with each of your application network transmissions (and will invariably NOT be 1-1 if you send more than the network's MTU or maximum packet size even if Nagle is disabled).

Ultimately the mechanisms of TCP & IP (Transport and Network layers) are to ensure delivery of your traffic in one direction (from source to destination) with some limits on maximum retries/etc. Application communication is ultimately about full duplex (two-way) Application layer communications that sit on top of TCP/IP. Mixing those layers is not a good strategy. Think of HTTP request-response on top of TCP/IP. HTTP does not rely on TCP ACKS to implement its own time outs, etc. HTTP would be a great spec to study if you are interested.

But let's even pretend that it was doing what you want. You always send less than 1 MTU (or max packet size) in 1 transmission and receive exactly 1 ACK. Introduce your wireless environment and everything gets more complex. You can have a failure between the successful transmission and the corresponding ACK!

The problem is that each direction of the wireless communication stream is not necessarily of equal quality or reliability and can change over time based on local environmental factors and movement of the wireless device.

Devices often receive better than they can transmit. It is common for the device to receive your transmission perfectly, reply with some kind of "ACK" which is transmitted, but that wireless ACK never reaches its destination due to signal quality, transmission distance, RF interference, signal attenuation, signal reflection, etc. In industrial applications this could be heavy machinery turning on, welding machines, fridges/freezers, fluorescent lighting, etc. In urban environment it could be mobility within structures, parking garages, steel building structures, etc.

At what point in this scenario does the client take action (save/commit data or change state) and at what point does the server consider the action successful (save/commit data or change state)? This is very difficult to solve reliably without additional communication checks in your application layer (sometimes including 2-way ACK for transactions ie: client transmits, server ACKS, client ACKS the ACK :-) You should not rely on TCP level ACKs here as they will not reliably equate to successful full duplex communication and will not facilitate a reliable retry mechanism for your application.

Application layer technique for unreliable wireless communications on embedded devices

Our technique was that every application level message was sent with a couple byte application level header that included a packet ID # (just an incrementing integer), the length of the entire message in bytes and a CRC32 checksum for the entire message. I can't remember for sure, but I believe we did this in 8 bytes, 2 | 2 | 4. (Depending on the maximum message length you want to support).

So let's say you are counting inventory in the warehouse, you count an item and count 5 units, the barcode terminal sends a message to the server saying "Ben counted 5 units of Item 1234". When the server receives the message, it would wait until it received the full message, verify the message length first, then CRC32 checksum (if the length matched). If this all passed we sent back an application response to this message (something like an ACK for the application). During this time the barcode terminal is waiting for the ACK from the server and will retransmit if it doesn't hear back from the server. If the server receives multiple copies of the same packet ID it can de-duplicate by abandoning uncommitted transactions. However if the barcode scanner does receives its ACK from the server, it would then reply with one more final "COMMIT" command to the server. Because the first 2 messages just validated a working full duplex connection, the commit is incredibly unlikely to fail within this couple ms timeframe. FYI, this failure condition is fairly easy to replicate at the edge of your WiFi coverage, so take your laptop/device and go for a walk until the wifi is just "1 bar" or the lowest connection speed often 1 mbps.

So you are adding 8 bytes header to the beginning of your message, and optionally adding one extra final COMMIT message transmission if you require a transacted request/response when only one side of the wireless communication might fail.

It will be very hard to justify saving 8 bytes per message with a complex application layer to transport layer hooking system (such as hooking into winpcap). Also you may or may not be able to replicate this transport layer hooking on other devices (maybe your system will run on other devices in the future? Android, iOS, Windows Phone, Linux, can you implement the same application layer communication for all these platforms? I would argue you should be able to implement your application on each device regardless of how the TCP stack is implemented.)

I'd recommend you keep your application layer separate from the transport and network layers for good separation of concerns, and tight control over retry conditions, time-outs and potentially transacted application state changes.

connect in

Is there a way to configure the timeout in which a socket expects to receive an ACK for sent data before it decides that the connection has failed?

I'm aware this can be done at the application level as well, but since every packet I send is ACK'd anyway, and I just want to know if my data is received, using additional data at the application level to accomplish the same thing seems wasteful. (Not to mention, my particular application uses per-byte charged cellular links.)

Note: As per my previous question -- What conditions cause NetworkStream.Write to block? -- you cannot rely on .Write throwing an exception in order to determine that data is not being sent properly.

I'm not a C# expert but I think I can help respond. You are trying to have TCP-layer control data from the application. This is not easy and as with any application layer protocol you would need some kind of application layer response like Request-Response in HTTP.

The problem with knowing that ALL your written data was actually received by the other end is that TCP is stream oriented. That means that you might send 1KB of data through the socket, that KB is stored in a TCP snd buffer, and that KB might be sent with 3 TCP segments that may be acknowledged (TCP ACK) altogether or separately. It's asynchronous. So, at some point TCP might have sent only 300 Bytes of your 1,000 KB of data, just an example.

Now the other question is whether you open the connection and close the connection every time you send a chunk of data (A) or you have the connection always open (B).

In (A) it is simpler because, if the connection fails opening, that's it. It might take more than one minute to have the timeout but you don't send more than a few 20-Byte IP and (20-Byte) TCP headers (sometimes more than 20 Bytes for IP and TCP options).

In (B) you will realize of success or failure when you want to send data. There are 3 cases I would consider:

1-The other end of the socket closed or reset the TCP connection. In that case you should immediately receive and error response or, in C, a signal indicating broken pipe, and I suppose it'll become an exception in C#.

2-The other end becomes unreachable and hasn't closed/reset the socket. This is difficult to detect because TCP will send messages that will time out and after a few retry/timeouts it will decide the connection is broken. The time for timeout and number of retries may be configurable but at OS level (for all applications). I don't think you can configure that by socket. In this case your application won't realize at the moment it sends the data.

3-Data was successfully received by the other end and acknowledged at TCP layer.

The complex part is to differentiate between (2) and (3) as fast as possible. I will assume you are asking about this. I don't think there is any possibility of doing it completely unless you hack the kernel.

Anyway, getting an ACK from the server at application layer could mean just 1 or 2 Bytes telling the amount of data received. That in addition to 20+20 Bytes for IP and TCP basic headers.

If there is any possibility of doing what you say, I would try this but I have never tested:

You can play with the send buffer size and select function. You can set the send buffer size of a socket with setsockopt and OS_SNDBUF socket option. http://msdn.microsoft.com/en-us/library/system.net.sockets.socket_methods(v=vs.110).aspx

If you know you are always going to send 2 KB, you set the send buffer size at 2 KB. Usually you can change it only after connecting. http://msdn.microsoft.com/en-us/library/system.net.sockets.socket.sendbuffersize(v=vs.110).aspx?cs-save-lang=1&cs-lang=csharp#code-snippet-1

Then you call Select or Poll method on the Socket to check if it is writable.

As long as one TCP message is acked, Select or Poll should indicate the socket is writable because the sent data is removed from the send buffer.

Note this algorithm has limitations:

  1. The operating system could define a minimum buffer size.
  2. If the algorithm is possible, Select and Poll will tell you the socket is writable when there is buffer space available but only one part of you data was actually received and ACKED by the other end.
  3. If you send variable-size messages this is not possible.

If you cannot apply the mentioned algorithm you might need to pay the extra cost of an additional TCP message with ~42 Bytes with an application-layer simple ACK.

Sorry for not being able to provide a definitive solution. Maybe OS's should implement the capability to tell you available buffer Bytes and that would solve your problem.

EDIT: I'm adding another suggestion from my comments.

If you have the possibility of having other process using Winpcap, you could capture TCP responses from the other end!!! For example, using local IPC like shared memory or just sockets one application can tell the other about the data of the socekt (src IP, src port, dst IP, dst port). The other second process, call it monitoring process, can detect an ACK received from the other endpoint by sniffing the connection. Also winpcap could be used linking to native code ...


c# sockets network-programming tcpclient