c - SO_KEEPALIVE does not work during a call to write()? -
i'm developing socket application, must must robust network failures.
the application has 2 running threads, 1 waiting messages socket (a read() loop) , other send messages socket (a write() loop).
i'm trying use so_keepalive handle network failures. works ok if i'm blocked on read(). few seconds after connection lost (network cable removed), read() fail message 'connection timed out'.
but, if try wrte() after network disconnected (and before timeout ends), both write() , read() block forever, without error.
this stripped sample code directs stdin/stdout socket. listens on port 5656:
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <pthread.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <netinet/tcp.h> int socket_fd; void error(const char *msg) { perror(msg); exit(1); } //read stdin , write socket void* write_daemon (void* _arg) { while (1) { char c; int ret = scanf("%c", &c); if (ret <= 0) error("read stdin"); int ret2 = write(socket_fd, &c, sizeof(c)); if (ret2 <= 0) error("write socket"); } return null; } //read socket , write stdout void* read_daemon (void* _arg) { while (1) { char c; int ret = read(socket_fd, &c, sizeof(c)); if (ret <= 0) error("read socket"); int ret2 = printf("%c", c); if (ret2 <= 0) error("write stdout"); } return null; } //enable , configure keepalive - detect network problems void config_socket() { int enable_no_delay = 1; int enable_keep_alive = 1; int keepalive_idle =1; //very short interval. testing int keepalive_count =1; int keepalive_interval =1; int result; //=> http://tldp.org/howto/html_single/tcp-keepalive-howto/#setsockopt result = setsockopt(socket_fd, sol_socket, so_keepalive, &enable_keep_alive, sizeof(int)); if (result < 0) error("so_keepalive"); result = setsockopt(socket_fd, sol_tcp, tcp_keepidle, &keepalive_idle, sizeof(int)); if (result < 0) error("tcp_keepidle"); result = setsockopt(socket_fd, sol_tcp, tcp_keepintvl, &keepalive_interval, sizeof(int)); if (result < 0) error("tcp_keepintvl"); result = setsockopt(socket_fd, sol_tcp, tcp_keepcnt, &keepalive_count, sizeof(int)); if (result < 0) error("tcp_keepcnt"); } int main(int argc, char *argv[]) { //create server socket, bound port 5656 int listen_socket_fd; int tr=1; struct sockaddr_in serv_addr, cli_addr; socklen_t clilen = sizeof(cli_addr); pthread_t write_thread, read_thread; listen_socket_fd = socket(af_inet, sock_stream, 0); if (listen_socket_fd < 0) error("socket()"); if (setsockopt(listen_socket_fd,sol_socket,so_reuseaddr,&tr,sizeof(int)) < 0) error("so_reuseaddr"); bzero((char *) &serv_addr, sizeof(serv_addr)); serv_addr.sin_family = af_inet; serv_addr.sin_addr.s_addr = inaddr_any; serv_addr.sin_port = htons(5656); if (bind(listen_socket_fd, (struct sockaddr *) &serv_addr, sizeof(serv_addr)) < 0) error("bind()"); //wait client socket listen(listen_socket_fd,5); socket_fd = accept(listen_socket_fd, (struct sockaddr *) &cli_addr, &clilen); config_socket(); pthread_create(&write_thread, null, write_daemon, null); pthread_create(&read_thread , null, read_daemon , null); close(listen_socket_fd); pthread_exit(null); }
to reproduce error, use telnet 5656. if exit after couple os seconds after connection lost, unless try write in terminal. in case, block forever.
so, questions are: what's wrong? how fix it? there other alternatives?
thanks!
i've tried using wireshark inspect network connection. if don't call write(), can see tcp keep-alive packages being sent , connection close after few seconds.
if, instead, try write(), stops sending keep-alive packets, , starts sending tcp retransmissions instead (it seems ok me). problem is, time between retransmissions grows bigger , bigger after each failure, , seems never give-up , close socket.
is there way set maximum number of retransmissions, or similar? thanks
i have found tcp_user_timeout socket option (rfc5482), closes connection if sent data not ack'ed after specified interval.
it works fine me =)
//defined in include/uapi/linux/tcp.h (since linux 2.6.37) #define tcp_user_timeout 18 int tcp_timeout =10000; //10 seconds before aborting write() result = setsockopt(socket_fd, sol_tcp, tcp_user_timeout, &tcp_timeout, sizeof(int)); if (result < 0) error("tcp_user_timeout");
yet, feel shouldn't have use both so_keep_alive , tcp_user_timeout. maybe it's bug somewhere?
Comments
Post a Comment