为什么心跳包(HeartBeat)是必须的?
几乎所有的网游服务端都有心跳包(HeartBeat或Ping)的设计,在最近开发手游服务端时,也用到了心跳包。思考思考,心跳包是必须的吗?为什么需要心跳包?TCP没有提供断线检测的方法吗?TCP提供的KeepAlive机制可以替代HeartBeat吗?
由于连接丢失时,TCP不会立即通知应用程序。比如说,客户端程序断线了,服务端的TCP连接不会检测到断线,而是一直处于连接状态。这就带来了很大的麻烦,明明客户端已经断了,服务端还维护着客户端的连接,照常执行着该玩家的游戏逻辑……
心跳包就是用来及时检测是否断线的一种机制,通过每间隔一定时间发送心跳数据,来检测对方是否连接。是属于应用程序协议的一部分。
问题1: TCP为什么不自己提供断线检测?
首先,断线检测需要轮询发送检测报文,会消耗一定的网络带宽和暂用一定的网络资源。如果把它做成TCP的底层默认功能,那些不需要断线检测的应用程序将会浪费不必要的带宽资源。
另外,TCP不提供连接丢失及时通知的最重要原因与其主要设计目的目标之一有关:出现网络故障时维护通信的能力。TCP是美国国防部赞助研究的,一种即使发生战争或自然灾害这种严重网络损坏情况下,也能维护可靠网络通信的网络协议。通常,网络故障只是暂时的,有时路由器会在TCP临时连接丢失后默默的重新连上。所以,TCP本身并不提供那么及时的断线检测。
问题2: TCP的KeepAlive机制可以用来及时检测连接状态吗?
TCP有个KeepAlive开关,打开后可以用来检测死连接。通常默认是2小时,可以自己设置。但是注意,这是TCP的全局设置。假如为了能更及时的检测出断开的连接,把tcp_keepalive_time
和tcp_keepalive_intvl
的时间改小(参考:Link),该机器上所有应用程序的KeepAlive检测间隔都会变小,显然是不能接受的。因为不同应用程序的需求是不一样的。
(在某些平台的Socket实现已经支持为每条连接单独设置KeepAlive参数)
KeepAlive本质上来说,是用来检测长时间不活跃的连接的。所以,不适合用来及时检测连接的状态。
问题3:心跳包(HeartBeat)为什么是好的方式及时检测连接状态?
- 具有更大的灵活性,可以自己控制检测的间隔,检测的方式等等。
- 心跳包同时适用于TCP和UDP,在切换TCP和UDP时,上层的心跳包功能都适用。(其实这种情况很少遇到)
- 有些情况下,心跳包可以附带一些其他信息,定时在服务端和客户端之间同步。(比如帧数同步)
结论
需要及时检测TCP连接状态,心跳包(HeartBeat)还是必须的。
实例:
计算机周期性的发送一个代表心跳的UDP包到服务器,服务器跟踪每台计算机在上次发送心跳之后尽力的时间并报告那些沉默时间太长的计算机。
客户端程序:HeartbeatClient.py
[python] view plain copy
- """ 心跳客户端,周期性的发送 UDP包 """
- import socket, time
- SERVER_IP = '192.168.0.15'; SERVER_PORT = 43278; BEAT_PERIOD = 5
- print 'Sending heartbeat to IP %s , port %d' % (SERVER_IP, SERVER_PORT)
- print 'press Ctrl-C to stop'
- while True:
- hbSocket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
- hbSocket.sendto('PyHB', (SERVER_IP, SERVER_PORT))
- if _ _debug_ _:
- print 'Time: %s' % time.ctime( )
- time.sleep(BEAT_PERIOD)
服务器程序接受ing跟踪“心跳”,她运行的计算机的地址必须和“客户端”程序中的 SERVER_IP一致。服务器必须支持并发,因为来自不同的计算机的心跳可能会同时到达。一个服务器有两种方法支持并发:多线程和异步操作。下面是一个多线程的ThreadbearServer.py,只使用了Python标准库中的模块:
[python] view plain copy
- """ 多线程 heartbeat 服务器"""
- import socket, threading, time
- UDP_PORT = 43278; CHECK_PERIOD = 20; CHECK_TIMEOUT = 15
- class Heartbeats(dict):
- """ Manage shared heartbeats dictionary with thread locking """
- def _ _init_ _(self):
- super(Heartbeats, self)._ _init_ _( )
- self._lock = threading.Lock( )
- def _ _setitem_ _(self, key, value):
- """ Create or update the dictionary entry for a client """
- self._lock.acquire( )
- try:
- super(Heartbeats, self)._ _setitem_ _(key, value)
- finally:
- self._lock.release( )
- def getSilent(self):
- """ Return a list of clients with heartbeat older than CHECK_TIMEOUT """
- limit = time.time( ) - CHECK_TIMEOUT
- self._lock.acquire( )
- try:
- silent = [ip for (ip, ipTime) in self.items( ) if ipTime < limit]
- finally:
- self._lock.release( )
- return silent
- class Receiver(threading.Thread):
- """ Receive UDP packets and log them in the heartbeats dictionary """
- def _ _init_ _(self, goOnEvent, heartbeats):
- super(Receiver, self)._ _init_ _( )
- self.goOnEvent = goOnEvent
- self.heartbeats = heartbeats
- self.recSocket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
- self.recSocket.settimeout(CHECK_TIMEOUT)
- self.recSocket.bind(('', UDP_PORT))
- def run(self):
- while self.goOnEvent.isSet( ):
- try:
- data, addr = self.recSocket.recvfrom(5)
- if data == 'PyHB':
- self.heartbeats[addr[0]] = time.time( )
- except socket.timeout:
- pass
- def main(num_receivers=3):
- receiverEvent = threading.Event( )
- receiverEvent.set( )
- heartbeats = Heartbeats( )
- receivers = [ ]
- for i in range(num_receivers):
- receiver = Receiver(goOnEvent=receiverEvent, heartbeats=heartbeats)
- receiver.start( )
- receivers.append(receiver)
- print 'Threaded heartbeat server listening on port %d' % UDP_PORT
- print 'press Ctrl-C to stop'
- try:
- while True:
- silent = heartbeats.getSilent( )
- print 'Silent clients: %s' % silent
- time.sleep(CHECK_PERIOD)
- except KeyboardInterrupt:
- print 'Exiting, please wait...'
- receiverEvent.clear( )
- for receiver in receivers:
- receiver.join( )
- print 'Finished.'
- if _ _name_ _ == '_ _main_ _':
- main( )
NB:在运行该程序时可能出现“ socket.error: [Errno 98] Address already in use”(Linux下) 或 “socket.error: [Errno 10048] 通常每个套接字地址(协议/网络地址/端口)只允许使用一次”(windows下),解决办法参见博文:解决socket.error: [Errno 98] Address already in use问题
作为备选方案,线面给出异步的AsyBeatserver.py程序,这个程序接住了强大的twisted的力量:
[python] view plain copy
- import time
- from twisted.application import internet, service
- from twisted.internet import protocol
- from twisted.python import log
- UDP_PORT = 43278; CHECK_PERIOD = 20; CHECK_TIMEOUT = 15
- class Receiver(protocol.DatagramProtocol):
- """ Receive UDP packets and log them in the "client"s dictionary """
- def datagramReceived(self, data, (ip, port)):
- if data == 'PyHB':
- self.callback(ip)
- class DetectorService(internet.TimerService):
- """ Detect clients not sending heartbeats for too long """
- def _ _init_ _(self):
- internet.TimerService._ _init_ _(self, CHECK_PERIOD, self.detect)
- self.beats = { }
- def update(self, ip):
- self.beats[ip] = time.time( )
- def detect(self):
- """ Log a list of clients with heartbeat older than CHECK_TIMEOUT """
- limit = time.time( ) - CHECK_TIMEOUT
- silent = [ip for (ip, ipTime) in self.beats.items( ) if ipTime < limit]
- log.msg('Silent clients: %s' % silent)
- application = service.Application('Heartbeat')
- # define and link the silent clients' detector service
- detectorSvc = DetectorService( )
- detectorSvc.setServiceParent(application)
- # create an instance of the Receiver protocol, and give it the callback
- receiver = Receiver( )
- receiver.callback = detectorSvc.update
- # define and link the UDP server service, passing the receiver in
- udpServer = internet.UDPServer(UDP_PORT, receiver)
- udpServer.setServiceParent(application)
- # each service is started automatically by Twisted at launch time
- log.msg('Asynchronous heartbeat server listening on port %d\n'
- 'press Ctrl-C to stop\n' % UDP_PORT)
为什么心跳包(HeartBeat)是必须的?
几乎所有的网游服务端都有心跳包(HeartBeat或Ping)的设计,在最近开发手游服务端时,也用到了心跳包。思考思考,心跳包是必须的吗?为什么需要心跳包?TCP没有提供断线检测的方法吗?TCP提供的KeepAlive机制可以替代HeartBeat吗?
由于连接丢失时,TCP不会立即通知应用程序。比如说,客户端程序断线了,服务端的TCP连接不会检测到断线,而是一直处于连接状态。这就带来了很大的麻烦,明明客户端已经断了,服务端还维护着客户端的连接,照常执行着该玩家的游戏逻辑……
心跳包就是用来及时检测是否断线的一种机制,通过每间隔一定时间发送心跳数据,来检测对方是否连接。是属于应用程序协议的一部分。
问题1: TCP为什么不自己提供断线检测?
首先,断线检测需要轮询发送检测报文,会消耗一定的网络带宽和暂用一定的网络资源。如果把它做成TCP的底层默认功能,那些不需要断线检测的应用程序将会浪费不必要的带宽资源。
另外,TCP不提供连接丢失及时通知的最重要原因与其主要设计目的目标之一有关:出现网络故障时维护通信的能力。TCP是美国国防部赞助研究的,一种即使发生战争或自然灾害这种严重网络损坏情况下,也能维护可靠网络通信的网络协议。通常,网络故障只是暂时的,有时路由器会在TCP临时连接丢失后默默的重新连上。所以,TCP本身并不提供那么及时的断线检测。
问题2: TCP的KeepAlive机制可以用来及时检测连接状态吗?
TCP有个KeepAlive开关,打开后可以用来检测死连接。通常默认是2小时,可以自己设置。但是注意,这是TCP的全局设置。假如为了能更及时的检测出断开的连接,把tcp_keepalive_time
和tcp_keepalive_intvl
的时间改小(参考:Link),该机器上所有应用程序的KeepAlive检测间隔都会变小,显然是不能接受的。因为不同应用程序的需求是不一样的。
(在某些平台的Socket实现已经支持为每条连接单独设置KeepAlive参数)
KeepAlive本质上来说,是用来检测长时间不活跃的连接的。所以,不适合用来及时检测连接的状态。
问题3:心跳包(HeartBeat)为什么是好的方式及时检测连接状态?
- 具有更大的灵活性,可以自己控制检测的间隔,检测的方式等等。
- 心跳包同时适用于TCP和UDP,在切换TCP和UDP时,上层的心跳包功能都适用。(其实这种情况很少遇到)
- 有些情况下,心跳包可以附带一些其他信息,定时在服务端和客户端之间同步。(比如帧数同步)
结论
需要及时检测TCP连接状态,心跳包(HeartBeat)还是必须的。
实例:
计算机周期性的发送一个代表心跳的UDP包到服务器,服务器跟踪每台计算机在上次发送心跳之后尽力的时间并报告那些沉默时间太长的计算机。
客户端程序:HeartbeatClient.py
[python] view plain copy
- """ 心跳客户端,周期性的发送 UDP包 """
- import socket, time
- SERVER_IP = '192.168.0.15'; SERVER_PORT = 43278; BEAT_PERIOD = 5
- print 'Sending heartbeat to IP %s , port %d' % (SERVER_IP, SERVER_PORT)
- print 'press Ctrl-C to stop'
- while True:
- hbSocket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
- hbSocket.sendto('PyHB', (SERVER_IP, SERVER_PORT))
- if _ _debug_ _:
- print 'Time: %s' % time.ctime( )
- time.sleep(BEAT_PERIOD)
服务器程序接受ing跟踪“心跳”,她运行的计算机的地址必须和“客户端”程序中的 SERVER_IP一致。服务器必须支持并发,因为来自不同的计算机的心跳可能会同时到达。一个服务器有两种方法支持并发:多线程和异步操作。下面是一个多线程的ThreadbearServer.py,只使用了Python标准库中的模块:
[python] view plain copy
- """ 多线程 heartbeat 服务器"""
- import socket, threading, time
- UDP_PORT = 43278; CHECK_PERIOD = 20; CHECK_TIMEOUT = 15
- class Heartbeats(dict):
- """ Manage shared heartbeats dictionary with thread locking """
- def _ _init_ _(self):
- super(Heartbeats, self)._ _init_ _( )
- self._lock = threading.Lock( )
- def _ _setitem_ _(self, key, value):
- """ Create or update the dictionary entry for a client """
- self._lock.acquire( )
- try:
- super(Heartbeats, self)._ _setitem_ _(key, value)
- finally:
- self._lock.release( )
- def getSilent(self):
- """ Return a list of clients with heartbeat older than CHECK_TIMEOUT """
- limit = time.time( ) - CHECK_TIMEOUT
- self._lock.acquire( )
- try:
- silent = [ip for (ip, ipTime) in self.items( ) if ipTime < limit]
- finally:
- self._lock.release( )
- return silent
- class Receiver(threading.Thread):
- """ Receive UDP packets and log them in the heartbeats dictionary """
- def _ _init_ _(self, goOnEvent, heartbeats):
- super(Receiver, self)._ _init_ _( )
- self.goOnEvent = goOnEvent
- self.heartbeats = heartbeats
- self.recSocket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
- self.recSocket.settimeout(CHECK_TIMEOUT)
- self.recSocket.bind(('', UDP_PORT))
- def run(self):
- while self.goOnEvent.isSet( ):
- try:
- data, addr = self.recSocket.recvfrom(5)
- if data == 'PyHB':
- self.heartbeats[addr[0]] = time.time( )
- except socket.timeout:
- pass
- def main(num_receivers=3):
- receiverEvent = threading.Event( )
- receiverEvent.set( )
- heartbeats = Heartbeats( )
- receivers = [ ]
- for i in range(num_receivers):
- receiver = Receiver(goOnEvent=receiverEvent, heartbeats=heartbeats)
- receiver.start( )
- receivers.append(receiver)
- print 'Threaded heartbeat server listening on port %d' % UDP_PORT
- print 'press Ctrl-C to stop'
- try:
- while True:
- silent = heartbeats.getSilent( )
- print 'Silent clients: %s' % silent
- time.sleep(CHECK_PERIOD)
- except KeyboardInterrupt:
- print 'Exiting, please wait...'
- receiverEvent.clear( )
- for receiver in receivers:
- receiver.join( )
- print 'Finished.'
- if _ _name_ _ == '_ _main_ _':
- main( )
NB:在运行该程序时可能出现“ socket.error: [Errno 98] Address already in use”(Linux下) 或 “socket.error: [Errno 10048] 通常每个套接字地址(协议/网络地址/端口)只允许使用一次”(windows下),解决办法参见博文:解决socket.error: [Errno 98] Address already in use问题
作为备选方案,线面给出异步的AsyBeatserver.py程序,这个程序接住了强大的twisted的力量:
[python] view plain copy
- import time
- from twisted.application import internet, service
- from twisted.internet import protocol
- from twisted.python import log
- UDP_PORT = 43278; CHECK_PERIOD = 20; CHECK_TIMEOUT = 15
- class Receiver(protocol.DatagramProtocol):
- """ Receive UDP packets and log them in the "client"s dictionary """
- def datagramReceived(self, data, (ip, port)):
- if data == 'PyHB':
- self.callback(ip)
- class DetectorService(internet.TimerService):
- """ Detect clients not sending heartbeats for too long """
- def _ _init_ _(self):
- internet.TimerService._ _init_ _(self, CHECK_PERIOD, self.detect)
- self.beats = { }
- def update(self, ip):
- self.beats[ip] = time.time( )
- def detect(self):
- """ Log a list of clients with heartbeat older than CHECK_TIMEOUT """
- limit = time.time( ) - CHECK_TIMEOUT
- silent = [ip for (ip, ipTime) in self.beats.items( ) if ipTime < limit]
- log.msg('Silent clients: %s' % silent)
- application = service.Application('Heartbeat')
- # define and link the silent clients' detector service
- detectorSvc = DetectorService( )
- detectorSvc.setServiceParent(application)
- # create an instance of the Receiver protocol, and give it the callback
- receiver = Receiver( )
- receiver.callback = detectorSvc.update
- # define and link the UDP server service, passing the receiver in
- udpServer = internet.UDPServer(UDP_PORT, receiver)
- udpServer.setServiceParent(application)
- # each service is started automatically by Twisted at launch time
- log.msg('Asynchronous heartbeat server listening on port %d\n'
- 'press Ctrl-C to stop\n' % UDP_PORT)