-
2018-07-24
14/042,237
2013-09-30
US 10,033,837 B1
2018-07-24
-
-
Davoud Zand
LeClairRyan PLLC
2036-04-10
Smart Summary: A method for compressing data is designed to make data transfer more efficient. When a device sends data, it checks if the receiving device already has a copy of the data using a special filter. If the receiving device does have the data, only an identifier and information about the data's length are sent, instead of the actual data itself. The receiving device then uses this identifier to find and decode the original data from its storage. This approach helps reduce the amount of data that needs to be transferred, especially in large networks. 🚀 TL;DR
A system, medium and method of performing dictionary compression is disclosed. A first data segment received at a receiver device (RD) from a transmiter device (TD) is selected A global bloom filter of the TD is queried to determine if the RD has a stored copy of a first plurality of content data bytes and corresponding first identifier and data length information for the first data segment. A first encoded data packet is prepared and sent which includes the first identifier and data length information without the first plurality of content data bytes. The RD utilizes the received first identifier and data length information to retrieve the first plurality of content data bytes associated with the first data segment from the RD's data store and decodes the first data segment to include the first plurality of content data bytes.
Get notified when new applications in this technology area are published.
H04L69/04 » CPC main
Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass Protocols for data compression, e.g. ROHC
G06F15/16 IPC
Digital computers in general ; Data processing equipment in general Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
The present application claims the benefit of priority based on U.S. Provisional Patent Application Ser. No. 61/707,890, filed on Sep. 29, 2012, in the name of Saxon Amdahl, entitled “System and Method for Utilizing a Data Reducing Module for Dictionary Compression of Encoded Data”, which is hereby incorporated by reference in its entirety.
The present disclosure is directed to a system and method for utilizing a data reducing module for dictionary compression of encoded data.
Symmetric data deduplication technology is used between network traffic management devices that have web acceleration functionality. In particular, deduplication technology is used to reduce the amount of bandwidth consumed between the web accelerator devices across a wide area network, especially for repeated data transfers. Although deduplication technology works well, such existing technologies are subject to scaling challenges, especially in large mesh networks.
What is needed is a system and method that utilizes deduplication technology that is easily scalable for large mesh networks.
In an aspect, a method of performing dictionary compression utilizing identifier and data length information is disclosed. The method comprises handling, at a transmitter device (TD), a data stream to be transmitted over a network to a receiver device (RD), wherein the data stream includes a plurality of content data bytes. The method comprises selecting a first data segment in the data stream having a universal first identifier and data length information and a corresponding first plurality of content data bytes. The method comprises querying a global bloom filter of the TD to determine if the RD has a stored copy of the first plurality of content data bytes and the corresponding first identifier and data length information for the first data segment, wherein the global bloom filter indicates that the RD has the stored copy of the first identifier and data length information and first plurality of content data bytes for the first data segment. The method comprises preparing a first encoded data packet, wherein the first encoded data packet includes the first identifier and data length information without the first plurality of content data bytes for the data segment. The method comprises sending the first encoded data packet over a network connection to the RD, wherein the RD utilizes the received first identifier and data length information to retrieve the first plurality of content data bytes associated with the first data segment from the RD's data store and decodes the first data segment to include the first plurality of content data bytes.
In an aspect, a processor readable medium having stored thereon instructions for utilizing universal identifier and data length information associated with content data bytes is disclosed. The medium comprising processor executable code which when executed by at least one processor of a transmitting network device (TD), causes the TD to handle a data stream to be transmitted over the network to a receiver device (RD), wherein the data stream includes a plurality of content data bytes. The TD selects a first data segment in the data stream having an assigned first identifier and data length and a corresponding first plurality of content data bytes. The TD queries a global bloom filter to determine if the RD has a stored copy of the first plurality of content data bytes and the corresponding first identifier and data length for the first data segment, wherein the global bloom filter indicates that the RD has the stored copy of the first identifier and data length and first plurality of content data bytes for the first data segment. The TD prepares a first encoded data packet to send to the RD, wherein the first encoded data packet includes the first identifier and data length without the first plurality of content data bytes for the data segment. The TD instructs the network interface to send the first encoded data packet over a network connection to the RD, wherein the RD utilizes the received first identifier and data length to retrieve the first plurality of content data bytes associated with the first data segment from the RD's data store and decodes the first data segment to include the first plurality of content data bytes.
In an aspect, a transmitter device (TD) configured to communicate compressed data packets to a receiver device (RD) over a network is disclosed. The TD comprises a network interface capable of transmitting compressed data packets over a network to one or more receiver network devices (RD). The TD comprises a memory having stored thereon code embodying machine executable programmable instructions for utilizing universal identifier and data length associated with content data bytes. The TD comprises a processor configured to execute the stored programming instructions in the memory, which when executed by the processor, causes the processor to handle a data stream to be transmitted over the network to the RD, wherein the data stream includes a plurality of content data bytes. The processor further selects a first data segment in the data stream having an assigned first identifier and data length and a corresponding first plurality of content data bytes. The processor further queries a global bloom filter of the TD to determine if the RD has a stored copy of the first plurality of content data bytes and the corresponding first identifier and data length for the first data segment, wherein the global bloom filter indicates that the RD has the stored copy of the first identifier and data length and first plurality of content data bytes for the first data segment. The processor further prepares a first encoded data packet to send to the RD, wherein the first encoded data packet includes the first identifier and data length without the first plurality of content data bytes for the data segment. The processor further instructs the network interface to send the first encoded data packet over a network connection to the RD, wherein the RD utilizes the received first identifier and data length to retrieve the first plurality of content data bytes associated with the first data segment from the RD's data store and decodes the first data segment to include the first plurality of content data bytes.
FIG. 1 is a diagram of an example system environment utilizing a data reducer module in accordance with an aspect of the present disclosure.
FIG. 2A is a block diagram of a network traffic management device configured to implement the data reducer module in accordance with an aspect of the present disclosure;
FIG. 2B is a block diagram of the data reducer module in accordance with an aspect of the present disclosure;
FIG. 3 illustrates a flow chart representing the process of transmitting data packets in accordance with an aspect of the present disclosure;
FIG. 4 illustrates a flow chart representing the process of receiving data packets in accordance with an aspect of the present disclosure;
FIG. 5 illustrates a flow chart representing the process of handling variable length matches in accordance with an aspect of the present disclosure; and
FIG. 6 illustrates flow charts representing the process of migrating local bloom filters of other network devices into a network device's global bloom filter in accordance with an aspect of the present disclosure.
While these examples are susceptible of embodiments in many different forms, there is shown in the drawings and will herein be described in detail preferred examples with the understanding that the present disclosure is to be considered as an exemplification and is not intended to limit the broad aspect to the embodiments illustrated.
FIG. 1 is a diagram of an example system environment that includes a plurality of network traffic management devices in accordance with an aspect of the present disclosure. As shown in FIG. 1, the example system environment 100 employs a plurality of network devices comprising one or more client devices 106, one or more servers 102, and at least three network traffic management devices 109, 110, 111. As shown in FIG. 1, the example environment 100 includes one or more client-side network traffic management devices 109, 110 as well as one or more server-side network traffic management devices 111 which communicate with one another in relation to the present disclosure. The example system environment 100 may include other numbers and types of network devices in other arrangements.
The network traffic management devices 109 and 110 are coupled to the client devices 106 and the network traffic management device 111 is coupled to the servers 102 via a local area network (LAN) 104. Generally, communications sent over the network 108 between client devices 106 and servers 102 are received, handled and transmitted via the network traffic management devices 109, 110, 111.
Client devices 106 comprise network devices capable of connecting to other network devices, such as network traffic management device(s) 109-111 and/or servers 102. Such connections are performed over wired and/or wireless networks, such as network 108, to send and receive data, like Web-based requests, receiving responses to requests and/or performing other tasks. Non-limiting and non-exhausting examples of such client devices 106 include personal computers (e.g., desktops, laptops, tablets), mobile and/or smart phones, smart TVs, stand alone media devices and the like. In an example, client devices 106 run Web browsers that may provide an interface for operators, such as human users, to interact with for making requests for resources to different web server-based applications or Web pages via the network 108, although other server resources may be requested by clients. One or more Web-based applications may run on the server 102 that provide the requested data back to one or more exterior network devices, such as client devices 106, in the form of responses.
The servers 102 comprise one or more server computing machines capable of operating one or more Web-based and/or non Web-based applications that may be accessed by network devices (e.g. client devices, network traffic management devices) over the network 108. The servers 102 may provide other data representing requested resources, such as particular Web page(s), image(s) of physical objects, and any other objects, responsive to requests from other network devices. It should be noted that the server 102 may perform other tasks and provide other types of resources. It should be noted that while only two servers 102 are shown in the environment 100 depicted in FIG. 1, greater or lesser numbers and types of servers may be implemented in the environment 100. It is also contemplated that one or more of the servers 102 may be a cluster of servers managed by one or more network traffic management devices 109-111.
It is to be understood that the one or more servers 102 may be hardware and/or software, and/or may represent a system with multiple servers that may include internal or external networks. In an aspect, the servers 102 may be any version of Microsoft® IIS servers or Apache® servers, although other types of servers may be used. In an aspect, the server 102 utilizes software to allow it run the RADIUS protocol (Remote Access Dial In User Services), DIAMETER protocol and the like to provide authentication, authorization, and accounting (AAA) services for dial-up PPP/IP, Mobile IP access and mobile telecommunications networks. Further, additional servers may be coupled to the network 108 and many different types of applications may be available on servers coupled to the network 108.
As shown in the example environment 100 depicted in FIG. 1, one or more network traffic management devices 109, 110 are interposed between client devices 106 and the network 108, whereas one or more network traffic management devices 111 are interposed, via LAN 104, between the network 108 and the servers 102. Again, the environment 100 could be arranged in other manners with other numbers and types of network devices. It should be understood that the devices and the particular configuration shown in FIG. 1 are provided for exemplary purposes only and thus are not limiting.
Generally, the network traffic management devices 109-111 perform web acceleration functions, but can also perform other functions. Such other functions include, but are not limited to, managing network communications between one another as well as between the client devices 106 and the servers 102, load balancing, access control, validating HTTP requests using JavaScript code and the like. Client requests may be destined for one or more servers 102, and may take the form of one or more data packets over the network 108. The requests pass through one or more intermediate network devices and/or intermediate networks, until they ultimately reach the one or more network traffic management devices 109-111.
Network 108 comprises a publicly accessible network, such as the Internet; however, it is contemplated that the network 108 may comprise other types of private and public networks. Communications, such as requests from clients 106 and responses from servers 102, take place over the network 108 according to standard network protocols, such as the HTTP and TCP/IP protocols in this example. As per the TCP/IP protocols, communications between the client device 106 and the server(s) 102 may be sent as one or more streams of data packets over the network 108, via one or more network traffic management devices 109-111. It is also contemplated that streams of data packets may be sent among network traffic management devices. Such protocols can be used by the network devices to establish connections, send and receive data for existing connections, and the like. However, the principles discussed herein are not limited to this example and can include other protocols.
Further, it should be appreciated that network 108 may include local area networks (LANs), wide area networks (WANs), direct connections and any combination thereof, as well as other types and numbers of network types. On an interconnected set of LANs or other networks, including those based on differing architectures and protocols, routers, switches, hubs, gateways, bridges, and other intermediate network devices may act as links within and between LANs and other networks to enable messages and other data to be sent from and to network devices. Also, communication links within and between LANs and other networks typically include twisted wire pair (e.g., Ethernet), coaxial cable, analog telephone lines, full or fractional dedicated digital lines including T1, T2, T3, and T4, Integrated Services Digital Networks (ISDNs), Digital Subscriber Lines (DSLs), wireless links including satellite links and other communications links known to those skilled in the relevant arts. In essence, the network 108 includes any communication method by which data may travel between client devices 106, servers 102 and network traffic management devices 109-111.
LAN 104 comprises a private local area network that allows communications between the network traffic management device 110 and the one or more servers 102, although the LAN 104 may comprise other types of private and public networks with other devices. Networks, including local area networks, besides being understood by those skilled in the relevant arts, have already been generally described above in connection with network 108 and thus will not be described further.
FIG. 2A is a block diagram of a network traffic management device shown in FIG. 1 in accordance with the present disclosure. FIG. 2B is a block diagram of the data reduction module of the network traffic management device in accordance with an aspect of the present disclosure. Referring now to FIG. 2A, an example network traffic management device 109-111 includes one or more device processors 200, one or more device I/O interfaces 202, one or more network interface 204 and one or more device memories 206, which are coupled together by bus 208. As shown in FIG. 2A, the network traffic management devices 109-111 include a data reduction module 210 stored in the memory 206, although the data reduction module 210 may alternatively be stored elsewhere within or external to the network traffic management device 109-111. It should be noted that the network traffic management device could include other types and/or numbers of components and is thus not limited to the example shown in FIG. 2A.
Device processor 200 comprises one or more microprocessors or cores configured to execute computer/machine readable and executable instructions stored in device memory 206 or elsewhere. Such instructions, when executed by one or more processors, implement network traffic management related functions of the network traffic management device 109-111. In addition, the instructions of the data reduction module 210, when executed by one or more processors, cause the processor 200 to perform one or more portions of the novel processes described below. The processor 200 may comprise other types and/or combinations of processors, such as digital signal processors, micro-controllers, application specific integrated circuits (“ASICs”), programmable logic devices (“PLDs”), field programmable logic devices (“FPLDs”), field programmable gate arrays (“FPGAs”), and the like.
Device I/O interfaces 202 comprise one or more user input and output device interface mechanisms. The interface may include a computer keyboard, mouse, touchscreen display device, and the corresponding physical ports and underlying supporting hardware and software to enable the network traffic management device 109-111 to communicate with the outside environment. Such communication may include accepting user data input and to provide user output, although other types and numbers of user input and output devices may be used.
Network interface 204 comprises one or more mechanisms that enable the one or more network traffic management devices 109-111 to engage in network communications over the LAN 104 and the network 108 using one or more desired protocols (e.g. TCP/IP, UDP, HTTP, RADIUS, DNS). However, it is contemplated that the network interface 204 may be constructed for use with other communication protocols and types of networks. Network interface 204 is sometimes referred to as a transceiver, transceiving device, or network interface card (NIC), which transmits and receives network data packets to one or more networks, such as LAN 104 and network 108. In an example where the one or more network traffic management devices 109-111 include more than one device processor 200 (or a processor 200 has more than one core), each processor 200 (and/or core) may use the same single network interface 204 or a plurality of network interfaces 204. Further, the network interface 204 may include one or more physical ports, such as Ethernet ports, to couple the one or more network traffic management devices 109-111 with other network devices, such as servers 102/client devices 106. Moreover, the interface 204 may include certain physical ports dedicated to receiving and/or transmitting certain types of network data, such as device management related data for configuring the one or more network traffic management devices 109-111 and/or client request/server response related data.
Bus 208 may comprise one or more internal device component communication buses, links, bridges and supporting components, such as bus controllers and/or arbiters. The bus enables the various components of the one or more network traffic management devices 109-111, such as the processor 200, device I/O interfaces 202, network interface 204, and device memory 206, to communicate with one another. However, it is contemplated that the bus may enable one or more components of the one or more network traffic management devices 109-111 to communicate with components in other devices as well. Example buses include HyperTransport, PCI, PCI Express, InfiniBand, USB, Firewire, Serial ATA (SATA), SCSI, IDE and AGP buses. However, it is contemplated that other types and numbers of buses may be used, whereby the particular types and arrangement of buses will depend on the particular configuration of the one or more network traffic management devices 109-111.
Device memory 206 comprises non-transitory computer readable media, namely non-transitory computer readable or processor readable storage media, which are examples of machine-readable storage media. Computer readable storage/machine-readable storage media may include volatile, nonvolatile, removable, and non-removable media implemented in any method or technology for storage of information. Such storage media includes computer readable/machine-executable instructions, data structures, program modules, or other data, which may be obtained and/or executed by one or more processors, such as device processor 200. Such instructions, when executed by one or more processors, causes or allows the network traffic management device 109-111 to perform actions including implementing an operating system for controlling the general operations, manage network traffic, implement the data reduction module 210, and perform the process described in the following description in accordance with the present disclosure. Examples of non-transitory computer readable storage media include one or more types of RAM, BIOS, ROM, EEPROM, flash/firmware memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the information.
The data reduction module 210 is depicted in FIG. 2A as being within memory 206 for exemplary purposes only; it should be appreciated the module 210 may be alternatively located elsewhere. For instance, the data reduction module 210 may be located and executed on a client device 106 or server 102, as opposed to the network traffic management devices 109-111, in accordance with an aspect of the present disclosure. Generally, instructions embodying the data reduction module 210 are executed by the device processor 200 to execute the process described herein. With regard to the discussion below, the environment utilizes one or more network traffic management devices which transmit (“transmitting device”) encoded compressed data to one or more receiving network traffic management devices (“receiving device”) over an established TCP/IP (or other appropriate network protocol) connection. With respect to the environment shown in FIG. 1, the transmitting device is described herein as network traffic management device 111 whereas the receiving device is described herein as network traffic management device 110. It should be noted, however, that network traffic management device 110 may be considered the transmitting device whereas network traffic management device 111 may be considered the receiving device. It should be noted, however, that network traffic management device 109 can also perform the functions associated with transmitting compressed encoded data and/or receiving compressed encoded data in accordance with the present disclosure.
FIG. 2B illustrates a block diagram of the data reduction module of the network traffic management device in accordance with an aspect of the present disclosure. As shown in FIG. 2B, the data reduction module 210 includes one or more software components including an identifier generator 212, an index component 214 including a first index component 214A and a second index component 214B, a local bloom filter 216, a global bloom filter 218, and a data store 220. It should be noted that the above identified components are defined for explanation purposes, and these components of the data reduction module 210 need not be organized into discreet, individual components for actual implementation purposes.
In an aspect, the network traffic management devices 110, 111 utilize the data reduction module 210 to effectively compress or decompress encoded data packets using a dictionary compression technique involving a universal identifier and data length that is sent from a transmitting device to a receiving device. The identifier represents a virtual address, wherein the virtual address plus the data length is what is utilized to encode and replace the bytes, associated with content data. The receiver device determines if it has a local copy of the content data when it finds a matching identifier and data length in its local index.
In particular, a chunk of data that is to be encoded and sent from one network traffic management device, say device 111, to another network traffic management device, say device 110, is first processed by the transmitter's 111 identifier generator 212. In an aspect, the identifier generator 212 determines selective length segments of repetitive and/or redundant data for use in compressing the data into discrete data structures. Potential starting positions within input data are examined by the identifier generator 212, wherein the identifier generator 212 selects a series of bytes of data as a candidate input matching data segment for data encoding. The selected series of bytes of data may be determined utilizing a best fitness function within a sliding window to identify beginning and ending boundaries for the data segment. In other words, the fitness function evaluates a at least a portion of the series of bytes in which the bytes have a repeating pattern that may be present in other data streams. Upon identifying a byte pattern that is appropriate, the identifier generator 212 generates an encoded representation of that byte pattern, which is described herein as the universal identifier. In an aspect, the identifier is a strong identifier in that it has a relatively low probability of representing another byte pattern. The identifier generator 212 performs this process on the data stream to be transmitted, wherein portions of the data stream are converted into a plurality of data segments, each of which having a corresponding unique generated identifier. Details of the identifier generation process and fitness function technique are described in U.S. Pat. No. 7,882,084 entitled, “Compression of Data Transmitted Over a Network” owned by F5 Networks, Inc.
Along with the identifier, the identifier generator 212 also determines a length value which indicates the length of the content data associated with the corresponding identifier. As will be discussed below, the generated identifiers, as well as their associated offset and content data information, is shared among communicating network traffic management devices 110, 111 to achieve efficient encoding.
In an aspect, the data reduction module 210 stores the content data bytes of a particular data segment in the data store 220. Additionally, the data reduction module 210 stores the generated identifier and byte length for the corresponding content data in one or more index components 214. The index component 214 of the network traffic management device 109-111 allows look up functions to be performed with respect to the content data stored in the data store 220.
In an aspect, the index component 214 may be configured as a first index component 214A and a second index component 214B. In this aspect, the first index component 214A contains a table of entries, in which each entry contains an identifier and storage location information contains an index which provides offset information of each stored identifier's data and length data information. In this aspect, the first index component may be housed and executed in local RAM, whereas the second index component may be housed and executed on a separate hardware memory that is not in the local RAM of the device 109-111. It should be noted that the first and second index components are exemplary and it is contemplated that the functions of the first and second index components can be incorporated into one data structure.
The data store 220 is configured to store the content data bytes of each compressed data segment. In an aspect, the bytes of the data segments are stored sequentially in the data store 220 in the same sequential order as they are received in the data stream. In an aspect, the data store 220 is within the network traffic management device, although it is contemplated that the data store is located remotely from the network traffic management device. As will be described, portions of the data store are reserved based on the size of the encoded content, although the location(s) in the data store where the data is stored may not be preestablished. In an aspect, at least a portion of the data store 220 is configured to operate in RAM, as opposed to long term memory. In an aspect, the data store 220 may be configured to have a plurality of segments, wherein each segment has an allocated size to allow storage of a corresponding number of bytes. In such an aspect, the segments have a size of 1 MB, although other allocated sizes of greater or lesser amount are contemplated. Additionally, more than one index (e.g., primary and secondary index values) can be used and stored by the data store 220.
As will be discussed in more detail below, the data reduction module 210 in the transmitter device 111 analyzes the data stream and identifies data segments that contain a pattern of content data bytes that match data segments that have associated identifier and data length information. The transmitter device 111 replaces those content data bytes with a generated identifier and data length information to compress the data stream. In particular, the transmitter's 111 data reduction module 210 writes the identifier and data length information along with the ID of the receiving device 110 into a compressed data packet to ensure proper delivery of the compressed data packet to the receiver 110.
As shown in FIG. 2B, the network traffic management device 110, 111 includes a local bloom filter 216 and a global bloom filter 218. In particular, the local bloom filter 216 is a data structure in the network traffic management device which informs the network traffic management device whether identifier and data length information and associated content data of a particular data segment is stored in that network traffic management device's data store 220. In an aspect, the network traffic management device's local bloom filter 216 will set one or more bits to a true or false value based on whether the identifier and data length information as well as the actual content data are stored in that network traffic management device's data store 220. As will be discussed in more detail below, the local bloom filter 216 of a particular network traffic management device may be shared with one or more other network traffic management devices to provide a global or world view of the identifier and data length information stored in that device.
The global bloom filter 218 is a data structure in the network traffic management device which keeps track of whether one or more other network traffic management device(s) has a copy of the identifier and data length information as well as corresponding content data for a particular data segment in that other network traffic management device's data store 220. If it is determined in network traffic management device's (say device 111) global bloom filter 218 that another network traffic management device (say device 110) has a locally stored copy of the identifier and data length information of a particular data segment, the global bloom filter 218 of network traffic management device (say device 111) will be updated, whereby an assigned one or more bits for device 110 for that data segment will be set to a true value. In contrast, if it is determined that device 110 does not have a copy of the identifier and data length information for that particular data segment, device's 111 global bloom filter 218 will have one or more bits set to a false value. Additionally, one or more of the bits may be set as a result of a collision.
As mentioned above, the receiving device 110 provides information updates to the transmitting device 111 to notify the transmitting device 111 that the identifier and data length information for a particular data segment has been received and stored locally in the receiving device's 110 index components. In one aspect, the receiver's 110 data reduction module 210 sends a “store” message to the transmitting device 111 in which the ‘store’ message identifies the receiving device 110 and notifies the transmitter 111 that the identifier and data length information were successfully processed by the receiving device 110. The ‘store’ message is then stored in the transmitter's cache and the transmitter's global bloom filter 218 gets updated to indicate that particular device 110 as having the identifier and data length information.
In another aspect, the transmitter device 111 receives update information from the receiver device 110 in the form of a bloom filter update, wherein a copy of the receiver's 110 local bloom filter 216 is sent to the transmitting device 111 and merged with the transmitting device's 111 global bloom filter 218. This is discussed below in relation to FIG. 6.
In an aspect, the data reduction module 210 in the transmitter device 111 marks, reserves, pins or otherwise locks the identifier and data length information for a compressed data segment it has sent until the transmitter device 111 receives an acknowledgement message from the receiver 110, indicating that the identifier and data length information has been processed. This prevents the locked identifier and data length information from being erased, deleted or overwritten in the transmitter 111 until the transmitter 111 receives the acknowledgement message. Once the acknowledgement message is received at the transmitter device 111, its data reduction module 210 will unlock or otherwise remove the reservation placed on the data segment's identifier and data length information, thereby allowing such information to be deleted, erased or overwritten.
In another aspect, the transmitter 111 stores confirmation information, and/or information regarding outstanding references, for one or more data packets locally and uses the information to perform reference accounting, thereby allowing network traffic management devices 110, 111 to maintain status information of the sent encoded data packets even in the event that there is a connection failure/termination between the receiver 110 and transmitter 111. In other words, segments are pinned by the flow due to outstanding references, which prevents pruning of referenced data. In particular, each generated reference adds to the reference count for a segment, wherein each ACK that is received on the connection decrements the reference count. If flow ends, all of the outstanding references in the flow are removed, and if one or more old references exist, the device will send a probe to see if the peer ‘forgot’ to send an ACK message.
In particular to the example aspect, the data reduction module 210 of the transmitter 111 writes confirmation information identifying the flow, the particular data segment and an unconfirmed value representing the number of content data bytes which are locked for that data segment, into the header of the encoded data packet being sent to the receiver 110. In this aspect, the transmitter's 111 data reduction module 210 also assigns the same unconfirmed value to the data segment in its data store 220. In this aspect, for each successfully decoded and written content data byte at the receiver 110, the receiver's 110 data reduction module 210 monitors and updates a confirm value for the corresponding data segment that is then written in the header of the response which is sent over the network back to the transmitter 111. The transmitter 111, upon processing the header, will identify the corresponding data segment and decrement the associated unconfirmed value for each ACK message received from the receiver 110.
For example, for a data segment A containing 3 content data bytes, the receiver 110 will set the confirm value in the header to a value of 3 for data segment A. Upon processing the received confirmation message, the transmitter's 111 data reduction module 210 processes the header to identify the data segments and their confirm value. For each content data byte that is confirmed in the confirmation message, the transmitter's 111 data reduction module 210 decrements the unconfirmed value in its data store 220 upon receiving an ACK message from the receiver 110.
FIG. 3 illustrates a flow chart representing the process of transmitting compressed encoded data packets in accordance with an aspect of the present disclosure. As shown in FIG. 3, the process is performed in a network traffic management device 111 which is preparing to transmit compressed data segments to a receiving network traffic management device 110. As shown in FIG. 3, a series of bytes of a data stream that is to be transmitted by the transmitting network traffic management device 111 is received by the transmitter's 111 data reduction module 210 (Block 300). In an aspect, the series of bytes of data is of a length that fills a buffer (not shown) in the transmitting device 111 that is to be sent to the receiving device 110. The data stream comprises a plurality of data segments, in that each data segment comprises a plurality of content data bytes.
In an aspect, the data reduction module 210 of the transmitting device 111 receives a data segment in the data stream and analyzes it to identify the unique identifier assigned to it (Block 302). Upon identifying that data segment's identifier, the data reduction module 210 of the transmitting device 111 queries its own global bloom filter 218 to determine if the receiver device 110 has a locally stored copy of the identifier and data length information for the selected data segment (Block 304). As mentioned above, the receiving device 110 communicates status updates to the transmitting device 111 which are then stored in the transmitter's 111 global bloom filter 218. By the global bloom filter 218 having this information, the data reduction module 210 of the transmitter device 111 can make a decision as to whether only the identifier and data length information (as opposed to also including the actual content data) needs to be sent to the receiver device 110 for the selected data segment.
In response to the query in Block 304, if the transmitting device's 111 global bloom filter 218 indicates that the receiving device 110 has a stored copy of the identifier and data length information for the selected data segment, the process proceeds to Block 306A. In contrast, if the transmitting device's 111 global bloom filter 218 indicates that the receiving device 110 does not have a stored copy of the identifier and data length information for the selected data segment, the process proceeds to Block 306B.
Referring to Block 306A, the data reduction module 210 in the transmitter device 111 will determine a local index indicates that the identifier and data length is stored locally in the transmitter device 111. In response to the query in Block 306A, if the transmitter device's 111 index component 214 indicate that there is no match for the queried identifier and data length information, the process proceeds directly to Block 310A.
In contrast, if the transmitter device's 111 local index component 214 indicate that the there is a matching entry of the identifier and data length information for the data segment, the process proceeds to Block 308A. Accordingly, the transmitter's 111 data reduction module 210 will store the identifier and data length information in the transmitter device's 111 index component 214 and store the corresponding content data bytes for the data segment in its data store 220 (Block 308A). In an aspect, the transmitter's data reduction module 210 will also update its local bloom filter 216 to indicate that is now has a stored copy of the content data in the data store 220. The process then proceeds to Block 310A.
As shown in FIG. 3, the data reduction module 210, in an aspect, marks, reserves or otherwise locks the content data bytes and associated identifier and data length information for each data segment, stored in the transmitter device's 111 (Block 310A). As shown in later steps in FIG. 3, the identifier and data length information and content data for the sent data segment remains locked or reserved until the data reduction module 210 in the transmitter device 111 receives an acknowledge message from the receiver device 110. Also in Block 310A, the data reduction module 210 writes only the identifier and data length information data for the selected data segment into one or more data packets as compressed data packet(s) to be transmitted to the receiving device 110.
In configuring the data segment to be encoded, the data reduction module 210 does not include the actual content data associated with the selected data segment, but only provides the identifier and data length information associated with the selected data segment. Since the transmitter's 111 index component 214 and/or global bloom filter 218 indicated in Block 304 that the receiver 110 has a copy of the identifier and data length information as well as content data for the selected data segment, the receiver's 110 data reduction module 210 will be able to utilize only the identifier and data length information to successfully locate and retrieve the associated content data from its own data store 220 and accordingly decode the data segment (FIG. 4).
In an aspect, since the data being sent from the transmitter 111 is compressed, the transmitter's 111 data reduction module 210 may add one or more additional bits to the data packet indicating a “hit” as well as writing the identifier, length, and, optionally, an offset to the data packet. In an aspect, the data reduction module 210 repeats this process by selecting and analyzing additional data segments until the transmitter device's 111 buffer is filled to capacity. The transmitter device 111 will then send a data stream having a plurality of data packets having encoded and/or unencoded data segments to the receiver device 110 over the network (Block 312A).
As mentioned above, the identifier and data length information and content data for the sent data segment remains locked or reserved in the transmitter device 111 until it receives an acknowledge message from the receiver device 110. Once the acknowledgement message is received from the receiver device 110 (Block 314), the transmitter's 111 data reduction module 210 will unlock or otherwise remove the reservation placed on the data segment's identifier and data length information and content data (Block 316), thereby allowing such information to be deleted or overwritten.
Referring back to Block 304, if the transmitting device's 111 global bloom filter 218 indicates that the receiving device 110 appears to have the identifier and data length information and a stored copy of the corresponding content data bytes for the selected data segment, the process proceeds to Block 306B.
In Block 306B, the transmitter device 111 will determine whether a local index indicates that the identifier and data length is stored locally in the transmitter device 111, as described and illustrated earlier with reference to Block 306A. If the transmitter's 111 index component 214 finds a match for the queried identifier and data length information for the data segment, the process proceeds to Block 310B, as discussed in more detail below. In contrast, if the transmitter's 111 index component 214 and/or local bloom filter 216 does not find a match for the queried identifier and data length information for the data segment, the data reduction module 210 stores the content data bytes corresponding to the data segment in the data store 220. The transmitter's 111 data reduction module 210 also adds the identifier and data length information in the local index component 214 and updates its local bloom filter 216 (Block 308B).
From Block 306B or 308B, the data reduction module 210 then writes the identifier and data length information and the content data bytes for the selected data segment (i.e. uncompressed data segment) into a data packet (Block 310B). In an aspect, since the data is an uncompressed data segment, the transmitter's 111 data reduction module 210 may add one or more additional bits to the data packet indicating a “miss”. The uncompressed data segment is then sent to the receiver device 110 (Block 312B).
FIG. 4 illustrates a flow chart representing the process performed by a network traffic management device which receives a data stream from a transmitter device in accordance with an aspect of the present disclosure. As shown in FIG. 4, the receiving device 110 receives the data stream from the transmitting device 111 in which one or more data packets may be encoded and/or unencoded in a manner consistent with the novel process described above in FIG. 3 (Block 400).
As shown in FIG. 4, the data reduction module 210 of the receiving device 110 selects and extracts a data segment in a data packet in the data stream (Block 402). In an aspect, the data reduction module 210 may determine, upon processing a selected data segment, whether that the data segment contains a hit bit, as in Block 404. If the data reduction module 210 determines that the selected data segment does not contain a hit bit, then the No branch is taken to Block 408A. At Block 408A, the data reduction module 210 in the receiver device 110 determines whether a content index indicates that the identifier and content data is stored in the receiver device's 110 data store for the data segment. If the content index indicates that identifier and content data is stored in the receiver device's 110 data store 220, then the data reduction module 210 proceeds to block 414A, as described and illustrated in more detail below.
In contrast, if the content index indicates that identifier and content data is stored in the receiver device's 110 data store 220, the receiver's 110 data reduction module 210 enters the identifier and data length information in one or more index components 214 and also stores the corresponding content data bytes in the data store 220 (Block 412A). Additionally, the receiver's 110 data reduction module 210 updates its own local bloom filter 216 to indicate that a stored copy of the content data and a matching entry of the identifier and data length information is present in the receiver device 110 (Block 412A).
Referring back to Block 404, if the receiver's 110 data reduction module 210 determines that the selected data segment in the data packet includes a hit bit, then the Yes branch is taken to Block 408B. As shown in Block 408B, the data reduction module 210 in the receiver device 110 determines whether a content index indicates that the identifier and content data is stored in the receiver device's 110 data store for the data segment, as described and illustrated earlier with reference to Block 408A. If the content index indicates that identifier and content data is stored in the receiver device's 110 data store 220, then the data reduction module 210 locates the content data using the local index, identifier, data length, and optional offset and retrieves it from the receiver's 110 data store 220 (Block 410). The process then proceeds to Block 414A.
In contrast, if the content index indicates that identifier and content data is stored in the receiver device's 110 data store 220, the receiver's 110 data reduction module 210 sends a resend message back to the transmitting device 111 (Block 412B). In particular, the resend communication identifies the receiver device 110, the selected data segment, the identifier and data length information and other relevant information to allow the transmitting device 111 to locate the content data and send it to the receiver device 110.
The transmitter's 111 data reduction module 210, upon receiving and processing the resend communication, is able to retrieve the content data from its data store 220 using the identifier and data length information (Block 310B in FIG. 3). It is assumed that since the transmitter's 111 data reduction module 210 locks the content data such that it cannot be deleted until a confirmation message for that data segment is received from the receiver 110, that content data cannot be deleted or erased from the transmitter's 111 data store 220. Once the content data is retrieved, the transmitter's 111 data reduction module 210 writes the content data into a resend data packet to be sent to the requesting receiving device 110 (Block 312B in FIG. 3). The resend data packet is then sent to the receiving device 110 (Block 314 in FIG. 3). In an aspect, the receiver's 110 data reduction module 210 receives the resend data packet from the transmitter 111 (Block 414B). The process then proceeds to Block 412A.
In an aspect, the transmitter's 111 data reduction module 210, before resending the data packet, may write additional information which instructs the receiver's 110 data reduction module 210 to automatically store the content data as well as update its index component 214 with the identifier and data length information upon receiving it (as opposed to again querying its local bloom filter). This allows the receiver device 110 to quickly process, decode and write the data packet to its buffer.
Irrespective of whether the receiver device 110 initially has or does not have the content data for the identifier, the data reduction module 210 thereafter writes the content data for the identifier for decoding, as shown in Block 414A. In an aspect, after the content data for the data segment is written for decoding, the data reduction module 210 sends a confirmation message back to the transmitting device 111 for that data segment (Block 416). In particular, the confirmation message notifies the transmitting device 111 that the data segment has been successfully processed at the receiver 110, wherein the stored content data and identifier and data length information for the selected data segment can be unlocked and deletable in the transmitting device 111.
FIG. 5 illustrates a flow chart representing a process performed by the data reduction module of a transmitting device with regard to an encoded data segment having partially matched content data bytes stored in the transmitting device, in accordance with an aspect of the present disclosure. As shown in FIG. 5, the data reduction module 210 of the transmitting device 111 processes a data segment in a data stream to be compressed and transmitted to the receiving device 110 (Block 500). In the example process of FIG. 5, the data reduction module 210 of the transmitting device 111 examines the content data bytes of the data segment and at some point during the compression, identifies a miss indicating a portion of the data in the data segment that does not match stored data. In step 504, the data reduction module 210 of the transmitting device 111 determines whether the previous result, prior to generating the miss, was a hit indicating that the data in the previous data segment matched a stored data segment. If the data reduction module 210 determines that the previous result was a hit, then the Yes branch is taken to step 506.
In step 506, the data reduction module 210 of the transmitting device 111 retrieves data bytes, an identifier, and a length which follow the stored segment which generated the hit associated with the previous result. In step 508, the data reduction module 210 of the transmitting device 111 identifies non-matching content data byte(s) and location(s) with respect to matching content byte(s) in the data bytes retrieved from the data store 220 in step 506. With respect to the matching content byte(s), in step 510, the data reduction module 210 of the transmitting device 111 generates partial matching offset(s) and length(s).
In step 512, the data reduction module 210 of the transmitting device 111 determines whether any partial matches were generated as a result of steps 506-510. Optionally, the data reduction module 210 of the transmitting device 111 also determines whether the receiver device 110 has the content for the generated partial matching identifier by checking the global bloom filter prior to sending the data packet in step 514. If the data reduction module 210 determines that partial matches were generated (and the receiver device 110 has the content), then the Yes branch is taken to step 514. In step 514, the data reduction module 210 of the transmitting device 111 writes at least one bit indicating a partial match, a generated partial matching identifier, offset(s), and length(s) to a data packet.
Referring back to step 504 and 512, if the data reduction module 210 determines the previous result was not a hit or that no partial matches were generated (or the receiver device 110 does not have the content), then the No branch is taken from steps 504 and 512, respectively, to step 516. In step 516, the data reduction module 210 of the transmitting device 111 writes at least one bit indicating a miss, content data, an identifier, and a length into a data packet. The data packets generated in steps 514 and 516 are then sent to the receiver device 110. If the data reduction module 210 of the transmitting device 111 determines the receiver device 110 does not have the content for the generated partial matching identifier, then step 516 is performed.
FIG. 6 illustrates an exemplary process of updating a portion of a network traffic management device's global bloom filter in accordance with an aspect of the present disclosure. As shown in FIG. 6, a network traffic management device will update at least a portion of its global bloom filter 218 by selecting a sub-region of its global bloom filter 218 which contains update information of one or more local bloom filters 216 of at least one other network device(s). In an aspect, a particular sub-region of a network device's global bloom filter 218 may contain portions of two or more other network device's local bloom filters 216. For example, a sub-region in network device's 111 global bloom filter 218 may contain a portion of the network device's 110 local bloom filter and a portion of network device's 109 local bloom filter (FIG. 1) in accordance with an aspect of the present disclosure.
Regarding FIG. 6, the network device 111 may choose to update a sub-region in its global bloom filter 218 to update (Block 600). In an aspect, the network device 111 will update only those sub-regions of its global bloom filter 218 which have crossed a threshold value or percentage of true “1” or false “0” bits with respect to the corresponding other network devices 109, 110. The reason for this is that since bloom filters are normally only additive (and other network devices 111 only set bits when it learns that content exists at one or more other network devices 109, 110), regions of the global bloom filter tend to fill up with “1” values. Once a threshold has been crossed, say 50% one bits (or more or less), the network device 111 selects those other network devices 109, 110 to get updates from.
Accordingly, the data reduction module 210 of network device 111 monitors the number of “1” bits that are set as a percentage of the global bloom filter region corresponding to another network device 109, 110. This list of remote devices is determined by which devices can set bits (but might not have) in that region. This list may be further restricted to the top N devices in terms of who the transmitting device 111 sends compressed data packets to.
The data reduction module 210 will accordingly identify those other corresponding network devices 109, 110 with which the network device 111 has recently communicated compressed data packets with and will send an update request to those network devices 109, 110 (Block 602). In particular, network device's 111 data reduction module 210 will request an update from a specified set of the other network device's 109, 110 local bloom filters 216 that correspond to the matching portions of device's 111 global bloom filter sub-region.
As shown in FIG. 6, the data reduction module 210 of the network device 111 thereafter clears the global bloom filter sub-region from the requested network devices 109, 110 (Block 604). The data reduction module 210 then receives the requested local bloom filter update from the specified set of other RD(s), which in this example includes network devices 109 and 110 (Block 606). The data reduction module 210 of the transmitter device 111 thereafter performs an XOR on the local bloom filter update(s) received from the various receiving network devices 109, 110 with the current version of the global bloom filter portion to generate a merged, updated global bloom filter sub-region (Block 608). The merged updated global bloom filter sub-region is then stored in the corresponding sub-region of the requesting network device's 111 global bloom filter 218 (Block 610).
Having thus described the basic concepts, it will be rather apparent to those skilled in the art that the foregoing detailed disclosure is intended to be presented by way of example only, and is not limiting. Various alterations, improvements, and modifications will occur and are intended to those skilled in the art, though not expressly stated herein. These alterations, improvements, and modifications are intended to be suggested hereby, and are within the spirit and scope of the examples. Additionally, the recited order of processing elements or sequences, or the use of numbers, letters, or other designations therefore, is not intended to limit the claimed processes to any order except as may be specified in the claims.
1. A method for facilitating deduplication in mesh networks using dictionary compression of encoded data to improve bandwidth utilization and scalability, the method implemented by a network traffic management system comprising one or more transmitter devices (TDs), one or more receiver devices (RDs), one or more client devices, or one or more server devices and comprising:
selecting a first data segment in a received data stream having a universal first identifier and data length information and a corresponding first plurality of content data bytes;
querying a global probabilistic data structure to determine when an RD has a stored copy of the first plurality of content data bytes and the corresponding universal first identifier and data length information for the first data segment, and when the determining indicates that the RD has the stored copy of the universal first identifier and data length information and first plurality of content data bytes for the first data segment:
performing dictionary compression by preparing a first encoded data packet, wherein the first encoded data packet includes the universal first identifier and data length information without the first plurality of content data bytes for the data segment; and
sending the first encoded data packet over a network to the RD to facilitate retrieval of the first plurality of content data bytes associated with the first data segment from the RD's data store and decoding of the first data segment to include the first plurality of content data bytes without requiring that the first plurality of content data bytes for the data segment be transmitted over the network.
2. The method of claim 1, further comprising:
querying a local probabilistic data structure to determine when a copy of the universal first identifier and data length information and first content data bytes for the first data segment is stored locally; and
storing the universal first identifier and data length information in one or more index components, storing the first content data bytes in a data store, and updating the local probabilistic data structure to reflect that a copy of the universal first identifier and data length information and the first content data bytes for the first data segment is locally stored, when the determining indicates that a copy of the universal first identifier and data length information and first content data bytes is not locally stored.
3. The method of claim 1, further comprising:
marking the universal first identifier and data length information and the first plurality of content data bytes for the first data segment as locked, wherein the universal first identifier and data length information and the first plurality of content data bytes cannot be deleted when marked as locked;
receiving a confirmation communication from the RD for the first data segment;
updating the global probabilistic data structure to reflect that the RD has the universal first identifier and data length information and first plurality of content data bytes of the first data segment stored locally in the RD; and
marking the universal first identifier and data length information and the first plurality of content data bytes for the first data segment as being unlocked, wherein the universal first identifier and data length information and the first plurality of content data bytes are able to be deleted when marked as unlocked.
4. The method of claim 1, further comprising:
receiving a resend request from the RD, wherein the resend request comprises an instruction to send the first plurality of content data bytes for the first data segment;
retrieving the first content data bytes from a data store utilizing the universal first identifier and data length information;
sending the universal first identifier and the first plurality of bytes to the RD, wherein the universal first identifier and the first plurality of bytes remain in a locked status.
5. The method of claim 1, further comprising:
selecting a second data segment in the data stream, the second data segment including a plurality of second content data bytes;
determining a partial match between the plurality of content data bytes of the second data segment and content data bytes of a stored segment following a stored version of the first content data bytes of the first data segment;
identifying one or more non-matching content data bytes between the second data segment and the first data segment; and
generating a universal second identifier corresponding to a universal third identifier of the stored segment following the stored version of the first content data bytes of the first data segment.
6. The method of claim 1, wherein the global probabilistic data structure comprises a bloom filter.
7. A non-transitory computer readable medium having stored thereon instructions for facilitating deduplication in mesh networks using dictionary compression of encoded data to improve bandwidth utilization and scalability comprising executable code, which when executed by at least one processor, causes the at least one processor to:
select a first data segment in a received data stream having a universal first identifier and data length information and a corresponding first plurality of content data bytes;
query a global probabilistic data structure to determine when a receiver device (RD) has a stored copy of the first plurality of content data bytes and the corresponding universal first identifier and data length information for the first data segment, and when the determining indicates that the RD has the stored copy of the universal first identifier and data length information and first plurality of content data bytes for the first data segment:
perform dictionary compression by preparing a first encoded data packet, wherein the first encoded data packet includes the universal first identifier and data length information without the first plurality of content data bytes for the data segment; and
send the first encoded data packet over a network to the RD to facilitate retrieval of the first plurality of content data bytes associated with the first data segment from the RD's data store and decoding of the first data segment to include the first plurality of content data bytes without requiring that the first plurality of content data bytes for the data segment be transmitted over the network.
8. The non-transitory computer readable medium of claim 7, wherein the executable code, when executed by the at least one processor, further causes the processor to:
query a local probabilistic data structure to determine when a copy of the universal first identifier and data length information and first content data bytes for the first data segment is stored locally; and
store the universal first identifier and data length information in one or more index components, store the first content data bytes in a data store, and update the local probabilistic data structure to reflect that a copy of the universal first identifier and data length information and the first content data bytes for the first data segment is locally stored, when the determining indicates that a copy of the universal first identifier and data length information and first content data bytes is not locally stored.
9. The non-transitory computer readable medium of claim 7, wherein the executable code, when executed by the at least one processor, further causes the processor to:
mark the universal first identifier and data length information and the first plurality of content data bytes for the first data segment as locked, wherein the universal first identifier and data length information and the first plurality of content data bytes cannot be deleted when marked as locked;
receive a confirmation communication from the RD for the first data segment;
update the global probabilistic data structure to reflect that the RD has the universal first identifier and data length information and first plurality of content data bytes of the first data segment stored locally in the RD; and
mark the universal first identifier and data length information and the first plurality of content data bytes for the first data segment as being unlocked, wherein the universal first identifier and data length information and the first plurality of content data bytes are able to be deleted when marked as unlocked.
10. The non-transitory computer readable medium of claim 7, wherein the executable code, when executed by the at least one processor, further causes the processor to:
receive a resend request from the RD, wherein the resend request comprises an instruction to send the first plurality of content data bytes for the first data segment;
retrieve the first content data bytes from a data store utilizing the universal first identifier and data length information; and
send the universal first identifier and the first plurality of bytes to the RD, wherein the universal first identifier and the first plurality of bytes remain in a locked status.
11. The non-transitory computer readable medium of claim 7, wherein the executable code, when executed by the at least one processor, further causes the processor to:
select a second data segment in the data stream, the second data segment including a plurality of second content data bytes;
determine a partial match between the plurality of content data bytes of the second data segment and content data bytes of a stored segment following a stored version of the first content data bytes of the first data segment;
identify one or more non-matching content data bytes between the second data segment and the first data segment; and
generate a universal second identifier corresponding to a universal third identifier of the stored segment following the stored version of the first content data bytes of the first data segment.
12. The non-transitory computer readable medium of claim 7, wherein the global probabilistic data structure comprises a bloom filter.
13. A transmitter device (TD) comprising memory comprising programmed instructions stored thereon and one or more processors configured to be capable of executing the stored programmed instructions to:
select a first data segment in a received data stream having a universal first identifier and data length information and a corresponding first plurality of content data bytes;
query a global probabilistic data structure to determine when a receiver device (RD) has a stored copy of the first plurality of content data bytes and the corresponding universal first identifier and data length information for the first data segment, and when the determining indicates that the RD has the stored copy of the universal first identifier and data length information and first plurality of content data bytes for the first data segment:
perform dictionary compression by preparing a first encoded data packet, wherein the first encoded data packet includes the universal first identifier and data length information without the first plurality of content data bytes for the data segment; and
send the first encoded data packet over a network to the RD to facilitate retrieval of the first plurality of content data bytes associated with the first data segment from the RD's data store and decoding of the first data segment to include the first plurality of content data bytes without requiring that the first plurality of content data bytes for the data segment be transmitted over the network.
14. The transmitter device (TD) of claim 13, wherein the one or more processors are further configured to be capable of executing the programmed instructions stored in the memory to:
query a local probabilistic data structure to determine when a copy of the universal first identifier and data length information and first content data bytes for the first data segment is stored locally; and
store the universal first identifier and data length information in one or more index components, store the first content data bytes in a data store, and update the local probabilistic data structure to reflect that a copy of the universal first identifier and data length information and the first content data bytes for the first data segment is locally stored, when the determining indicates that a copy of the universal first identifier and data length information and first content data bytes is not locally stored.
15. The transmitter device (TD) of claim 13, wherein the one or more processors are further configured to be capable of executing the programmed instructions stored in the memory to:
mark the universal first identifier and data length information and the first plurality of content data bytes for the first data segment as locked, wherein the universal first identifier and data length information and the first plurality of content data bytes cannot be deleted when marked as locked;
receive a confirmation communication from the RD for the first data segment;
update the global probabilistic data structure to reflect that the RD has the universal first identifier and data length information and first plurality of content data bytes of the first data segment stored locally in the RD; and
mark the universal first identifier and data length information and the first plurality of content data bytes for the first data segment as being unlocked, wherein the universal first identifier and data length information and the first plurality of content data bytes are able to be deleted when marked as unlocked.
16. The transmitter device (TD) of claim 13, wherein the one or more processors are further configured to be capable of executing the programmed instructions stored in the memory to:
receive a resend request from the RD, wherein the resend request comprises an instruction to send the first plurality of content data bytes for the first data segment;
retrieve the first content data bytes from a data store utilizing the universal first identifier and data length information; and
send the universal first identifier and the first plurality of bytes to the RD, wherein the universal first identifier and the first plurality of bytes remain in a locked status.
17. The transmitter device (TD) of claim 13, wherein the one or more processors are further configured to be capable of executing the programmed instructions stored in the memory to:
select a second data segment in the data stream, the second data segment including a plurality of second content data bytes;
determine a partial match between the plurality of content data bytes of the second data segment and content data bytes of a stored segment following a stored version of the first content data bytes of the first data segment;
identify one or more non-matching content data bytes between the second data segment and the first data segment; and
generate a universal second identifier corresponding to a universal third identifier of the stored segment following the stored version of the first content data bytes of the first data segment.
18. The transmitter device (TD) of claim 13, wherein the global probabilistic data structure comprises a bloom filter.
19. A network traffic management system comprising one or more transmitter devices (TDs), one or more receiver devices (RDs), one or more client devices, or one or more server devices, the system comprising memory comprising programmed instructions stored thereon and one or more processors configured to be capable of executing the stored programmed instructions to:
select a first data segment in a received data stream having a universal first identifier and data length information and a corresponding first plurality of content data bytes;
query a global probabilistic data structure to determine when an RD has a stored copy of the first plurality of content data bytes and the corresponding universal first identifier and data length information for the first data segment, and when the determining indicates that the RD has the stored copy of the universal first identifier and data length information and first plurality of content data bytes for the first data segment:
perform dictionary compression by preparing a first encoded data packet, wherein the first encoded data packet includes the universal first identifier and data length information without the first plurality of content data bytes for the data segment; and
send the first encoded data packet over a network to the RD to facilitate retrieval of the first plurality of content data bytes associated with the first data segment from the RD's data store and decoding of the first data segment to include the first plurality of content data bytes without requiring that the first plurality of content data bytes for the data segment be transmitted over the network.
20. The network traffic management system of claim 19, wherein the global probabilistic data structure comprises a bloom filter.