Patent application title:

Local hash value generation in non-volatile data storage systems

Publication number:

-

Publication date:
Application number:

14/035,869

Filed date:

2013-09-24

✅ Patent granted

Patent number:

US 9,524,235 B1

Grant date:

2016-12-20

PCT filing:

-

PCT publication:

-

Examiner:

Matthew Bradley

Agent:

Morgan, Lewis & Bockius LLP

Adjusted expiration:

2035-03-25

Smart Summary: Local hash value generation can be done directly in non-volatile data storage systems, like flash memory, instead of relying on the host computer. This system uses a method called a Bloom filter, which helps manage data more efficiently. When the host sends requests for data, the storage system generates specific bit positions using multiple hash functions. These bit positions are then set in the Bloom filter stored within the memory. This approach allows for better data handling and reduces the need for traditional read and write methods. 🚀 TL;DR

Abstract:

The various implementations described herein include systems, methods and/or devices used to enable local hash value generation in a non-volatile data storage system (e.g., using a flash memory device). In one aspect, rather than having Bloom filter logic in a host, Bloom filter functionality is integrated in the non-volatile data storage system. In some implementations, at a non-volatile data storage system, the method includes receiving from a host a plurality of requests that specify respective elements. The method further includes, for each respective element specified by the received requests, (1) generating a respective set of k bit positions in a Bloom filter, using k distinct hash functions, where k is an integer greater than 2, and (2) setting the respective set of k bit positions in the Bloom filter, which is stored in a non-volatile storage medium of the non-volatile data storage system.

Inventors:

Assignee:

Applicant:

Interested in similar patents?

Get notified when new applications in this technology area are published.

Classification:

G06F12/0246 »  CPC main

Accessing, addressing or allocating within memory systems or architectures; Addressing or allocation; Relocation; User address space allocation, e.g. contiguous or non contiguous base addressing; Free address space management; Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory in block erasable memory, e.g. flash memory

G06F13/12 »  CPC further

Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units; Program control for peripheral devices using hardware independent of the central processor, e.g. channel or peripheral processor

G06F2212/7202 »  CPC further

Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures; Details relating to flash memory management Allocation control and policies

Y10S707/99932 »  CPC further

Data processing: database and file management or data structures; Database or file accessing Access augmentation or optimizing

Y10S707/99933 »  CPC further

Data processing: database and file management or data structures; Database or file accessing Query processing, i.e. searching

G06F12/00 IPC

Accessing, addressing or allocating within memory systems or architectures

G06F12/02 IPC

Accessing, addressing or allocating within memory systems or architectures Addressing or allocation; Relocation

Description

RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Patent Application No. 61/858,522, filed on Jul. 25, 2013, which is incorporated by reference herein.

TECHNICAL FIELD

The disclosed embodiments relate generally to memory systems, and in particular, to using non-volatile data storage systems to implement Bloom filters.

BACKGROUND

Semiconductor memory devices, including flash memory, typically utilize memory cells to store data as an electrical value, such as an electrical charge or voltage. A flash memory cell, for example, includes a single transistor with a floating gate that is used to store a charge representative of a data value. Flash memory is a non-volatile data storage device that can be electrically erased and reprogrammed. Non-volatile memory retains stored information even when not powered, as opposed to volatile memory, which requires power to maintain the stored information. In an address-targeted write to memory, a host supplies an address and the data to be written. In an address-targeted read from memory, a host supplies an address from which to read. However, when memory is used to implement data structures such as Bloom filters, using address-targeted read and write methods to access memory is not ideal.

SUMMARY

Various implementations of systems, methods and devices within the scope of the appended claims each have several aspects, no single one of which is solely responsible for the attributes described herein. Without limiting the scope of the appended claims, after considering this disclosure, and particularly after considering the section entitled “Detailed Description” one will understand how the aspects of various implementations are used to enable local hash value generation in a non-volatile data storage system (e.g., using a flash memory device). In one aspect, rather than having Bloom filter logic in a host, Bloom filter functionality is integrated in the non-volatile data storage system. In some implementations, an object “X” is directly transferred by the host to the non-volatile data storage system. In other implementations, the object “X” is hashed by the host and a fingerprint of object “X” is transferred by the host to the non-volatile data storage system.

BRIEF DESCRIPTION OF THE DRAWINGS

So that the present disclosure can be understood in greater detail, a more particular description may be had by reference to the features of various implementations, some of which are illustrated in the appended drawings. The appended drawings, however, merely illustrate the more pertinent features of the present disclosure and are therefore not to be considered limiting, for the description may admit to other effective features.

FIG. 1 is a block diagram illustrating an implementation of a data storage system, in accordance with some embodiments.

FIG. 2 is a block diagram illustrating an implementation of a management module, in accordance with some embodiments.

FIG. 3 is a prophetic diagram of voltage distributions that may be found in a single-level flash memory cell (SLC) over time, in accordance with some embodiments.

FIGS. 4A-4B illustrate a flowchart representation of a method for data processing, in accordance with some embodiments.

In accordance with common practice the various features illustrated in the drawings may not be drawn to scale. Accordingly, the dimensions of the various features may be arbitrarily expanded or reduced for clarity. In addition, some of the drawings may not depict all of the components of a given system, method or device. Finally, like reference numerals may be used to denote like features throughout the specification and figures.

DETAILED DESCRIPTION

Many applications use a data structure called a Bloom filter to determine whether an element is a member of a set (e.g., whether an object is already physically present in a storage media). Bloom filter arrays (the storage aspect of a Bloom filter) can be implemented with dynamic random-access memory (DRAM), but this can become prohibitively expensive as the size of the set grows. In embodiments disclosed below, for applications where large Bloom filters are needed, NAND flash storage devices are used.

The various implementations described herein include systems, methods and/or devices used to enable local hash value generation in a non-volatile data storage system. Some implementations include systems, methods and/or devices to integrate Bloom filter functionality in the non-volatile data storage system.

More specifically, some implementations include a method for data processing. In some implementations, at a non-volatile data storage system, the method includes receiving from a host a plurality of requests that specify respective elements. The method further includes, for each respective element specified by the received requests, (1) generating a respective set of k bit positions in a Bloom filter, using k distinct hash functions, where k is an integer greater than 2, and (2) setting the respective set of k bit positions in the Bloom filter, wherein the Bloom filter is stored in a non-volatile storage medium of the non-volatile data storage system.

In some embodiments, the method includes generating the respective set of k bit positions in the Bloom filter using one or more processors of the non-volatile data storage system.

In some embodiments, the method includes generating the respective set of k bit positions in the Bloom filter using k parallel processors of the non-volatile data storage system.

In some embodiments, the non-volatile storage medium includes one or more flash memory devices.

In some embodiments, the non-volatile data storage system is distinct from the host.

In some embodiments, the non-volatile data storage system is embedded in the host.

In some embodiments, the method further includes receiving a first element for testing with respect to the Bloom filter. The method further includes testing whether the first element is present in the Bloom filter, by (1) processing the first element with the k distinct hash functions to generate a first set of k bit positions, (2) reading the first set of k bit positions from the Bloom filter, (3) returning a first result if all the k bit positions in the Bloom filter from the first set are set, and (4) returning a second result if one or more of the k bit positions in the Bloom filter from the first set are not set.

In some embodiments, the respective elements specified by the plurality of requests comprise a plurality of objects.

In some embodiments, the respective elements specified by the plurality of requests comprise n-bit fingerprints of a plurality of objects, where n is at least 64.

In another aspect, any of the methods described above are performed by a non-volatile data storage system comprising (1) a non-volatile storage medium storing a Bloom filter, (2) one or more processors, and (3) memory storing one or more programs, which when executed by the one or more processors cause the non-volatile data storage system to perform any of the methods described above.

In yet another aspect, a non-transitory computer readable storage medium stores one or more programs configured for execution by one or more processors of a non-volatile data storage system, the one or more programs comprising instructions for causing the non-volatile data storage system to perform any of the methods described above.

In yet another aspect, a non-volatile data storage system is configured to process data in accordance with any of the methods described above. In some embodiments, the non-volatile data storage system includes means for receiving from a host a plurality of requests that specify respective elements, and means for processing each respective element specified by the received requests, including (1) means for generating a respective set of k bit positions in a Bloom filter, using k distinct hash functions, where k is an integer greater than 2, and (2) means for setting the respective set of k bit positions in the Bloom filter, wherein the Bloom filter is stored in a non-volatile storage medium of the non-volatile data storage system.

Numerous details are described herein in order to provide a thorough understanding of the example implementations illustrated in the accompanying drawings. However, some embodiments may be practiced without many of the specific details, and the scope of the claims is only limited by those features and aspects specifically recited in the claims. Furthermore, well-known methods, components, and circuits have not been described in exhaustive detail so as not to unnecessarily obscure more pertinent aspects of the implementations described herein.

FIG. 1 is a diagram of an implementation of a data storage system 100, in accordance with some embodiments. While some example features are illustrated, various other features have not been illustrated for the sake of brevity and so as not to obscure more pertinent aspects of the example implementations disclosed herein. To that end, as a non-limiting example, the data storage system 100 includes a memory controller 120, and a storage medium 130, and is used in conjunction with a computer system 110. In some implementations, storage medium 130 is a single flash memory device while in other implementations storage medium 130 includes a plurality of flash memory devices. In some implementations, storage medium 130 is NAND-type flash memory or NOR-type flash memory. Further, in some implementations memory controller 120 is a solid-state drive (SSD) controller. However, other types of storage media may be included in accordance with aspects of a wide variety of implementations.

Computer system 110 is coupled to memory controller 120 through data connections 101. However, in some implementations computer system 110 includes memory controller 120 as a component and/or a sub-system. Computer system 110 may be any suitable computer device, such as a computer, a laptop computer, a tablet device, a netbook, an internet kiosk, a personal digital assistant, a mobile phone, a smart phone, a gaming device, a computer server, or any other computing device. Computer system 110 is sometimes called a host or host system. In some implementations, computer system 110 includes one or more processors, one or more types of memory, a display and/or other user interface components such as a keyboard, a touch screen display, a mouse, a track-pad, a digital camera and/or any number of supplemental devices to add functionality.

Storage medium 130 is coupled to memory controller 120 through connections 103. Connections 103 are sometimes called data connections, but typically convey commands in addition to data, and optionally convey metadata, error correction information and/or other information in addition to data values to be stored in storage medium 130 and data values read from storage medium 130. In some implementations, however, memory controller 120 and storage medium 130 are included in the same device as components thereof. Furthermore, in some implementations memory controller 120 and storage medium 130 are embedded in a host device, such as a mobile device, tablet, other computer or computer controlled device, and the methods described herein are performed by the embedded memory controller. Storage medium 130 may include any number (i.e., one or more) of memory devices including, without limitation, non-volatile semiconductor memory devices, such as flash memory. For example, flash memory devices can be configured for enterprise storage suitable for applications such as cloud computing, or for caching data stored (or to be stored) in secondary storage, such as hard disk drives. Additionally and/or alternatively, flash memory can also be configured for relatively smaller-scale applications such as personal flash drives or hard-disk replacements for personal, laptop and tablet computers. Furthermore, as discussed in more detail below, flash memory devices can be configured to implement data structures such as Bloom filter array(s) 131.

A Bloom filter (e.g., Bloom filter array(s) 131) is a probabilistic data structure used to determine if an element “x” is a member of a set “S” with high probability. A Bloom filter is constructed using an N-bit array that is initially cleared, and has hash functions where 0≦Hash (x,k)≦N−1. For each element “x” in set “S,” k hash functions are computed, and the k corresponding bits in the N-bit array are set. In some embodiments, a Bloom filter is initially cleared by resetting the N-bit array to all zeros, and the k corresponding bits in the N-bit array are set to ones. In some embodiments, a Bloom filter is initially cleared by resetting the N-bit array to all ones, and the k corresponding bits in the N-bit array are set to zeros. While the labeling of memory cell states as having specific data values is somewhat arbitrary, with respect to flash memory devices, memory cells that have been reset are typically said to represent ones, and memory cells that have been set are typically said to represent zeros. However, any labeling or mapping of memory cell states to data values can be used, as long as it is used consistently.

As an example, to test an element “w” for membership in the set “S,” the k hash functions are generated for element “w” and the k bit positions are tested. If the k bit positions are set, then the element “w” is most likely a member of set “S,” with a possibility of this membership being a “false positive.” A false positive is when the Bloom filter returns a result that an element is a member of the set “S,” when in actuality it is not. Bloom filters return fewer false positives when the number of elements in the set “S” is an order of magnitude smaller than the number of bits in the bit array. The probability of a false positive is given by equation (1):

( 1 - ⅇ - k ⁡ ( n + 0.5 ) / ( m - 1 ) ) k ( 1 )

In equation (1), k represents the number of hash functions per element, m represents the number of bits in the Bloom filter, and n is the number of elements stored in the Bloom filter.

Storage medium 130 is divided into a number of addressable and individually selectable blocks. In some implementations, the individually selectable blocks are the minimum size erasable units in a flash memory device. In other words, each block contains the minimum number of memory cells that can be erased simultaneously. Each block is usually further divided into a plurality of pages and/or word lines, where each page or word line is typically an instance of the smallest individually accessible (readable) portion in a block. In some implementations (e.g., using some types of flash memory), the smallest individually accessible unit of a data set, however, is a sector, which is a subunit of a page. That is, a block includes a plurality of pages, each page contains a plurality of sectors, and each sector is the minimum unit of data for reading data from the flash memory device.

For example, one block comprises any number of pages, for example, 64 pages, 128 pages, 256 pages or another suitable number of pages. Blocks are typically grouped into a plurality of zones. Each block zone can be independently managed to some extent, which increases the degree of parallelism for parallel operations and simplifies management of storage medium 130.

In some implementations, memory controller 120 includes a management module 121, a host interface 129, a storage medium interface (I/O) 128, and additional module(s) 125. Memory controller 120 may include various additional features that have not been illustrated for the sake of brevity and so as not to obscure more pertinent features of the example implementations disclosed herein, and a different arrangement of features may be possible. Host interface 129 provides an interface to computer system 110 through data connections 101. Similarly, storage medium I/O 128 provides an interface to storage medium 130 though connections 103. In some implementations, storage medium I/O 128 includes read and write circuitry, including circuitry capable of providing reading signals to storage medium 130 (e.g., reading threshold voltages for NAND-type flash memory).

In some implementations, management module 121 includes one or more processing units (CPUs, also sometimes called processors) 122 configured to execute instructions in one or more programs (e.g., in management module 121). In some implementations, the one or more CPUs 122 are shared by one or more components within, and in some cases, beyond the function of memory controller 120. Management module 121 is coupled to host interface 129, additional module(s) 125 and storage medium I/O 128 in order to coordinate the operation of these components.

Additional module(s) 125 are coupled to storage medium I/O 128, host interface 129, and management module 121. As an example, additional module(s) 125 may include an error control module to limit the number of uncorrectable errors inadvertently introduced into data during writes to memory or reads from memory. In some embodiments, additional module(s) 125 are executed in software by the one or more CPUs 122 of management module 121, and, in other embodiments, additional module(s) 125 are implemented in whole or in part using special purpose circuitry (e.g., to perform encoding and decoding functions).

During an address-targeted write operation, host interface 129 receives data to be stored in storage medium 130 from computer system 110. The data held in host interface 129 is made available to an encoder (e.g., in additional module(s) 125), which encodes the data to produce one or more codewords. The one or more codewords are made available to storage medium I/O 128, which transfers the one or more codewords to storage medium 130 in a manner dependent on the type of storage medium being utilized.

An address-targeted read operation is initiated when computer system (host) 110 sends one or more host read commands on control line 111 to memory controller 120 requesting data from storage medium 130. Memory controller 120 sends one or more read access commands to storage medium 130, via storage medium I/O 128, to obtain raw read data in accordance with memory locations (addresses) specified by the one or more host read commands. Storage medium I/O 128 provides the raw read data (e.g., comprising one or more codewords) to a decoder (e.g., in additional module(s) 125). If the decoding is successful, the decoded data is provided to host interface 129, where the decoded data is made available to computer system 110. In some implementations, if the decoding is not successful, memory controller 120 may resort to a number of remedial actions or provide an indication of an irresolvable error condition.

Bloom filter implementations using address-targeted write and read operations would require transferring large amounts of data between computer system (host) 110 and data storage system 100. For example, to add an object “X” to Bloom filter array(s) 131, computer system 110 would generate k hashes and then initiate k read-modify-write commands to data storage system 100. In some examples, this would require the sensing, transfer, modification, and write back of k×4 KB pages. As another example, to test an element for presence in Bloom filter array(s) 131, computer system 110 would initiate k read commands. Instead of using address-targeted write and read operations, which require computer system 110 to generate k hashes and/or initiate k commands to data storage system 100, Bloom filter functionality is integrated in data storage system 100, as described below and with reference to FIG. 2.

When Bloom filter functionality is integrated in data storage system 100, computer system 110 is not required to generate k hashes and initiate k commands in order to add an object “X” to Bloom filter array(s) 131. Instead, in some implementations, computer system 110 transfers object “X” directly to data storage system 100 as an element to add to Bloom filter array(s) 131. In some implementations, computer system 110 generates a fingerprint of object “X” (e.g., an n-bit fingerprint of object “X,” where n is at least 64) and transfers the fingerprint of object “X” directly to data storage system 100 as an element to add to Bloom filter array(s) 131. For each element received from computer system 110 to add to Bloom filter array(s) 131, data storage system 100 generates k bit positions in Bloom filter array(s) 131, using k distinct hash functions, where k is an integer greater than 2. Further, data storage system 100 sets the k bit positions in Bloom filter array(s) 131 (e.g., using write circuitry in storage medium I/O 128). Thus, only a single host command (e.g., “Add Element”) is needed to add an element to Bloom filter array(s) 131, reducing data transfers between computer system 110 and memory controller 120.

Further, when Bloom filter functionality is integrated in data storage system 100, computer system 110 is not required to initiate k read commands in order to test whether an element is present in Bloom filter array(s) 131. Instead, similar to the process described above for adding an element to Bloom filter array(s) 131, in some implementations, computer system 110 transfers an element (e.g., object “X” or a fingerprint of object “X”) directly to data storage system 100 in order to test whether the element is present in Bloom filter array(s) 131. For each element received from computer system 110 for testing, data storage system 100 processes the element with k distinct hash functions to generate k bit positions in Bloom filter array(s) 131 and reads the k bit positions from Bloom filter array(s) 131 (e.g., using read circuitry in storage medium I/O 128). In some embodiments, data storage system 100 returns a first result in accordance with a determination that all the k bit positions are set (e.g., indicating that the element is present in Bloom filter array(s) 131 with high probability) or returns a second result in accordance with a determination that at least a predetermined number (e.g., one or more) of the k bit positions in the Bloom filter are not set (e.g., indicating that the element is not present in Bloom filter array(s) 131). Thus, only a single host command (e.g., “Test Element”) is needed to test for an element's presence in Bloom filter array(s) 131, reducing data transfers between computer system 110 and memory controller 120.

In some implementations, computer system 110 resets Bloom filter array(s) 131 with a single host command (e.g., “Reset Filter”). Data storage system 100 responds to a reset command by resetting Bloom filter array(s) 131 to an empty state. In some embodiments, Bloom filter array(s) 131 is cleared by resetting the array to all zeros. In some embodiments, Bloom filter array(s) 131 is cleared by resetting the array to all ones. As explained above, with respect to flash memory devices, memory cells that have been reset are typically said to represent ones.

When Bloom filter functionality is integrated in data storage system 100, data transfers between storage medium I/O 128 and storage medium 130 can also be reduced. In some embodiments, storage medium 130 is implemented using NAND flash memory. NAND flash memory devices have on-chip logical function capabilities with the ability to do simple bit-wise operations (e.g., AND, OR, INVERT, and XOR). Bloom filters require the ability to test and set single bits at a time. By using the NAND flash memory device's integrated logical function registers, these calculations are offloaded from the drive's processor(s) (e.g., CPUs 122), allowing for higher performance.

Flash memory devices utilize memory cells to store data as electrical values, such as electrical charges or voltages. Each flash memory cell typically includes a single transistor with a floating gate that is used to store a charge, which modifies the threshold voltage of the transistor (i.e., the voltage needed to turn the transistor on). The magnitude of the charge, and the corresponding threshold voltage the charge creates, is used to represent one or more data values. In some implementations, during a read operation, a reading threshold voltage is applied to the control gate of the transistor and the resulting sensed current or voltage is mapped to a data value.

The terms “cell voltage” and “memory cell voltage,” in the context of flash memory cells, means the threshold voltage of the memory cell, which is the minimum voltage that needs to be applied to the gate of the memory cell's transistor in order for the transistor to conduct current. Similarly, reading threshold voltages (sometimes also called reading signals and reading voltages) applied to a flash memory cells are gate voltages applied to the gates of the flash memory cells to determine whether the memory cells conduct current at that gate voltage. In some implementations, when a flash memory cell's transistor conducts current at a given reading threshold voltage, indicating that the cell voltage is less than the reading threshold voltage, the raw data value for that read operation is a “1,” and otherwise the raw data value is a “0.”

FIG. 2 is a block diagram illustrating an exemplary management module 121, in accordance with some embodiments. Management module 121 typically includes one or more processing units (CPUs) 122 for executing modules, programs and/or instructions stored in memory 206 and thereby performing processing operations, memory 206, and one or more communication buses 208 for interconnecting these components. Communication buses 208 optionally include circuitry (sometimes called a chipset) that interconnects and controls communications between system components. Management module 121 is coupled to host interface 129, additional module(s) 125, and storage medium I/O 128 by communication buses 208. Memory 206 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices, and may include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. Memory 206 optionally includes one or more storage devices remotely located from the CPU(s) 122. Memory 206, or alternately the non-volatile memory device(s) within memory 206, comprises a non-transitory computer readable storage medium. In some embodiments, memory 206, or the computer readable storage medium of memory 206 stores the following programs, modules, and data structures, or a subset thereof:

    • a hash function generation module 216 that is used for processing an element with k distinct hash functions to generate k bit positions in a Bloom filter (e.g., Bloom filter array(s) 131, FIG. 1);
    • an add element module 218 that is used for adding elements to the Bloom filter;
    • a test element module 224 that is used for testing whether an element is present in the Bloom filter;
    • a delete element module 232 that is used for deleting an element from the Bloom filter;
    • a reset module 238 that is used for resetting the Bloom filter to an empty state; and
    • a fingerprint module 240 that is used for generating an n-bit fingerprint of an object to be added to the Bloom filter, where n is at least 64.

In some embodiments, the add element module 218 optionally includes the following modules or sub-modules, or a subset thereof:

    • an add element processing module 220 that is used for processing the element to be added with k distinct hash functions to generate k bit positions in a Bloom filter and/or communicating with hash function generation module 216 to obtain the k bit positions; and
    • a bit setting module 222 that is used for setting the k bit positions in the Bloom filter.

In some embodiments, the test element module 224 optionally includes the following modules or sub-modules, or a subset thereof:

    • a test element processing module 226 that is used for processing the element to be tested with k distinct hash functions to generate k bit positions in a Bloom filter and/or communicating with hash function generation module 216 to obtain the k bit positions;
    • a bit reading module 228 that is used for reading the k bit positions from the Bloom filter; and
    • a test result module 230 that is used for returning a first result if all the k bit positions in the Bloom filter are set and returning a second result if at least a predetermined number (e.g., one or more) of the k bit positions in the Bloom filter are not set.

In some embodiments, the delete element module 232 optionally includes the following modules or sub-modules, or a subset thereof:

    • a delete element processing module 234 that is used for processing the element to be deleted with k distinct hash functions to generate k bit positions in a Bloom filter and/or communicating with hash function generation module 216 to obtain the k bit positions; and
    • a bit resetting module 236 that is used for resetting the k bit positions in the Bloom filter.

Each of the above identified elements may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above. The above identified modules or programs (i.e., sets of instructions) need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise re-arranged in various embodiments. In some embodiments, memory 206 may store a subset of the modules and data structures identified above. Furthermore, memory 206 may store additional modules and data structures not described above. In some embodiments, the programs, modules, and data structures stored in memory 206, or the computer readable storage medium of memory 206, provide instructions for implementing any of the methods described below with reference to FIGS. 4A-4B.

Although FIG. 2 shows a management module 121, FIG. 2 is intended more as functional description of the various features which may be present in a management module than as a structural schematic of the embodiments described herein. In practice, and as recognized by those of ordinary skill in the art, items shown separately could be combined and some items could be separated.

As discussed below with reference to FIG. 3, a single-level flash memory cell (SLC) stores one bit (“0” or “1”). Thus, the storage density of a SLC memory device is one bit of information per memory cell. A multi-level flash memory cell (MLC), however, can store two or more bits of information per cell by using different ranges within the total voltage range of the memory cell to represent a multi-bit bit-tuple. In turn, the storage density of a MLC memory device is multiple-bits per cell (e.g., two bits per memory cell).

FIG. 3 is a simplified, prophetic diagram of voltage distributions 300 found in a single-level flash memory cell (SLC) over time, in accordance with some embodiments. The voltage distributions 300 shown in FIG. 3 have been simplified for illustrative purposes. In this example, the SLC's voltage range extends approximately from a voltage, VSS, at a source terminal of an NMOS transistor to a voltage, VDD, at a drain terminal of the NMOS transistor. As such, voltage distributions 300 extend between VSS and VDD.

Sequential voltage ranges 301 and 302 between source voltage VSS and drain voltage VDD are used to represent corresponding bit values “1” and “0,” respectively. Each voltage range 301, 302 has a respective center voltage V1 301b, V0 302b. As described below, in many circumstances the memory cell current sensed in response to an applied reading threshold voltages is indicative of a memory cell voltage different from the respective center voltage V1 301b or V0 302b corresponding to the respective bit value written into the memory cell. Errors in cell voltage, and/or the cell voltage sensed when reading the memory cell, can occur during write operations, read operations, or due to “drift” of the cell voltage between the time data is written to the memory cell and the time a read operation is performed to read the data stored in the memory cell. For ease of discussion, these effects are collectively described as “cell voltage drift.” Each voltage range 301, 302 also has a respective voltage distribution 301a, 302a that may occur as a result of any number of a combination of error-inducing factors, examples of which are identified above.

In some implementations, a reading threshold voltage VR is applied between adjacent center voltages (e.g., applied proximate to the halfway region between adjacent center voltages V1 301b and V0 302b). Optionally, in some implementations, the reading threshold voltage is located between voltage ranges 301 and 302. In some implementations, reading threshold voltage VR is applied in the region proximate to where the voltage distributions 301a and 302a overlap, which is not necessarily proximate to the halfway region between adjacent center voltages V1 301b and V0 302b.

As explained above, a SLC memory device stores one bit of information (“0” or “1”) per memory cell. In some embodiments, a Bloom filter is implemented in a SLC memory device, and uses a single-level flash memory cell for each bit of the N-bit array of the Bloom filter. In some embodiments (e.g., using some types of flash memory), the Bloom filter is initially cleared by resetting each bit of the N-bit array to “1” and elements are added to the Bloom filter by setting the corresponding k bits generated from the k hash functions to “0.” In some embodiments, the Bloom filter is initially cleared by resetting each bit of the N-bit array to “0” and elements are added to the Bloom filter by setting the corresponding k bits generated from the k hash functions to “1.”

In order to increase storage density in flash memory, flash memory has developed from single-level (SLC) cell flash memory to multi-level cell (MLC) flash memory so that two or more bits can be stored by each memory cell. A MLC flash memory device is used to store multiple bits by using voltage ranges within the total voltage range of the memory cell to represent different bit-tuples. A MLC flash memory device is typically more error-prone than a SLC flash memory device created using the same manufacturing process because the effective voltage difference between the voltages used to store different data values is smaller for a MLC flash memory device. Moreover, due to any number of a combination of factors, such as electrical fluctuations, defects in the storage medium, operating conditions, device history, and/or write-read circuitry, a typical error includes a stored voltage level in a particular MLC being in a voltage range that is adjacent to the voltage range that would otherwise be representative of the correct storage of a particular bit-tuple. The impact of such errors can be reduced by gray-coding the data, such that adjacent voltage ranges represent single-bit changes between bit-tuples.

FIGS. 4A-4B illustrate a flowchart representation of a method 400 for data processing, in accordance with some embodiments. As noted above with respect to FIG. 1, when a host (e.g., computer system 110, FIG. 1, sometimes called a host) adds an element to a Bloom filter (e.g., Bloom filter array(s) 131), only a single host command is needed for each element. To add a plurality of elements to the Bloom filter, the host sends a plurality of requests with respective elements to be added, which initiates performance of method 400.

At least in some implementations, method 400 is performed by a non-volatile data storage system (e.g., data storage system 100, FIG. 1) or one or more components of the non-volatile data storage system (e.g., memory controller 120 and/or storage medium 130, FIG. 1). In some embodiments, method 400 is governed by instructions that are stored in a non-transitory computer readable storage medium and that are executed by one or more processors of a device, such as the one or more processing units (CPUs) 122 of management module 121, shown in FIGS. 1 and 2.

A non-volatile data storage system receives (402) from a host (e.g., computer system 110, FIG. 1) a plurality of requests that specify respective elements. In some implementations, the plurality of requests are requests to add respective elements to a Bloom filter (e.g., Bloom filter array(s) 131). As noted above, in some implementations, only a single host command (e.g., “Add Element”) is needed to add a respective element to the Bloom filter. As an example, if three elements are to be added to the Bloom filter, the host would send three requests, the first request specifying the first element to be added to the Bloom filter, the second request specifying the second element to be added to the Bloom filter, and the third request specifying the third element to be added to the Bloom filter.

In some embodiments, the non-volatile data storage system is (404) distinct from the host. For example, in some implementations, one or more components of the non-volatile data storage system (e.g., memory controller 120 and storage medium 130 of data storage system 100, FIG. 1) are distinct from and coupled to a host (e.g., computer system 110) by one or more connections (e.g., connections 101 and control line 111, FIG. 1).

In some embodiments, the non-volatile data storage system is (406) embedded in the host. In some implementations, one or more components of the non-volatile data storage system (e.g., memory controller 120 and storage medium 130 of data storage system 100, FIG. 1) are included in a device as components thereof. Furthermore, in some implementations, one or more components of the non-volatile data storage system (e.g., memory controller 120 and storage medium 130 of data storage system 100, FIG. 1) are embedded in a host device, such as a mobile device, tablet, other computer or computer controlled device, and the methods described herein are performed by the embedded data storage system.

In some embodiments, the respective elements specified (408) by the plurality of requests comprise a plurality of objects. In some implementations, for example, an object is a file (e.g., a 1 MB file). In some implementations, for example in data deduplication applications, an object is an email attachment in a forwarded email message. In some implementations, an object is mapped into an n-bit fingerprint by the non-volatile data storage system (e.g., data storage system 100, FIG. 1) before being processed for insertion in the Bloom filter. In some implementations, a fingerprint module (e.g., fingerprint module 240, FIG. 2) is used to generate an n-bit fingerprint of an object to be added to the Bloom filter, where n is at least 64, as described above with respect to FIG. 2.

In some embodiments, the respective elements specified (410) by the plurality of requests comprise n-bit fingerprints of a plurality of objects, where n is at least 64. In some implementations, an object is mapped into an n-bit number by a host (e.g., computer system 110, FIG. 1, sometimes called a host). In some implementations, for example, a 64-bit hash function is used to map data sets of variable length (e.g., a file or an email attachment) to data sets of a fixed length (e.g., 64 bits).

Next, for each respective element specified (412) by the received request, the non-volatile data storage system generates (414) a respective set of k bit positions (sometimes called a respective group of k bit positions) in a Bloom filter, using k distinct hash functions, where k is an integer greater than 2. As an example, if k is equal to 16, for a respective element specified in the received request, the non-volatile data storage system uses 16 distinct hash functions to generate a respective set of 16 bit positions in the Bloom filter. In some implementations, the respective set of k bit positions in the Bloom filter is generated in firmware (e.g., in management module 121, FIGS. 1 and 2). In some implementations, the respective set of k bit positions in the Bloom filter is generated in hardware (e.g., a hardware hash engine). In some implementations, the respective set of k bit positions in the Bloom filter is generated by a hash function generation module (e.g., hash function generation module 216, FIG. 2) and/or an add element processing module (e.g., add element processing module 220, FIG. 2), as described above with respect to FIG. 2.

In some embodiments, the non-volatile data storage system generates (416) the respective set of k bit positions in the Bloom filter using one or more processors of the non-volatile data storage system (e.g., CPUs 122, FIG. 1).

In some embodiments, the non-volatile data storage system generates (418) the respective set of k bit positions in the Bloom filter using k parallel processors of the non-volatile data storage system. In some other embodiments, the non-volatile data storage system generates the respective set of k bit positions in the Bloom filter using at least k/2 parallel processors of the non-volatile data storage system, while in yet other embodiments, the non-volatile data storage system generates the respective set of k bit positions in the Bloom filter using at least k/4 parallel processors of the non-volatile data storage system. In some implementations, the aforementioned one or more processors of the non-volatile data storage system (e.g., CPUs 122, FIG. 1) comprise parallel processors, and the respective set of k bit positions in the Bloom filter is generated using the parallel processors.

Furthermore, for each respective element specified (412) by the received request, the non-volatile data storage system sets (420) the respective set of k bit positions in the Bloom filter (e.g., Bloom filter array(s) 131, FIG. 1), wherein the Bloom filter is stored in a non-volatile storage medium (e.g., storage medium 130, FIG. 1) of the non-volatile data storage system.

In some embodiments, the non-volatile storage medium comprises (422) one or more flash memory devices. In some implementations, the non-volatile storage medium (e.g., storage medium 130, FIG. 1) is a single flash memory device, while in other implementations, the non-volatile storage medium includes a plurality of flash memory devices. In some implementations, the non-volatile storage medium (e.g., storage medium 130, FIG. 1) is NAND-type flash memory or NOR-type flash memory.

Optionally, the non-volatile data storage system receives (424) a first element for testing with respect to the Bloom filter. In some embodiments, a non-volatile data storage system (e.g., data storage system 100, FIG. 1) receives a first element for testing with respect to the Bloom filter (e.g., Bloom filter array(s) 131, FIG. 1) from a host (e.g., computer system 110, FIG. 1, sometimes called a host). As noted above, in some implementations, only a single host command (e.g., “Test Element”) is needed to test whether the element is present in the Bloom filter.

Next, the non-volatile data storage system tests (426) whether the first element is present in the Bloom filter by processing (428) the first element with the k distinct hash functions to generate a first set of k bit positions (sometimes called a first group of k bit positions). As an example, if k is equal to 16, the non-volatile data storage system (e.g., data storage system 100, FIG. 1) processes the first element with 16 distinct hash functions to generate a first set of 16 bit positions in the Bloom filter (e.g., Bloom filter array(s) 131, FIG. 1). In some implementations, the respective set of k bit positions in the Bloom filter is generated in firmware (e.g., in management module 121, FIGS. 1 and 2). In some implementations, the respective set of k bit positions in the Bloom filter is generated in hardware (e.g., a hardware hash engine). In some implementations, the respective set of k bit positions in the Bloom filter is generated by a hash function generation module (e.g., hash function generation module 216, FIG. 2) and/or a test element processing module (e.g., test element processing module 226, FIG. 2), as described above with respect to FIG. 2.

In some embodiments, the non-volatile data storage system generates the respective set of k bit positions in the Bloom filter using one or more processors of the non-volatile data storage system (e.g., CPUs 122, FIG. 1).

In some embodiments, the non-volatile data storage system generates the respective set of k bit positions in the Bloom filter using k parallel processors (or, alternatively, at least k/2 parallel processors, or at least k/4 parallel processors, as discussed above) of the non-volatile data storage system. In some implementations, the one or more processors of the non-volatile data storage system (e.g., CPUs 122, FIG. 1) comprise parallel processors, and the respective set of k bit positions in the Bloom filter is generated using the parallel processors.

The non-volatile data storage system further tests (426) whether the first element is present in the Bloom filter by reading (430) the first set of k bit positions from the Bloom filter. Using the example above where k is equal to 16, the non-volatile data storage system reads the set of 16 bit positions from the Bloom filter (e.g., Bloom filter array(s) 131, FIG. 1). In some implementations, the k bit positions are read from the Bloom filter using a bit reading module (e.g., bit reading module 228, FIG. 2), as described above with respect to FIG. 2.

Testing (426) whether the first element is present in the Bloom filter further includes returning (432) a first result in accordance with a determination that all the k bit positions in the Bloom filter from the first set are set. In some implementations, using the example above where k is equal to 16, the non-volatile data storage system (e.g., data storage system 100, FIG. 1) returns a first result in accordance with a determination that all 16 bit positions in the Bloom filter (e.g., Bloom filter array(s) 131, FIG. 1) from the first set are set, indicating that the first element is present in the Bloom filter with high probability. In some implementations, the first result is returned (e.g., in accordance with a determination that all the k bit positions in the Bloom filter from the first set are set) using a test result module (e.g., test result module 230, FIG. 2), as described above with respect to FIG. 2.

Further, testing (426) whether the first element is present in the Bloom filter includes returning (434) a second result in accordance with a determination that one or more of the k bit positions in the Bloom filter from the first set are not set. In some implementations, the non-volatile data storage system (e.g., data storage system 100, FIG. 1) returns a second result in accordance with a determination that one or more of the k bit positions in the Bloom filter (e.g., Bloom filter array(s) 131, FIG. 1) from the first set are not set, indicating that the first element is not present in the Bloom filter. In some implementations, the second result is returned (e.g., in accordance with a determination that one or more of the k bit positions in the Bloom filter from the first set are not set) using a test result module (e.g., test result module 230, FIG. 2), as described above with respect to FIG. 2.

In some implementations, with respect to any of the methods described above, the storage medium (e.g., storage medium 130, FIG. 1) is a single flash memory device, while in other implementations, the storage medium (e.g., storage medium 130, FIG. 1) includes a plurality of flash memory devices.

In some implementations, with respect to any of the methods described above, a data storage system includes a non-volatile storage medium (e.g., storage medium 130, FIG. 1), one or more processors (e.g., CPUs 122, FIGS. 1 and 2) and memory (e.g., memory 206, FIG. 2) storing one or more programs configured for execution by the one or more processors and configured to perform or control performance of any of the methods described above.

It will be understood that, although the terms “first,” “second,” etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first contact could be termed a second contact, and, similarly, a second contact could be termed a first contact, which changing the meaning of the description, so long as all occurrences of the “first contact” are renamed consistently and all occurrences of the second contact are renamed consistently. The first contact and the second contact are both contacts, but they are not the same contact.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the claims. As used in the description of the embodiments and the appended claims, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

As used herein, the term “if” may be construed to mean “when” or “upon” or “in response to determining” or “in accordance with a determination” or “in response to detecting,” that a stated condition precedent is true, depending on the context. Similarly, the phrase “if it is determined [that a stated condition precedent is true]” or “if [a stated condition precedent is true]” or “when [a stated condition precedent is true]” may be construed to mean “upon determining” or “in response to determining” or “in accordance with a determination” or “upon detecting” or “in response to detecting” that the stated condition precedent is true, depending on the context.

The foregoing description, for purpose of explanation, has been described with reference to specific implementations. However, the illustrative discussions above are not intended to be exhaustive or to limit the claims to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The implementations were chosen and described in order to best explain principles of operation and practical applications, to thereby enable others skilled in the art.

Claims

What is claimed is:

1. A data processing method, comprising:

at a memory controller in a non-volatile data storage system:

receiving from a computer system, external to the non-volatile data storage system, a plurality of requests that specify respective elements to be stored in the non-volatile data storage system;

for each respective element received from the computer system specified by the received requests:

generating a respective set of k bit positions in a Bloom filter, using k distinct hash functions, where k is an integer greater than 2; and

setting the respective set of k bit positions in the Bloom filter, wherein the Bloom filter is stored in a non-volatile storage medium of the non-volatile data storage system;

receiving from the computer system a first element for testing with respect to the Bloom filter; and

testing whether the first element is present in the Bloom filter, by:

processing the first element with the k distinct hash functions to generate a first set of k bit positions;

reading the first set of k bit positions from the Bloom filter;

returning a first result in accordance with a determination that all the k bit positions in the Bloom filter from the first set are set; and

returning a second result in accordance with a determination that one or more of the k bit positions in the Bloom filter from the first set are not set.

2. The method of claim 1, including generating the respective set of k bit positions in the Bloom filter using one or more processors of the non-volatile data storage system.

3. The method of claim 1, including generating the respective set of k bit positions in the Bloom filter using k parallel processors of the non-volatile data storage system.

4. The method of claim 1, wherein the non-volatile storage medium comprises one or more flash memory devices.

5. The method of claim 1, wherein the respective elements specified by the plurality of requests comprise a plurality of objects.

6. The method of claim 1, wherein the respective elements specified by the plurality of requests comprise n-bit fingerprints of a plurality of objects, where n is at least 64.

7. A non-volatile data storage system, comprising:

a non-volatile storage medium storing a Bloom filter;

one or more processors; and

memory storing one or more programs, which when executed by the one or more processors cause a memory controller in the non-volatile data storage system to:

receive from a computer system, external to the non-volatile data storage system, a plurality of requests that specify respective elements to be stored in the non-volatile data storage system;

for each respective element received from the computer system specified by the received requests:

generate a respective set of k bit positions in the Bloom filter, using k distinct hash functions, where k is an integer greater than 2; and

set the respective set of k bit positions in the Bloom filter;

receive from the computer system a first element for testing with respect to the Bloom filter; and

test whether the first element is present in the Bloom filter, by:

processing the first element with the k distinct hash functions to generate a first set of k bit positions;

reading the first set of k bit positions from the Bloom filter;

returning a first result in accordance with a determination that all the k bit positions in the Bloom filter from the first set are set; and

returning a second result in accordance with a determination that one or more of the k bit positions in the Bloom filter from the first set are not set.

8. The system of claim 7, wherein the respective set of k bit positions in the Bloom filter is generated using the one or more processors of the non-volatile data storage system.

9. The system of claim 7, wherein the one or more processors of the non-volatile data storage system comprise k parallel processors, and the respective set of k bit positions in the Bloom filter is generated using the k parallel processors.

10. The system of claim 7, wherein the non-volatile storage medium comprises one or more flash memory devices.

11. The system of claim 7, wherein the respective elements specified by the plurality of requests comprise a plurality of objects.

12. The system of claim 7, wherein the respective elements specified by the plurality of requests comprise n-bit fingerprints of a plurality of objects, where n is at least 64.

13. A non-transitory computer readable storage medium storing one or more programs configured for execution by one or more processors of a non-volatile data storage system, the one or more programs comprising instructions for causing a memory controller in the non-volatile data storage system to:

receive from a computer system, external to the non-volatile data storage system, a plurality of requests that specify respective elements to be stored in the non-volatile data storage system;

for each respective element received from the computer system specified by the received requests:

generate a respective set of k bit positions in the Bloom filter, using k distinct hash functions, where k is an integer greater than 2; and

set the respective set of k bit positions in the Bloom filter, wherein the Bloom filter is stored in a non-volatile storage medium of the non-volatile data storage system;

receive from the computer system a first element for testing with respect to the Bloom filter; and

test whether the first element is present in the Bloom filter, by:

processing the first element with the k distinct hash functions to generate a first set of k bit positions;

reading the first set of k bit positions from the Bloom filter;

returning a first result in accordance with a determination that all the k bit positions in the Bloom filter from the first set are set; and

returning a second result in accordance with a determination that one or more of the k bit positions in the Bloom filter from the first set are not set.

14. The non-transitory computer readable storage medium of claim 13, wherein the respective set of k bit positions in the Bloom filter is generated using the one or more processors of the non-volatile data storage system.

15. The non-transitory computer readable storage medium of claim 13, wherein the one or more processors of the non-volatile data storage system comprise k parallel processors, and the respective set of k bit positions in the Bloom filter is generated using the k parallel processors.

16. The non-transitory computer readable storage medium of claim 13, wherein the non-volatile storage medium comprises one or more flash memory devices.

17. The non-transitory computer readable storage medium of claim 13, wherein the respective elements specified by the plurality of requests comprise a plurality of objects.

18. The non-transitory computer readable storage medium of claim 13, wherein the respective elements specified by the plurality of requests comprise n-bit fingerprints of a plurality of objects, where n is at least 64.

Resources

Images & Drawings included:

Sources:

Recent applications in this class:

Recent applications for this Assignee: