-
2016-05-03
10/826,798
2004-04-15
US 9,330,060 B1
2016-05-03
-
-
David N Werner
2026-10-19
A method and device for encoding and decoding video image data. An MPEG decoding and encoding process using data flow pipeline architecture implemented using complete dedicated logic is provided. A plurality of fixed-function data processors are interconnected with at least one pipelined data transmission line. At least one of the fixed-function processors performs a predefined encoding/decoding function upon receiving a set of predefined data from said transmission line. Stages of pipeline are synchronized on data without requiring a central traffic controller. This architecture provides better performance in smaller size, lower power consumption and better usage of memory bandwidth.
Get notified when new applications in this technology area are published.
G06F15/82 » CPC main
Digital computers in general ; Data processing equipment in general; Architectures of general purpose stored program computers data or demand driven
H04N19/43 » CPC further
Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation Hardware specially adapted for motion estimation or compensation
H04N19/436 » CPC further
Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
H04N19/439 » CPC further
Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using cascaded computational arrangements for performing a single operation, e.g. filtering
H04N19/42 IPC
Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
This application claims priority to the provisional patent application, Ser. No. 60/463,017, entitled โData Flow Pipeline Architecture for MPEG Video Codec,โ with filing date Apr. 15, 2003, and assigned to the assignee of the present application.
The present invention relates to the field of digital video image processing. More particularly, embodiments of the present invention relate to methods and devices for encoding and decoding video image data without requiring a separate digital signal processor (DSP) or an embedded processor to perform the main data-stream management.
The conventional art of designing and configuring a Motion Pictures Experts Group (MPEG) encoding and decoding system is confronted with several technical limitations and difficulties. Particularly, the task of processing a video image data based on the MPEG video standard involves many complex algorithms and requires several processing stages. Each of these algorithms consists of many computationally intensive tasks, executing all the complex encoding and decoding procedures in real time. For the purpose of generating real time video images, conventional methods of configuring a MPEG system generally require a very high performance solution. A conventional configuration usually requires a digital signal processor (DSP) or embedded processor to handle mainstream processes and may also require additional hardware assist logic circuits.
However, the conventional configurations create several technical challenges and difficulties. Implementation of a conventional configuration first requires the selection of an appropriate high performance DSP platform to support the high processing demand thus causing an increase in the production costs of such system. The processor selected based on this DSP platform then extracts and executes software programs stored in the memory that causes the size and power consumptions to increase and also degrades the processing bandwidth due to the data transfer operations between the memory and processor. The handling and control of data transfer sequencing and synchronization further adds to the overhead of DSP overhead that further slow down the MPEG encode/decode operations.
Even though current digital video encoding and compression techniques are able to take advantage of redundancies inherent in natural imagery to dramatically improve the efficiency in video image data storage and processing and to allow for faster transmission of images, there are still needs to lower the power consumption, to increase the processing speed and to achieve more compact video storage. Particularly, this is a challenging task as the decoding of the MPEG compressed video data involves five basic operations: 1) bit stream parser and variable decoder; 2) inverse scan and run-level code decoder; 3) de-quantization and inverse discrete cosine transform function (IDCT); 4) motion compensation; and 5) YUV to RGB color conversion.
For example, FIG. 1 shows a functional block diagram of a conventional MPEG video image display system, in accordance with the prior art. In particular, DSP/RISC 110 controls, manages and co-processes the fixed functions necessary for the image data processing, such as discrete cosine transform function (DCT), motion estimation (ME) and motion compensation), and any components of codec functions. These functions include the five operations described above. A quantitative estimate of the complexity of the general MPEG video real-time decoding process in terms of the number of required instruction cycles per second reveals that, for a typical general-purpose RISC processor, all of the resources of the microprocessor are exhausted by, for example, the color conversion operation alone. Real-time decoding refers to decoding at the rate at which the video signals were originally recorded (e.g., 30 frames per second). An exemplary digital television signal generates about 10.4 million picture elements (pixels) per second. Since each pixel has three independent color components (primary colors: red, green and blue), the total data element rate is more than 30 million per second, which is of the same order of magnitude as current CPU clock speeds. Thus, even at the highest current CPU clock speed of 200 MHz, there are only 20 clock cycles available for processing each pixel, and less than 7 clocks per color component.
Furthermore, to convert the video signals of a digital television signal from YUV format to RGB format in real time, for example, using even the fastest conventional microprocessors requires approximately 200 million instruction cycles per second (nearly all of the data processing bandwidth of such a microprocessor). Depending on the type of processor used and several other factors such as bit rate, average symbol rate, etc., implementing each of the IDCT function and motion compensation in real time may require, for example, anywhere from approximately 90 million operations per second (MOPS) to 200 MOPS for full resolution images. Existing general-purpose microprocessors are extremely inefficient in handling real-time decompression of full-size, digital motion video signals compressed according to MPEG standards. Typically, additional hardware is needed for such real-time decompression, which adds to system complexity and cost.
The requirement for performing these tasks using a processor that involves the execution of software programs increase the costs, power consumption, and size of the system and further degrades the bandwidth and speed of video image data processing. For these reasons, there is a need for a more efficient implementation of real-time decompression of digital motion video compressed according to MPEG standards such that the difficulties and limitations of the conventional techniques can be resolved.
Various embodiments of the present invention provide a device configuration and method for carrying out video image data encoding/decoding function implemented with pipelined, data-driven functional blocks to eliminate the requirement of using a digital signal processor (DSP) as a central processor to overcome the above-mentioned prior art difficulties and limitations. In one embodiment, the functional blocks may be fixed functions.
In one embodiment, the present invention provides an MPEG-4 video image data encoding/decoding device including fixed-function processors connected in a pipelined configuration. A fixed function processor carries out a predefined encoding/decoding function upon receiving a set of predefined data in a data driven manner such that a central processor is not required. By configuring the fixed function processors in a pipelined architecture, a high degree of parallel processing capability can be achieved. The pipelined functional blocks can be configured such that the functional blocks are highly portable and can be conveniently maintained, are easily scalable and can be implemented in different encoding/decoding devices. As the configuration and operations are significantly simplified, the encoding-decoding device can achieve low power consumption and the functional block that are not used can be powered down in an idle state until it is activated again when data is received. Since each functional block may be a dedicated processor, the memory size can be optimal designed to minimize resource waste arising from the storage of a large amount of data required for performing multiple logic functions.
In one embodiment, the present invention provides a video image data encoding/decoding device including a plurality of fixed-function data processors interconnected with at least one pipelined data transmission line. The fixed-function processors perform predefined encoding/decoding functions upon receiving a set of predefined data from the transmission line. In one embodiment, the plurality of fixed function data processors may include a data buffer queue for receiving a set of predefined data from the transmission line. In another embodiment, the plurality of fixed function data processors may include a control queue for initiating a performance of the predefined encoding/decoding function upon receiving a set of predefined data from the transmission line.
The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention:
FIG. 1 shows a functional block diagram of a conventional MPEG video image display system, in accordance with the prior art.
FIG. 2 shows a functional block diagram of a data-driven MPEG encode-decode system, in accordance with an embodiment of the present invention.
FIG. 3 shows a functional block diagram of a data-driven MPEG decoder architecture, in accordance with an embodiment of the present invention.
FIG. 4 shows a functional block diagram of a data-driven MPEG encoder architecture, in accordance with an embodiment of the present invention.
FIG. 5 is a functional block diagram for showing a pipelined fixed function, in accordance with an embodiment of the present invention.
FIG. 6 is a functional block diagram showing a general pipelined architecture of a system implemented with pipelined fixed functions, in accordance with an embodiment of the present invention.
FIG. 2 shows an exemplary functional block diagram of a data-driven MPEG encode/decode system 200, in accordance with an embodiment of the present invention. In one embodiment, MPEG encode/decode system 200 is a data-driven pipelined configuration of MPEG-4 decoder architecture. MPEG encode/decode system 200 includes bit stream fixed function logic 210, IDCT fixed function logic 220, motion estimation/compensation fixed function logic 230, and post process fixed function logic 240.
FIG. 2 shows the pipelined data flow where the data transfers from one functional block, e.g., from bit stream fixed function logic 210 to inverse discrete cosine transform (IDCT) fixed function logic 220 and motion estimation/compensation fixed function logic 230 and then to post process fixed function logic 240, are automatically controlled based on a data driven process. The configuration is significantly simplified because there is no need to employ a high performance digital signal processor (DSP) to control and coordinate the overall data flow. Fixed function logic of a data-driven MPEG encode/decode system 200 includes dedicated logic that operates independently of other fixed functions. Fixed function logic carries out a predefined encoding/decoding function upon receiving a set of predefined data in a data driven manner such that a central processor is not required.
It should be appreciated that the dedicated logic processors can also be implemented with higher performance without requiring a high cost implementation because of the simplified configuration that does not require synchronizations and complicated check and branch operations. The speed of carrying out the encoding/decoding function is improved because each of the pipelined functional blocks can each perform the assigned dedicated function simultaneously. These significant benefits are achieved because there are less requirements for wasting the overhead resources arising from the tracking and synchronizations of the data flows among many processors that are required in the encoding/decoding devices implemented with conventional configurations.
FIG. 3 shows a functional block diagram of data-driven MPEG decoder architecture 300, in accordance with an embodiment of the present invention. In one embodiment, data-driven MPEG decoder architecture 300 is operable to decode MPEG-4 video image data. In one embodiment, the pipelined configuration is constructed by employing blocks to carry out a fixed logic that represent a pipeline stage. The smaller functional block inside each larger functional block represents the fixed function (e.g., pipelined stages implemented inside this larger functional block).
The MPEG encoded data is stored in external memory 310. In one embodiment, bit stream decoder stage 320 accesses the MPEG encoded data. The data bits are fetched from external memory 310 to the bit stream decoder function 322, where the MPEG data bits are decoded. In one embodiment, a data bus allows communication as required among the stages and functions of data-driven MPEG decoder architecture 300 and external memory 310. As understood by those skilled in the art, a โbusโ may comprise a shared set of wires or electrical signal paths to which other elements connect. However, as also understood by those skilled in the art, required communication paths may also be provided by other structures, such as individual point-to-point connections from each element to a switch, dedicated connections for each for each pair of elements that communicate with each other, or any combination of dedicated and shared paths. Therefore, it should be appreciated that the term โbusโ refers to any structure that provides the communication paths required by the methods and device described below.
The decoded MPEG data bits are pushed to two other stages: preprocess stage 330 and motion compensation stage 350, for further computation. In one embodiment, preprocess 330 stage comprises of five functions: DC scalar calculation function 332 for determining the discrete transform value; predict direction function 334 for determining the prediction direction; AC/DC prediction function 336 for calculating the predicted AC and DC values; de-quantization function 338 for reversing the quantization and calculating the result value; and Run Level Coding (RLC) and Inverse Scan (I-Scan) function 340 for decoding the RLC and reversing the scan process to lay out the correct order of values.
From preprocess stage 330, a decoded block matrix is pushed to inverse discrete cosine transform (IDCT) stage 360. IDCT stage 360 performs the IDCT function 362 of transforming the matrix from the frequency domain into the time domain. The decoded block matrix elements represent the correct color space values.
While bit stream decoder stage 320 sends data to preprocess stage 330, the decoded bit stream is also sent as motion vectors to motion compensation stage 350. At motion compensation function 352 of motion compensation stage 350, the previous frame data is retrieved from external memory 310 and processed into a block matrix for the next stage copy and transfer.
The final pipelined stage may be copy and transfer stage 370 that is implemented to receive the block matrices sent from motion compensation stage 350 and IDCT stage 360. At copy and retire function 372, the block matrices are combined if necessary, and the final decoded picture is written back to external memory 310 to complete the data flow that drives the functions implemented as pipelined stages to carry out the functions sequentially.
FIG. 4 shows a functional block diagram of a data-driven MPEG encoder architecture 400, in accordance with an embodiment of the present invention. In one embodiment, data-driven MPEG decoder architecture 400 is operable to encode MPEG-4 video image data. In one embodiment, the pipelined configuration is constructed by employing blocks to carry out a fixed logic that represent a pipeline stage. The smaller functional block inside each larger functional block represents the fixed function (e.g., pipelined stages implemented inside this larger functional block).
The data of the original picture is stored in external memory 410. In one embodiment, motion estimation stage 420 accesses the original picture data. Motion estimation function 422 is operable to retrieve the picture data, search for the optimal block matrix, and send the optimal block matrix to discrete cosine transform (DCT) stage 430. In one embodiment, motion estimation function is also operable to transmit the motion form vector of the picture data to bit stream encoding stage 440. Also, the pipelined process transfers the decoder motion compensation data to DCT stage 430 and to copy and retire stage 490 for decoded picture reconstruction.
DCT function 430 of DCT stage 432 is implemented to transform the matrix from time domain to frequency domain upon receiving the data from motion estimation stage 420. The result is transmitted to quantization stage 450. Quantization function 452 of quantization stage 450 is operable to calculate and quantize the values of the received data. The quantized data is then forwarded to inverse preprocess stage 460 and to de-quantization stage 470. De-quantization function 472, IDCT function 482, and copy and retire function 492 operate in a similar manner as de-quantization function 338, IDCT function 362, and copy and retire function 372 of FIG. 3, respectively, for a MPEG-4 decoder. The re-constructed picture may be saved back to external memory 410 for future use to complete the data flow and the data driven pipelined stages to perform the encoder functions in a sequential pipelined fashion.
Inverse preprocess stage 460 includes AC/DC prediction function 464 and RLC and scan stage 462. The quantized block matrix from quantization stage 450 is combined with AC/DC predictions and scanned to find all the RLC. The RLC is then pushed to bit stream encoding stage 440. Bit stream encoding stage 440 gathers all the information about the picture including RLC and motion vectors from inverse preprocess stage 460 and motion estimation stage 420. Bit stream encoder function 442 is performed to encode the final bit stream of MPEG and store back to external memory 410 to complete the data flow. Bit stream encoding stage 440 is also implemented with bit rate control function 444 to prepare the compression ratio of next frame of the video.
FIG. 5 is a functional block diagram for showing a pipelined fixed function 500, in accordance with an embodiment of the present invention. In one embodiment, the stage blocks as shown in FIGS. 2, 3, and 4 include functional elements shown in pipelined fixed function 500 of FIG. 5. In one embodiment, the function logic is a fixed function circuit (e.g. IDCT function 362 of FIG. 3). Each function requires two inputs from previous stage, control input and data input. A function block includes at least one control queue 510 for queuing control inputs and at least one data buffer queue 520 for queuing data input. When control queue 510 has a command, function logic 530 starts to process the data from data queue 520. At the completion of functional performance, the functional stage stores the result to data queue 520 provided in a next stage. Meanwhile a command is sent to the control queue 510 of a next functional stage to initiate a functional performance designated for the next functional stage.
FIG. 6 is a functional block diagram showing a general pipelined architecture 600 of a system implemented with pipelined fixed functions, in accordance with an embodiment of the present invention. In one embodiment, pipelined architecture 600 is a data driven processing system that is configured by sequentially connecting a plurality of fixed function blocks (e.g., pipelined fixed function 500 of FIG. 5). The configuration as shown can be flexibly and effectively implemented in many different data processing systems to simplify system configuration, to lower the implementation cost and to minimize the complicated control and synchronization problems often encountered in the conventional central control processing systems.
The encoding and decoding systems as detailed in the described embodiments are divided into blocks of pipeline stages. Each block automatically synchronizes passing and buffering of data, and the system is completely data driven. There is no need for a central processor to control the sequence and data. Thus, the streamlined design provides a very efficient and high performance engine.
In one embodiment, pipeline stages are partitioned to follow the logic sequence in an MPEG encoding/decoding process. A stage of the pipeline is programmed to look at the control queue. For example, with reference to FIG. 6, if there is a request for data computation received at control queue 610a, fixed function 600a starts to process the data in data buffer queue 620a. When fixed function 600a finishes the data computation, the data is stored in data buffer queue 620b of fixed function 600b (e.g., the next stage). At the same time, fixed function 600a transmits the sequence information into control queue 610b of fixed function 600b.
In one embodiment, the data buffer queues and control queues as shown in FIGS. 5 and 6 are implemented using a dual buffer scheme (ping-pong buffer), one for current process and one for previous stage to use. The advantage of using dual buffer is to ensure the process can continue without being stalled by next stage.
Various embodiments of the present invention, a device and method for encoding and decoding video image data, are described. In one embodiment, the present invention includes a plurality of fixed-function data processors interconnected with a least one pipelined data transmission line wherein each of the fixed-function processors perform a predefined encoding/decoding function upon receiving a set of predefined data from the transmission line. In one embodiment, the plurality of fixed-function data processors includes a data buffer queue for receiving a set of predefined data from the transmission line. In another embodiment, the plurality of fixed-function data processors includes a control queue for initiating a performance of the predefined encoding/decoding function upon receiving a set of predefined data from the transmission line.
In another embodiment, the present invention provides a method for encoding and/or decoding a video image. The method includes sequentially pipelining a set of data via a data transmission line connected between a plurality of fixed-function data processors for sequentially performing a predefined encoding/decoding function upon receiving the set of data from the data transmission line.
Various embodiments of the invention, a method and device for encoding and decoding video image data, are thus described. While the present invention has been described in particular embodiments, it should be appreciated that the invention should not be construed as limited by such embodiments, but rather construed according to the below claims.
1. A video image data encoding/decoding device comprising:
a plurality of fixed-function data processors interconnected with at least one pipelined data transmission line wherein each of said plurality of fixed-function data processors performs a predefined encoding/decoding function upon receiving a set of predefined data from another of said plurality of fixed-function data processors,
wherein said plurality of fixed-function data processors are synchronized on data without a central controller,
wherein each of said plurality of fixed-function data processors is data driven, and each of said plurality of fixed-function data processors comprises dedicated logic that operates independently from the remaining said plurality of fixed-function data processors, and each of said plurality of fixed-function data processors comprises a first queue to queue a set of predefined data and a second queue to queue a set of predefined control data, said set of predefined data and set of predefined control data are received from a previous fixed-function data processor of said plurality of fixed-function data processors, and each of said plurality of fixed-function data processors is operable to simultaneously store a set of predefined data to a first queue of a subsequent fixed-function data processor of said plurality of fixed-function data processors and to send a set of predefined control data to a second queue of said subsequent fixed-function data processor,
wherein said first queue comprises a ping-gong buffer and said second queue comprises a ping-pong buffer.
2. The video image data encoding/decoding device of claim 1 wherein said first queue is operable for receiving a set of predefined data from said transmission line.
3. The video image data encoding/decoding device of claim 1 wherein said second queue is operable for initiating a performance of said predefined encoding/decoding function upon receiving a set of predefined control data from said transmission line.
4. The video image data encoding/decoding device of claim 1 wherein said first queue is operable for receiving a set of predefined data from said transmission line and said second queue is operable for initiating a performance of said predefined encoding/decoding function upon receiving a set of predefined control data.
5. The video image data encoding/decoding device of claim 1 wherein at least one of said plurality of fixed-function data processors comprises a bit-stream decoder.
6. The video image data encoding/decoding device of claim 5 wherein at least one of said plurality of fixed-function data processors comprises a motion compensation processor.
7. The video image data encoding/decoding device of claim 1 wherein at least one of said plurality of fixed-function data processors comprises a discrete cosine transformation (DCT) logic processor.
8. The video image data encoding/decoding device of claim 1 wherein at least one of said plurality of fixed-function data processors comprises an inverse discrete cosine transformation (IDCT) logic processor.
9. The video image data encoding/decoding device of claim 1 wherein at least one of said plurality of fixed-function data processors comprises a direction prediction processor.
10. The video image data encoding/decoding device of claim 1 wherein at least one of said plurality of fixed-function data processors comprises a de-quantization processor.
11. The video image data encoding/decoding device of claim 1 wherein at least one of said plurality of fixed-function data processors comprises an AC/DC prediction processor.
12. The video image data encoding/decoding device of claim 1 wherein at least one of said plurality of fixed-function data processors comprises a Run Level Coding (RLC) and Inverse Scan (I-Scan) logic processor.
13. The video image data encoding/decoding device of claim 1 wherein at least one of said plurality of fixed-function data processors comprises a copy and retire processor operable to combine data block matrices.
14. The video image data encoding/decoding device of claim 1 further comprising a data bus for transmitting data between said video image data encoding/decoding device and an external memory.
15. The video image data encoding/decoding device of claim 1, wherein at least one of said plurality of fixed-function data processors is powered down.
16. The video image data encoding/decoding device of claim 1, wherein at least one of said plurality of fixed-function data processors comprises a bit stream encoder processor.
17. The video image data encoding/decoding device of claim 1, wherein each of said plurality of fixed-function data processors is operable to automatically synchronize passing and buffering of data.
18. The video image data encoding/decoding device of claim 1, wherein a pipeline architecture comprises said plurality of fixed-function data processors.
19. A method for encoding/decoding video image data, said method comprising:
receiving a first set of predefined image data at a first data driven processor for performing a first predefined encoding/decoding function, wherein said set of predefined image data is queued by a first queue of said first data driven processor and wherein a first set of predefined control data associated with said first set of predefined data is queued by a second queue of said first data driven processor;
performing said first predefined encoding/decoding function via said first data driven processor; and
transmitting via said first data driven processor a second set of predefined image data to at least a second data driven processor for performing a second predefined encoding/decoding function, said second set of predefined image data is queued by a first queue of said second data driven processor, said first data driven processor and said second data driven processor are synchronized on data without a central controller, and said first data driven processor is operable to simultaneously perform said transmitting and send a second set of predefined control data to said second data driven processor, wherein said second set of predefined control data is queued by a second queue of said second data driven processor;
wherein said first queue of said first data driven processor comprises a ping-pong buffer, said second queue of said first data driven processor comprises a ping-pong buffer, said first queue of said second data driven processor comprises a ping-pong buffer, and said second queue of said second data driven processor comprises a ping-pong buffer.
20. The method as recited in claim 19 wherein said second queue of said first data driven processor is operable for initiating said performing upon receiving said first set of predefined control data.
21. The method as recited in claim 20 further comprising:
performing said second predefined encoding/decoding function via said second data driven processor, said second queue of said second data driven processor is operable for initiating said performing said second predefined encoding/decoding function upon receiving said second set of predefined control data.
22. A data encoding/decoding system comprising:
a first data driven processor operable to receive a first set of predefined image data and a first set of predefined control data from a previous data driven processor, said first data driven processor is operable to perform a first predefined encoding/decoding function, said first set of predefined image data is queued by a data buffer queue of said first data driven processor and a first set of predefined control data associated with said first set of predefined image data is queued by a control queue of said first data driven processor;
a second data driven processor connected to said first data driven processor, said second data driven processor comprises a data buffer queue and a control queue; and
wherein said first data driven processor is operable to simultaneously store a second set of predefined image data to said data buffer queue of said second data driven processor and send a second set of predefined control data to said control queue of said second data driven processor that is associated with said second set of predefined image data, said second data driven processor is operable to perform a second predefined encoding/decoding function, said first data driven processor and said second data driven processor are synchronized on data without a central controller,
wherein said data buffer queue of said first data driven processor comprises a ping-pong buffer, said control queue of said first data driven processor comprises a ping-pong buffer, said data buffer queue of said second data driven processor comprises a ping-pong buffer, and said control queue of said second data driven processor comprises a ping-pong buffer.
23. The system as recited in claim 22 wherein each of said first and second data driven processors is operable to synchronize passing and buffering of data automatically.
24. The system as recited in claim 22 further comprising a third data driven processor connected to said second data driven processor, said third data driven processor comprises a data buffer queue and a control queue, said data buffer of said third data driven processor comprises a ping-pong buffer and said control queue of said third data driven processor comprises a ping-gong buffer.