

| Volume 6, Issue 4, July 2019 |

# High Speed 2-D Discrete Wavelet Transform using Modified Distributed Arithmetic Technique

#### Ankit Pathak

M Tech. Scholar, Dept. of Electronics and Communication, SCOPE College of Engineering, Bhopal, India

**ABSTRACT**: Discrete wavelet transform is applied of the stationary and non-stationary signal, it is used to speech signal, audio and video signal. Distributed arithmetic (DA) is a general and effective technique to implement multiplier-less filters and has been exploited in the past to implement the discrete wavelet transform as well. It's found that DA requires ROM and size of ROM increases exponentially as the increase in number of inputs, which highly increases the complexity. In this paper, high speed area efficient 2-D discrete wavelet transform (DWT) using 9/7 filter based Modified Distributed Arithmetic (MDA) Technique. MDA technique is applied the low and high pass filter of the discrete wavelet transform. MDA technique is consist of adder, shift register and free of multiplier. This design is implementation in Xilinx software and verified resistor transfer level (RTL) and waveform.

**KEYWORDS**: 2-D Discrete Wavelet Transform (DWT), MDA, Low Filter Bank, High Filter Bank, Xilinx Simulation.

#### I. INTRODUCTION

The Wavelet transform holds each time and frequency facts, based on a multi-decision analysis framework while as compared to conventional transforms such as the quick Fourier rework (FFT) and the Discrete Cosine rework (DCT) [1]. Wavelet rework has been used in many fields, which include photo and sign processing, sign compression, interstellar information evaluation, virtual fingerprints, noise reduction and so on. However, there has been an extended distance among the expectancy and the realism, due to its tough, tedious computation manner, and the computation speed.

The SRAM-based FPGA is nicely acceptable for mathematics, which include multiply & collect (MAC) in depth DSP capabilities and it could avoid the development charges and the lack of ability to make layout changes even after manufacturing [2].

The DWT is computationally in depth and most of its utility demand actual-time processing. One manner of reaching high pace overall performance is to apply speedy computational algorithm in a popular motive computers [3]. Another way is to make the most the parallelism inherent inside the computation for concurrent processing with the aid of a set of parallel processor.

But, it is not cost effective to use a general purpose computer for a specific application. Also, general purpose computer used for their implementation required more space, large power and more computation time. With the development of very large scale integration (VLSI) technology it facilitates to digital signal processing (DSP) system designer to design a high performance, low cost and low power system in a single chip [4]. The characteristic of VLSI system are that they offer greater potential for large amount of concurrency and offer an enormous amount of computing power within a small area. The computation is very cheap as the hardware is not an obstacle for VLSI system [5]. But, the nonlocalized global communication is not only expensive but demands high power dissipation. Thus, a high degree of parallelism and a nearest neighbor communication are crucial for realization of high performance VLSI system [6]. Keeping this in view, excessive performance utility specific VLSI structures are hastily evolving in recent years. The special purpose VLSI systems maximize processing concurrency by parallel / pipeline processing and provides cost effective alternative for real- time application. Consequently, 2-D DWT is currently carried out in a VLSI system to meet the temporal requirement of real-time application. Maintaining this reality in view, numerous layout schemes have been cautioned within the final decades for efficient implementation of two-D DWT in a VLSI system. Researchers have adopted different algorithm formulation, mapping scheme, and architectural design methods to reduce the computational time, arithmetic complexity or memory complexity of 2-D DWT structures. However, the area-delay performance of the existing structures changes marginally. The JPEG2000 preferred [4-6] defines the discrete wavelet rework (DWT) as a linear area-to-frequency remodel of the picture domain in an irreversible compression. This irreversible discrete wavelet transform is applied with the aid of a DA factorized coefficients from 9/7 Daubechies coefficients.

### II. MULTILEVEL DISCRETE WAVELET TRANSFORM

Multiresolution analysis (MRA) is a characteristic feature of sub-band and it is used for better spectral representation of the signal. In MRA, the signal is decomposed for more than one DWT level known as multilevel DWT. It means the



## | Volume 6, Issue 4, July 2019 |

low-pass output of first DWT level is further decomposed in a similar manner in order to get the second level of DWT decomposition and the process is repeated for higher DWT levels [7]. Few algorithms have been suggested for computation of multilevel DWT. One of the most important algorithm are pyramid algorithm (PA), this algorithm are proposed Mallet for parallel computation of multilevel DWT. PA for 1-D DWT is given by

$$Y_l^j(n) = \sum_{i=0}^{k-1} h(i) Y_l^{j-1} (2n-i)$$
 (1)

$$Y_h^j(n) = \sum_{i=0}^{k-1} g(i) Y_h^{j-1}(2n-i)$$
 (2)

Where  $Y_l^j(n)$  is the n-th low-pass sub band component of the j-th DWT level and  $Y_h^j(n)$  is the n-th high-pass sub band component of the j-th DWT level. Two-dimensional signal, such as images, are analyzed using the 2-D DWT as shown in Figure 1. Currently 2-D DWT is applied in many image processing applications such as image compression and reconstruction [8]. The 2-D DWT is a mathematical technique that decomposes an input image in the multiresolution frequency space [9]. The 2-D DWT decomposes an enter photo into four sub bands called low-low (LL), low-excessive (LH), high-low (HL) and high-excessive (HH) sub band.



Figure 1: Three Level Diagram of Low High Filter Bank

#### III. PROPOSED ARCHITECTURE

The block diagram of 9/7 wavelet coefficient based multilevel discrete wavelet transform using MDA structure shown in Figure 2. In this figure, input sample passing through 8-bit register after that all symmetrical delay input is add in the equation 3 to equation 6.

$$m(1) = X(n) + X(n-8)$$
(3)

$$m(2) = X(n-1) + X(n-7)$$
(4)

$$m(3) = X(n-2) + X(n-6)$$
(5)

$$m(4) = X(n-3) + X(n-5)$$
(6)

$$m(5) = X(n-4) \tag{7}$$

We have used MDA in 9/7 filter to remove multipliers. We have to apply MDA two times get the 1-D 9/7 filter high pass output  $Y_{H1}$  and low pass output  $Y_{L1}$ .



# | Volume 6, Issue 4, July 2019 |



Figure 2: Block Diagram of 9/7 Wavelet Coefficient based Discrete Wavelet Transform

Where  $h_0$ ,  $h_1$ ,  $h_2$ ,  $h_3$ ,  $h_4$  are the Low pass filter coefficients and High pass filter coefficient are  $g_0$ ,  $g_1$ ,  $g_2$  and  $g_3$ . Both low pass and high pass filter bank are used to  $g_0$ ,  $g_1$ ,  $g_2$  and  $g_3$  applied MDA technique by  $r_1$ ,  $r_2$ ,  $r_3$  and  $r_4$  then we get the high pass output  $Y_H$  of the 9/7 filter and we take the low pass coefficient  $h_0$ ,  $h_1$ ,  $h_2$ ,  $h_3$ , and  $h_4$  applied MDA technique by  $m_1$ ,  $m_2$ ,  $m_3$ ,  $m_4$  and  $m_5$  then we get the low pass output  $Y_L$  of the 9/7 filter. Example the low pass output step by step as shown in below:

$$Y_L \!=\! \begin{bmatrix} h_0 & h_1 & h_2 & h_3 & h_4 \end{bmatrix} \begin{bmatrix} m_1 \\ m_2 \\ m_3 \\ m_4 \\ m_5 \end{bmatrix}$$

(7)

Let  $m_1=1$ ,  $m_2=2$ ,  $m_3=3$ ,  $m_4=4$  and  $m_5=5$ . Then multiplier row and column and find out the low pass output 122. Where  $h_0$ ,  $h_1$ ,  $h_2$ ,  $h_3$ , and  $h_4$  daubechies 9/7 filter coefficients are 0.6029490, 0.2668444, -0.782232, -0.0168641 and 0.02674875 respectively. All the daubechies 9/7 filter coefficients multiplied by 128 and get the 77, 34, -10, -2 and 3 respectively.

(8)

$$Y_{H} = \begin{bmatrix} 77 & 34 & -10 & -2 & 3 \end{bmatrix} \bullet \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \\ 5 \end{bmatrix} = 122$$

We take the low pass coefficients  $h_0$ ,  $h_1$ ,  $h_2$ ,  $h_3$ , and  $h_4$  applied MDA technique by  $m_1$ ,  $m_2$ ,  $m_3$ ,  $m_4$  and  $m_5$  then

$$Y_L = \begin{bmatrix} 01001101 & 00100010 & 11110110 & 11111110 & 00000011 \end{bmatrix} \times \begin{bmatrix} m_1 \\ m_2 \\ m_3 \\ m_4 \\ m_5 \end{bmatrix}$$

we get the low pass output  $Y_L$  of the 9/7 filter.

(9

Now we will make the DA matrix by way of the filter coefficients as low pass clear out based DA matrix



# | Volume 6, Issue 4, July 2019 |

$$\left[X_{k}\right] = \begin{bmatrix} 1 & 0 & 0 & 0 & 1 \\ 0 & 1 & 1 & 1 & 1 \\ 1 & 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 1 & 0 \\ 0 & 1 & 1 & 1 & 0 \\ 1 & 0 & 1 & 1 & 0 \\ 0 & 0 & 1 & 1 & 0 \end{bmatrix}$$

In Figure 3, observe MDA strategies step-1 all of the input converts' binary number  $m_1 = 001$ ,  $m_2 = 010$ ,  $m_3 = 011$ ,  $m_4 = 100$ ,  $m_5 = 101$ 

Step-2 all of the binary enter carried out to sign extension so,

$$s(1) = 0001, s(2) = 0010, s(3) = 0011, s(4) = 0100, s(5) = 0101$$

Step-3 all of the sign extensions enter carried out to adder array so,

$$m(1) = 0110, m(2) = 1110, m(3) = 1000,$$

$$m(4) = 0101, m(5) = 0111,$$



(11)

Figure 3: Proposed Architecture 1-D for Low Pass Filter Using MDA Technique

$$m(6) = 1001, m(7) = 1000$$

$$m(8) = not(m_3 + m_4) + 1 = 1001$$

Step-4 the entire adder array enter applied to MUX so, the whole adder array enter proper shift 1-bit so



| Volume 6, Issue 4, July 2019 |

$$MUX(1) = 0.0110 = Y_p(0)$$

MUX(1) add  $MUX(2) = Y_P(1)$ 

= 0.0110

= 1110

+ 100010

Output of the YP (1) again right shift 1-bit and adds MUX (3) so

= 0.100010

= 1000

+ 1 100010

Continuous the process one by one, after then calculate the final output

$$Y_P(7) = 00001111010 = 122$$

Carry is rejected.

For 2-D sub-band WT, the outputs of 1-D high pass and low pass filters  $Y_{H1}$  and  $Y_{L1}$  are passed through series of shift registers and then we take the samples parallel using parallel data access method. The parallel data access method is used to minimize the memory requirement in 2-D sub-band WT.

## IV. SIMULATION RESULT

All of the designing and test concerning algorithm that referred to in this paper is being developed on Xilinx 14.1i up to date version. Xilinx 14.1i has couple of the striking functions inclusive of low reminiscence requirement, fast debugging, and occasional price. By means of the aid of that software program we debug the program without problems.



Figure 4: RTL View of 1-D Wavelet Transform



## | Volume 6, Issue 4, July 2019 |



Figure 5: RTL View of 2-D Wavelet Transform

Table 1: 1-D DWT using MDA Technique

| Device Utilization Summary (estimated values) |      |           |             |  |  |
|-----------------------------------------------|------|-----------|-------------|--|--|
| Logic Utilization                             | Used | Available | Utilization |  |  |
| Number of Slice Registers                     | 35   | 301440    | 0%          |  |  |
| Number of Slice LUTs                          | 213  | 150720    | 0%          |  |  |
| Number of fully used LUT-FF pairs             | 10   | 238       | 4%          |  |  |
| Number of bonded IOBs                         | 29   | 400       | 7%          |  |  |
| Number of BUFG/BUFGCTRLs                      | 1    | 32        | 3%          |  |  |

Timing Summary:
----Speed Grade: -2

Minimum period: 1.380ns (Maximum Frequency: 724.638MHz)
Minimum input arrival time before clock: 0.471ns
Maximum output required time after clock: 11.542ns
Maximum combinational path delay: 10.949ns

We functionally 2-D sub-band WT verified presented in this paper including all low pass filter and high pass filter. We have been found from the results shown in table 1, that number of slices, number of slices LUTs and maximum combinational path delay used in different types of device family. RTL (resister transfer level) view is 2-D sub-band tree structure in shown in Figure 5.



# | Volume 6, Issue 4, July 2019 |

Table 2: 2-D DWT using MDA Technique

| Device Utilization Summary (estimated values) |      |           |             |  |  |
|-----------------------------------------------|------|-----------|-------------|--|--|
| Logic Utilization                             | Used | Available | Utilization |  |  |
| Number of Slice Registers                     | 228  | 301440    | 0%          |  |  |
| Number of Slice LUTs                          | 728  | 150720    | 0%          |  |  |
| Number of fully used LUT-FF pairs             | 134  | 822       | 16%         |  |  |
| Number of bonded IOBs                         | 44   | 400       | 11%         |  |  |
| Number of BUFG/BUFGCTRLs                      | 1    | 32        | 3%          |  |  |

Table 3: Comparison result of existing 1-D DWT and proposed 1-D DWT algorithm

| Design                   | No. of<br>Slice<br>Registers | No. of Slice<br>LUTs | LUT Flip<br>Flop pair | Bonded<br>IOBs | Maximum<br>Frequency<br>(MHz) |
|--------------------------|------------------------------|----------------------|-----------------------|----------------|-------------------------------|
| Mamatha I,<br>Shikha [1] | 139                          | 417                  | 71                    | 358            | 633.43                        |
| Proposed<br>Design       | 35                           | 213                  | 10                    | 29             | 724.638                       |

#### V. CONCLUSION

2-D sub-band wavelet transform standardize two basic blocks for representing the image compression namely, low pass filter and high pass filter. Wavelet transforms a vast application in many areas like image compression, signal processing and VLSI design. In this paintings end result-biased MDA-primarily based filter structure for the approximate computation of the DWT has been presented. The proposed concept has been implemented to the well-known 9/7 wavelet filters, respectively, to reduce the complexity of MDA-based totally architectures for the DWT computation. The overall performance and complexity of the proposed structure for the 9/7 wavelet filters are better than those of previously published works.

## REFERENCES

- [1] Mamatha I, Shikha Tripathi and Sudarshan TSB, "Pipelined Architecture for Filter Bank based 1-D DWT", 2016 3rd International Conference on Signal Processing and Integrated Networks (SPIN) 2016 IEEE.
- [2] S.G. Mallat, "A Theory for Multiresolution Signal Decomposition: The Wavelet Representation", IEEE Trans. on Pattern Analysis on Machine Intelligence, 110. July1989, pp. 674-693.
- [3] M. Alam, C. A. Rahman, and G. Jullian, "Efficient distributed arithmetic based DWT architectures for multimedia applications," in Proc. IEEE Workshop on SoC for real-time applications, pp. 333 336, 2003.
- [4] X. Cao, Q. Xie, C. Peng, Q. Wang and D. Yu, "An efficient VLSI implementation of distributed architecture for DWT," in Proc. IEEE Workshop on Multimedia and Signal Process., pp. 364-367, 2006.
- [5] Archana Chidanandan and Magdy Bayoumi, "Area-Efficient NEDA Architecture For The 1-D DCT/IDCT," ICASSP 2006.

#### International Journal of Advanced Research in Arts, Science, Engineering & Management (IJARASEM)



| ISSN: 2395-7852 | www.ijarasem.com | Bimonthly, Peer Reviewed & Referred Journal

# | Volume 6, Issue 4, July 2019 |

- [6] M. Martina, and G. Masera, "Low-complexity, efficient 9/7 wavelet filters VLSI implementation," IEEE Trans. on Circuits and Syst. II, Express Brief vol. 53, no. 11, pp. 1289-1293, Nov. 2006.
- [7] M. Martina, and G. Masera, "Multiplierless, folded 9/7-5/3 wavelet VLSI architecture," IEEE Trans. on Circuits and syst. II, Express Brief vol. 54, no. 9, pp. 770-774, Sep. 2007.
- [8] Gaurav Tewari, Santu Sardar, K. A. Babu, "High-Speed & Memory Efficient 2-D DWT on Xilinx Spartan3A DSP using scalable Polyphase Structure with DA for JPEG2000 Standard," 978-1-4244-8679-3/11/\$26.00 ©2011 IEEE.
- [9] B. K. Mohanty and P. K. Meher, "Memory Efficient Modular VLSI Architecture for Highthroughput and Low-Latency Implementation of Multilevel Lifting 2-D DWT", Ieee Transactions On Signal Processing, VOL. 59, NO. 5, MAY 2011.
- [10] B. K. Mohanty and P. K. Meher, "Efficient Multiplierless Designs for 1-D DWT using 9/7 Filters Based on Distributed Arithmetic", ISIC 2009.