JPEG2000 Parallelization

  • Here we highlight topics related to parallelize JPEG2000.
    1. Performance results.
      • HD images run on both JPEG2000 implementations, Jasper and Kakadu.
      • Size overhead
      • Compression ratio
      • Results run by single and multiple processes
      • Program behavior of jasper.
    2. Flags for OpenMP.
    3. Program profiling.
    4. JPEG2000 flow chat. (for parallelization)
  • Outline of experiments for the project
    • Motivation - Poor scalability on JPEG2000 in previous works.
      1. Related work. (two parallelizing JPEG2000 papers.)
      2. Jasper(1.900) v.s. Kakadu(5.2.5) on images ranging from big(1000 mega pixel) image to small image(3.8 mega pixel).
          1. Size and computing time trend.
          2. Jasper and Kakadu performace evaluation.
        • Vtune or Thread profile tool applied to profile program. Need to be DONE
          1. To find out program hotspot of kakadu.
      3. Kakadu scalability (speedup) result on our smallest image (300 MP).
        1. On Xeon 3.2 x2 machine. - Kakadu speedup result
        2. On T1-2000. Need to be DONE
    • Experimental Result
  1. Partitioned size exploreration - see what image(biggest image) size could get better performance.
  • Use images with random-generated (uniform distribution) pixels' to model the worse-case compression and decompression overhead.
  • Later, we could apply any other model, say normal distribution, to model any other type of images, real world pictures or medical image, applied. Need to be DONE
  1. Size overhead - only on biggest and smallest images. - Refer to sec-“Size overhead”.
  2. Parallelizing processes results on images from 3.8 to 1000 mega pixel. (about 5 level images would be tested.) Need to be DONE

Performance metrics

  • Speedup
  • Compression ratio
    1. To figure out that if any compression degradation occurs while we simply crop the image into pieces and compress those pieces independently.
    2. Take an source ppm image as an example:
      • 29,942,801bytes (ppm) → 12,133,149bytes (jp2)
    3. from 4ppms to 4jp2 = 12138942 bytes
    4. compression degradation: (12138942-12133149)/12133149 = 0.0477452309 %
  • Compression ration of JPEG2000

Size overhead

image nameresolutionoriginal ppm image sizetranslated jp2 image sizeratio (jp2/ppm)
Canon_PowerShot_A640_img_0287.ppm3648×273629,241 KB11,848 KB40.51%
Canon_PowerShot_A520_img_0259.ppm2272×170411,342 KB3,410 KB30.06%
  • Overhead due to divide image into pieces. (ppm)
image nameoriginal ppm image sizenumber of sub-imagessum of sub-images' sizesize overhead (sub-image/original)
Canon_PowerShot_A640_img_0287.ppm29,942,801 B2729,943,216 B1.00001
Canon_PowerShot_A520_img_0259.ppm11,614,481 B1711,614,736 B1.00002
  • Overhead due to divide image into pieces. (jp2) Flags: jasper -O rate=1 -O mode=int -f original.ppm -F output.jp2
image nameoriginal jp2 image sizenumber of sub-imagessum of sub-images' sizesize overhead (sub-image/original)
Jasper
Canon_PowerShot_A520_img_0259.jp23,492,496 B173,515,083 B1.00647
Canon_PowerShot_A640_img_0287.jp212,133,161 B2712,234,385 B1.00834
Kakadu
Canon_PowerShot_A640_img_0287.jp212,133,161 B2712,239,278 B1.00412
Canon_PowerShot_A520_img_0259.jp23,492,496 B17 B1.

Preliminary Performance Results

  • Brief description:
    1. HD images are run on both JPEG2000 implementations, Jasper and Kakadu.
    2. (Kakadu):Use linux “time” command and internal timing information in Kakadu.
    3. (Jasper):timing information (clock() function used) instrumented in the source code to get the execution time of the images.(on Main() and JP2_decode() functions.)
    4. we could find the trend in the result.
  • Testing Environment: NOTE: Platform is different from the one used for program profiling.
    1. Vito:Linux nullNode01 2.6.19-gentoo-r5 #1 SMP Tue May 1 18:29:47 CST 2007 x86_64 Intel(R) Xeon(TM) CPU 3.20GHz GenuineIntel GNU/Linux
    2. Intel(R) C Compiler for Intel(R) EM64T-based applications, Version 9.1 Build 20061101
    3. basic flag used:“-g -O0 -openmp -lompstub -lomp_db”
    4. optimi
  • Tested images:
    1. Canon PowerShot_A640_img_0287.jp2 (original image)
    2. Canon\ PowerShot_A640_img_0287_left_top.jp2 (sub-image)
    3. Canon\ PowerShot_A640_img_0287_right_top.jp2 (sub-image)
    4. Canon\ PowerShot_A640_img_0287_right_bottom.jp2 (sub-image)
    5. Canon\ PowerShot_A640_img_0287_left_bottom.jp2 (sub-image)
    6. Canon_PowerShot_A640_img_0287-27.jp2(27th sub-image)
    7. Canon_PowerShot_A520_img_0259.jp2 (original image)
    8. Canon_PowerShot_A520_img_0259-17.jp2 (17th sub-image)

Results with single process

  • There only Jasper, JPEG2000 implementation, is run with HD images.
  • Command below is used to generate the result. Where test.jp2 is Canon PowerShot_A640_img_0287.ppm in jp2 format.
    1. “./jasper –input test.jp2 –output test.ppm >output”
  • Timing results: “flags: -g -O0 -openmp -lompstub -lomp_db”
Input imageimage size (KB)Time in main()Time in jp2_decode()Time in jpc_decode()(jp2_decode / main) x 100%
Canon PowerShot_A640_img_0287.jp211,849 KB17.04 s14.56 s14.56 s 85.4 %
Canon PowerShot_A640_img_0287_left_top.jp22,881 KB4.1 s3.48 s3.48 s84.9%
Canon PowerShot_A640_img_0287_right_top.jp22,795 KB4.01 s3.38 s3.38 s84.3%
Canon PowerShot_A640_img_0287_right_bottom.jp23,267 KB4.46 s3.83 s3.83 s85.8%
Canon PowerShot_A640_img_0287_left_bottom.jp22,913 KB4.14 s3.51 s3.51 s84.8%
  • Timing results: “flags: -openmp_report2 -O2 -openmp -lompstub -lomp_db -lguide”
Input imageimage size (KB)Time in main()Time in jp2_decode()Time in jpc_decode()(jp2_decode / main) x 100%
Canon_PowerShot_A640_img_0287_left_top.jp22,881 KB2.16 s1.9 s1.9 s87.96%

Results with multiple processes

  • Two JPEG2000 implementations, Jasper and Kakadu, are used to present the performance of high definition images.
  • Timing results: “flags: -openmp_report2 -par_report3 -fp_report -O3 -openmp -lompstub -lguide -Ob2 -parallel -tpp7 -mp -axP”
    1. This is Process-Level Parallelism: we fork processes concurrently at the same time and measure the timing.
    2. Performance results for images run in single process and multiple processes.
    3. Run with Single Process:
      1. “Canon_PowerShot_A640_img_0287-27.jp2” is 1/27 sub-image of “Canon_PowerShot_A640_img_0287.jp2”.
      2. “Canon_PowerShot_A520_img_0259-17.jp2” is 1/17 sub-image of “Canon_PowerShot_A520_img_0259.jp2”
    4. Run with Multiple Processes:
      1. “(Canon_PowerShot_A640_img_0287)/27” means 27 equal-sized images are executed in background concurrently.
      2. “(Canon_PowerShot_A520_img_0259)/17” means 17 equal-sized images are executed in background concurrently.
      3. Problem: Sum of sub-images might exceed the single image size. –> Don't worry, we could explain with tiling method.
    • Command used: time ../jasper -f Canon_PowerShot_A640_img_0287.ppm -F Canon_PowerShot_A640_img_0287-1.jp2 -O rate=1 -O mode=int
  • Timing information are gathered by “time” (user+system time) command under linux system.
    1. To aviod processes schduling time to interfere with the program performance, since there might be several processes running in the system at the same time.
    2. Results with Multiple Process - Performance results between Jasper and Kakadu run by single and multiple processes.

Configure usage --- add "-openmp" flag

  • Find “Extra stuff for research purposes” in the configure file and add “-openmp” flag in it.
    1. Later, openmpm flag will be added into gererated makefile while we use command “./configure –enable-special0”

partial code of “configure” presented below:

 Check whether --enable-special0 or --disable-special0 was given.
 if test "${enable_special0+set}" = set; then 
 enableval="$enable_special0"
 case "${enableval}" in
      if test "$GCC" = yes; then
              CFLAGS="-openmp_report2 -O2 -openmp -lompstub -lomp_db -lguide "
      fi

Program profiling

JPEG2000 encode/decode flow

  • How do we parallelize JPEG2000 with tiling?
    1. Traditional approach - Regular JPEG2000 flow.
    2. Simplified approach - Simplified JPEG2000 flow.
  • Differences between these flows.
    1. Traditional approach: A single instance of software implementation is running with input tiled image and decode/encode part is parallelized into threads to decode/encode tiles independently.
    2. Simplified approach: N instance of software implementations are running in threads at the beginning to decode/encode pre-partitioned image.
  • Why we use second approach?
    1. Because it saves our time to get performance improvement without digging to the code.
    2. We simply partition the image into sub-images and we execute the program with threads to serve sub-images.
  • Is there any validation should be made to prove our method (simplified approach) is valid?
    1. No, by the JPEG2000 standard, each tile could be translated independently without interference. Here, we do the tiling job manually and it won't make any different.

Regular JPEG2000 flow

  • A single Jasper instance is called with several threads invoked while performing encode/decode job.

Simplified JPEG2000 flow (our approach to explorer parallelism with minimum effort)

  • Several instances of Jasper is called to encode/decode pre-partitioned images in parallel.
  • The a_image is divided into sub-images before encode and decode to be performed.

References:

 
arm-contest/arm-contest/jpeg2000_parallelization.txt · Last modified: 2010/05/22 09:19 (external edit)
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki