2006 ARM Code-O-Rama - Lossless-JPEG Decoder 最佳化設計

GOAL (copy from the official website):

2006 ARM Code-O-Rama 設計大賽 由安謀國際科技股份有限公司 (ARM) 與財團法人國家實驗研究院國家晶片系統設計中心 (CIC) 主辦、台北市電腦公會 (TCA) 協辦,為國內首度由業界主導,並結合產、官、學界力量,成為半導體產業最受矚目的設計競賽活動之一。截至 2006 年初,全球已有超過 20 億顆採用 ARM 技術的處理器銷售至市面,ARM 技術儼然已成為全球半導體設計領域的業界標準。ARM 以全球半導體業界領導者的角色,發起並主導這次的設計大賽,更令此次活動顯得意義非凡。 此次設計大賽目的在考驗大學院校學生,如何將其設計技術與創意結合在實際應用中。命題由國家晶片設計中心擬定,並由產官學界共同組成的 [命題及評審委員會] 審議通過。為求比賽的公平性與鑑別度,命題的設定盡量不偏頗特定的研究領域,而是由參賽隊伍從網路上取得 Reference C Code,移植至指定的 ARM ADS 平台,並針對評分項目進行最佳化。 此次競賽的標語 [Open、Original、Optimal] 就是依據這樣的理念發想而出。 # Open 從字面上的意義是開放,不僅代表此次比賽的命題是由網路下載開放程式碼為起點,更代表半導體設計人員應該擁有開放的心胸,才能夠沒有任何偏見地進行設計。 # Original 是原始的、具有獨創性的意思,充分反映出此次比賽希望學生發揮設計創意的宗旨。 # Optimal 則代表參賽隊伍將各自運用巧思及技術,將程式碼進行最佳化,也揭示出半導體設計的最終目的—為消費者帶來最佳的體驗。 除了優渥的獎金及價值不斐的 RVDS 軟體之外,入圍及獲獎的參賽隊伍更可獲得由 ARM 頒發的證書,作品並可刊登於發行全球的ARM IQ 雜誌中。更物超所值的是,只要報名參賽的隊伍,即可免費參加價值不斐的 ARM 訓練課程,進一步熟悉 ARM 的先進技術,為進入半導體設計職場做好準備。本活動預計吸引來自全國半導體相關科系學生的參與,也歡迎您的熱烈參與!

Reference

performance diary

Performance Status

How to

Issues

  • Try to understand JpegLS algorithm
  • Check how much registers can be used in one LDM/SDM command
  • How to remove unnecessary functions in c library
  1. Can we write a none-c library program?
Baseline ( no any c lib in code ) : lib size is 984
if use printf only in the code : lib size is 9856 ( RO + RW + ZI )
if use memcpy only in the code : lib size is 1068 (RO + RW + ZI )
  • The code that executes only one time shouldn't be move to RAM.
Ex:
GetSoi execute one time in period of whole program .
Move GetSoi to the RAM then execute it : Cycle 739286657
Execute directly GetSoi function in ROM : Cycle 739286554
  • How about the ARM C compiler couldn't recognize “inline” keyword?
    1. Solution add this statements on the top of the files where error occurs, “#define inline __inline”
  • How about ARM C compiler couldn't open “malloc.h”?
    1. Solution replace malloc.h with alloca.h
  • What about an undefined symbol bzero? (ljpgtopnm - Lossless JPEG)
    1. Solution use MEMSET() macro, which is defined in jpeg.h, instead of bzero().
  • What about the error: implicit cast of pointer to non-equal pointer, while calling ReadJpegData() in GetJpegChar().

repalce this —> (numInputBytes = 2+ReadJpegData( inputBuffer+2,JPEG_BUF_SIZE-2)

with this --->  (numInputBytes = 2+ReadJpegData( ((char *) inputBuffer+2),JPEG_BUF_SIZE-2)
  • Map from open() to fopen().

Report

Topic need to cover:

  1. debugger internal statistics of program before and after optimization
    • discussion the upside of cycle count, CPI, …etc optimization without algorithmic changes
  2. describe why in multimedia embedded system, instruction cache is more important than data cache.
  3. describe the performance enhancement due to the caching of left and diagonal pixel
  4. describe the effort we made to use internal memory to buffer read and write data and the reason why it fails
  5. rerun the benchmarks on new version of program
  6. describe the fact that our implementation scale with picture size and can deal with large row size (done)
  7. describe the effort we made to reduce stack size (done)
    • remove auto variables

Need to be done

  1. source level
    1. algorithm optimization
    2. remove every c library functions (except memcpy)
      1. see “Compilers and Libraries Guide” ch4.3 “Building an application without the C library”
    3. add image buffer
  2. ADS related
    1. How to find the balance between arm & thumb
    2. Remove unnecessary exception handler(remove ADS adding some useless handler)
  3. program analysis
    1. variables, functions usage profile
    2. stack size estimation
 
arm-contest/arm-contest.txt · Last modified: 2010/05/22 09:19 (external edit)
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki