CRYX's note about the JPEG decoding algorithm.Copyright 1999 Cristi Cuturicu.DISCLAIMER...........You get this file for free, so you cannot have any legal requests from me.If you don't agree, read no more.No warranty is provided with this doc, there might be bugs or errors in it(although I've tried to avoid them), so use the information contained in thisfile at your own risk.This is NOT an official documentation, for further information please referto the JPEG ISO standard.All product names mentioned in this file are trademarks or registered trademarksof their respective owners.Not for reproduction (electronic or hardcopy) except for personal use.THE JPEG COMPRESSION and THE JPG FILE FORMAT.............................................Long ago, I've been looking on the net a good doc which could have explained tome the JPEG compression, and particularly the JPG file format.And recently I've found the ISO-ITU JPEG standard in a file called itu-1150.ps(JPEG standard = ISO standard 10918-1 or CCITT standard recommendation T.81: "Information Technology - Digital compression and coding of continuous-tonestill images - Requirements and guidelines")Though this standard is quite complete, it has a lot of not interesting partsin its 186 pages, and I had to dig in it, and then write my own JPG viewer,to get from this standard the main stuff a programmer needs : The Baseline Sequential DCT JPG compression.First a note : Mainly because of the fact that the majority of the JPG files areBaseline Sequential JPGS, this doc concerns only the Baseline Sequential JPGcompression and particularly the JFIF implementation of it.It DOES NOT cover the JPG Progresive or Hierarchical compression.(For more details about these read the itu-1150 standard. It can be found at www.wotsit.org or somewhere at www.jpeg.org/jpeg)I've thought that it would be easier for the reader to understand the JPGcompression if I'll explain the steps of the JPG encoder.(The decoder steps will be the inverse of the encoder's steps, but in reverseorder, of course)THE JPEG ENCODER STEPS----------------------1) The afine transformation in colour space : [R G B] -> [Y Cb Cr]---------------------------------------------------------------------(It is defined in the CCIR Recommendation 601)(R,G,B are 8-bit unsigned values) | Y | | 0.299 0.587 0.114 | | R | | 0 | | Cb | = |- 0.1687 - 0.3313 0.5 | * | G | + |128| | Cr | | 0.5 - 0.4187 - 0.0813| | B | |128|The new value Y = 0.299*R + 0.587*G + 0.114*B is called the luminance.It is the value used by the monochrome monitors to represent an RGB colour.Physiologically, it represents the intensity of an RGB colour perceived bythe eye.You see that the formula for Y it's like a weighted-filter with different weightsfor each spectral component: the eye is most sensitive to the Green componentthen it follows the Red component and the last is the Blue component.The values Cb = - 0.1687*R - 0.3313*G + 0.5 *B + 128 Cr = 0.5 *R - 0.4187*G - 0.0813*B + 128are called the chromimance values and represent 2 coordinates in a systemwhich measures the nuance and saturation of the colour ([Approximately], thesevalues indicate how much blue and how much red is in that colour).These 2 coordinates are called shortly the chrominance.[Y,Cb,Cr] to [R,G,B] Conversion (The inverse of the previous transform)--------------------------------RGB can be computed directly from YCbCr ( 8-bit unsigned values) as follows:R = Y + 1.402 *(Cr-128)G = Y - 0.34414*(Cb-128) - 0.71414*(Cr-128)B = Y + 1.772 *(Cb-128)A note relating Y,Cb,Cr to the human visual system---------------------------------------------------The eye, particulary the retina, has as visual analyzers two kind of cells :Cells for night view which perceive only nuances of gray ranging from intensewhite to the darkest black and cells for the day view which perceive the colornuance.The first cells, given an RGB colour, detect a gray level similar to that givenby the luminance value.The second cells, responsible for the perception of the colour nuance, are thecells which detects a value related to that of the chrominance.2) Sampling------------The JPEG standard takes into account the fact that the eye seems to be moresensitive at the luminance of a colour than at the nuance of that colour.(the white-black view cells have more influence than the day view cells)So, on most JPGS, luminance is taken in every pixel while the chrominance istaken as a medium value for a 2x2 block of pixels.Note that it is not neccessarily that the chrominance to be taken as a mediumvalue for a 2x2 block , it could be taken in every pixel, but good compressionresults are achieved this way, with almost no loss in visual perception of thenew sampled image.A note : The JPEG standard specifies that for every image component (like, forexample Y) must be defined 2 sampling coefficients: one for the horizontalsampling and one for vertical sampling.These sampling coefficients are defined in the JPG file as relative to themaximum sampling coefficient (more on this later).3) Level shift--------------All 8-bit unsigned values (Y,Cb,Cr) in the image are "level shifted": they areconverted to an 8-bit signed representation, by subtracting 128 from their value.4) The 8x8 Discrete Cosine Transform (DCT)------------------------------------------The image is break into 8x8 blocks of pixels, then for each 8x8 block isapplied the DCT transform. Note that if the X dimension of the original imageis not divisible by 8, the encoder should make it divisible, by completing theremaining right columns (until X becomes a multiple of 8) with the right-mostcolumn of the original image.Similar, if the Y dimension is not divisible by 8, the encoder should completethe remaining lines with the bottom-most line of the original image.The 8x8 blocks are processed from left to right and from top to bottom.A note: Since a pixel in the 8x8 block has 3 components (Y,Cb,Cr) the DCTis applied separately to 3 blocks 8x8: The first 8x8 block is the block which contains the luminance of the pixels in the original 8x8 block The second 8x8 block is the block which contains the Cb value of the pixels in the original 8x8 block And, similar, the third 8x8 block contains the Cr values.The purpose of the DCT transform is that instead of processing the originalsamples, you work with the spatial frequencies present in the original image.These spatial frequencies are very related to the level of detail present in animage. High spatial frequencies corresponds to high levels of detail, whilelower frequencies corresponds to lower levels of detail.The DCT transform is very similar to the 2D Fourier transform which shifts fromthe time domain (the original 8x8 block) to the frequency domain (the new 8x8=64 coefficients which represents the amplitudes of the spatial frequenciesanalyzed)The mathematical definition of Forward DCT (FDCT) and Inverse DCT (IDCT) is :FDCT: c(u,v) 7 7 2*x+1 2*y+1F(u,v) = --------- * sum sum f(x,y) * cos (------- *u*PI)* cos (------ *v*PI) 4 x=0 y=0 16 16 u,v = 0,1,...,7 { 1/2 when u=v=0 c(u,v) = { { 1 otherwiseIDCT: 1 7 7 2*x+1 2*y+1f(x,y) = --- * sum sum c(u,v)*F(u,v)*cos (------- *u*PI)* cos (------ *v*PI) 4 u=0 v=0 16 16 x,y=0,1...7Applying these formulas directly is computationally expensive, especiallywhen there have been developed faster algorithms for implementing forward orinverse DCT. A notable one called AA&N leaves only 5 multiplies and 29 addsto be done in the DCT itself. More info and an implementation of it can be found in the free software for JPEG encoders/decoders made by Independent JPEG Group (IJG), their C source can be found at www.ijg.org.5) The zig-zag reordering of the 64 DCT coefficients-----------------------------------------------------So, after we performed the DCT transform over a block of 8x8 values, we havea new 8x8 block.Then, this 8x8 block is traversed in zig-zag like this :(The numbers in the 8x8 block indicate the order in which we traverse thebidimensional 8x8 matrix) 0, 1, 5, 6,14,15,27,28, 2, 4, 7,13,16,26,29,42, 3, 8,12,17,25,30,41,43, 9,11,18,24,31,40,44,53, 10,19,23,32,39,45,52,54, 20,22,33,38,46,51,55,60, 21,34,37,47,50,56,59,61, 35,36,48,49,57,58,62,63As you see , first is the upper-left corner (0,0), then the value at (0,1),then (1,0) then (2,0), (1,1), (0,2), (0,3), (1,2), (2,1), (3,0) etc.After we are done with traversing in zig-zag the 8x8 matrix we have now a vectorwith 64 coefficients (0..63)The reason for this zig-zag traversing is that we traverse the 8x8 DCT coefficientsin the order of increasing the spatial frequencies. So, we get a vector sortedby the criteria of the spatial frequency: The first value in the vector (atindex 0) corresponds to the lowest spatial frequency present in the image -It's called the DC term. As we increase the index in the vector, we get valuescorresponding to higher frequencies (The value at index 63 corresponds to theamplitude of the highest spatial frequency present in the 8x8 block).The rest of the DCT coefficients are called AC terms.6) Quantization----------------At this stage, we have a sorted vector with 64 values corresponding to theamplitudes of the 64 spatial frequencies present in the 8x8 block.These 64 values are quantized: Each value is divided by a dividend specifiedin a vector with 64 values --- the quantization table , then it's rounded tothe nearest integer. for (i = 0 ; i<=63; i++ ) vector[i] = (int) (vector[i] / quantization_table[i] + 0.5)Here is the example of the quantization table for luminance(Y) given in anannex of the JPEG standard.(It is given in a form of an 8x8 block; in order toobtain a 64 vector it should be zig-zag reordered) 16 11 10 16 24 40 51 61 12 12 14 19 26 58 60 55 14 13 16 24 40 57 69 56 14 17 22 29 51 87 80 62 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 49 64 78 87 103 121 120 101 72 92 95 98 112 100 103 99This table is based upon "psychovisual thresholding" , it has "been used withgood results on 8-bit per sample luminance and chrominance images".Most existing encoders use simple multiples of this example, but the values arenot claimed to be optimal (An encoder can use ANY OTHER quantization table)The table is specified in the JPEG file with the DQT(Define Quantization Table)marker.Most commonly there is one table for Y, and another one for thechrominance (Cb and Cr).The quantization process has the key role in the JPEG compression.It is the process which removes the high frequencies present in the originalimage -- in consequence the high detail.We do this because of the fact that the eye is much more sensitive to lowerspatial frequencies than to higher frequencies, so we can remove, with verylittle visual loss, higher frequencies.This is done by dividing values at high indexes in the vector (the amplitudesof higher frequencies) with larger values than the values by which are dividedthe amplitudes of lower frequencies.The bigger the values in the quantization table are, the bigger is the error(in consequence the visual error) introduced by this lossy process, and thesmaller is the visual quality.Another important fact is that in most images the colour varies slow from onepixel to another, so most images will have a small quantity of high detail-> a small amount (small amplitudes) of high spatial frequencies - but they havea lot of image information contained in the low spatial frequencies.In consequence in the new quantized vector, at high spatial frequencies, we'llhave a lot of consecutive zeroes.7) The Zero Run Length Coding (RLC)-------------------------------Now we have the quantized vector with a lot of consecutive zeroes. We can exploitthis by run length coding the consecutive zeroes.IMPORTANT: You'll see later why, but here we skip the encoding of the first coefficient of the vector (the DC coefficient) which is coded a bit different.(I'll present its coding later on this doc)Let's consider the original 64 vector a 63 vector (it's the 64 vector withoutthe first coefficient)Say that we have 57,45,0,0,0,0,23,0,-30,-16,0,0,1,0,0,0, 0 , 0 ,0 , only 0,..,0Here it is how the RLC JPEG compression is done for this example :(0,57) ; (0,45) ; (4,23) ; (1,-30) ; (0,-16) ; (2,1) ; EOBAs you see, we encode for each value different by 0 the number of consecutivezeroes PRECEDING that value, then we add the value.Another note : EOB is the short form for End of Block, it's a special codedvalue (a marker). If we've reached in a position in the vector from which we have till the end of the vector only zeroes, we'll mark that positionwith EOB and finish the RLC compression of the quantized vector.[Note that if the quantized vector doesn't finishes with zeroes (has the lastelement not 0) we'll not have the EOB marker.]ACTUALLY, EOB has as an equivalent (0,0) and it will be (later) Huffman codedlike (0,0), so we'll encode : (0,57) ; (0,45) ; (4,23) ; (1,-30) ; (0,-16) ; (2,1) ; (0,0)Another MAJOR thing: Say that somewhere in the quantized vector we have: 57, eighteeen zeroes, 3, 0,0 ,0,0 2, thirty-three zeroes, 895, EOBThe JPG Huffman coding makes the restriction (you'll see later why) thatthe number of previous 0's to be coded as a 4-bit value, so it can't overpassthe value 15 (0xF).So, the previous example would be coded as : (0,57) ; (15,0) (2,3) ; (4,2) ; (15,0) (15,0) (1,895) , (0,0)(15,0) is a special coded value which indicates that there follows 16 consecutivezeroes.Note : 16 zeroes not 15 zeroes.8) The final step === Huffman coding-------------------------------------First an IMPORTANT note : Instead of storing the actual value , the JPEG standardspecifies that we store the minimum size in bits in which we can keep that value(it's called the category of that value) and then a bit-coded representationof that value like this: Values Category Bits for the value 0 0 - -1,1 1 0,1 -3,-2,2,3 2 00,01,10,11 -7,-6,-5,-4,4,5,6,7 3 000,001,010,011,100,101,110,111 -15,..,-8,8,..,15 4 0000,..,0111,1000,..,1111 -31,..,-16,16,..,31 5 00000,..,01111,10000,..,11111 -63,..,-32,32,..,63 6 . -127,..,-64,64,..,127 7 . -255,..,-128,128,..,255 8 . -511,..,-256,256,..,511 9 . -1023,..,-512,512,..,1023 10 . -2047,..,-1024,1024,..,2047 11 . -4095,..,-2048,2048,..,4095 12 . -8191,..,-4096,4096,..,8191 13 . -16383,..,-8192,8192,..,16383 14 .-32767,..,-16384,16384,..,32767 15 .In consequence for the previous example: (0,57) ; (0,45) ; (4,23) ; (1,-30) ; (0,-8) ; (2,1) ; (0,0)let's encode ONLY the right value of these pairs, except the pairs that arespecial markers like (0,0) or (if we would have) (15,0) 57 is in the category 6 and it is bit-coded 111001 , so we'll encode itlike (6,111001) 45 , similar, will be coded as (6,101101) 23 -> (5,10111) -30 -> (5,00001) -8 -> (4,0111) 1 -> (1,1)And now , we'll write again the string of pairs: (0,6), 111001 ; (0,6), 101101 ; (4,5), 10111; (1,5), 00001; (0,4) , 0111 ; (2,1), 1 ; (0,0)The pairs of 2 values enclosed in bracket paranthesis, can be represented on abyte because of the fact that each of the 2 values can be represented on a nibble(the counter of previous zeroes is always smaller than 15 and so it is thecategory of the numbers [numbers encoded in a JPG file are in range -32767..32767]).In this byte, the high nibble represents the number of previous 0s, and thelower nibble is the category of the new value different by 0.The FINAL step of the encoding consists in Huffman encoding this byte, and thenwriting in the JPG file, as a stream of bits, the Huffman code of this byte,followed by the remaining bit-representation of that number.For example, let's say that for byte 6 ( the equivalent of (0,6) ) we have aHuffman code = 111000; for byte 69 = (4,5) (for example) we have 1111111110011001 21 = (1,5) --- 11111110110 4 = (0,4) --- 1011 33 = (2,1) --- 11011 0 = EOB = (0,0) --- 1010The final stream of bits written in the JPG file on disk for the previous exampleof 63 coefficients (remember that we've skipped the first coefficient ) is 111000 111001 111000 101101 1111111110011001 10111 11111110110 00001 1011 0111 11011 1 1010The encoding of the DC coefficient-----------------------------------DC is the coefficient in the quantized vector corresponding to the lowestfrequency in the image (it's the 0 frequency) , and (before quantization) ismathematically = (the sum of 8x8 image samples) / 8 .(It's like an average value for that block of image samples).It is said that it contains a lot of energy present in the original 8x8 imageblock. (Usually it gets large values).The authors of the JPEG standard noticed that there's a very close connectionbetween the DC coefficient of consecutive blocks, so they've decided to encodein the JPG file the difference between the DCs of consecutive 8x8 blocks(Note: consecutive 8x8 blocks of the SAME image component, like consecutive8x8 blocks for Y , or consecutive blocks for Cb , or for Cr)Diff = DC - DC i (i-1)So DC of the current block (DC ) will be equal to : DC = DC + Diff i i i-1And in JPG decoding you will start from 0 -- you consider that the firstDC coefficient = 0 ; DC = 0 0And then you'll add to the current value the value decoded from the JPG(the Diff value)SO, in the JPG file , the first coefficient = the DC coefficient is actuallythe difference, and it is Huffman encoded DIFFERENTLY from the encoding of AC coefficients.Here it is how it's done:(Remember that we now code the Diff value)Diff corresponds as you've seen before to a representation made by category andit's bit coded representation.In the JPG file it will be Huffman encoded only the category value, like this:Diff = (category, bit-coded representation)Then Diff will be coded as (Huffman_code(category) , bit-coded representation)For example, if Diff is equal to -511 , then Diff corresponds to (9, 000000000)Say that 9 has a Huffman code = 1111110(In the JPG file, there are 2 Huffman tables for an image component: one for DCand one for AC)In the JPG file, the bits corresponding to the DC coefficient will be: 1111110 000000000And,applied to this example of DC and to the previous example of ACs, for thisvector with 64 coefficients, THE FINAL STREAM OF BITS written in the JPG filewill be: 1111110 000000000 111000 111001 111000 101101 1111111110011001 10111 11111110110 00001 1011 0111 11011 1 1010(In the JPG file , first it's encoded DC then ACs)THE HUFFMAN DECODER (A brief summary) for the 64 coefficients (A Data Unit)of an image component (For example Y)-------------------------------------------------------------So when you decode a stream of bits from the image in the JPG file, you'll do:Init DC with 0.1) First the DC coefficient decode : a) Fetch a valid Huffman code (you check if it exists in the Huffman DC table) b) See at what category this Huffman code corresponds c) Fetch N = category bits , and determine what value is represented by (category, the N bits fetched) = Diff d) DC + = Diff e) write DC in the 64 vector : " vector[0]=DC "2) The 63 AC coefficients decode :------- FOR every AC coefficient UNTIL (EOB_encountered OR AC_counter=64) a) Fetch a valid Huffman code (check in the AC Huffman table) b) Decode that Huffman code : The Huffman code corresponds to (nr_of_previous_0,category)[Remember: EOB_encountered = TRUE if (nr_of_previous_0,category) = (0,0) ] c) Fetch N = category bits, and determine what value is represented by (category,the N bits fetched) = AC_coefficient d) Write in the 64 vector, a number of zeroes = nr_of_previous_zero e) increment the AC_counter with nr_of_previous_0 f) Write AC_coefficient in the vector: " vector[AC_counter]=AC_coefficient "-----------------Next Steps-----------So, now we have a 64 elements vector.We'll do the reverse of the steps presentedin this doc:1) Dequantize the 64 vector : "for (i=0;i<=63;i++) vector[i]*=quant[i]"2) Re-order from zig-zag the 64 vector into an 8x8 block3) Apply the Inverse DCT transform to the 8x8 blockRepeat the upper process [ Huffman decoder, steps 1), 2) and 3)] for every8x8 block of every image component (Y,Cb,Cr).4) Up-sample if it's needed5) Level shift samples (add 128 to the all 8-bit signed values in the 8x8 blocksresulting from the IDCT transform)6) Tranform YCbCr to RGB7--- And VOILA ... the JPG imageThe JPEG markers and/or how it's organized the image information in the JPG file(The Byte level)--------------------------------------------------------------------------------NOTE: The JPEG/JFIF file format uses Motorola format for words, NOT Intel format,i.e. : high byte first, low byte last -- (ex: the word FFA0 will be written inthe JPEG file in the order : FF at the low offset , A0 at the higher offset)The JPG standard specifies that the JPEG file is composed mostly of pieces called"segments".A segment is a stream of bytes with length <= 65535.The segment beginning isspecified with a marker.A marker = 2 bytes beginning with 0xFF ( the C hexadecimal notation for 255),and ending with a byte different by 0 and 0xFF.Ex: 'FFDA' , 'FFC4', 'FFC0'.Each marker has a meaning: the second byte (different by 0 and 0xFF) specifieswhat does that marker.For example, there is a marker which specifies that you should start the decodingprocess , this is called (the JPG standard's terminology): SOS=Start Of Scan = 'FFDA'Another marker called DQT = Define Quantization Table = 0xFFDB does what thisname says: specifies that in the JPG file, after the marker (and after 3 bytes,more on this later) it will follow 64 bytes = the coefficients of the quantizationtable.If, during the processing of the JPG file, you encounter an 0xFF, then again aa byte different by 0 (I've told you that the second byte for a marker is not 0)and this byte has no marker meaning (you cannot find a marker corresponding tothat byte) then the 0xFF byte you encountered must be ignored and skipped.(In some JPGS, sequences of consecutive 0xFF are for some filling purposes andmust be skipped)You see that whenever you encounter 0xFF , you check the next byte and see ifthat 0xFF you encountered has a marker meaning or must be skipped.What happens if we actually need to encode the 0xFF byte in the JPG fileas an *usual* byte (not a marker, or a filling byte) ?(Say that we need to write a Huffman code which begins with 11111111 (8 bits of1) at a byte alignment)The standard says that we simply make the next byte 0 , and write the sequence'FF00' in the JPG file.So when your JPG decoder meets the 2 byte 'FF00' sequence, it should considerjust a byte: 0xFF as an usual byte.Another thing: You realise that these markers are byte aligned in the JPG file.What happens if during your Huffman encoding and inserting bits in the JPG file'sbytes you have not finished to insert bits in a byte, but you need to write amarker which is byte aligned ?For the byte alignment of the markers, you SET THE REMAINING BITS UNTIL THEBEGINNING OF THE NEXT BYTE TO 1, then you write the marker at the next byte.A short explanation of some important markers found in a JPG file.-------------------------------------------------------------------SOI = Start Of Image = 'FFD8' This marker must be present in any JPG file *once* at the beginning of the file.(Any JPG file starts with the sequence FFD8.)EOI = End Of Image = 'FFD9' Similar to EOI: any JPG file ends with FFD9.RSTi = FFDi (where i is in range 0..7) [ RST0 = FFD0, RST7=FFD7] = Restart MarkersThese restart markers are used for resync. At regular intervals, they appearin the JPG stream of bytes, during the decoding process (after SOS)(They appear in the order: RST0 -- interval -- RST1 -- interval -- RST2 --... ...-- RST6 -- interval -- RST7 -- interval -- RST0 --...)(Obs: A lot of JPGs don't have restart markers)The problem with these markers is that they interrupt the normal bit order inthe JPG's Huffman encoded bitstream.Remember that for the byte alignment of the markers the remaining bits are setto 1, so your decoder has to skip at regular intervals the useless fillingbits (those set with 1) and the RST markers.-------Markers...-------At the end of this doc, I've included a very well written technical explanationof the JPEG/JFIF file format, written by Oliver Fromme, the author of the QPEGviewer.There you'll find a pretty good and complete definition for the markers.But, anyway, here is a list of markers you should check:SOF0 = Start Of Frame 0 = FFC0SOS = Start Of Scan = FFDAAPP0 = it's the marker used to identify a JPG file which uses the JFIF specification = FFE0COM = Comment = FFFEDNL = Define Number of Lines = FFDCDRI = Define Restart Interval = FFDDDQT = Define Quantization Table = FFDBDHT = Define Huffman Table = FFC4The Huffman table stored in a JPG file---------------------------------------Here it is how JPEG implements the Huffman tree: instead of a tree, it definesa table in the JPG file after the DHT (Define Huffman Table) marker.NOTE: The length of the Huffman codes is restricted to 16 bits.Basically there are 2 types of Huffman tables in a JPG file : one for DC andone for AC (actually there are 4 Huffman tables: 2 for DC,AC of luminance and 2 for DC,AC of chrominance)They are stored in the JPG file in the same format which consist of:1) 16 bytes :byte i contains the number of Huffman codes of length i (length in bits) i ranges from 1 to 16 162) A table with the length (in bytes) = sum nr_codes_of_length_i i=1which contains at location [k][j] (k in 1..16, j in 0..(nr_codes_with_length_k-1))the BYTE value associated to the j-th Huffman code of length k.(For a fixed length k, the values are stored sorted by the value of the Huffmancode)From this table you can find the actual Huffman code associated to a particularbyte.Here it is an example of how the actual code values are generated:Ex: (Note: The number of codes for a given length are here for this particular example to figure it out, they can have any other values)SAY that, For length 1 we have nr_codes[1]=0, we skip this length For length 2 we have 2 codes 00 01 For length 3 we have 3 codes 100 101 110 For length 4 we have 1 code 1110 For length 5 we have 1 code 11110 For length 6 we have 1 code 111110 For length 7 we have 0 codes -- skip (if we had 1 code for length 7, we would have 1111110) For length 8 we have 1 code 11111100 (You see that the code is still shifted to left though we skipped the code value for 7) ..... For length 16, .... (the same thing)I've told you that in the Huffman table in the JPG file are stored the BYTE valuesfor a given code.For this particular example of Huffman codes:Say that in the Huffman table in the JPG file on disk we have (after that 16 byteswhich contains the nr of Huffman codes with a given length): 45 57 29 17 23 25 34 28These values corressponds , given that particular lengths I gave you before ,to the Huffman codes like this : there's no value for code of length 1 for codes of length 2 : we have 45 57 for codes of length 3 : 3 values (ex : 29,17,23) for codes of length 4 : only 1 value (ex: 25) for codes of length 5 : 1 value ( ex: 34) .. for code of length 7, again no value, skip to code with length 8 for code of length 8 : 1 value 28IMPORTANT note: For codes of length 2: the value 45 corresponds to code 00 57 to code 01 For codes of length 3: the value 29 corresponds to code 100 17 ----||--- 101 23 ----||--- 110 ETC...(I've told you that for a given length the byte values are stored in the orderof increasing the value of the Huffman code.)Four Huffman tables corresponding to DC and AC tables of the luminance, andDC and AC tables for the chrominance, are given in an annex of the JPEGstandard as a suggestion for the encoder. The standard says that these tables have been tested with good compressionresults on a lot of images and reccommends them, but the encoder can use anyother Huffman table. A lot of JPG encoders use these tables. Some of them offeryou an option: entropy optimization - if it's enabled they'll use Huffmantables optimized for that particular image.The JFIF (Jpeg Format Interchange File) file--------------------------------------------- The JPEG standard (that in the itu-1150.ps file) is somehow very general,the JFIF implementation is a particular case of this standard (and it is, of course,compatible with the standard) . The JPEG standard specifies some markers reserved for applications(by applications I mean particular cases of implementing the standard) Those markers are called APPn , where n ranges from 0 to 0xF ; APPn = FFEn The JFIF specification uses the APP0 marker (FFE0) to identify a JPG file whichuses this specification. You'll see in the JPEG standard that it refers to "image components".These image components can be (Y,Cb,Cr) or (YIQ) or whatever. The JFIF implementations uses only (Y,Cb,Cr) for a truecolor JPG, or only Y fora monochrome JPG. You can get the JFIF specification from www.jpeg.orgThe sampling factors--------------------Note: The following explanation covers the encoding of truecolor (3 components)JPGS; for gray-scaled JPGs there is one component (Y) which is usually nodown-sampled at all, and does not require any inverse transformation like theinverse (Y,Cb,Cr) -> (R,G,B). In consequence, the gray-scaled JPGS are thesimplest and easiest to decode: for every 8x8 block in the image you do theHuffman decoding of the RLC coded vector then you reorder it from zig-zag,dequantize the 64 vector and finally you apply to it the inverse DCT and add128 (level shift) to the new 8x8 values.I've told you that image components are sampled. Usually Y is taken every pixel,and Cb, Cr are taken for a block of 2x2 pixels.But there are some JPGs in which Cb , Cr are taken in every pixel, or someJPGs where Cb, Cr are taken every 2 pixels (a horizontal sampling at 2 pixels,and a vertical sampling in every pixel)The sampling factors for an image component in a JPG file are defined in respect(relative) to the highest sampling factor.Here are the sampling factors for the most usual example: Y is taken every pixel , and Cb,Cr are taken for a block of 2x2 pixels(The JFIF specification gives a formula for sampling factors which I think thatworks only when the maximum sampling factor for each dimension X or Y is <=2)The JPEG standard does not specify the sampling factors , it's more general).You see that Y will have the highest sampling rate : Horizontal sampling factor = 2 = HY Vertical sampling factor = 2 = VY For Cb , Horizontal sampling factor = 1 = HCb Vertical sampling factor = 1 = VCb For Cr Horizontal sampling factor = 1 = HCr Vertical sampling factor = 1 = VCrActually this form of defining the sampling factors is quite useful.The vector of 64 coefficients for an image component, Huffman encoded, is called DU = Data Unit (JPEG's standard terminology)In the JPG file , the order of encoding Data Units is : 1) encode Data Units for the first image component: for (counter_y=1;counter_y<=VY;counter_y++) for (counter_x=1;counter_x<=HY;counter_x++) { encode Data Unit for Y } 2) encode Data Units for the second image component: for (counter_y=1;counter_y<=VCb ;counter_y++) for (counter_x=1;counter_x<=HCb;counter_x++) { encode Data Unit for Cb } 3) finally, for the third component, similar: for (counter_y=1;counter_y<=VCr;counter_y++) for (counter_x=1;counter_x<=HCr;counter_x++) { encode Data Unit for Cr }For the example I gave you (HY=2, VY=2 ; HCb=VCb =1, HCr,VCr=1)here it is a figure ( I think it will clear out things for you) : YDU YDU CbDU CrDU YDU YDU( YDU is a Data unit for Y , and similar CbDU a DU for Cb, CrDU = DU for Cr )This usual combination of sampling factors is referred as 2:1:1 for bothvertical and horizontal sampling factors.And, of course, in the JPG file the encoding order will be : YDU,YDU,YDU,YDU,CbDU,CrDUYou know that a DU (64 coefficients) defines a block of 8x8 values , so herewe specified the encoding order for a block of 16x16 image pixels(An image pixel = an (Y,Cb,Cr) pixel [my notation]) : Four 8x8 blocks of Y values (4 YDUs), one 8x8 block of Cb values (1 CbDU)and one 8x8 block of Cr values (1 CrDU)(Hmax = the maximum horizontal sampling factor , Vmax = the maximum verticalsampling factor)In consequence for this example of sampling factors (Hmax = 2, Vmax=2), theencoder should process SEPARATELY every 16x16 = (Hmax*8 x Vmax*8) image pixelsblock in the order mentioned.This block of image pixels with the dimensions (Hmax*8,Vmax*8) is called, inthe JPG's standard terminology, an MCU = Minimum Coded UnitFor the previous example : MCU = YDU,YDU,YDU,YDU,CbDU,CrDUAnother example of sampling factors : HY =1, VY=1 HCb=1, VCb=1 HCr=1, VCr=1Figure/order : YDU CbDU CrDUYou see that here is defined an 8x8 image pixel block (MCU) with 3 8x8 blocks: one for Y, one for Cb and one for Cr (There's no down-sampling at all)Here (Hmax=1,Vmax=1) the MCU has the dimension (8,8), and MCU = YDU,CbDU,CrDUFor gray-scaled JPGs you don't have to worry about the order of encodingdata units in an MCU. For these JPGs, an MCU = 1 Data Unit (MCU = YDU)In the JPG file, the sampling factors for every image component are definedafter the marker SOF0 = Start Of Frame 0 = FFC0A brief scheme of decoding a JPG file--------------------------------------The decoder reads from the JPG file the sampling factors, it finds out thedimensions of an MCU (Hmax*8,Vmax*8) => how many MCUs are in the whole image,then decodes every MCU present in the original image (a loop for all theseblocks, or until the EOI marker is found [it should be found when the loopfinishes, otherwise you'll get an incomplete image]) - it decodes an MCUby decoding every Data Unit in the MCU in the order mentioned before, andfinally, writes the decoded (Hmax*8 x Vmax*8) truecolor pixel block into the(R,G,B) image buffer.MPEG-1 video and JPEG----------------------The interesting part of the MPEG-1 specification (and probably MPEG-2) is thatit relies heavily on the JPEG specification.It uses a lot of concepts presented here. The reason is that every 15 frames ,or when it's needed, there's an independent frame called I-frame (Intra frame)which is JPEG coded.(By the way, that 16x16 image pixels block example I gave you, is called,in theMPEG's standard terminology, a macroblock)Except the algorithms for motion compensation, MPEG-1 video relies a lot on theJPG specifications (the DCT transform , quantization, etc.)Hope you're ready now to start coding your JPG viewer or encoder.About the author of this doc----------------------------The author of this doc is Cristi Cuturicu, student at University Politehnicain Bucharest (UPB), Department of Computer Science.You can contact him by e-mail: cccrx@kermit.cs.pub.ro cryx@ulise.cs.pub.roAnd if you are a software company needing a programmer then get in touch.A technical explanation of the JPEG/JFIF file format,written by Oliver Fromme, the author of the QPEG viewer-------------------------------------------------------Legal NOTE: The legal rules mentioned in the Disclaimer in top of this fileapply also to the following informations so neither Oliver Fromme, neither Ican be held responsible for errors or bugs in the following informations.The author of the following informations is: Oliver Fromme Leibnizstr. 18-61 38678 Clausthal GERMANYJPEG/JFIF file format:~~~~~~~~~~~~~~~~~~~~~~ - header (2 bytes): $ff, $d8 (SOI) (these two identify a JPEG/JFIF file) - for JFIF files, an APP0 segment is immediately following the SOI marker, see below - any number of "segments" (similar to IFF chunks), see below - trailer (2 bytes): $ff, $d9 (EOI)Segment format:~~~~~~~~~~~~~~~ - header (4 bytes): $ff identifies segment n type of segment (one byte) sh, sl size of the segment, including these two bytes, but not including the $ff and the type byte. Note, not Intel order: high byte first, low byte last! - contents of the segment, max. 65533 bytes. Notes: - There are parameterless segments (denoted with a '*' below) that DON'T have a size specification (and no contents), just $ff and the type byte. - Any number of $ff bytes between segments is legal and must be skipped.Segment types:~~~~~~~~~~~~~~ *TEM = $01 usually causes a decoding error, may be ignored SOF0 = $c0 Start Of Frame (baseline JPEG), for details see below SOF1 = $c1 dito SOF2 = $c2 usually unsupported SOF3 = $c3 usually unsupported SOF5 = $c5 usually unsupported SOF6 = $c6 usually unsupported SOF7 = $c7 usually unsupported SOF9 = $c9 for arithmetic coding, usually unsupported SOF10 = $ca usually unsupported SOF11 = $cb usually unsupported SOF13 = $cd usually unsupported SOF14 = $ce usually unsupported SOF14 = $ce usually unsupported SOF15 = $cf usually unsupported DHT = $c4 Define Huffman Table, for details see below JPG = $c8 undefined/reserved (causes decoding error) DAC = $cc Define Arithmetic Table, usually unsupported *RST0 = $d0 RSTn are used for resync, may be ignored *RST1 = $d1 *RST2 = $d2 *RST3 = $d3 *RST4 = $d4 *RST5 = $d5 *RST6 = $d6 *RST7 = $d7 SOI = $d8 Start Of Image EOI = $d9 End Of Image SOS = $da Start Of Scan, for details see below DQT = $db Define Quantization Table, for details see below DNL = $dc usually unsupported, ignore SOI = $d8 Start Of Image EOI = $d9 End Of Image SOS = $da Start Of Scan, for details see below DQT = $db Define Quantization Table, for details see below DNL = $dc usually unsupported, ignore DRI = $dd Define Restart Interval, for details see below DHP = $de ignore (skip) EXP = $df ignore (skip) APP0 = $e0 JFIF APP0 segment marker, for details see below APP15 = $ef ignore JPG0 = $f0 ignore (skip) JPG13 = $fd ignore (skip) COM = $fe Comment, for details see below All other segment types are reserved and should be ignored (skipped).SOF0: Start Of Frame 0:~~~~~~~~~~~~~~~~~~~~~~~ - $ff, $c0 (SOF0) - length (high byte, low byte), 8+components*3 - data precision (1 byte) in bits/sample, usually 8 (12 and 16 not supported by most software) - image height (2 bytes, Hi-Lo), must be >0 if DNL not supported - image width (2 bytes, Hi-Lo), must be >0 if DNL not supported - number of components (1 byte), usually 1 = grey scaled, 3 = color YCbCr or YIQ, 4 = color CMYK) - for each component: 3 bytes - component id (1 = Y, 2 = Cb, 3 = Cr, 4 = I, 5 = Q) - sampling factors (bit 0-3 vert., 4-7 hor.) - quantization table number Remarks: - JFIF uses either 1 component (Y, greyscaled) or 3 components (YCbCr, sometimes called YUV, colour).APP0: JFIF segment marker:~~~~~~~~~~~~~~~~~~~~~~~~~~ - $ff, $e0 (APP0) - length (high byte, low byte), must be >= 16 - 'JFIF'#0 ($4a, $46, $49, $46, $00), identifies JFIF - major revision number, should be 1 (otherwise error) - minor revision number, should be 0..2 (otherwise try to decode anyway) - units for x/y densities: 0 = no units, x/y-density specify the aspect ratio instead 1 = x/y-density are dots/inch 2 = x/y-density are dots/cm - x-density (high byte, low byte), should be <> 0 - y-density (high byte, low byte), should be <> 0 - thumbnail width (1 byte) - thumbnail height (1 byte) - n bytes for thumbnail (RGB 24 bit), n = width*height*3 Remarks: - If there's no 'JFIF'#0, or the length is < 16, then it is probably not a JFIF segment and should be ignored. - Normally units=0, x-dens=1, y-dens=1, meaning that the aspect ratio is 1:1 (evenly scaled). - JFIF files including thumbnails are very rare, the thumbnail can usually be ignored. If there's no thumbnail, then width=0 and height=0. - If the length doesn't match the thumbnail size, a warning may be printed, then continue decoding.DRI: Define Restart Interval:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - $ff, $dd (DRI) - length (high byte, low byte), must be = 4 - restart interval (high byte, low byte) in units of MCU blocks, meaning that every n MCU blocks a RSTn marker can be found. The first marker will be RST0, then RST1 etc, after RST7 repeating from RST0.DQT: Define Quantization Table:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - $ff, $db (DQT) - length (high byte, low byte) - QT information (1 byte): bit 0..3: number of QT (0..3, otherwise error) bit 4..7: precision of QT, 0 = 8 bit, otherwise 16 bit - n bytes QT, n = 64*(precision+1) Remarks: - A single DQT segment may contain multiple QTs, each with its own information byte. - For precision=1 (16 bit), the order is high-low for each of the 64 words.DAC: Define Arithmetic Table:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Current software does not support arithmetic coding for legal reasons. JPEG files using arithmetic coding can not be processed.DHT: Define Huffman Table:~~~~~~~~~~~~~~~~~~~~~~~~~~ - $ff, $c4 (DHT) - length (high byte, low byte) - HT information (1 byte): bit 0..3: number of HT (0..3, otherwise error) bit 4 : type of HT, 0 = DC table, 1 = AC table bit 5..7: not used, must be 0 - 16 bytes: number of symbols with codes of length 1..16, the sum of these bytes is the total number of codes, which must be <= 256 - n bytes: table containing the symbols in order of increasing code length (n = total number of codes) Remarks: - A single DHT segment may contain multiple HTs, each with its own information byte.COM: Comment:~~~~~~~~~~~~~ - $ff, $fe (COM) - length (high byte, low byte) of the comment = L+2 - The comment = a stream of bytes with the length = LSOS: Start Of Scan:~~~~~~~~~~~~~~~~~~~ - $ff, $da (SOS) - length (high byte, low byte), must be 6+2*(number of components in scan) - number of components in scan (1 byte), must be >= 1 and <=4 (otherwise error), usually 1 or 3 - for each component: 2 bytes - component id (1 = Y, 2 = Cb, 3 = Cr, 4 = I, 5 = Q), see SOF0 - Huffman table to use: - bit 0..3: AC table (0..3) - bit 4..7: DC table (0..3) - 3 bytes to be ignored (???) Remarks: - The image data (scans) is immediately following the SOS segment.