Saturday, May 15, 2010

Data structure alignment

Data structure alignment is the way data is arranged and accessed in computer memory. It consists of two separate but related issues: data alignment and data structure padding. When a modern computer reads from or writes to a memory address, it will do this in word sized chunks (e.g. 4 byte chunks on a 32-bit system). Data alignment means putting the data at a memory offset equal to some multiple of the word size, which increases the system's performance due to the way the CPU handles memory. To align the data, it may be necessary to insert some meaningless bytes between the end of the last data structure and the start of the next, which is data structure padding.

For example, when the computer's word size is 4 bytes, the data to be read should be at a memory offset which is some multiple of 4. When this is not the case, e.g. the data starts at the 14th byte instead of the 16th byte, then the computer has to read two 4-byte chunks and do some calculation before the requested data has been read, or it may generate an alignment fault. Even though the previous data structure ends at the 14th byte, the next data structure should start at the 16th byte. Two padding bytes are inserted between the two data structures to align the next data structure to the 16th byte

Typical alignment of C structs on x86

Data structure members are stored sequentially in a memory so that in the structure below the member Data1 will always precede Data2 and Data2 will always precede Data3:

struct MyData
{
short Data1;
short Data2;
short Data3;
};

If the type "short" is stored in two bytes of memory then each member of the data structure depicted above would be 2-byte aligned. Data1 would be at offset 0, Data2 at offset 2 and Data3 at offset 4. The size of this structure would be 6 bytes.

The type of each member of the structure usually has a default alignment, meaning that it will, unless otherwise requested by the programmer, be aligned on a pre-determined boundary. The following typical alignments are valid for compilers from Microsoft, Borland, and GNU when compiling for 32-bit x86:

* A char (one byte) will be 1-byte aligned.
* A short (two bytes) will be 2-byte aligned.
* An int (four bytes) will be 4-byte aligned.
* A float (four bytes) will be 4-byte aligned.
* A double (eight bytes) will be 8-byte aligned on Windows and 4-byte aligned on Linux.
* A long double (twelve bytes) will be 4-byte aligned on Linux.
* Any pointer (four bytes) will be 4-byte aligned on Linux. (e.g.: char*, int*)

The only notable difference in alignment for a 64-bit Linux system when compared to a 32 bit is:

* A double (eight bytes) will be 8-byte aligned.
* A long double (Sixteen bytes) will be 16-byte aligned.
* Any pointer (eight bytes) will be 8-byte aligned.

Here is a structure with members of various types, totaling 8 bytes before compilation:

struct MixedData
{
char Data1;
short Data2;
int Data3;
char Data4;
};

After compilation the data structure will be supplemented with padding bytes to ensure a proper alignment for each of its members:

struct MixedData /* after compilation */
{
char Data1;
char Padding1[1]; /* For the following 'short' to be aligned on a 2 byte boundary */
short Data2;
int Data3;
char Data4;
char Padding2[3];
};

The compiled size of the structure is now 12 bytes. It is important to note that the last member is padded with the number of bytes required to conform to the largest type of the structure. In this case 3 bytes are added to the last member to pad the structure to the size of a long word.

It is possible to change the alignment of structures to reduce the memory they require (or to conform to an existing format) by reordering structure members or changing the compiler’s alignment (or “packing”) of structure members.

struct MixedData /* after reordering */
{
char Data1;
char Data4; /* reordered */
short Data2;
int Data3;
};

The compiled size of the structure now matches the pre-compiled size of 8 bytes. Note that Padding1[1] has been replaced (and thus eliminated) by Data4 and Padding2[3] is no longer necessary as the structure is already aligned to the size of a long word.

The alternative method of enforcing the MixedData structure to be aligned to a one byte boundary will cause the pre-processor to discard the pre-determined alignment of the structure members and thus no padding bytes would be inserted.

Thanks to wikipedia:
http://en.wikipedia.org/wiki/Data_structure_alignment

No comments:

Post a Comment