Sunday, October 12, 2008

Template Version memcpy

I came across the following code snippet few days ago


template<int size, typename T>
void template_memcpy(T* dest, const T* src)
{
struct type {
T data[size];
};

*reinterpret_cast<type *>(dest) = *reinterpret_cast<const type *>(src);
}


The code seems pretty cool. It performs conventional memcpy, by depending on compiler generated code.

I made a test on Visual C++ 2008 compiler, to see whether the generated code from template_memcpy, is as good as conventional c-style memcpy.

Our source is a 31 bytes char array. Take a look on the conventional c-style memcpy source code, together with its disassembly.


memcpy(dest, src, sizeof(src));
00241053 B9 07 00 00 00 mov ecx,7
00241058 8D 74 24 1C lea esi,[esp+1Ch]
0024105C 8B FB mov edi,ebx
0024105E F3 A5 rep movs dword ptr es:[edi],dword ptr [esi]
00241060 66 A5 movs word ptr es:[edi],word ptr [esi]
00241062 A4 movs byte ptr es:[edi],byte ptr [esi]


From the assembly code, we know that CPU performs

1) Move double word (4 bytes) from source to destination memory 7 times.
2) Move word (2 bytes) from source to destination memory 1 time.
3) Move byte from source to destination memory 1 time.

How about the generated code for template_memcpy?


template_memcpy<sizeof(src)>(dest, src);
002410C7 B9 07 00 00 00 mov ecx,7
002410CC 8D 74 24 1C lea esi,[esp+1Ch]
002410D0 8B FB mov edi,ebx
002410D2 F3 A5 rep movs dword ptr es:[edi],dword ptr [esi]
002410D4 66 A5 movs word ptr es:[edi],word ptr [esi]
002410D6 A4 movs byte ptr es:[edi],byte ptr [esi]


See. template_memcpy is as good as conventional memcpy, isn't it?

OK. So, why do we choose template_memcpy over memcpy?

template_memcpy with come in handy, when you try to perform copy on array of objects.

Instead of :-


MyObject src[100];
MyObject *dest = new MyObject[sizeof(src) / sizeof(src[0])];

for(int i=0; i<(sizeof(src) / sizeof(src[0])); i++) {
dest[i] = src[i];
}

delete[] dest;


We may :-


MyObject src[100];
MyObject *dest = new MyObject[sizeof(src) / sizeof(src[0])];

template_memcpy<sizeof(src) / sizeof(src[0])>(dest, src);

delete[] dest;


The code seems cleaner, isn't it?

Of course, there is a shortcoming for template_memcpy. It only support memory size, which is known during compiled time :)

No comments:

Followers