MY SPACE: SSE Implementation

This is the introduction from Toshiya which tells you how to implement the ray tracer with SSE.

http://graphics.ucsd.edu/courses/cse168_s10/ucsd/SIMD_Ray_Tracing_Tips.pdf

My experiment:

Add two float[4] together for 10e8 times and compare it with add two __m128 with _mm_add().

Here is what I found through my experiment:

1. If you want the SSE implementation really to work, you have to optimize it (set optimize level when you compile it with gcc.) The performance can be at most nearly 4 times as fast as the implementation without SSE.

2. For my experiment, there is no big difference between using __m128 directly or using __m128* and allocate the memory with _mm_malloc(sizeof(__m128),16).

(From Toshiya: it's no difference between those two. The only thing matters is how to load the data (non-__m128 data). Using _mm_load + _mm_malloc will be compiled as faster instructions)

MY SPACE

2010年6月22日星期二

SSE Implementation

沒有留言:

張貼留言

追蹤者

網誌存檔

關於我自己