[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

efficient summing of vector.

Hi Camm, master in the ways of intel-assembly.

You once wrote that you shaved an instruction of the way I sum a sse
register. I use a sequence like this to sum the register in #reg using
xmm7 as scratch, and it seems like a clumsy way to do it. How can it be
done in 4 instructions?

        __asm__ __volatile__ ("movhlps " #reg ", %%xmm7\n"\
    			      "addps " #reg ", %%xmm7\n"\
    			      "movaps %%xmm7, " #reg "\n"\
                              "shufps $1, " #reg ", %%xmm7\n"\
    			      "addss %%xmm7, " #reg "\n"\

Hope you can help me,