OpenMP Memory Benchmark 2014-03-08

Writing a memory bandwidth benchmark in OpenMP

There is a lack of memory bandwidth tests that are capable of running with multiple threads. I set out to write my own and by going through this process learned about some of the pitfalls that one can get into.

The problem:

I want to write a simple benchmark that takes advantage of two things, SSE data structures, and OpenMP Parallelism. The SSE code I require for the memory bandwidth test is minimal and can be summarized as follows:

SSE Float4 class:


class __attribute__ ((aligned(16))) float4 {
public:
  __m128 mmvalue;

  inline float4()         :mmvalue(_mm_setzero_ps()) { }
  inline float4(float a)  :mmvalue(_mm_set1_ps(a)) {}
  inline float4(__m128 m) :mmvalue(m) {}
  inline float4 operator+(const float4& b) const { return _mm_add_ps(mmvalue, b.mmvalue); }
};

The importance of Node Interleaving on AMD compute nodes 2014-03-04

Enabling Node Interleaving in the bios can greatly increase performance of a compute node. Node interleaving essentially lets the CPU decide where to put the memory, disabling it means that the user must explicitly tell where in memory to put data so that the associated CPU gets best performance.

An explanation of Node Interleaving can be found here

The end result, a 4-5x performance increase in terms of memory bandwidth.

In our lab we have several 64 core AMD nodes with the following specs:

  • Supermicro HBQGL-6F/HBQGL-IF
  • Supermicro 1042-LTF SuperServer
Processor AMD 6274
Nickname Interlagos
Clock (GHz) 2.2
Sockets/Node 4
Cores/Socket 16
NUMA/Socket 2
DP GFlops/Socket 140.8
Memory/Socket 32 GB
Bandwidth/Socket 102.4 GB/s
DDR3 1333 MHz
L1 cache (excl.) 16KB
L2 cache/# cores 2MB/2
L3 cache/# cores 8MB/8

OpenCL Performance Tuning 2014-02-27

This post will discuss the process I went through to optimize a sparse matrix vector multiply (SPMV) that I use in my multibody dynamics code. I wanted to document the process and information that I gathered so that it may help others, and so that I could refer back to it.

Here is how I plan on going about this: First I want to determine the peak performance I could hope to achieve on my testing machine. I know I will never reach peak performance so comparing my code’s performance to the maximum theorical doesn’t make sense. I will rely on clpeak as my control. Then I will take my SPMV operation and try to get it to run as close as possible to what the benchmark tells me is possible.

I plan on detailing every kernel modification I made even when it makes the code perform slower. I am by no means an expert with OpenCL (currently at 3 days of experience), but I do know a bit about writing parallel code.

Testing Setup

Hardware:

The specifications for the test rig are as follows:

  • 1 x Supermicro 1042-LTF SuperServer
  • 4 x AMD Opteron 6274 2.2GHz 16 core processor
Processor AMD 6274
Nickname Interlagos
Clock (GHz) 2.2
Sockets/Node 4
Cores/Socket 16
NUMA/Socket 2
DP GFlops/Socket 140.8
Memory/Socket 32 GB
Bandwidth/Socket 102.4 GB/s
DDR3 1333 MHz
L1 cache (excl.) 16KB
L2 cache/# cores 2MB/2
L3 cache/# cores 8MB/8

Povray Color Ramp 2014-01-14

I sometimes need to use a color ramp in povray to render the velocity of a particle. This is more of a reference for myself then anything else 2013-6-03 - Original Post 2014-1-14 - Updated

//velocity of object is stored in vx, vy, vz
#local c=<1,1,1>;
#local p=sqrt(vx*vx+vy*vy+vz*vz);
#if (p <= 0.5)
	//handle color values from blue to green
	#local c = (y * p *2.0  + z * (.5- p)*2.0);
#elseif (p > 0.5 & p < 1.0)
	//handle color values from green to red
	#local c = (x * (p - .5)* 2.0 + y * (1.0 - p)*2.0);
#else
	//clamp color to red for maximum value
	#local c=<1,0,0>;
#end

sphere {<0,0,0>, 1 translate < x, y, z >  pigment {color rgb c }finish {diffuse 1 ambient 0 specular 0 } }

Three.js Quaternion Camera 2013-12-25

This is a three.js camera class based on my quaternion camera code, math is very similar with syntax changes due to vector and quaternion classes in three.js

based on https://github.com/mrdoob/three.js/blob/master/examples/js/controls/PointerLockControls.js

/**
 * @author hammad mazhar / http://hamelot.co.uk/
 * based on https://github.com/mrdoob/three.js/blob/master/examples/js/controls/PointerLockControls.js
 */
THREE.FreeCamera = function (camera) {

    var scope = this;

    camera.rotation.set(0, 0, 0);
    camera.position.set(0, 1, 0);
    camera.useQuaternion = true;

    var MOVE = {
        LEFT: {
            value: 0,
            name: "Left",
            code: "L"
        },
        RIGHT: {
            value: 1,
            name: "Right",
            code: "R"
        },
        FORWARD: {
            value: 2,
            name: "Forward",
            code: "F"
        },
        BACK: {
            value: 3,
            name: "Back",
            code: "B"
        },
        UP: {
            value: 4,
            name: "Up",
            code: "U"
        },
        DOWN: {
            value: 5,
            name: "Down",
            code: "D"
        }
    };
    var camera_scale = .5;
    var camera_direction = new THREE.Vector3(0, 0, 1);
    var camera_up = new THREE.Vector3(0, 1, 0);
    var camera_position_delta = new THREE.Vector3(0, 0, 0);
    var camera_position = new THREE.Vector3(0, 0, 0);
    var camera_look_at = new THREE.Vector3(0, 0, 2);

    var max_pitch_rate = 5.0;
    var max_heading_rate = 5.0;
    var camera_pitch = 0.0;
    var camera_heading = 0.0;

    var mouse_delta_x = 0.0;
    var mouse_delta_y = 0.0;
    var mouse_pos_x = 0.0;
    var mouse_pos_y = 0.0;
    var move_camera = false;

    var isOnObject = false;
    var canJump = false;

    var onMouseMove = function (event) {
        if (scope.enabled === false) return;

        var movementX = event.movementX || event.mozMovementX || event.webkitMovementX || 0;
        var movementY = event.movementY || event.mozMovementY || event.webkitMovementY || 0;

        mouse_delta_x = mouse_pos_x - movementX;
        mouse_delta_y = mouse_pos_y - movementY;
        if (move_camera) {
            ChangeHeading(.08 * mouse_delta_x);
            ChangePitch(.08 * mouse_delta_y);
        }
        mouse_pos_x = movementX;
        mouse_pos_y = movementY;

    };
    var onMouseDown = function (event) {
        var movementX = event.movementX || event.mozMovementX || event.webkitMovementX || 0;
        var movementY = event.movementY || event.mozMovementY || event.webkitMovementY || 0;
        mouse_pos_x = movementX;
        mouse_pos_y = movementY;
        move_camera = true;
    }
    var onMouseUp = function (event) {
        var movementX = event.movementX || event.mozMovementX || event.webkitMovementX || 0;
        var movementY = event.movementY || event.mozMovementY || event.webkitMovementY || 0;
        mouse_pos_x = movementX;
        mouse_pos_y = movementY;
        move_camera = false;
    }
    var ChangePitch = function (degrees) {
        if (degrees < -max_pitch_rate) {
            degrees = -max_pitch_rate;
        } else if (degrees > max_pitch_rate) {
            degrees = max_pitch_rate;
        }
        camera_pitch += degrees;

        //Check bounds for the camera pitch
        if (camera_pitch > 360.0) {
            camera_pitch -= 360.0;
        } else if (camera_pitch < -360.0) {
            camera_pitch += 360.0;
        }
    }
    var ChangeHeading = function (degrees) {
        //Check bounds with the max heading rate so that we aren't moving too fast
        if (degrees < -max_heading_rate) {
            degrees = -max_heading_rate;
        } else if (degrees > max_heading_rate) {
            degrees = max_heading_rate;
        }
        //This controls how the heading is changed if the camera is pointed straight up or down
        //The heading delta direction changes
        if (camera_pitch > 90 && camera_pitch < 270 || (camera_pitch < -90 && camera_pitch > -270)) {
            camera_heading -= degrees;
        } else {
            camera_heading += degrees;
        }
        //Check bounds for the camera heading
        if (camera_heading > 360.0) {
            camera_heading -= 360.0;
        } else if (camera_heading < -360.0) {
            camera_heading += 360.0;
        }

    }

    var moveCamera = function (move) {
        if (scope.enabled === false) return;
        var t = new THREE.Vector3(0, 0, 0);
        switch (move) {
        case MOVE.UP:
            t.copy(camera_up);
            t.multiplyScalar(camera_scale);
            camera_position_delta.add(t);
            break;
        case MOVE.DOWN:
            t.copy(camera_up);
            t.multiplyScalar(camera_scale);
            camera_position_delta.sub(t);
            break;
        case MOVE.LEFT:
            t.crossVectors(camera_direction, camera_up);
            t.multiplyScalar(camera_scale);
            camera_position_delta.sub(t);
            break;
        case MOVE.RIGHT:
            t.crossVectors(camera_direction, camera_up);
            t.multiplyScalar(camera_scale);
            camera_position_delta.add(t);
            break;
        case MOVE.FORWARD:
            t.copy(camera_direction);
            t.multiplyScalar(camera_scale);
            camera_position_delta.add(t);
            break;
        case MOVE.BACK:
            t.copy(camera_direction);
            t.multiplyScalar(camera_scale);
            camera_position_delta.sub(t);
            break;
        }
    }
    var onKeyDown = function (event) {

        switch (event.keyCode) {
        case 81: //q
        case 33: //PgUp
            moveCamera(MOVE.DOWN);
            break;
        case 69: // e
        case 34: // PgDown
            moveCamera(MOVE.UP);
            break;
        case 38: // up
        case 87: // w
            moveCamera(MOVE.FORWARD);
            break;
        case 37: // left
        case 65: // a
            moveCamera(MOVE.LEFT);
            break;
        case 40: // down
        case 83: // s
            moveCamera(MOVE.BACK);
            break;
        case 39: // right
        case 68: // d
            moveCamera(MOVE.RIGHT);
            break;
        }

    };

    document.addEventListener('mousemove', onMouseMove, false);
    document.addEventListener('mousedown', onMouseDown, false);
    document.addEventListener('mouseup', onMouseUp, false);
    document.addEventListener('keydown', onKeyDown, false);

    this.enabled = false;

    this.getObject = function () {
        return camera;
    };

    this.isOnObject = function (boolean) {
        isOnObject = boolean;
        canJump = boolean;

    };

    this.getDirection = function () {
        // assumes the camera itself is not rotated
        return camera_direction;
    }();

    this.getLookAt = function (delta) {
        return camera_look_at;
    };

    this.update = function (delta) {
        var axis = new THREE.Vector3(0, 0, 0);
        //console.log(camera_direction);
        camera_direction.subVectors(camera_look_at, camera_position);
        camera_direction.normalize();
        axis.crossVectors(camera_direction, camera_up);
        var pitch_quat = new THREE.Quaternion(0, 0, 0, 1);
        var heading_quat = new THREE.Quaternion(0, 0, 0, 1);
        var temp = new THREE.Quaternion(0, 0, 0, 1);
        pitch_quat.setFromAxisAngle(axis, camera_pitch * Math.PI / 180.0);
        heading_quat.setFromAxisAngle(camera_up, camera_heading * Math.PI / 180.0);
        temp.multiplyQuaternions(pitch_quat, heading_quat);
        camera_direction.applyQuaternion(temp);
        camera_position.add(camera_position_delta);
        camera_look_at.addVectors(camera_position, camera_direction);

        camera.position.copy(camera_position);
        camera.up.copy(camera_up);
        camera.lookAt(camera_look_at);
        if (move_camera == false) {
            camera_pitch = camera_pitch * .5;
            camera_heading = camera_heading * .5;
        }
        camera_position_delta.multiplyScalar(.8);
    };
};