You are almost there. Just a couple things you need to tweak to get it to work.
- I recommend using more input points. The five-point algorithm used in calculating the essential matrix involves finding the roots of a tenth-degree polynomial. If you were using only five input points you would get multiple possible solutions. The bare minimum to get only one solution is six points, as you have done. However, I had to double the number of points you were using to get any decent results. A little bit of depth variation would also be helpful (though not mandatory) since all your points are coplanar.
- You need to pick a good threshold for RANSAC (the default method) to actually work. The OpenCV documentation shows that the default threshold for RANSAC is 1.0, which in my opinion is a bit large even when using pixel coordinates. If you were using pixel coordinates I would recommend using something around 0.1 pixels. However, when using normalized image coordinates as you are doing, you should pick something even smaller, like 1e-4. The threshold you are using of 1.0 in this case corresponds to 45 degrees from the optical axis. Such a large threshold will permit many possible essential matrices for which every point is an inlier and no way to distinguish between them. If you aren't sure how to pick a good threshold, try using LMEDS instead. It doesn't require a threshold and has comparable computation time.
- One of your points is invalid because it is at the origin of one of the camera views and in the other camera view it is behind the camera. I assume you were trying to make the camera move backwards in the z direction with the -1. However, the translation vector direction is counter intuitive. It is the direction the points translate in the camera frame, not the direction the cameras move. This post explains this concept in more detail.