1 | initial version |
This seems like a serious undertaking - and one that you could easily write a thesis on, let alone a high school science fair project.
If you want to finish this in the next hundred years, I'd say that you'll need to drop a couple of the features that were shown in the promotional video. First and foremost will be the 3-dimensional tools floating around showing you what to do - you may be able to work something out in OpenCV with image overlays for directions, but 3d is a bit far fetched; remember, you don't have an unlimited supply of computational power here; and you'll need every last drop of it for the object detection and recognition you're doing. And also, you won't be able to write this in Python. You'll have to bite the bullet and go C/C++, to leverage the full possibilities of OpenCV and other libraries. Sorry bud. :D
The barcode reading could be a challenge. I think you could use zxing or something like that to read the barcode and identify what the puzzle is, (although keep in mind that you'll only be able to have a few puzzles to solve, unless you want to go writing 'technician' AIs for every single electrical device with a barcode on it - which I doubt you want to do). Not only will you only be able to have a couple of puzzles, but also only a couple of solutions - just like in the pictures you show, only a single blatantly obvious loose cable or something - because you'll have to manually teach the machine what a puzzle looks like when it's solved, and what it looks like when it hasn't been solved yet.
The object detection and recognition, like the screwdriver, is actually not terribly difficult. You could adapt the method shown in this video. Like you said, you'd probably have to paint the tools a distinct colour so that other objects in the surrounding area aren't mistaken for tools or other objects of interest. Detecting the screws/bolts/nuts/other small things could also be a challenge, because they're small and monotonous and not distinct enough from the surrounding area; unless of course you painted them some wacky colour that made them really stand out. Colour matching is generally the way to go, because to detect objects based on their size and appearance (other than colour) is pretty much hell. You'd have to write your own algorithms to detect the patterns in the objects, and it's completely unnecessary. Not to mention how computationally intense detection and recognition algorithms (not based on colour) are. Colours are the way to go.
I'm not entirely sure how this whole system works with the funky glasses - are they a small embedded system that you develop on top of? Or is this whole system powered by a Raspberry Pi? Actually, with this level of detail a laptop or desktop (with a serious GPU) would be necessary.
3D stereoscopic? Don't mean to sound cynical, but in your dreams. I'm not saying you can't program it, I'm saying that if by some act of the computational resource Gods you're able to have this video stream real time, I assure you that rendering the video in stereo 3D will probably have the same effect as dropping the computer that's running this out of a plane at 30,000 feet, or throwing it in front of a freight train - hell, you might as well toss it in the bath tub. Put it simply: the 3D stereo might be a bit too much for the physical system to handle. I'll talk a little bit about performance after the cameras.
In terms of cameras, for the love of God, don't get 2 cameras - and what you said about resolution and object detection & recognition is true: the higher the resolution, the more the program fucks the system it's running on, and if this system is running on a pair of glasses, a Raspberry Pi, or even a laptop, it won't be able to handle anything more than 320 x 240. If you had some serious hardware acceleration on the GPU, maybe you could get away with 640 x 480 or even 800 x 600. The latter only if you can perform some kind of computational resource miracle, in which case I'll hire you as a sysadmin.
You'll be fine with a single webcam. In fact, you'll probably have to resize the image coming from the webcam as well, just to deal with all this detection.
If you have any questions, comments, concerns, or anything else; feel free to send me an email.
[Note: this is a duplicate of my response to your post on the link in your question]
2 | No.2 Revision |
This seems like a serious undertaking - and one that you could easily write a thesis on, let alone a high school science fair project.
If you want to finish this in the next hundred years, I'd say that you'll need to drop a couple of the features that were shown in the promotional video. First and foremost will be the 3-dimensional tools floating around showing you what to do - you may be able to work something out in OpenCV with image overlays for directions, but 3d is a bit far fetched; remember, you don't have an unlimited supply of computational power here; and you'll need every last drop of it for the object detection and recognition you're doing. And also, you won't be able to write this in Python. You'll have to bite the bullet and go C/C++, to leverage the full possibilities of OpenCV and other libraries. Sorry bud. :D
The barcode reading could be a challenge. I think you could use zxing or something like that to read the barcode and identify what the puzzle is, (although keep in mind that you'll only be able to have a few puzzles to solve, unless you want to go writing 'technician' AIs for every single electrical device with a barcode on it - which I doubt you want to do). Not only will you only be able to have a couple of puzzles, but also only a couple of solutions - just like in the pictures you show, only a single blatantly obvious loose cable or something - because you'll have to manually teach the machine what a puzzle looks like when it's solved, and what it looks like when it hasn't been solved yet.
The object detection and recognition, like the screwdriver, is actually not terribly difficult. You could adapt the method shown in this video. Like you said, you'd probably have to paint the tools a distinct colour so that other objects in the surrounding area aren't mistaken for tools or other objects of interest. Detecting the screws/bolts/nuts/other small things could also be a challenge, because they're small and monotonous and not distinct enough from the surrounding area; unless of course you painted them some wacky colour that made them really stand out. Colour matching is generally the way to go, because to detect objects based on their size and appearance (other than colour) is pretty much hell. h*ll. You'd have to write your own algorithms to detect the patterns in the objects, and it's completely unnecessary. Not to mention how computationally intense detection and recognition algorithms (not based on colour) are. Colours are the way to go.
I'm not entirely sure how this whole system works with the funky glasses - are they a small embedded system that you develop on top of? Or is this whole system powered by a Raspberry Pi? Actually, with this level of detail a laptop or desktop (with a serious GPU) would be necessary.
3D stereoscopic? Don't mean to sound cynical, but in your dreams. I'm not saying you can't program it, I'm saying that if by some act of the computational resource Gods you're able to have this video stream real time, I assure you that rendering the video in stereo 3D will probably have the same effect as dropping the computer that's running this out of a plane at 30,000 feet, or throwing it in front of a freight train - hell, heck, you might as well toss it in the bath tub. Put it simply: the 3D stereo might be a bit too much for the physical system to handle. I'll talk a little bit about performance after the cameras.
In terms of cameras, for the love of God, don't get 2 cameras - and what you said about resolution and object detection & recognition is true: the higher the resolution, the more the program fucks f*cks the system it's running on, and if this system is running on a pair of glasses, a Raspberry Pi, or even a laptop, it won't be able to handle anything more than 320 x 240. If you had some serious hardware acceleration on the GPU, maybe you could get away with 640 x 480 or even 800 x 600. The latter only if you can perform some kind of computational resource miracle, in which case I'll hire you as a sysadmin.
You'll be fine with a single webcam. In fact, you'll probably have to resize the image coming from the webcam as well, just to deal with all this detection.
If you have any questions, comments, concerns, or anything else; feel free to send me an email.
[Note: this is a duplicate of my response to your post on the link in your question]