Over the past few months, Avram has taken a deep dive into the realm of AI, especially generative AI, and its various versions and uses. Last week, we looked at Auto-GPT, and this week we take a look at Microsoft JARVIS.
Microsoft JARVIS is an AI platform designed to provide natural language processing capabilities to enable multi-stage task management. The idea is that you can take a piece of information and ask the system to perform tasks based on the results of processing the initial information.
An example that is shown off on the project site involves starting with an image and ending with a different image with natural language processing and interpretation in the middle. They supplied the system with an image of a young boy riding a bike. They then asked the system to create a new image of a girl reading a book while in the position of the young boy from the source image.
The system then went through a couple of processes in order to produce the new image. First, it had to process the initial image and locate the boy. Once identified and located, the system then had to figure out the position in which the boy was sitting. It made a skeletal structure diagram, similar to the Kinect, of the boy. Using that skeletal structure, the system created a new image of a girl sitting on a bed reading a book. The overall skeletal structure of the girl was similar enough to the boy to call the task complete.
While Microsoft's team was able to show a fantastic demo of the system, Avram's experience has been a bit different. In preparation for this week's episode, he spent several hours trying to get the software working. This was a big disappointment for him, as it had been working perfectly until this afternoon. The website, which is run through Hugging Face, stopped responding in a useful amount of time. While it usually takes about a second for a command to complete, tonight it was taking about 30 minutes.
As a result, he instead tried to install it locally, which is supposed to be a normal process for people. After a challenging installation process, he was able to get the system to function. Well, function as well as it was going to for the evening.
Avram decided to do a similar experiment to the one performed by Microsoft. Rather than generating a new image, he wanted to provide an image and receive where best to purchase the product in the image. So, he provided an image of a Samsung SSD and was told to purchase it online on in a retail store. No recommendations or specific stores where it could be purchased.
You might wonder why that happened. Well, the system believed, rather than an image of an SSD, it was a screenshot of a conversation between a user and an AI chatbot. As such, it had absolutely no idea what he was trying to accomplish and simply panicked. Definitely not a great way for a system designed to complete tasks to respond when facing simple adversity.
Scott is a developer who has worked on projects of varying sizes, including all of the PLUGHITZ Corporation properties. He is also known in the gaming world for his time supporting the rhythm game community, through DDRLover and hosting tournaments throughout the Tampa Bay Area. Currently, when he is not working on software projects or hosting F5 Live: Refreshing Technology, Scott can often be found returning to his high school days working with the Foundation for Inspiration and Recognition of Science and Technology (FIRST), mentoring teams and helping with ROBOTICON Tampa Bay. He has also helped found a student software learning group, the ASCII Warriors, currently housed at AMRoC Fab Lab.
Avram's been in love with PCs since he played original Castle Wolfenstein on an Apple II+. Before joining Tom's Hardware, for 10 years, he served as Online Editorial Director for sister sites Tom's Guide and Laptop Mag, where he programmed the CMS and many of the benchmarks. When he's not editing, writing or stumbling around trade show halls, you'll find him building Arduino robots with his son and watching every single superhero show on the CW.