At the conclusion of a nearly two-hour keynote at the Google I/O conference last week, CEO Sundar Pichai amused the audience by using the company's AI to scan the transcript and count the number of times "AI" was mentioned. The result: nearly 125 mentions.
Hence, the annual developers' conference could have been aptly named Google A/I.
Pichai and his team outlined how Google's generative AI engine, Gemini, is being integrated into popular products and services like Gmail, Google apps, the Android operating system, and Google's leading Search service. Gemini also serves as the foundation for new video, audio, and "AI Agents."
"Google is fully in our Gemini era," Pichai announced at the start of the May 14 event. "We view this as the key to advancing our mission: to organize the world's information from every input, make it accessible via any output, and merge global information with your personal data in a way that's genuinely useful for you."
In summary, Google, one of the world's most powerful and influential companies, aims to ensure that generative AI is integrated into the daily lives of billions of its users. This strategic move comes as Google is perceived to be trailing behind OpenAI, the creator of ChatGPT, in the generative AI race.
But is Google's approach to AI beneficial for humanity? Now is the time to question companies like Google and other AI developers, including OpenAI, Microsoft, Meta, Anthropic, and soon Apple, who market themselves as innovators finding new ways to empower people.
Regarding Google, users might appreciate features like "Ask Photos," as demonstrated by Pichai. This tool can sift through Google Photo libraries to find images that document their child's swimming progress over time or identify their car license plate without needing a keyword search. "It knows the cars that appear frequently, triangulates which one is yours, and provides the license plate number," Pichai explained.
People might enjoy having Gemini scan their Gmail inbox, extract mentions of upcoming events, and summarize related PDFs. If the event is on Google Meet, Gemini can recap the highlights. For volunteer sign-ups, Gemini can check your calendar, notify you if you're available, and draft a polite email RSVP, as Pichai noted.
Many will likely welcome AI Agents that assist with "organizing, reasoning, and synthesizing." For instance, they can help return shoes bought online by finding the receipt in your inbox, retrieving the order number, filling out a return form, and scheduling a UPS pickup. Additionally, they can find local service providers, such as restaurants and dog walkers, when you move to a new city and update your address across various sites with your personal information.
Others might be intrigued by Project Astra, a Gemini-based "multimodal" agent, which represents a step toward Google DeepMind's goal of developing artificial general intelligence (AGI)—an AI that behaves more like a human. Astra can handle text, images, audio, and video (hence "multimodal") and can "see" the world around you in real time using your smartphone camera. In one demo, a person asked Astra to remind them where they left their glasses.
"An agent like this needs to understand and respond to our complex and dynamic world just like we do," said Demis Hassabis, co-founder and CEO of Google DeepMind. "It has to take in and remember what it sees to understand context and take action. Moreover, it must be proactive, teachable, and personal, allowing you to interact with it naturally, without lag or delay."
Businesses are also turning to Google and other AI developers for tools to enhance productivity and profitability. Actor and director Donald Glover and his creative studio, Gilga, explored how AI could aid in visual storytelling and tested a text-to-video tool introduced by Google called Veo. The Gilga team used Veo to create a short film and reported that it allowed them to "visualize things on a time scale that's 10 or a hundred times faster."
This capability enables filmmakers to quickly iterate on new ideas. "That's what's really cool about it," Glover said in Gilga's 90-second testimonial on YouTube. "You can make mistakes faster. At the end of the day, especially in art, you just want to make mistakes faster."
However, not everyone may be as enthusiastic about AI, especially considering how much of your personal and work life (including your physical spaces) you'd need to expose to Google—even if the company claims, "We take your privacy seriously."
Others might worry about relying on Google for "organizing, reasoning, and synthesizing," questioning how accurately the company's AI can prioritize, summarize, highlight, and suggest information. How can you be sure it truly captures the nuances of what's happening around you?
OpenAI's latest ChatGPT-4o chatbot becomes quicker, chattier, and is offered for free. There were speculations that OpenAI could attempt to outdo Google by revealing an AI search engine a day before Google I/O. However,
OpenAI is also rolling out this potent new model for free and has introduced a desktop version as part of its initiative to encourage more consumers to give it a try. Earlier this year, as Lacy reported, OpenAI eliminated the need for account sign-ups.
Dubbed ChatGPT-4o (with the "o" representing omni), it can "respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is comparable to human response time in a conversation," as stated by OpenAI. Additionally, the company mentioned that it can comprehend and converse in 50 languages, with one demonstration showcasing a real-time exchange in Italian and English.
Additionally, it boasts enhanced "vision" capabilities, enabling it to recognize and appropriately respond to various inputs such as selfies, handwritten math equations, screenshots, documents, and photos.
During the product launch, Mark Chen, head of frontiers research at OpenAI, highlighted the new real-time conversational speech feature, allowing users to interrupt the chatbot without waiting for it to finish responding. Furthermore, ChatGPT-4o can detect and project emotions, contributing to a chattier and more engaging conversation. With demos featuring both male and female voices, the chatbot sounded upbeat and cheery.
For those skeptical about these claims, OpenAI encourages checking out the demos to experience it firsthand. In one demonstration, an OpenAI employee asks ChatGPT-4o to assess a joke he's working on, prompting the chatbot to engage in a lighthearted exchange and ultimately affirming the joke's quality.
This new model represents part of OpenAI's broader initiative to transform ChatGPT into an AI voice assistant, with interactions initiated by the wake-up phrase: "Hey, ChatGPT." This development coincides with Google's advancement of its virtual agents through Gemini-powered AI Agents, and Apple's anticipated announcement of AI enhancements to Siri at its upcoming developers' conference in June.
Comments
Post a Comment