Exploring Microsoft Copilot Vision: Potential and Limitations

In 2024, Microsoft introduced a feature called Copilot Vision, initially available to Pro subscription users in the United States. This feature allows users to interact with any webpage, giving a new dimension to browsing and engagement. Now, Microsoft has expanded this to free users, though, as of now, it’s still only accessible in the United States. Despite the excitement surrounding Copilot Vision, initial user experiences show both promising potential and some significant drawbacks.

What is Microsoft Copilot Vision?

Copilot Vision is an AI-powered tool integrated within Microsoft Edge, designed to assist users by reading and interpreting web pages in real time. With a simple voice command, users can ask questions about the content on a webpage, and Copilot Vision will attempt to provide answers based on the information displayed. This feature is meant to serve as an advanced assistant, offering users a more interactive and intuitive browsing experience.

The Launch of Copilot Vision for Free Users

After the initial launch for Pro users, Microsoft extended Copilot Vision to free users, but only in the United States. Users can access it through Microsoft Edge by searching for “Copilot Vision” in Bing, selecting the appropriate result, and activating the feature by accepting the terms and conditions. Once enabled, Copilot Vision can be accessed via the sidebar in Edge, where users can engage with a variety of page elements, including asking questions or having the assistant describe content.

Initial Hands-On Experience

Early hands-on tests revealed a mixture of successes and limitations with Copilot Vision. Users reported that while the assistant could describe some aspects of a webpage, its responses were often incomplete or disconnected. For instance, on some occasions, Copilot Vision would stop speaking abruptly in the middle of a response, or its interpretation would be inaccurate.

An attempt to have Copilot describe the content of a webpage led to multiple failed responses. It was unable to identify all of the page elements correctly, missing out on certain buttons or features. One notable failure occurred when Copilot only identified one prominent button on a Microsoft webpage, while missing a second button that allowed for playing a video.

Another critical limitation became evident when users attempted to have Copilot interact with page elements, such as playing embedded videos or clicking buttons. Copilot declined these requests, stating that it couldn’t access anything on the page. This indicates that Copilot Vision, at this stage, does not have full control over the page elements or functionality.

Testing Copilot Vision with Other Websites

Further tests were conducted on other websites, such as WindowsLatest and Amazon UK. Copilot Vision could describe articles and navigate through content, but its scope remained limited. When asked about specific product details, such as the performance of an SSD, Copilot Vision could only discuss what was visible on the screen, without the ability to search for additional information or access deeper product specifications.

When navigating Amazon UK, Copilot Vision successfully identified SSDs but struggled to compare products effectively. It could not compare the write speed of two products unless it was explicitly mentioned on the visible page. Additionally, Copilot Vision showed a lack of memory, as it would forget details once the page was scrolled, leading to incomplete responses.

Copilot

Despite its promise, Copilot Vision falls short in several areas:
1. Inconsistent Responses: The assistant often stops responding in the middle of a conversation, providing incomplete or irrelevant answers.
2. Limited Interaction with Web Elements: Copilot cannot interact with page elements, such as buttons or videos, and is unable to execute basic actions like scrolling or playing videos.
3. Memory Issues: Copilot Vision struggles with remembering information across different sections of a webpage, often forgetting previous details once users scroll down.
4. Inability to Search Beyond Visible Content: Copilot can only read visible portions of a page, meaning it lacks the ability to fetch information that isn’t immediately apparent.

What Undercode Says:

Copilot Vision shows significant promise, but it is clear that there is a lot of work left to do before it becomes a truly useful tool for users. While the ability to interact with web content is an exciting concept, its current limitations make it more of a novelty than a practical tool. The assistant’s inability to interact with basic page elements like buttons or videos is a major barrier, as it severely limits the user’s experience.

Moreover, the inconsistent responses and memory issues make it challenging to rely on Copilot Vision for detailed, continuous interactions with a webpage. Users may find it frustrating when Copilot fails to remember earlier interactions or provides incomplete answers. To truly unlock the potential of Copilot Vision, Microsoft will need to focus on improving its memory, response accuracy, and ability to interact with dynamic web content.

From a user experience perspective, Copilot Vision has a long way to go before it can match the expectations set by other AI-powered tools. Microsoft’s commitment to pushing the boundaries of AI in the browser is clear, but for Copilot Vision to be truly effective, it needs to evolve beyond its current state. Whether through better integration with page elements, improved interaction capabilities, or a more reliable response system, Copilot Vision’s success will depend on overcoming these significant obstacles.

Fact Checker Results:

Response Inaccuracy: Copilot Vision has trouble offering consistent answers, often stopping mid-conversation or delivering incomplete responses.
Limited Page Interaction: The tool cannot interact with or modify elements like buttons, videos, or other features on a webpage.
Restricted Visibility: Copilot Vision can only view visible portions of a page, limiting its ability to fully comprehend and interact with all content.

References:

Reported By: https://www.windowslatest.com/2025/03/28/microsoft-just-added-copilot-vision-to-edge-for-free-on-windows-11-hands-on/
Extra Source Hub:
https://www.instagram.com
Wikipedia
Undercode AI