As far as we're concerned, they're all just different terms for the same thing. Historically these different phrases have been preferred by different communities, which in recent years have come together due to amazing advances in AI technology. The term Computer Vision has traditionally been used in academic research, as a sub-discipline within the machine learning (ML) community. While Machine Vision is somewhat more common in industrial and manufacturing settings. AI Vision is a more modern and broadly-encompassing term which brings in current AI techniques like GPT-style transformers and natural language queries. It's totally reasonable to try to define subtle differences between the terms, but here at Groundlight we don't bother to distinguish. We just want to make it easy for you to solve your real-world vision problems!
Yes! Visit https://dashboard.groundlight.ai/ and create an account. Once you’ve created your account, navigate to the ‘Explore’ page. From there you can create a detector, capture images and examine results all from your browser.
A detector is what we call a computer vision model that you create using Groundlight AI. It answers a specific question and tunes the model for a specific set of images or scenes which you provide, from a camera feed or set of images.
An image query is an individual image and question you send to the detector, which will return an answer. The detector will first attempt to answer with a specialized ML model, and if unsure, will escalate to a person, all behind the scenes. What you get is an answer, and that happens for each image that gets sent to a detector.
If you want high throughput analysis for video, ask us about setting up an edge endpoint. But in the cloud, Enterprise accounts can accommodate whatever rate you need. For free accounts the limit is 6/min. 30/min for Pro, and 60/min for Business. Again, to run faster, you will want an edge endpoint, which uses a local copy of your detector's model, which will typically respond to an image query in under 20 ms. An edge endpoint can scale horizontally to arbitrarily high throughputs depending on the hardware available. A small server can easily handle hundreds of frames per second, and by adding multiple GPUs in an edge cluster Groundlight can scale to hundreds of video streams.
For a Groundlight detector, Ground Truth is a select set of labeled images of the utmost quality - images where we're sure that the answer is correct. These are the images we use to measure how well the detector is performing, so more ground truth you provide, the more we will know about detector performance. Generally these answers must be provided by you, the person defining the detector, because only you know what the detector is trying to do, and exactly what your question means. Your inputs are the source of truth for accuracy and how the detector should behave. In some cases where there aren’t a lot of answers provided by you, a Groundlight staff expert can act as a proxy for your inputs, generally after a conversation with you.
The confidence threshold is a knob you can tune on any detector to manage the trade-off between fast ML answers and reliable human answers. At one extreme, a value of 1.0 means every single image query should be checked by a person. At the other extreme, a value of 0.5 means nothing gets checked by a person and we will always take the ML prediction as is. In between you can tune the detector's behavior to match your application's needs and budget.
To dig in deeper, let's look at how a typical confidence threshold of 0.9 behaves on a detector. Every image query first gets an ML prediction - either YES, or NO, and a confidence value. If the confidence value is over 90%, we trust the ML answer. If the confidence value is under 90% then it is considered unsure, and gets escalated to human review.
When an image query runs through the ML model, it outputs a prediction (either YES or NO) and a confidence score, from 50% to 100%. The confidence score is a careful scientific estimate of the odds this ML prediction is correct. So if the confidence score is 99% (or 0.99 - same thing), this is awesome! Because it means your ML model is doing a fantastic job. So good in fact, that we know there's a 99% chance the ML prediction is correct. This is what happens when you have a mature, well-trained detector.
But if your ML prediction comes with a confidence score of 75%, that means the answer will be correct 75% of the time, but there's a one-on-four chance it made a mistake. If the confidence score is something like 52% then the detector is admitting that it's basically just guessing. This is typical when it has never seen something like this before.
All of this is written specifically for binary YES/NO questions. For other detector modes, some details are different, but the idea is the same.
Yes! Check out our blog post here. GPT is not reliable enough for most real-world visual analyses. It can analyze images in the sense of producing text related to the content of the image, and sometimes this produces the correct answer to specific questions. But very often it only understands the general sense of the image, and will make incorrect statements when asked about specifics. Moreover, customizing GPT or any other LLM for your exact needs is generally a fairly involved effort. However, for trustworthy, repeatable, and actionable answers, you are better off training a specialized model. Groundlight allows you to do just that: all the machine learning engineering ops are hidden behind its simple service.
You can ask binary questions where the response is either yes or no, and count questions where the response is a numerical integer. We are building additional modes: multiple choice, bounding boxes, text recognition are all in the works. They're available as previews for Enterprise accounts. Interested? Schedule a free consultation with us.
When you submit an image query to a detector, the ML model runs inference and makes a prediction. If the confidence on that prediction is below the detector's configured confidence threshold, the image query will be escalated to a person for a more definitive answer. That person’s response answers the question that was asked, and also goes back into retraining the model. In some scenarios, there is an intermediate step where a larger ML model attempts to answer the image query before it is further escalated to human labelers. Groundlight has a 24/7 labor force of online labelers available to answer any image queries that the models struggle to understand.
In some cases, our cloud labelers will also decide that an image query is ‘unclear.’ You can find those in the ‘Flagged’ section in Detector Details (go to “Detectors” tab, select the detector you wish to see detail for). This may happen, for example, for reasons of poor image quality or ambiguity in the question. For example if you ask "Is the door closed" but it's slightly ajar, the cloud labelers will mark this as unclear so that you can clarify your intent. You can help by labeling the unclear examples yourself, and providing clear instructions in the query text and notes. For more detail, see our blog post on best practices.
Yes! You always own your data - see our terms of service here. Data can be downloaded using the SDK, and paid accounts can export their data from the web dashboard.
Paid accounts can download models to run on a Groundlight Hub or a custom edge endpoint. Connect with a Groundlight team member to learn more.
No. With other computer vision tools, there is an explicit cost for training and a cost for inference. With Groundlight, each time a label is provided by your or one of our cloud labelers, the models automatically retrain to get the most accurate data. Cloud labels provided by groundlight staff are a part of the paid service and are subject to limitations based on your account tier.
There are multiple ways to add an image to a detector. You can do it manually using the dashboard web interface. But for real automation, you probably want to write or use an application that uses the Groundlight Python SDK to submit images. If you're writing code to grab images from a camera, we recommend the python library Framegrab to simplify and standardize access to network cameras, USB web cameras, and high-quality industrial cameras.
Another option is to use a Groundlight Hub. Groundlight Hub enables you to connect to any local cameras- go to the ‘hubs’ tab in your groundlight ai account and select the hub with the camera(s) you wish to fetch images from. From there, you can select the camera, detector, stream, and alert you’d like to enable.
Image queries will get an ML answer generally in less than a second. If escalated to human review, cloud labels are generally available in under a minute, often in 30 seconds or less, depending on the system load and your account's priority.
Groundlight can work with almost any standard camera. The best camera to use depends on your application. Overall, for Groundlight to be effective, it needs to be a resolution that a human could look at and answer the query well. Groundlight supports cameras that have either USB or RTSP. Groundlight detectors tend to perform better on images that are nearly square, rather than extremely tall or wide images.
If you are a developer, check out Framegrab to easily send images to Groundlight from a camera feed. Framegrab makes it easy to use any network camera that supports the RTSP protocol - which is available for almost any modern network or ethernet camera.
Groundlight does not directly support 3-D or depth cameras, or advanced sensors such as LiDAR. But we have seen these work in some situations. Feel free to get in touch if you think that would be useful.
The primary means to control escalation is by setting the confidence threshold on a detector. A higher confidence threshold results in more escalation, and vice versa. It is also possible to set your detector to ALWAYS or NEVER escalate to human review. If your question requires special expertise to answer, we can also enable escalation to be routed to your own private labeler workforce.
The model starts training from the first label it receives. The number of labels necessary for good performance will depend on how easily the question can be answered from the images. Some visual questions are easy and will be learned quickly, while others take longer. Many questions get good results with a few hundreds of labels.
Most binary detectors’ models use up to 5000 labeled image queries to train. All labeled images up to 5000, and up to 25,000 total (labeled and unlabeled) images are kept per detector. If more than 5000 labels are available, we automatically determine the best ones to use in training based on recency and label authority. You can add more, and Groundlight's algorithms will select the 5000 most useful images for training.
If you're a developer and you're looking for resources to build computer vision applications using Groundlight, visit our Documentation.
SEE DOCUMENTATION