Study notes
1. Analise images
Computer Vision
- Part of artificial intelligence (AI) in which software interprets visual input: images or video feeds.
- Designed to help you extract information from images:
- Description and tag generation
Determining an appropriate caption for an image, and identifying relevant "tags" - Object detection
Detecting the presence and location of specific objects within the image. - Face detection
Detecting the presence, location, and features of human faces in the image. - Image metadata, color, and type analysis
Determining the format and size of an image, its dominant color palette, and whether it contains clip art. - Category identification
Identifying an appropriate categorization for the image, and if it contains any known landmarks. - Brand detection
Detecting the presence of any known brands or logos. - Moderation rating
Determine if the image includes any adult or violent content. - Optical character recognition
Reading text in the image. - Smart thumbnail generation
Identifying the main region of interest in the image to create a smaller "thumbnail"
- Description and tag generation
- Provision:
- Single-service resource
- Computer Vision API in a multi-service Cognitive Services resource.
Use the Analyze Image REST method or the equivalent method in the SDK (Python, C# etc)
You can use scoped functions to retrieve specific subsets of the image features, including the image description, tags, and objects in the image.
Returns a JSON document containing the requested information.
Sample:
{
"categories": [
{
"name": "_outdoor_mountain",
"confidence": "0.9"}
],
"adult": {"isAdultContent": "false", …},
..
..
}
Generate a smart-cropped thumbnail
Creates thumbnail with different dimensions (and aspect ratio) from the source image, and optionally to use image analysis to determine the region of interest in the image (its main subject) and make that the focus of the thumbnail.
2. Analise video
Extract info:
- Facial recognition
- OCR
- Speech transcription
- Topics - key topics discussed in the video.
- Sentiment analysis
- Labels - label tags that identify key objects or themes throughout the video.
- Content moderation
- Scene segmentation
reating custom models for:
- People.
Add images of the faces of people you want to recognize in videos, and train a model. Consider Limited Access approval, adhering to our Responsible AI standard. - Language.
Specific terminology that may not be in common usage - Brands.
Train a model to recognize specific names as brands relevant to your business. - Animated characters.
Detect the presence of individual animated characters in a video.
- Video Analyzer for Media widgets
share insights from specific videos with others without giving them full access to your account in the Video Analyzer for Media portal - Video Analyzer for Media API
REST API that you can subscribe to in order to get a subscription key -> automate video indexing tasks, such as uploading and indexing videos, retrieving insights, and determining endpoints for Video Analyzer widgets.
Result is in JSON.
Image classification
Computer vision technique in which a model is trained to predict a class label for an image based on its contents.
- multiclass classification - multiple classes, each image can belong to only one class.
- multilabel classification - an image might be associated with multiple labels.
- Use existing (labeled) images to train a Custom Vision model.
- Create a client application that allow others to submit new images - model generate predictions.
Object detection
Computer vision technique in which a model is trained to detect the presence and location of one or more classes of object in an image.
- Class label of each object detected in the image.
- Location of each object within the image, indicated as coordinates of a bounding box that encloses the object.
Hardest part is training model:
- Add label to every object in image via use the interactive UI from Custom Vision portal.
Suggest train the model as soon as you have relevant images labeled then, use smart labeling, system prefill and you just confirm or change. - Use labeling tools ie. the one provided in Azure Machine Learning Studio or the Microsoft Visual Object Tagging Tool (VOTT)- team work.
In this case, you may need to adjust the output to match the measurement units expected by the Custom Vision API
In fact, there are multiple 'actions':
- Face detection
- Face analysis
- Face recognition
- Computer Vision service
Detect human faces and return the box blundering face and its location (like in object detection). - The Face service
What do Computer Vision (box +location) plus:- Comprehensive facial feature analysis
- Head pose
- Glasses
- Blur
- Exposure
- Noise
- Occlusion
- Facial landmark location
- Face comparison and verification.
- Facial recognition.
- Comprehensive facial feature analysis
- Data privacy and security
- Transparency
- Fairness and inclusiveness
When you need to positively identify individuals, you can train a facial recognition model using face images:
Training process:
- Create a Person Group that defines the set of individuals you want to identify.
- Add a Person to the Person Group for each individual you want to identify.
- Add detected faces from multiple images to each person, preferably in various poses.
The IDs of these faces will no longer expire after 24 hours (persisted faces). - Train the model.
It can be used to:
- Identify individuals in images.
- Verify the identity of a detected face.
- Analyze new images to find faces that are similar to a known, persisted face.
Hands-On Classify images, Login to view
Hands-On Computer Vision, Login to view
Hands-On Video Indexer, Login to view
References
Create computer vision solutions with Azure Cognitive Services - Training | Microsoft Learn
Limited Access features for Cognitive Services - Azure Cognitive Services | Microsoft Learn
Responsible AI investments and safeguards for facial recognition | Azure Blog and Updates | Microsoft Azure
Labeling images and text documents - Azure Machine Learning | Microsoft Learn
VoTT/README.md at master · microsoft/VoTT (github.com)
Limited Access features for Cognitive Services - Azure Cognitive Services | Microsoft Learn
Responsible AI investments and safeguards for facial recognition | Azure Blog and Updates | Microsoft Azure
Labeling images and text documents - Azure Machine Learning | Microsoft Learn
VoTT/README.md at master · microsoft/VoTT (github.com)