Computer Vision Algorithms: Decoding the Visual World

Written by Coursera Staff • Updated on

Computer vision algorithms make it possible for AI models to respond to visual cues. Explore how algorithms like image classification and object detection work, how to use them, and the types of computer vision models you can use to write them.

[Featured Image] Two VR developers use computer vision algorithms as they test their creations in a lab environment.

Computer vision is a technology powered by artificial intelligence that enables robots, computers, and other machines to process visual information and react accordingly. Computer vision works using cameras and lenses that capture images and AI algorithms that instruct the machine how to interact with the image. These computer vision algorithms enable AI models to detect and classify objects within images, track objects through video or sequential images, and generate unique visuals. 

These functions allow professionals in many different industries to use computer vision for better information gathering and productivity. Computer vision can be used for business intelligence, more accurate medical diagnostics, agriculture, home protection monitoring, and self-driving vehicles. 

Explore how computer vision works and some of the core computer vision algorithms that empower this technology, including image classification, object detection, object tracking, edge detection, segmentation, and image generation.

Core computer vision algorithms and how they work

Computer vision algorithms enable the artificial intelligence needed to process, understand, classify, and manipulate images. This technology works similar to the way that humans see and understand visual information. Just as you have learned to process visual data throughout your life, computer vision uses training data to provide the AI model with a foundation of visual information. When you see something new, your brain compares it to other things you’ve seen in the past to try and classify or make sense of the unfamiliar. Similarly, computer vision algorithms draw on the patterns noticed in training data to make sense of a new image. 

The algorithm is the instruction you provide to the AI model that gives it functionality. You can use different types of algorithms to enable different functions of an AI model, such as detecting or classifying objects or generating new images. You can think about computer vision algorithms in two ways: by their function and by the structure or architecture of the model. 

Explore core computer vision algorithms by function and learn the type of algorithm structure you might use to accomplish each task, including image classification, object detection, object tracking, edge detection, segmentation, and image generation. 

Image classification

Image classification is a method of sorting images by class or a primary characteristic that describes the image. For example, an image classification program might sort pictures of apples, bananas, and oranges. You can also use image classification to predict if an image belongs to a certain class, such as determining whether something is or is not a watermelon. You can use this technology in many different ways, including automatically categorizing uploaded images or enabling a camera to focus on faces before taking a picture. 

Possible types of algorithms for image classification: Convolutional neural networks, deep convolutional neural networks, logistic regression, support vector machines, and k nearest neighbor

Object detection

Object detection is an algorithm that digs a little deeper by assessing whether objects meet quality standards or sorting them within a class by appearance. For example, a farmer might use object detection to look for signs of illness in his livestock, or factory workers might use it to look for defective products on the assembly line. Home security systems also use object detection to spot threats like unfamiliar faces.. 

Possible types of algorithms for object detection: Region-based convolutional neural networks, Single Shot Detector, YOLO (You Only Look Once), RetinaNet, and Feature Pyramid Network

Object tracking

Object tracking algorithms allow computers to track objects as they move through a visual field, such as on a video feed or in photos taken sequentially. This technology can be used in many different ways, such as monitoring traffic, medical imaging, and autonomous cars, which need to track other moving vehicles, pedestrians, and unmoving objects to avoid collisions. 

Possible types of algorithms for object tracking: Single shot detector (SSD), Dense Optical Flow, Kalman Filtering, and Mean Shift

Feature/edge detection

Another type of computer vision algorithm used for image processing and image classification is feature detection. Detecting features such as edges, objects, subjects, and background of an image is a critical task a computer must complete before it can classify the image or make decisions about it. Detecting the edge of an image is particularly important because it can create an outline of the objects in the image to help the computer understand how to process it. After you use an algorithm for edge detection, you can analyze the photo in other ways, such as feature detection, line detection, or edge thinning. 

Possible types of algorithms for feature and edge detection: Canny, Roberts, Gaussian, and fuzzy logic

Instance or semantic segmentation

Instance or semantic segmentation are methods for understanding the boundaries of different objects within images and categorizing them individually. For example, instance segmentation may recognize a picture with an adult woman and a newborn baby as two separate subjects within the picture. Semantic segmentation would allow the AI model to understand them as two separate categories of people. 

Possible types of algorithms for instance segmentation: Mask region-based convolutional neural networks, Intersection over Union (IoU), and Average Precision (AP)

Image generation

An image generation algorithm can create unique images based on natural language prompts. To accomplish this, these AI models use deep learning and a gamified competition between two neural networks that together form a generative adversarial network (GAN). One part of the GAN AI works as the generator and creates original images based on training data. The other part of the AI model works as a discriminator to spot the differences between the AI-generated image and real images from the training data. The two personas compete until the generator wins the game, that is, until the discriminator is fooled by the generated image. This winning version is sent as the output. Other potential algorithms that you can use for image generation include neural style transfers and stable diffusion models.

Possible types of algorithms for image generation: Generative adversarial networks, neural style transfers, diffusion models

Computer vision careers

If you want to use computer vision algorithms to build AI models that solve real-world problems, consider a career in computer vision. Three potential options include computer vision engineer, robotics engineer, and virtual reality developer. 

Computer vision engineer

Average annual pay in the US (Glassdoor): $122,587 [1]

Job outlook (projected growth from 2023 to 2033): 26 percent [2]

As a computer vision engineer, you will use computer vision algorithms to build machine learning solutions, either for your employer or for clients who hire your team. You will develop, test, and train computer vision algorithms as well as create user guides or train staff on how to use your AI solutions. 

Robotics engineer

Average annual pay in the US (Glassdoor): $111,176 [3]

Job outlook (projected growth from 2023 to 2033): 11 percent [4]

As a robotics engineer, you will design, develop, test, and train robots for use in many different industries, such as manufacturing, automotive, health care, national defense, and utilities. Although the exact work you do will depend on your industry and project, you’re likely to work with a team of professionals to create, troubleshoot, and implement robots and automated systems. 

AR/VR developer

Average annual pay in the US (Glassdoor): $109,200 [5]

Job outlook (projected growth from 2023 to 2033): 17 percent [6]

As an augmented reality or virtual reality developer, you will work as part of a software development team creating projects that allow users to experience AR or VR. You may create programs for systems like Oculus Rift, iOS, HTC Vive, Gear VR, or PS4.

Learn more about computer vision algorithms on Coursera. 

Computer vision algorithms enable robots and machines to detect and process visual information and respond accordingly. If you’d like to learn more about computer vision algorithms or to start a career in a related field, you can begin today on Coursera. For example, you could enroll in First Principles of Computer Vision Specialization offered by Columbia University. 

Article sources

1

Glassdoor. “Salary: Computer Vision Engineer in the United States, https://www.glassdoor.com/Salaries/computer-vision-engineer-salary-SRCH_KO0,24.htm.” Accessed November 1, 2024. 

Keep reading

Updated on
Written by:

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.