LeakLens

Document Type

Abstract

Publication Date

Fall 12-1-2024

Abstract

Protecting intellectual property requires efforts for many film production companies in today’s fast paced Internet world. They have to ensure that their content such as movie scenes, scripts and celebrities’ photos is not leaked and distributed widely on the Internet. To minimize the time and cost of finding leaked content, LeakLens is an advanced web application designed for content security professionals to enhance image detection and similarity analysis. Utilizing Machine Learning, the app enables users to customize image searches for greater accuracy compared to existing tools like Google Lens or TinEye, minimizing false positives. Key functionalities include the ability to upload original image files, scrape the Internet for similar images, index a vector database, and store image embeddings. The app employs Python and the OpenAI CLIP (Contrastive Language–Image Pretraining) model to convert images into embeddings, which are then used for accurate similarity scoring. The image detection process involves two main phases. During the image search phase, users upload images that need to be monitored for leaks. The app processes these images with OpenAI CLIP, converting them into embeddings and storing them in Pinecone, the vector database. It then retrieves and ranks the top 5 most similar images based on similarity scores. In the training and indexing phase, content security personnel assist by scraping images using keyword searches. These images are converted into embeddings and indexed in the Pinecone database, improving the system’s accuracy and efficiency.

Comments

Completed as part of the Computer Science Senior Capstone Project.

This document is currently not available here.

Share

COinS