ZD Tech: The tools to identify online content follow each other and are not alike


Hello everyone and welcome to ZD Tech, ZDNet’s daily editorial podcast. My name is Clarisse Treilles, and today I am reviewing the different techniques for identifying audio, video and image files.

In the museum, an original work is generally signed by the hand of the artist. On the internet, the principle is the same, but the tools used differ. We find on some platforms recognition systems called “fingerprinting”, by digital fingerprints, or on others still digital “tattoos”.

Like YouTube, and its system called “Content ID”, or even Facebook which has developed the Rights Manager tool, fingerprinting techniques (known as “fingerprinting” in English) are the most widespread on the Internet.

Gather a lot of data to compare fingerprints

What must first be understood is that an imprint is distinct from the work itself. The technique is based on a unique digital representation of a content. To generate such an imprint, the methods consist in reducing or simplifying an entire content in order to retain only characteristic elements thereof. Note that this process is not reversible: it is therefore impossible to recreate the entire original content from its imprint.

To verify the authenticity of a document in this way, it is necessary to have a content recognition system. It generally consists of a database where the fingerprints of all the documents to be identified are stored. Once this database has been created, the system is used as a search engine.

Still, the technique of the imprint does have a defect: it must be fed by a large reference base. This requires significant storage capacity, which can represent a high cost, especially for small producers.

Get the unique signature of a file with the hash

Alongside this, there is also the so-called “chopping” technique. In this way, any file or data can be represented by a unique alphanumeric character string, which is called the hash. It’s a bit like the unique signature of a digital work. Thus, two strictly identical files will always have the same hash.

This hash is certainly practical, but not very flexible: this technique only works when two files are perfectly identical. Thus, the slightest change in a starting file – like the simple fact of changing the format of an image for example – will create a distinct hash. Comparing the hashes therefore does not make it possible to identify all the copies of an image, only the exact copies.

Watermarking digital content with the watermarking technique

Finally, the last method we are going to talk about is that of “watermarking”, or digital watermarking. Unlike the other two methods, watermarking involves modifying the content. This consists of integrating a mark into a file, which it will then be possible to find.

To achieve this, you need two things: first a marker to “watermark the content”, then a detector to find a digital watermark. This marker can be visible, such as a logo on an image or a video, and it can also be invisible to the naked eye.





Source link -97