5 Tech Tidbits Lawyers Should Know When Collecting Web Content

In an era dominated by the internet, it’s surprising that the legal industry is only starting to account for the realities of our increasingly-digital age

Last Updated March 2024

In an era dominated by the internet, it’s surprising that the legal industry is only starting to account for the realities of our increasingly-digital age. While the nuances of modern technology can be difficult to master, having a fundamental understanding of it will help ensure legal professionals meet their ethical duty to provide competent representation to clients and to have more insightful conversations with technology experts.

Here are some quick tech tidbits and tools to know about when collecting web content, such as websites, videos or social media profiles, for research or a case:

1. Hash Value Matching

Hash value matching is an algorithm-based matching process used to confirm that screenshots and other web content collection evidence are actual duplicates of the original data source.

“We do a couple of digital timestamps,” said Page Vault CTO Todd Price, “and at the heart of our PDF capture is a hash value, which is simply a small digital fingerprint for any digital data such as a PDF file. A digital timestamp also adds a date and time at which the hash value was obtained, so not only do you know that this content existed in this form, you know the exact time and date it existed.”

The Judicial Conference of the United States took hash value matching into consideration when it proposed amendments to the Federal Rules of Evidence 902 that will streamline the admission of electronically-stored evidence.

2. Metadata

Metadata is underlying information embedded in web content such as documents, images, videos and webpages; Metadata gives more detailed information about the data presented.

In Page Vault’s e-Discovery Trends 2018: Web Content Collection report, litigators identified the following metadata as the most valuable to collect: 100% said the time and date of the capture, 83% said the website URL of the content, 44% said IP addresses of the capturing computer and web server, 39% said the webpage source code (HTML), and 38% said the name of the person making the capture.

Page Vault documents all the metadata with each capture, such as the address of the server and the IP address, all of which they also verify and timestamp. To further emphasize the importance of technology competency, Page Vault attorney Patrick Schweihs, has said, “judges are getting more tech-savvy. Terms like metadata, time stamps, URLs—that nomenclature is not foreign to judges anymore. And for lawyers, the rules of professional responsibility are being updated to include sanctions for lawyers so that when something goes wrong, they can’t claim they didn’t understand the technology.”

3. Google Dorking or Google Hacking

A Google dork query is an advanced type of Google search that can be used to uncover vulnerabilities and sensitive information through specialized search commands. While it can be used for protective purposes, such as fixing website security issues, it can also be used by cybercriminals to find sensitive financial information, classified information, and other compromising data stored by law firm websites and servers.

This type of advanced search can also be helpful for legal professionals when trying to track down specific content or evidence on the web. A few examples of advanced searches that could assist legal professionals include:

  • site:” to pull up webpages associated with a certain website or domain
  • intext:” to search for a specific word or phrase within online text
  • filetype:” to search for specific documents by file type

Get more information on how to leverage Google’s advanced search in this guide >

4. Shodan

Shodan ( is a kind of search engine that finds devices connected to the internet. The devices can range from nuclear power plants to traffic lights to hospital heart rate monitors. What is most concerning is that many of these connected devices have minimal password protection or even no protection at all. While Shodan can be a security liability for firms and companies who are connected to the internet, it can also be useful when conducting case research or looking for evidence.


This post was authored by Eric Pesale, an attorney who writes about e-discovery, data security and other legal topics for law firms, publications, and companies, and is the founder and chief legal contributor of Write For Law. He is a graduate of New York Law School and the University of North Carolina at Chapel Hill, and has been published in CSO, the New York Law Journal and Above the Law. Eric can be reached at or on Twitter at @ericpesale.