As we develop this e-discovery knowledge base, we will be talking a lot about the people and how choosing the right teams and right people can reduce cost and add value. We think so much of the importance of the “people” part of the equation that we made our first substantive post a Primer on People.  But without the context of the process of e-discovery, knowing the “people” part only gets you so far in understanding how to drive down costs.

Whether you are an e-discovery pro or dealing with it for the first time, a primer on the process is helpful in setting a common foundation.

Most lawyers did not receive in law school any training regarding how to develop a framework for discovery. Those who have developed such a framework have done so through experience.

There are some helpful resources on how to conceptualize the e-discovery process. One of the most prevalent is the Electronic Discovery Reference Model—or the EDRM framework.

The EDRM framework is nice and has a certain logic to it. And it is a good way to teach new attorneys and give them an introduction to e-discovery.

Still, while helpful, it is not very practical or actionable. In our experience, we have never seen it explicitly used to guide decisions in a case.

Instead, we’ll identify a more practical description of the e-discovery process, breaking it down into six core steps:

Step 1: Locate your relevant data.

Here you need to identify the sources of information within your possession or control. We will be talking a lot about this over time (watch for our upcoming checklist to help you methodically work through your data sources to identify the relevant data). While there are dozens of potential sources of relevant data, the most common sources include emails, hard drives, shared network drives, and the like.

In one case we worked with the client and key fact witnesses to identify nearly 30 possibly relevant custodians, all of which had (1) email, (2) hard drives, (3) paper documents, and (4) access to the shared drives. If each custodian had approximately 10 GB of data across these sources (typical), the total data pool would be around 300 GB—the equivalent of nearly 20 million pages of data!

This data is your starting point. What you do with that data starts next.

Step 2: Preserve your data.

Whatever information you have identified for each custodian in Step 1, you generally need to take steps to preserve it from destruction. The federal (and, increasingly, the state) rules of procedure provide a proportionality framework from which you can determine which information needs to be preserved in light of the burden/cost of preservation. You may want to develop strategies not only for the preservation of data that is relevant to your particular case but also as part of a policy of data retention for your business as a whole. (A company-wide data retention policy will also likely make the location of data in Step 1 an easier process.) Ultimately, you want to create a decision-making matrix you can apply to your preservation decisions. (We’ll share our matrix with you in a future post.) In general, you should plan to preserve the relevant custodians’ email, hard drives, and hard-copy documents.

We will talk more about preservation of data down the road, but in the above example we did the following for each of our 30 custodians:

  • We turned off the company’s automatic email destruction policy (pursuant to which emails were automatically purged after 2 years);
  • We forensically* collected and stored .pst files from their email;
  • We took a forensic image of each of their hard drives;
  • We forensically imaged the information from the shared drives; and
  • We scanned their hard copy documents.

(*By “forensic” we mean a professional process that maintains the metadata and other key attributes of the data in a manner that is defensible in court if the collection process is challenged.  We’ll discuss more about this process in future posts as well.)

Step 3: Pull data from Step 1 into your data stack.

While 20 million pages of data is a lot, for a variety of reasons, it is unlikely that you will need to produce every page.

It may be that the other side asks for only a portion of the broader set of information you collected. Or, as the case proceeds, perhaps much of the data ultimately turns out to be irrelevant. In either case, as you move through the process you should be thinking about how to identify the subset of information that may need to produce to the other side or that want to use yourself.

You can identify this subset through many means. First, you may limit the custodians to just the most relevant. Second, you can use search terms to isolate a more relevant subset of the emails and other documents. Third, you may use a contextual analysis to identify portions of the shared network drive that are not immediately relevant or responsive (and therefore exclude that portion from your data set). You likely will do all three of these things, and more.

The bottom line is that the more you can defensibly limit the set of potentially responsive information, the fewer documents you need to review, the less data you need to process, store, and host, and ultimately the lower the cost of your discovery process.

After completing Steps 1-3, you have created a subset of data (your “data stack”) that you will review and potentially produce.

Step 4: Review your data stack to identify a smaller responsive/useful stack.

Step 4 often is the largest cost of e-discovery.  It is also the most important.

For example, say you are able to reduce your original data stack of 300 GB to a smaller set of just 30 GB of data, that still translates to potentially 2 million pages or up to 750,000 documents.

The first thing you will need to do is process and host the data. For this, you might retain a technology vendor or a managed services provider (which we discussed in our Primer on People).

After you have the data in the database—using a software platform such as Relativity—then you can review the documents and code them for responsiveness, privilege, and the like.

Under a typical approach, you might first use standard technology tools to further cull the dataset above (e.g., by removing duplicate documents or by applying email threading so that only the most inclusive iterations of an email chain are left for review). Through the use of such tools, it is not uncommon to reduce the document by one-third or more.

Then you might set up a team of first-pass reviewers to go document by document to identify documents that may be responsive to whatever document requests have been served on you by the other side.

At 50-70 documents/hour—a fairly typical pace for most document reviewers—the number of attorney-hours it takes to go through this, say 375,000 documents, is very large, and the cost, even at a reduced hourly rate, could easily spill well into the six figures just for the first-pass review.

This is where expert discovery counsel can work magic. Expert discovery counsel can slice and dice the data and review it far more effectively than the tradition linear review process often favored by BigLaw or managed review firms. Instead of 50-70 documents/hour, expert discovery counsel can (and we often do) achieve effective rates of over 200 documents/hour.

The hosting, processing, and, especially, the review, are often collectively the biggest cost of e-discovery.

Given its importance, we will be focusing quite a bit on Step 4 throughout this knowledge base.

No matter how you approach Step 4, you will have a reduced pile of data. What you do with it is in Steps 5 and 6.

Step 5: Produce data from the reduced pile from Step 4.

Once you have reviewed and collected your set of responsive documents, you need to “produce” them to the other side.

Before doing so, you must do several things:

  • Ensure you will produce only the information that is relevant/responsive and that you agreed to produce;
  • Properly withhold privileged information;
  • Properly code/label documents, including, where necessary, with a confidentiality designation under an assumed protective order so that the documents are protected from disclosure beyond an agreed group of people.

What’s the one thing that often gets lost in this process? See Step 6.

Step 6: Use your data to win the case.

One frequently forgotten step throughout the discovery process is making sure you are identifying documents that help your case! This isn’t just identifying a “hot” document that cuts against you, but actively looking for the documents and contextual knowledge to know who the players are, the context and nuances informing important decisions or key points along the case’s timeline, and other previously unknown facts that can help you win in a dispositive motion or at trial.

While the value of this type of review often does not show up in the bill, it will show up on the scoreboard. You want to make sure you have a review team that you can trust to put together the work product that catches these critically helpful items.


Ultimately, the point of discovery is to identify the documents that you are obligated to produce to the other side and that give you an understanding of the facts necessary to win the case.

In our next post we will start to put these concepts together, and we will talk about some of the hidden cost drivers of e-discovery.  If you have any questions about this post or any in the series, or if we can help with an upcoming review or discovery project, please contact us via [email protected].