Working with Big Data in Law Enforcement

Tuesday, June 12, 2018

Working with Big Data in Law Enforcement

One of the core business processes of law enforcement is to convert information into real intelligence to predict, solve and in some cases prevent crime. Information is simply data that contains no context that helps to understand how to solve a problem by applying it. However, by applying processes like data mining, data analysis, or data profiling enables the information to be used by the policing organization to make operational, strategic and/or tactical decisions.

Data Mining

Data mining is the process of using machine learning, statistics, and artificial intelligence to search through large data sets in order to identify patterns and systematic relationships. This process is used to solve problems and predict future activities or trends. One of the main challenges in Big Data is the volume of information available makes it difficult to share, transfer, capture, and store. This is also evident in law enforcement where for example the amount of tips, complaints, reports and other public safety information that law enforcement agencies handle on a regular basis is phenomenal. In response to the Laci Peterson disappearance back in 2003 it was reported that the police received over 2600 tips in less than 17 days. An average of 1000 tips per day was received by the police related to the D.C. sniper investigation. This is what makes law enforcement a suitable field to apply data mining processes.

An average of 1000 tips per day was received by the police related to the D.C. sniper investigation.

Data mining has the capability to take this vast amount of information and identify characters of criminal behavior by extracting important entities that used in producing useful intelligence. In the examples above, text from investigative tips are selected, reviewed, and characterized based on common elements and patterns, sometimes referred to as clustering if performed as an unsupervised learning method.

Data mining is sometimes referred to as having a virtuous cycle of iterative learning processes. The first step of data mining begins by identifying the business problem. For law enforcement, this could be a problem of predicting crimes, offenders, perpetrator identities, or victims by identifying areas at increased risk. These predictions can also be accomplished by identifying persons that would be considered at-risk of a crime, such as domestic violence. As part of this step, it must be determined if data mining is the necessary process to solve this problem. For example, it is a widely accepted that all school zones should have a restricted speed limit. One would not have to perform data mining to see how many vehicular accidents involving child pedestrians in these areas to conclude which ones should be restricted and which ones shouldn’t.

Most of the work is performed transforming the data into information that lead to actionable rules by the discovery of patterns and rules. It is important to define the required data to solve the business problem. For example, to identify the areas of the city that are considered at-risk of criminal activities it is necessary to have historical crime data and 911 call records. There may be additional data such as text from tip databases in order to tie back to actual criminal activities. When performing predictive modeling the data should contain the possible outcome, such as to predict people engaged in high-risk activity that might become victims, it is imperative to have the criminal records of repeat offenders. It is also during this process the data will need to be validated, explored, and cleaned in order to build the data mining model.

Next, the results of the data model will need to be acted upon. In law enforcement, this could be crime mapping where at-risk hot spots are geographically identified on a map. It could be to make smarter personnel assignments to be deployed where they are most needed. For example, in Richmond, VA, every New Year’s Eve they experience an increase of gunfire. Accessing the proper data models of reported random shootings and their locations, the following New Year’s Eves they deployed additional manpower to these areas. As a result, there was a “47% decrease in random gunfire and 246% increase in the number of weapons seized”, and a cost saving in personnel of $15,000. This is an example of obtaining the correct predictions as part of the predictive data mining model and then acting upon it by intervening to reduce the criminal behavior based on these predictions.

Lastly, the results should be measured in order to access the impact of the action that was taken. The outcome of this process of the cycle could often yield more data for additional data mining efforts. Persons that commit crimes learn over time and can possibly change the way they operate, they can get more sophisticated in the way they carry out their crime or even find smart ways to elude the police. This may impact the efficiency of the algorithms used in the data mining process, therefore identifying a new problem, starting the process all over again.

Data Analysis

Data analysis is the component of data mining that transforms it into actionable insight. The process includes making the data ready by inspecting, cleaning, and modeling it to extract useful information that can be influential in decision making. In order to effectively do this, Gwen Shapira, a system architect at Confluent, outlined seven key steps in the data analysis process. Although law enforcement may have some unique core business processes and goals, these steps can be applied to a process of analyzing crime-related data.

The first step in the process even before data is captured is to determine the business objectives and a measurable way to know if the goals of the business are being met. Although law enforcement agencies may have several objectives and goals, their primary objective is the prediction, prevention, and solving of criminal activity, as well as the apprehension of persons who commit criminal activities. 

The second step, according to Shapira, is to identify business goals, metrics, and levers that will provide direction to avoid data analysis that is useless towards the objectives. In the midst of negative news headlines of police brutality, a goal for a law enforcement agency can be to monitor the use-of-force incidents. Metrics could include the location of where use-of-force incidents occur as well as any historical data for the police officer for prior incidents related to use-of-force and the number of use-of-force incidents for his/her entire unit for a comparison.

Data collection from diverse resources is another step in data analysis. This allows us to find better correlations when building the data models which make better actionable insight. As discussed in the second step having not only the use-of-force by the targeted officer but to have data related to the type of dispatched calls based on offense to provide more of an association with the acceptable level of force.

Validating and cleaning the data is an important step because junk data will generate the wrong results. This can be challenging depending on how the information is being captured. The challenges are working with data where the free text is allowed, such as tip databases and incident reports. In law enforcement and intelligence systems, we often see information residing in unstructured or narrative formats. Analysis methods and tools are used to increase the regularity, such as fixing typos and removing stop words, and adding structure, such as word tagging, in order to clean and validate the data.

Building data modeling that associate the data with core business processes that will predict the outcome. In law enforcement, a crime linking model will enable investigators to group information from crime scenes to build a criminal profile and eventually strengthen the case against a person of interest. Shapira suggests creating data science teams that will focus on streaming the data through these models and format the results in the form of reports and dashboards.

Lastly, it is understood that these processes are repeatable as they will lead to continuous improvement. Driving levers can be modified based on the results which will need to be measured by the team performing the analysis. Eventually, after multiple iterations, the data will generate accurate predictions, thus making better decisions for the law enforcement agency.

Data Profiling

Data profiling collect summary information about the data to determine its accuracy and completeness in order to improve the quality of the information. Data profiling provides an orderly, repeatable, consistent, and measurable way to evaluate data. There are two types of data profiling, one based on a sample, and the other based on profiling data “in place”.

In law enforcement data is collected from various different sources, for example, to determine areas with the most risk of criminal activities. This could include 911 calls, arrest and incident reports, and records, tip databases. Sample-based profiling would take a representative of this data such as extracting certain offenses, a specific date range, or even a specific time of day. The second type of profiling would simply query all the obtained data. Due to the enormous amount of data that is captured in law enforcement, the process of data profiling helps to organize the data and uncover inconsistencies that might be an issue. For example, years ago I worked with a database resource that stored information about arrest incidents that allowed a free field for the city. In reviewing the data, it was discovered that the city of Chicago was spelled several different ways. This truly affected the way statistics were pulled to reflect crimes that had occurred within the city. As a result, it was determined that the creation of a standardization rule was required.

Challenges and Risks of Data Mining, Data Analysis, and Data Profiling

One of the challenges that are encountered when using data mining in law enforcement is the use of free text fields. Converting these unstructured values into data mining variables, however, can be accomplished with the use of a predefined dictionary and a coding manual that is specific to the industry. Text mining, in this case, would be done either based on a single word or a sentence. The manual would contain explicit rules and instructions for associating terms and phrases to categories which would be used to analyze the number of times and instances the phrases or terms were used in the free text.

Public versus private data issues can cause a risk in the data analysis processing of crime data due to infringement of legal boundaries and privacy. In the case of sex offender information, this type of information is shared commonly with different jurisdictions as well as public entities that would be impacted by the presence of a registered sexual offender in their area, such as schools. Therefore, this registry is made public to warn about their presence in the community, but the identity of the victim is not made public, and cannot be data mined by entities outside of law enforcement. The same goes for crimes that are committed by juveniles which are not made available to all agencies.

As discussed earlier, the volume of information that is available for law enforcement can be a challenge in itself, as well as the various data sources. However, in analyzing large amounts of data, it is important to apply the correct type of data mining technique and define the intangible relations between them. Techniques that use data clustering to categorize crimes can help them be used for crime pattern recognition.

Big data has become an essential partner in helping the police do their job. Whether its used in “predictive policing” to help deploy resources to areas deemed at risk, or helping to solve a crime committed by a repeat offender, the way data resources are analyzed, profiled, and mined can provide the necessary intelligence. However, data mining must be used appropriately, and it is important that police personnel are educated as to how it should be used.