Technologies and Analyses in CBS’ Person of Interest

Person of Interest is a broadcast television show on CBS where a “machine” predicts a person most likely to die within 24-48 hours. Then, it’s up to a mercenary and a data scientist to find that person and help them escape their fate. A straight forward plot really, but not so simple in terms of the technologies and analyses behind the scenes that could make a modern day prediction machine a reality. I have taken the liberty of framing some components that could be part of such a project.  Can you help discover more?

CBSIn Person of Interest, “the machine” delivers either a single name or group of names predicted to meet an untimely death. However, in order to predict such an event, the machine must collect and analyze reams of big data and then produce a result set, which is then delivered to “Harold” (the computer scientist).

In real life, such an effort would be a massive undertaking on a national basis, much less by state or city. However, let’s dispense with the enormities—or plausibility of such a scenario and instead see if we can identify various technologies and analyses that could make a modern day “Person of Interest” a reality.

It is useful to think of this analytics challenge in terms of a framework: data sources, data acquisition, data repository, data access and analysis and finally, delivery channels.

First, let’s start with data sources. In Person of Interest, the “machine” collects data from various sources such as interactions from: cameras (images, audio and video), call detail records, voice (landline and mobile), GPS for location data, sensor networks, and text sources (social media, web logs, newspapers, internet etc.). Data sets stored in relational databases that are publicly and not publicly available might also be used for predictive purposes.

Next, data must be assimilated or acquired into a data management repository (most likely a multi-petabyte bank of computer servers). If data are acquired in near real time, they may go into a data warehouse and/or Hadoop cluster (maybe cloud based) for analysis and mining purposes. If data are analyzed in real time, it’s possible that complex event processing technologies (i.e. streams in memory) are used to analyze data “on the fly” and make instant decisions.

Analysis can be done at various points—during data streaming (CEP), in the data warehouse after data ingest (which could be in just a few minutes), or in Hadoop (batch processed).  Along the way, various algorithms may be running which perform functions such as:

  • Pattern analysis – recognizing and matching voice, video, graphics, or other multi-structured data types. Could be mining both structured and multi-structured data sets.
  • Social network (graph) analysis – analyzing nodes and links between persons. Possibly using call detail records, web data (Facebook, Twitter, LinkedIn and more).
  • Sentiment analysis – scanning text to reveal meaning as in when someone says; “I’d kill for that job” – do they really mean they would murder someone, or is this just a figure of speech?
  • Path analysis – what are the most frequent steps, paths and/or destinations by those predicted to be in danger?
  • Affinity analysis – if person X is in a dangerous situation, how many others just like him/her are also in a similar predicament?

It’s also possible that an access layer is needed for BI types of reporting, dashboard, or visualization techniques.

Finally, delivery of the result set –in this case – name of the person “the machine” predicts most likely to be killed in the next twenty four hours, could be sent to a device in the field either a mobile phone, tablet, computer terminal etc.

These are just some of the technologies that would be necessary to make a “real life” prediction machine possible, just like in CBS’ Person of Interest. And I haven’t even discussed networking technologies (internet, intranet, compute fabric etc.), or middleware that would also fit in the equation.

What technologies are missing? What types of analysis are also plausible to bring Person of Interest to life? What’s on the list that should not be? Let’s see if we can solve the puzzle together!



  1. Please forgive spelling or grammatical mistakes. This is just the unstructured stuff that went through my mind when I saw the show. I plan to build my own version soon to be operational by the next 80 years (I do not have the time nor the resources at this moment)

    Recognize people (faces, emotions, gestures, stress/frustration, predictability, drive and will)
    places( land water, airspace, buildings, structures, tunnels, facilities)
    Understand natural language, commands, offers, options. break and make sentences paragraphs and large text. connect past mentions to recent data, rate relevance of data to current situation.

    study extrapolate think.

    Find solutions, store and categorize data, link data to one another/ find relations between people, places, data.

    pamper options, rate feasibility and success rate, importance, plan for contingency.

    predict outcome, test outcome, learn from wrong outcome change .

    rate confidence in predicted outcome. use confidence to control degree of change in prediction.

    learn to categorize data in different sets based on source and relation.

    math and physics. statistics for prediction.

    drop and grasp data from related data while thinking for solution. relate data constantly

    run ? to see from different point of view that of the object.

    hack through preexisting channels

    update status. save data in 2 drives: 1 unmodified unlinked raw data files stored under categories of place and time accessible through drive 2 by relation to data. Constantly review data in drive 1 to keep logic and prediction factually accurate.
    2 consists of modified data done so by relating linking emphasizing editing clearing extrapolating treating data and making models and collages.

    • Aves, very ambitious! Not sure if you were joking about creating such a system in 80 years. Interestingly enough, the technologies to do much of what you plan are available today. It’s not inconceivable that such a “machine” couldn’t be made to work in the next 3-5 years. And that, is a scary thought.

      Thanks for commenting!

  2. in a recent episode, an anti-virus was unleashed to find the “machine code” and couldn’t find it, so the machine software is not in any computer registry, right? So, all these technologies discussed have to be recreated in a way that runs without any of the host computers knowing it.

  3. Steve K, thanks for commenting. Anti-virus is going to scan and keep bad code out. Probably not going to find any “machine code”. Using today’s technologies, like Hadoop and associated Apache projects, the vision outlined in POI isn’t far fetched at all. In fact, it’s a reality in today’s govt agencies.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s