Real-Time Digital Twins Simplify Development

05.06.20

Topics : Architecture, Cloud, Featured, Features, Performance, Products, Programming Techniques, Solutions, Technology

The Challenge: Track Thousands of Data Sources

Writing applications for streaming analytics can be complex and time consuming. Developers need to write code that extracts patterns out of an incoming telemetry stream and take appropriate action. In most cases, applications just export interesting patterns for visualization or offline analysis. Existing programming models make it too difficult to perform meaningful analysis in real time.

This obstacle clearly presents itself when tracking the behavior of large numbers of data sources (thousands or more) and attempting to generate individualized feedback within several milliseconds after receiving a telemetry message. There’s no easy way to separately examine incoming telemetry from each data source, analyze it in the context of dynamically evolving information about the data source, and then generate an appropriate response.

An Example: Contact Self-Tracing

For example, consider the contact self-tracing application described in a previous blog. This application tracks messages from thousands of users logging risky contacts who might transmit the COVID19 virus to them. For each user, a list of contacts must be maintained. If a user signals the app that he or she has tested positive, the app successively notifies chained contacts until all connected users have been notified.

Implementing this application requires that state information be maintained for each user (such as the contact list) and updated whenever a message from that user arrives. In addition, when an incoming message signals that a user has tested positive, contact lists for all connected users must be quickly accessed to generate outgoing notifications.

In addition to this basic functionality, it would be highly valuable to compute real-time statistics, such as the number of users reporting positive by region, the average length of connection chains for all contacts, and the percentage of connected users who also report testing positive.

A Conventional Implementation

This application cannot be implemented by most streaming analytics platforms. As illustrated below, it requires standing up a set of cooperating services (encompassing a variety of skills), including:

  • A web service to process incoming messages (including notifications) by making calls to a backend database
  • A database service to host state information for each user
  • A backend analytics application (for example, a Spark app) that extracts information from the database, analyzes it, and exports it for visualization
  • A visualization tool that displays key statistics

Implementing and integrating these services requires significant work. After that, issues of scaling and high availability must be addressed. For example, how do we keep the database service from becoming a bottleneck when the number of users and update rate grow large? Can the web front end process notifications (which involve recursive chaining) without impacting responsiveness? If not, do we need to offload this work to an application server farm? What happens if a web or application server fails while processing notifications?

Enter the Real-Time Digital Twin

The real-time digital twin (RTDT) model was designed to address these challenges and simplify application development and deployment while also tackling scaling, high availability, real-time analytics, and visualization. This model uses standard object-oriented techniques to let developers easily specify both the state information to be maintained for each data source and the code required to process incoming messages from that data source. The rest is automatically handled by the hosting platform, which makes use of scalable, in-memory computing techniques that ensure high performance and availability.

Another big plus of this approach is that the state information for each instance of an RTDT can be immediately analyzed to provide important statistics in real time. Because this live state information is held in memory distributed across of a cluster of servers, the in-memory computing platform can analyze it and report results every few seconds. There’s no need to suffer the delay and complexity of copying it out to a data lake for offline analysis using an analytics platform like Spark.

The following diagram illustrates the steps needed to develop and deploy an RTDT model using the ScaleOut Digital Twin Streaming Service™, which includes built-in connectors for exchanging messages with data sources:

Implementing Contact Self-Tracing Using a Real-Time Digital Twin Model

Let’s take a look at just how easy it is to build an RTDT model for tracking users in the contact self-tracking application. Here’s an example in C# of the state information that needs to be tracked for each user:

public class UserTwin : DigitalTwinBase
{
    public string Alias;
    public string Email;
    public string MobileNumber;
    public Status CurrentStatus;  // Normal, TestedPositive, or Notified
    public int NumHopsIfNotified;
    public List<Contact> Contacts;
    public Dictionary<string, Contact> Notifiers;
}

This simple set of properties is sufficient to track each user. In additional to phone and/or email, each user’s current status (i.e., normal, reporting tested positive, or notified of a contact who has tested positive) is maintained. In case the user is notified by another contact, the number of hops to the connected user who reported tested positive is also recorded to help create statistics. There’s also a list of immediate contacts, which is updated whenever the user reports a risky interaction. Lastly, there’s a dictionary of incoming notifications to help prevent sending out duplicates.

The RTDT stream-processing platform automatically creates an instance of this state object for every user when the an account is created. The user can then send messages to the platform to either record an interaction or report having been tested positive. Here’s the application code required to process these messages:

foreach (var msg in newMessages)
{
    switch (msg.MsgType)
    {
        case MsgType.AddContact:
            newContact = new Contact();
            newContact.Alias = msg.ContactAlias;
            newContact.ContactTime = DateTime.UtcNow;
            dt.Contacts.Add(newContact);
            break;

        case MsgType.SignalPositive:
            dt.CurrentStatus = Status.SignaledPositive;

            // signal all of our contacts that we have tested positive:
            notifyMsg = new UserTwinMessage();
            notifyMsg.Id = dt.Alias;
            notifyMsg.MsgType = MsgType.Notify;
            notifyMsg.ContactAlias = dt.Alias;
            notifyMsg.ContactTime = DateTime.UtcNow;
            notifyMsg.NumHops = 0;

            foreach (var contact in dt.Contacts)
            {
                msgResult = context.SendToTwin("UserTwin", contact.Alias,
                                               notifyMsg);
            }
            break;
}}

Note that when a user signals that he or she has tested positive, this code sends a Notify message to all RTDT instances corresponding to users in the contact list. Handling this message type requires one more case to the above switch statement, as follows:

case MsgType.Notify:
    if (dt.CurrentStatus != Status.SignaledPositive)
        dt.CurrentStatus = Status.Notified;

    // if we have already heard from the root contact, ignore the message:
    if (msg.ContactAlias == dt.Alias || dt.Notifiers.ContainsKey(msg.ContactAlias))
        break;

    // otherwise, add the notifier and signal the user:
    else
    {
        if (dt.NumHopsIfNotified == 0)
            dt.NumHopsIfNotified = msg.NumHops;

        newContact = new Contact();
        newContact.Alias = msg.ContactAlias;
        newContact.ContactTime = msg.ContactTime;
        newContact.NumHops = msg.NumHops + 1;
        dt.Notifiers.Add(msg.ContactAlias, newContact);

        notifyMsg = new UserTwinMessage();
        notifyMsg.Id = dt.Alias;
        notifyMsg.MsgType = MsgType.Notify;
        notifyMsg.ContactAlias = msg.ContactAlias;
        notifyMsg.ContactTime = msg.ContactTime;
        notifyMsg.NumHops = msg.NumHops + 1;
    
        msgResult = context.SendToDataSource(notifyMsg);

        // finally, notify all our contacts except the root if it's a contact:

        foreach (var contact in dt.Contacts)
        {
            if (contact.Alias != msg.ContactAlias)
                msgResult = context.SendToTwin("UserTwin", contact.Alias,
                                               notifyMsg);
        }
    }                                      
    break;

That’s all there is to it. The key point is that the amount of code required to implement this application is small. Compare that to standing up web, application, database, and analytics services. In addition, integrated, real-time analytics within the RTDT platform can examine state variables to easily generate and visualize key statistics. For example, the CurrentStatus property and a Region property (not shown here) can be used to determine the average number of users who have tested positive by region. Likewise, the NumHopsIfNotified property can be used to determine the average number of connections traversed to notify users.

Summing Up
There’s no doubt that it’s a daunting challenge to create streaming analytics applications that track large numbers of data sources and respond individually to each one while simultaneously generating aggregate statistics that help maintain situational awareness. As we have seen, real-time digital twins can cut through this complexity and enable powerful applications to be quickly built with minimal code. This simplicity also makes them “agile” in the sense that they can be easily modified or extended to handle evolving requirements. You can find detailed information here to help you learn more.

Leave a Reply

Your email address will not be published. Required fields are marked *

Try ScaleOut for free

Use the power of in-memory computing in minutes on Windows or Linux.

Try for Free

Not ready to download?
CONTACT US TO LEARN MORE