Streaming saturation for large RDF graphs with dynamic schema information (DBPL 2019)

Sat 22 - Wed 26 June 2019 Phoenix, Arizona, United States

Who

Mohammad Amin Farvardin, Dario Colazzo, Khalid Belhajjame, Carlo Sartiani

Track

DBPL 2019

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 23 Jun 2019 11:20 - 11:40 at 106C - Graphs and Streams

Abstract

In the Big Data era, RDF data, just like other kinds of data, is produced in high volumes. While there exist proposals for reasoning over large RDF graphs using big data platforms, there is a dearth of solutions that do so in environments where RDF data is dynamic, and where new instance and schema triples can arrive at any time. With this in mind, we present in this work the first solution for reasoning over large streams of RDF data using big data platforms. In doing so, we focus on the saturation operation. Unlike existing solutions which saturate RDF data in bulk, our solution carefully identifies the subset of the existing (and already saturated) RDF dataset that needs to be considered given the RDF statements that have recently delivered by the stream. Thereby, it performs the saturation in an incremental manner. The experimental analysis that we performed shows that our solution outperforms existing bulk-based saturation solutions, which we use as a baseline.

Mohammad Amin Farvardin

PSL, Université Paris-Dauphine, LAMSADE

Dario Colazzo

Khalid Belhajjame