CEDAR team, INRIA and LIX

CEDAR team, INRIA and LIX CEDAR is a joint team between Inria Saclay and LIX (CNRS – UMR 7161 and Ecole Polytechnique), focused on rich data analytics at cloud scale. Dresden, U. Bolzano.

CEDAR research is organized along two main themes:

- We seek to exploit parallel data processing infrastructures for highly scalable, parallel Big Data storage and processing tools. In this area, we investigate several topics, including: heterogeneous hybrid stores, efficient query answering, and algorithms for scalable, fast data analytics.

- To enhance the usefulness of Big Data, we study new

paradigms of user interaction with Big Data, based on exploratory querying, analytics for semantic graphs, and intuitive query tools over highly heterogeneous data. We collaborate with the INRIA teams: ILDA@Saclay, TYREX@Grenoble, Graphik and Zenith@Montpellier, and also with colleagues from U. Rennes 1, Ecole Polytechnique, Télécom ParisTech, LIMSI (U. Paris Sud), CentraleSupéléc, and INSA Lyon. International partners include UC San Diego, AT&T Research, U. Current industrial partners include SemSoft, Business & Decision and Le Monde. The INRIA predecessor of CEDAR is the OAK project-team, joint between INRIA and Université de Paris Sud (2012-2015).

Our former and current PhD students, Julien Leblay and Maxime Buron, at   in Singapore
21/11/2017

Our former and current PhD students, Julien Leblay and Maxime Buron, at in Singapore

25/08/2017

The poster paper “A Framework for Efficient Representative Summarization of RDF Graphs” by Šejla Čebirić, François Goasdoué and Ioana Manolescu has been accepted at (International Semantic Web Conference)

25/08/2017

The demo paper “Dagger: Digging for Interesting Aggregates in RDF Graphs” by Yanlei Diao, Ioana Manolescu and Shu Shang has been accepted at (International Semantic Web Conference).

11/04/2017

[CEDAR Seminar] Šejla Čebirić will give a talk on Friday, April 14, at 2 pm, in the Turing building, room Grace Hopper.

Title:
Structural and Semantic Summarization of RDF Graphs

Abstract:
The Resource Description Framework (RDF) is the W3C’s graph data model for Semantic Web applications. RDF graphs are often large and heterogeneous, thus users may have a hard time getting familiar with the structure and semantics of a graph.
We consider the problem of building automatically, with no user input, compact RDF graph summaries which represent the complete structure and semantics of the graph, and are representative and accurate for a large useful dialect of SPARQL. We build out of an RDF graph, a summary of its explicit and implicit data (the latter is due to RDF semantic constraints); a summary for which this is possible is termed complete. Four novel summaries are introduced and it is shown that two of them are complete. Further, we provide a sufficient condition for RDF summarization completeness, and show that bisimulation summaries, previously studied, satisfy this condition.

20/03/2017

The paper “Extracting Linked Data from statistic spreadsheets” by Tien-Duc Cao, Ioana Manolescu and Xavier Tannier has been accepted at the International Workshop on Semantic Big Data in conjunction with the ACM Conference 2017.

09/12/2016

[CEDAR Seminar] Yannis Papakonstantinou will give a talk this Monday at 11 am, Turing Building, Flowers room.

Title:
The SQL++ Query Language: Support for native JSON, while backwards-compatible with SQL

Speaker:
Yannis Papakonstantinou, Prof. Computer Science and Engineering, UCSD
www.db.ucsd.edu/people/yannis.htm

Abstract:
SQL-on-Hadoop, NewSQL and NoSQL databases provide semi-structured data models (typically JSON-based). They now drive towards declarative, SQL-alike query languages. However, their idiomatic, non-SQL language constructs, the many variations and the lack of formal syntax and semantics pose problems. Notably, database vendors end up with unclear semantics and complicated implementations, as they add one feature at-a-time.

The presented SQL++ semi-structured data model bridges JSON and the SQL data model. The SQL++ query language is backwards compatible with SQL, while supporting native JSON. We show that a relatively small set of SQL restriction removals and feature additions is enough to provide a SQL-compatible extension to semistructured data. SQL++ is currently being adopted by the industry.

The extension to Configurable SQL++ includes configuration options that describe different options of language semantics and formally capture the variations of existing database languages. Configurable SQL++ is unifying: By appropriate choices of configuration options, the Configurable SQL++ semantics can morph into the semantics of any of eleven popular semistructured databases, which we surveyed, as the experimental validation shows. In this way, Configurable SQL++ allows a formal characterization of the capabilities of the emerging query languages.

Bio:

Yannis Papakonstantinou is a Professor of Computer Science and Engineering at the University of California, San Diego. His research is in the intersection of data management technologies and the web, where he has published over ninety five research articles and received over 13,000 citations. He has given multiple tutorials and invited talks, has served on journal editorial boards and has chaired and participated in program committees for many international conferences and workshops. He also teaches for UCSD’s Master of Advanced Studies in Data Science.

18/11/2016

[CEDAR Seminar] Stanislav Kikot and Roman Kontchakov from Birbeck College, London, will give a talk on "Ontology-based data access via query rewriting: theory and practice".

When: Friday, November 25, 10 am
Where: Salle Thomas Flowers, Inria Turing Building

Abstract:

The talk consists of two parts. In the first part, we present our recent theoretical results on the computational complexity of answering OWL 2 QL ontology-mediated queries (OMQs) with tree-shaped and bounded treewidth conjunctive queries (CQs) by means of query rewriting. In particular, we show that OMQs with bounded-depth ontologies have nonrecursive datalog (NDL) rewritings that can be constructed and evaluated in LOGCFL for combined complexity, even in NL if their CQs are tree-shaped with a bounded number of leaves. For OMQs with arbitrary ontologies and bounded-leaf CQs, NDL- rewritings are constructed and evaluated in LOGCFL. On the other hand, we show that answering OMQs with tree-shaped CQs is not fixed-parameter tractable if the ontology depth or the number of leaves in the CQs is regarded as the parameter, and that answering OMQs with a fixed ontology (of infinite depth) is NP-complete for tree-shaped and LOGCFL for bounded-leaf CQs.

In the second part, we report on our practical experience with the ontology-based data access platform Ontop. We concentrate on answering OMQs over the NPD FactPages ontology and argue that, for such real-world OMQs, the sources of theoretical intractability do not play a significant role in computing rewritings. On the other hand, compiling a large part of the ontology into mappings (known as T-mappings) with various optimisation techniques based, in particular, on the use of integrity constraints in the source database, allows Ontop to produce efficient SQL queries over the database.

18/11/2016

Yanlei Diao's ERC Consolidator proposal:

"Charting a New Horizon of Big and Fast Data Analysis through Integrated Algorithm Design" has been accepted by the EU.

Duc's team wins Start-up contest at StartupWeekendParis. Congrats!
09/11/2016

Duc's team wins Start-up contest at StartupWeekendParis. Congrats!

PigReuse: A Reuse-based Optimizer for Pig Latin has been presented by Jesús Camacho-Rodríguez at
28/10/2016

PigReuse: A Reuse-based Optimizer for Pig Latin has been presented by Jesús Camacho-Rodríguez at

17/10/2016

“XStream: Explaining Anomalies in Event Stream Monitoring”, by Haopeng Zhang, Yanlei Diao, and Alexandra Meliou, has been accepted at EDBT 2017.

10/10/2016

[CEDAR Seminar] Antoine Amarilli from Télécom Paris will give a talk on Query Answering with Guarded Logics and Expressive Constraints.

When: Thursday, October 20, 2 pm
Where: Salle Thomas Flowers, Inria Turing Building

Abstract:

The query answering problem, also called entailment or certain answer problem, is a fundamental reasoning problem in knowledge representation and databases. It asks whether a conjunctive query is always certain given incomplete data and logical constraints, i.e., whether all completions of the data that satisfy the constraints must also satisfy the query; it amounts to the negation of the satisfiability problem for the data, the constraints, and the negation of the query.

While this problem is undecidable for general TGDs or FO constraints, existing work has studied logical fragments for which it is decidable and has lower complexity, in particular the guarded paradigm, i.e., (frontier-)guarded TGDs and the guarded (negation) fragment of FO, based on restricting the shape of rules and of quantification. Yet, the guardedness paradigm cannot express many constraints that are important in practice to reason about data: transitivity (e.g., reachability), number restrictions (e.g., functional dependencies), and order relations (e.g., on numbers).

This talk will review our work (presented at IJCAI'15 and IJCAI'16) about extending guarded rules and logics to express such constraints, and its effect on the decidability and complexity of QA.

Adresse

Palaiseau
91120

Heures d'ouverture

Lundi 07:00 - 20:00
Mardi 07:00 - 20:00
Mercredi 07:00 - 20:00
Jeudi 07:00 - 20:00
Vendredi 07:00 - 20:00

Notifications

Soyez le premier à savoir et laissez-nous vous envoyer un courriel lorsque CEDAR team, INRIA and LIX publie des nouvelles et des promotions. Votre adresse e-mail ne sera pas utilisée à d'autres fins, et vous pouvez vous désabonner à tout moment.

Partager