Articles

Elastic scalable transaction processing in LeanXcale
Information Systems Elsevier, 2022

Ricardo Jimenez-Peris, Diego Burgos, Francisco Ballesteros, Marta Patiño, Patrick Valduriez.

Distributed Database Systems: The Case for NewSQL
Transactions on Large-Scale Data- and Knowledge-Centered Systems, Springer Berlin / Heidelberg, 2021

Patrick Valduriez, Ricardo Jimenez-Peris, M. Tamer Özsu.

BestNeighbor: Efficient Evaluation of kNN Queries on Large Time Series Databases
Knowledge and Information Systems (KAIS), Springer, In press, 2020

Oleksandra Levchenko, Boyan Kolev, Djamel-Edine Yagoubi, Reza Akbarinia, Florent Masseglia, Themis Palpanas, Patrick Valduriez, Dennis Shasha

Parallel Streaming Implementation of Online Time Series Correlation Discovery on Sliding Windows with Regression Capabilities
CLOSER 2019: International Conference on Cloud Computing and Services Science

Boyan Kolev, Reza Akbarinia, Ricardo Jimenez-Peris, Oleksandra Levchenko, Florent Masseglia, Marta Patino, Patrick Valduriez

Distributed Algorithms to Find Similar Time Series
ECML-PKDD 2019: European Conference on Machine Learning and Knowledge Discovery in Databases

Oleksandra Levchenko, Boyan Kolev, Djamel-Edine Yagoubi, Dennis Shasha, Themis Palpanas, Patrick Valduriez, Reza Akbarinia, Florent Masseglia

Parallel Polyglot Query Processing on Heterogeneous Cloud Data Stores with LeanXcale
IEEE BigData 2018: IEEE International Conference on Big Data

Boyan Kolev, Oleksandra Levchenko, Esther Pacitti, Patrick Valduriez, Ricardo Vilaça, Rui Gonçalves, Ricardo Jiménez-Peris, Pavlos Kranas

Highly Scalable Real-Time Analytics with CloudDBAppliance
XLDB 2017: Extremely Large Databases Conference

Boyan Kolev, Oleksandra Levchenko, Florent Masseglia, Reza Akbarinia, Esther Pacitti, Patrick Valduriez

CloudMdsQL: Querying Heterogeneous Cloud Data Stores with a Common Language
Distributed and Parallel Databases, Springer, 2016

Boyan Kolev, Patrick Valduriez, Carlyna Bondiombouy, Ricardo Jiménez-Peris, Raquel Pau, José Pereira

Design and Implementation of the CloudMdsQL Multistore System
CLOSER 2016: International Conference on Cloud Computing and Services Science

Boyan Kolev, Carlyna Bondiombouy, Oleksandra Levchenko, Patrick Valduriez, Ricardo Jimenez-Péris, Raquel Pau, Jose Pereira

Benchmarking Polystores: the CloudMdsQL Experience
Workshop on Methods to Manage Heterogeneous Big Data and Polystore Databases

Boyan Kolev, Raquel Pau, Oleksandra Levchenko, Patrick Valduriez, Ricardo Jiménez-Peris, José Pereira

The CloudMdsQL Multistore System
ACM SIGMOD 2016: International Conference on the Management of Data

Boyan Kolev, Carlyna Bondiombouy, Patrick Valduriez, Ricardo Jiménez-Peris, Raquel Pau, José Pereira

StreamCloud: An Elastic and Scalable Data Streaming System
IEEE Transactions on Parallel and Distributed Systems, 2012

Vincenzo Gulisano, Ricardo Jimenez-Peris, Marta Patino-Martínez, Claudio Soriente, Patrick Valduriez

Presentations

Innovation : startup strategies

Stratégies d’Innovation – 12ème edition, Marcusevans France

Online, France, 19-20 May 2021

by Patrick Valduriez

Abstract:
Technological innovation as driven by startups is hard to formalize (and manage) as the context may be unknown or quickly changing. To be successful, the innovation process involves not only inventions (new methods) but also context, e.g. user behavior, and timing, e.g. market readiness. In this talk, I illustrate various innovation strategies based on startup success stories, in particular LeanXcale, which delivers a new generation HTAP DBMS product. I also give hints to promote innovation within startups.

Download the PDF file

NewSQL: principles, systems and current trends

Tutorial, IEEE Big Data 2019, Los Angeles, USA, 12 December 2019

by Patrick Valduriez and Ricardo Jimenez-Peris

Abstract:
NewSQL is the latest technology in the big data management landscape, enjoying a fast growing rate in the DBMS and BI markets. NewSQL combines the scalability and availability of NoSQL with the consistency and usability of SQL. By providing online analytics over operational data, NewSQL opens up new opportunities in many application domains where real-time decision is critical. Important use cases are Google Adwords, proximity marketing, real-time pricing, risk monitoring, real-time fraud detection, etc. NewSQL may also simplify data management, by removing the traditional separation between operational database and data warehouse / data lake (no more ETLs!). However, a hard problem is scaling out transactions in mixed operational and analytical (HTAP) workloads over big data, possibly coming from different data stores (HDFS, SQL, NoSQL). Today, only a few NewSQL systems have solved this problem. This tutorial provides an in-depth presentation of NewSQL, with its principles, architectures and techniques. We provide a taxonomy of NewSQL systems based on major dimensions including targeted workloads, capabilities and implementation techniques. We illustrate with popular NewSQL systems such as Google F1/Spanner, LeanXcale, CockroachDB, SAP HANA, MemSQL and Splice Machine. In particular, we give a spotlight on some of the more advanced systems. We also compare with major NoSQL and SQL systems, and discuss integration within big data ecosystems and corporate information systems. Finally, we discuss the current trends and research directions.

Download the PDF file

Principles of Distributed Database Systems: spotlight on NewSQL

Tutorial, SBBD 2020: Brazilian Symposium on Databases, Rio de Janeiro, Brazil (virtual), 29 September 2020

by Patrick Valduriez

Abstract:
The first edition of the book Principles of Distributed Database Systems, co-authored with Prof. Tamer Özsu (University of Waterloo) appeared in 1991 when the technology was new and there were not too many products. In the Preface to the first edition, we had quoted Michael Stonebraker who claimed in 1988 that in the following 10 years, centralized DBMSs would be an “antique curiosity” and most organizations would move towards distributed DBMSs. That prediction has certainly proved to be correct, and most systems in use today are either distributed or parallel. The fourth edition of this classic textbook provides major updates, in particular, new chapters on big data platforms, NoSQL, NewSQL and polystores. In this tutorial, we introduce these major updates, with a focus on NewSQL. A first in-depth presentation of NewSQL was given in a tutorial at IEEE Big Data 2019 with Prof. Ricardo Jimenez-Peris (CEO and founder at LeanXcale). In this tutorial, I provide a taxonomy of NewSQL systems based on major dimensions including targeted workloads, capabilities and implementation techniques. I illustrate with popular NewSQL systems such as Google Spanner, LeanXcale, CockroachDB, SAP HANA, MemSQL and Splice Machine. Finally, I discuss the current trends and research directions.

Download the PDF file

Distributed Database Systems: the case for NewSQL

CWI Lectures on Database Research, Amsterdam, Netherlands (virtual), 19 November 2020.

by Patrick Valduriez

Abstract:
NewSQL is the latest technology in the big data management landscape, enjoying a fast-growing rate in the DBMS and BI markets. NewSQL combines the scalability and availability of NoSQL with the consistency and usability of SQL. By blending capabilities only available in different kinds of database systems such as fast data ingestion and SQL queries and by providing online analytics over operational data, NewSQL opens up new opportunities in many application domains where real-time decision is critical. Important use cases are eAdvertisement (such as Google Adwords), IoT, performance monitoring, proximity marketing, risk monitoring, real-time pricing, real-time fraud detection, etc. NewSQL may also simplify data management, by removing the traditional separation between NoSQL and SQL (ingest data fast, query it with SQL), as well as between operational database and data warehouse / data lake (no more ETLs!). However, a hard problem is scaling out transactions in mixed operational and analytical (HTAP) workloads over big data, possibly coming from different data stores (HDFS, SQL, NoSQL). Today, only a few NewSQL systems have solved this problem. In this talk, I introduce the solution for scalable transaction and polystore data management in LeanXcale, a recent NewSQL DBMS.

Download the PDF file