Articles
- Elastic scalable transaction processing in LeanXcale
Information Systems Elsevier, 2022
Ricardo Jimenez-Peris, Diego Burgos, Francisco Ballesteros, Marta Patiño, Patrick Valduriez.
- Distributed Database Systems: The Case for NewSQL
Transactions on Large-Scale Data- and Knowledge-Centered Systems, Springer Berlin / Heidelberg, 2021
Patrick Valduriez, Ricardo Jimenez-Peris, M. Tamer Özsu.
- BestNeighbor: Efficient Evaluation of kNN Queries on Large Time Series Databases
Knowledge and Information Systems (KAIS), Springer, In press, 2020
Oleksandra Levchenko, Boyan Kolev, Djamel-Edine Yagoubi, Reza Akbarinia, Florent Masseglia, Themis Palpanas, Patrick Valduriez, Dennis Shasha
- Parallel Streaming Implementation of Online Time Series Correlation Discovery on Sliding Windows with Regression Capabilities
CLOSER 2019: International Conference on Cloud Computing and Services Science
Boyan Kolev, Reza Akbarinia, Ricardo Jimenez-Peris, Oleksandra Levchenko, Florent Masseglia, Marta Patino, Patrick Valduriez
- Distributed Algorithms to Find Similar Time Series
ECML-PKDD 2019: European Conference on Machine Learning and Knowledge Discovery in Databases
Oleksandra Levchenko, Boyan Kolev, Djamel-Edine Yagoubi, Dennis Shasha, Themis Palpanas, Patrick Valduriez, Reza Akbarinia, Florent Masseglia
- Parallel Polyglot Query Processing on Heterogeneous Cloud Data Stores with LeanXcale
IEEE BigData 2018: IEEE International Conference on Big Data
Boyan Kolev, Oleksandra Levchenko, Esther Pacitti, Patrick Valduriez, Ricardo Vilaça, Rui Gonçalves, Ricardo Jiménez-Peris, Pavlos Kranas
- Highly Scalable Real-Time Analytics with CloudDBAppliance
XLDB 2017: Extremely Large Databases Conference
Boyan Kolev, Oleksandra Levchenko, Florent Masseglia, Reza Akbarinia, Esther Pacitti, Patrick Valduriez
- CloudMdsQL: Querying Heterogeneous Cloud Data Stores with a Common Language
Distributed and Parallel Databases, Springer, 2016
Boyan Kolev, Patrick Valduriez, Carlyna Bondiombouy, Ricardo Jiménez-Peris, Raquel Pau, José Pereira
- Design and Implementation of the CloudMdsQL Multistore System
CLOSER 2016: International Conference on Cloud Computing and Services Science
Boyan Kolev, Carlyna Bondiombouy, Oleksandra Levchenko, Patrick Valduriez, Ricardo Jimenez-Péris, Raquel Pau, Jose Pereira
- Benchmarking Polystores: the CloudMdsQL Experience
Workshop on Methods to Manage Heterogeneous Big Data and Polystore Databases
Boyan Kolev, Raquel Pau, Oleksandra Levchenko, Patrick Valduriez, Ricardo Jiménez-Peris, José Pereira
- The CloudMdsQL Multistore System
ACM SIGMOD 2016: International Conference on the Management of Data
Boyan Kolev, Carlyna Bondiombouy, Patrick Valduriez, Ricardo Jiménez-Peris, Raquel Pau, José Pereira
- StreamCloud: An Elastic and Scalable Data Streaming System
IEEE Transactions on Parallel and Distributed Systems, 2012
Vincenzo Gulisano, Ricardo Jimenez-Peris, Marta Patino-Martínez, Claudio Soriente, Patrick Valduriez
Presentations
Innovation : startup strategies
Stratégies d’Innovation – 12ème edition, Marcusevans France
Online, France, 19-20 May 2021
by Patrick Valduriez
Abstract:
Technological innovation as driven by startups is hard to formalize (and manage) as the context may be unknown or quickly changing. To be successful, the innovation process involves not only inventions (new methods) but also context, e.g. user behavior, and timing, e.g. market readiness. In this talk, I illustrate various innovation strategies based on startup success stories, in particular LeanXcale, which delivers a new generation HTAP DBMS product. I also give hints to promote innovation within startups.
Technological innovation as driven by startups is hard to formalize (and manage) as the context may be unknown or quickly changing. To be successful, the innovation process involves not only inventions (new methods) but also context, e.g. user behavior, and timing, e.g. market readiness. In this talk, I illustrate various innovation strategies based on startup success stories, in particular LeanXcale, which delivers a new generation HTAP DBMS product. I also give hints to promote innovation within startups.
NewSQL: principles, systems and current trends
Tutorial, IEEE Big Data 2019, Los Angeles, USA, 12 December 2019
by Patrick Valduriez and Ricardo Jimenez-Peris
Abstract:
NewSQL is the latest technology in the big data management landscape, enjoying a fast growing rate in the DBMS and BI markets. NewSQL combines the scalability and availability of NoSQL with the consistency and usability of SQL. By providing online analytics over operational data, NewSQL opens up new opportunities in many application domains where real-time decision is critical. Important use cases are Google Adwords, proximity marketing, real-time pricing, risk monitoring, real-time fraud detection, etc. NewSQL may also simplify data management, by removing the traditional separation between operational database and data warehouse / data lake (no more ETLs!). However, a hard problem is scaling out transactions in mixed operational and analytical (HTAP) workloads over big data, possibly coming from different data stores (HDFS, SQL, NoSQL). Today, only a few NewSQL systems have solved this problem. This tutorial provides an in-depth presentation of NewSQL, with its principles, architectures and techniques. We provide a taxonomy of NewSQL systems based on major dimensions including targeted workloads, capabilities and implementation techniques. We illustrate with popular NewSQL systems such as Google F1/Spanner, LeanXcale, CockroachDB, SAP HANA, MemSQL and Splice Machine. In particular, we give a spotlight on some of the more advanced systems. We also compare with major NoSQL and SQL systems, and discuss integration within big data ecosystems and corporate information systems. Finally, we discuss the current trends and research directions.
NewSQL is the latest technology in the big data management landscape, enjoying a fast growing rate in the DBMS and BI markets. NewSQL combines the scalability and availability of NoSQL with the consistency and usability of SQL. By providing online analytics over operational data, NewSQL opens up new opportunities in many application domains where real-time decision is critical. Important use cases are Google Adwords, proximity marketing, real-time pricing, risk monitoring, real-time fraud detection, etc. NewSQL may also simplify data management, by removing the traditional separation between operational database and data warehouse / data lake (no more ETLs!). However, a hard problem is scaling out transactions in mixed operational and analytical (HTAP) workloads over big data, possibly coming from different data stores (HDFS, SQL, NoSQL). Today, only a few NewSQL systems have solved this problem. This tutorial provides an in-depth presentation of NewSQL, with its principles, architectures and techniques. We provide a taxonomy of NewSQL systems based on major dimensions including targeted workloads, capabilities and implementation techniques. We illustrate with popular NewSQL systems such as Google F1/Spanner, LeanXcale, CockroachDB, SAP HANA, MemSQL and Splice Machine. In particular, we give a spotlight on some of the more advanced systems. We also compare with major NoSQL and SQL systems, and discuss integration within big data ecosystems and corporate information systems. Finally, we discuss the current trends and research directions.
Principles of Distributed Database Systems: spotlight on NewSQL
Tutorial, SBBD 2020: Brazilian Symposium on Databases, Rio de Janeiro, Brazil (virtual), 29 September 2020
by Patrick Valduriez
Abstract:
The first edition of the book Principles of Distributed Database Systems, co-authored with Prof. Tamer Özsu (University of Waterloo) appeared in 1991 when the technology was new and there were not too many products. In the Preface to the first edition, we had quoted Michael Stonebraker who claimed in 1988 that in the following 10 years, centralized DBMSs would be an “antique curiosity” and most organizations would move towards distributed DBMSs. That prediction has certainly proved to be correct, and most systems in use today are either distributed or parallel. The fourth edition of this classic textbook provides major updates, in particular, new chapters on big data platforms, NoSQL, NewSQL and polystores. In this tutorial, we introduce these major updates, with a focus on NewSQL. A first in-depth presentation of NewSQL was given in a tutorial at IEEE Big Data 2019 with Prof. Ricardo Jimenez-Peris (CEO and founder at LeanXcale). In this tutorial, I provide a taxonomy of NewSQL systems based on major dimensions including targeted workloads, capabilities and implementation techniques. I illustrate with popular NewSQL systems such as Google Spanner, LeanXcale, CockroachDB, SAP HANA, MemSQL and Splice Machine. Finally, I discuss the current trends and research directions.
The first edition of the book Principles of Distributed Database Systems, co-authored with Prof. Tamer Özsu (University of Waterloo) appeared in 1991 when the technology was new and there were not too many products. In the Preface to the first edition, we had quoted Michael Stonebraker who claimed in 1988 that in the following 10 years, centralized DBMSs would be an “antique curiosity” and most organizations would move towards distributed DBMSs. That prediction has certainly proved to be correct, and most systems in use today are either distributed or parallel. The fourth edition of this classic textbook provides major updates, in particular, new chapters on big data platforms, NoSQL, NewSQL and polystores. In this tutorial, we introduce these major updates, with a focus on NewSQL. A first in-depth presentation of NewSQL was given in a tutorial at IEEE Big Data 2019 with Prof. Ricardo Jimenez-Peris (CEO and founder at LeanXcale). In this tutorial, I provide a taxonomy of NewSQL systems based on major dimensions including targeted workloads, capabilities and implementation techniques. I illustrate with popular NewSQL systems such as Google Spanner, LeanXcale, CockroachDB, SAP HANA, MemSQL and Splice Machine. Finally, I discuss the current trends and research directions.
Distributed Database Systems: the case for NewSQL
CWI Lectures on Database Research, Amsterdam, Netherlands (virtual), 19 November 2020.
by Patrick Valduriez
Abstract:
NewSQL is the latest technology in the big data management landscape, enjoying a fast-growing rate in the DBMS and BI markets. NewSQL combines the scalability and availability of NoSQL with the consistency and usability of SQL. By blending capabilities only available in different kinds of database systems such as fast data ingestion and SQL queries and by providing online analytics over operational data, NewSQL opens up new opportunities in many application domains where real-time decision is critical. Important use cases are eAdvertisement (such as Google Adwords), IoT, performance monitoring, proximity marketing, risk monitoring, real-time pricing, real-time fraud detection, etc. NewSQL may also simplify data management, by removing the traditional separation between NoSQL and SQL (ingest data fast, query it with SQL), as well as between operational database and data warehouse / data lake (no more ETLs!). However, a hard problem is scaling out transactions in mixed operational and analytical (HTAP) workloads over big data, possibly coming from different data stores (HDFS, SQL, NoSQL). Today, only a few NewSQL systems have solved this problem. In this talk, I introduce the solution for scalable transaction and polystore data management in LeanXcale, a recent NewSQL DBMS.
NewSQL is the latest technology in the big data management landscape, enjoying a fast-growing rate in the DBMS and BI markets. NewSQL combines the scalability and availability of NoSQL with the consistency and usability of SQL. By blending capabilities only available in different kinds of database systems such as fast data ingestion and SQL queries and by providing online analytics over operational data, NewSQL opens up new opportunities in many application domains where real-time decision is critical. Important use cases are eAdvertisement (such as Google Adwords), IoT, performance monitoring, proximity marketing, risk monitoring, real-time pricing, real-time fraud detection, etc. NewSQL may also simplify data management, by removing the traditional separation between NoSQL and SQL (ingest data fast, query it with SQL), as well as between operational database and data warehouse / data lake (no more ETLs!). However, a hard problem is scaling out transactions in mixed operational and analytical (HTAP) workloads over big data, possibly coming from different data stores (HDFS, SQL, NoSQL). Today, only a few NewSQL systems have solved this problem. In this talk, I introduce the solution for scalable transaction and polystore data management in LeanXcale, a recent NewSQL DBMS.