1
OrientDB Manual - version 2.0 This documentation is also available in PDF format. Past releases: v1.7.8 Welcome to OrientDB - the first Multi-Model Open Source NoSQL DBMS that brings together the power of graphs and the flexibility of documents into one scalable, highperformance operational database. OrientDB is sponsored by Orient Technologies, LTD. OrientDB has features of both Document and Graph DBMSs. Written in Java and designed to be exceptionally fast: it can store up to 220,000 records per second on common hardware. Not only can it embed documents like any other Document database, but it manages relationships like Graph Databases with direct connections among records. You can traverse parts of or entire trees and graphs of records in a few milliseconds. OrientDB supports schema-less, schema-full and schema-mixed modes and it has a strong security profiling system based on users and roles. Thanks to the SQL layer, OrientDB query language is straightforward and easy to use, especially for those skilled in the relational DBMS world. To learn some key advantages of using OrientDB, take a look at Why OrientDB?
2
Is OrientDB really FREE? OrientDB is free for any use including commercial with its Apache 2 Open Source License. Orient Technologies, the company behind OrientDB, offers optional Professional Services such as Developer and Production Support, Training and Consultancy with transparent and competitive pricing. These options are available to ensure you’re maximizing OrientDB’s capabilities for your particular needs and use case.
3
OrientDB Community Start learning about OrientDB with the OrientDB Manual. For any questions, visit the OrientDB Community Group. Need help? Go to Online Support. Do you want to hear about OrientDB at a conference or meetup? Take a look at Events.
4
Need Further Assistance? If you have any questions or need assistance with OrientDB, please let us know. Check out our Get in Touch page for different ways of getting in touch with us.
Every effort has been made to ensure the accuracy of this manual. However, Orient Technologies, LTD. makes no warranties with respect to this documentation and disclaims any implied warranties of merchantability and fitness for a particular purpose. The information in this document is subject to change without notice.
5
Getting Started In the last few years, there's been an explosion of many NoSQL solutions and products. The meaning of the word "NoSQL" is not a campaign against the SQL language. In fact, OrientDB allows for SQL syntax! NoSQL is probably best described by the following: NoSQL, meaning "not only SQL", is a movement encouraging developers and business people to open their minds and consider new possibilities beyond the classic relational approach to data persistence. Alternatives to relational database management systems have existed for many years, but they have been relegated primarily to niche use cases such as telecommunications, medicine, CAD and others. Interest in NoSQL alternatives like OrientDB is increasing dramatically. Not surprisingly many of the largest web companies like Google, Amazon, Facebook, Foursquare, and Twitter are using NoSQL based solutions in production environments. What motivates companies to leave the comfort of a well established relational database world? It all surrounds the fact that you can better solve today's data problems with a modern database. Specifically, there are a few key areas: Performance Scalability (often extreme) Smaller footprint Developer productivity Schema flexibility Most of these areas also happen to be requirements of modern web applications. A few years ago, developers designed systems that could handle hundreds of concurrent users. Today it is not uncommon to have a potential target of thousands or millions of users connected and served at the same time. Changing technology requirements have been taken into account on the application front by creating frameworks, introducing standards and leveraging best practices. However, in the database world, the situation has remained more or less the same for over 30 years. From the 1970s until recently, relational DBMSs have played the dominant role. Programming languages and methodologies have evolved, but the concept of data persistence and DBMS have remained unchanged for the most part: tables, records and joins.
6
NoSQL Models NoSQL-based solutions in general provide a powerful, scalable, and flexible way to solve data needs and use cases which have previously been managed by relational databases. To summarize the NoSQL options, we'll examine the most common models or categories: Key / Value databases: data model is reduced to a simple hash table which consists of key / value pairs. It is often easily distributed across multiple servers. The most recognized products of this group include Redis, Dynamo, and Riak. Column-oriented databases: data is stored in sections of columns offering more flexibility and easy aggregation. Facebook's Cassandra, Google's BigTable, and Amazon's SimpleDB are some examples of column-oriented databases. Document databases: data model consists of document collections where individual documents can have multiple fields without necessarily defining a schema. The best known products of this group are MongoDB and CouchDB. Graph databases: domain model consists of vertices interconnected by edges creating a rich graph structure. The best known products of this group are OrientDB and Neo4j. OrientDB is a document-graph database, meaning it has full native graph capabilities coupled with features normally only found in document databases. Each of these categories or models has its own peculiarities, strengths and limitations. There is not a single category or model which is better than all the others, however certain types of databases are better at solving specific problems. This leads to the motto of NoSQL: choose the best tool for your specific use case. The goal of Orient Technologies in building OrientDB was to create a robust, highly performant database that can perform optimally in the widest possible set of use cases. Our product is designed to be the best "go to" solution for data persistence. In the following parts of this tutorial, we will look closely at OrientDB, the original open-source, multi-model, next generation NoSQL product.
7
Multi-Model The OrientDB engine supports Graph, Document, Key/Value, and Object models, so you can use OrientDB as a replacement for a product in any of these categories. However the main reason why users choose OrientDB is its ability to act as a true MultiModel DBMS by combining all the features of the four models into one. These are not just interfaces to the database engine, but the engine, itself, was built to support all four models. This is also the main difference with other DBMSs that claim to be Multi-Model since they just implement an additional layer with an API that mimics additional models, but under the hood they're truly one model therefore limiting speed and scalability.
8
Document Model The data in this model is stored inside documents. A document is a set of key/value pairs (also referred to as fields or properties) where a key allows access to its value. Values can hold primitive data types, embedded documents, or arrays of other values. Documents are not typically forced to have a schema which can be a benefit because they remain flexible and easy to modify. Documents are stored in collections enabling developers to group data as they decide. OrientDB uses the concepts of "classes" and "clusters" instead of "collections" for grouping documents. This provides several benefits that we will discuss in further sections of the documentation. OrientDB's Document model adds the concept of a "LINK" as a relationship between documents. With OrientDB you can decide whether to embed documents or link to them directly. When you fetch the document all the links are automatically resolved by OrientDB. This is the most important difference with any other Document Database like MongoDB. The table below illustrates the comparison between the relational model, the document model, and the OrientDB document model: Relational Model
Document Model
OrientDB Document Model
Table
Collection
Class or Cluster
Row
Document
Document
Column
Key/value pair
Document field
Relationship
not available
Link
9
Graph Model A graph represents a network-like structure consisting of Vertices (also known as Nodes) interconnected by Edges (also known as Arcs). OrientDB's graph model is represented by the concept of a property graph, which defines the following: Vertex - an entity that can be linked with other Vertices and has the following mandatory properties: unique identifier set of incoming Edges set of outgoing Edges Edge - an entity that links two Vertices and has the following mandatory properties: unique identifier link to incoming Vertex (also known as head) link to outgoing Vertex (also known as tail) label that defines the type of connection/relationship between head and tail vertex In addition to mandatory properties, each vertex or edge can also hold a set of custom properties. These properties can be defined by users, which can make vertices and edges appear similar to documents. In the table below, you can find a comparison between the graph model, the relational data model, and the OrientDB graph model: Relational Model
Graph Model
OrientDB Graph Model
Table
Vertex and Edge Class
Class that extends "V" (for Vertex) and "E" (for Edges)
Row
Vertex
Vertex
Column
Vertex and Edge property
Vertex and Edge property
Relationship
Edge
Edge
10
Key/Value Model This is the simplest model of the three. Everything in the database can be reached by a key, where the values can be simple and complex types. OrientDB supports Documents and Graph Elements as values allowing a richer model than the classic Key/Value one. The Key/Value model provides "buckets" to group key/value pairs in different containers. The most classic use cases with Key/Value Model are: POST the value as payload of the HTTP call -> // GET the value as payload from the HTTP call -> // DELETE the value by Key, by calling the HTTP call -> // The table below illustrates the comparison between the relational model, the Key/Value model, and the OrientDB Key/Value model: Relational Model
Key/Value Model
OrientDB Key/Value Model
Table
Bucket
Class or Cluster
Row
Key/Value pair
Document
Column
not available
Document field or Vertex/Edge property
Relationship
not available
Link
11
Object Model This model has been inherited by Object Oriented programming and supports Inheritance between types (sub-types extends the super-types), Polymorphism when you refer to a base class, and Direct binding from/to Objects used in programming languages. The table below illustrates the comparison between the relational model, the Object model, and the OrientDB Object model: Relational Model
Object Model
OrientDB Object Model
Table
Class
Class or Cluster
Row
Object
Document or Vertex
Column
Object property
Document field or Vertex/Edge property
Relationship
Pointer
Link
12
Installation OrientDB is available in two editions: Community Edition This edition is released as an open source project under the Apache 2 license. This license allows unrestricted free usage for both open source and commercial projects. Enterprise Edition OrientDB Enterprise edition is commercial software built on top of the Community Edition. Enterprise is developed by the same team that developed the OrientDB engine. It serves as an extension of the Community Edition by providing Enterprise features such as: Query Profiler Distributed Clustering configuration Metrics Recording Live Monitoring with configurable Alerts An Enterprise Edition license is included without charge if you purchase Support.
Prerequisites Both editions run on every operating system that has an implementation of the Java Virtual Machine (JVM), for example: All Linux distributions, including ARM (Raspberry Pi, etc.) Mac OS X Microsoft Windows from 95/NT or later Solaris HP-UX IBM AIX This means the only requirement for using OrientDB is to have Java version 1.6 or higher installed.
Download Binaries The easiest and fastest way to start using OrientDB is to download binaries from the Official OrientDB Download Page.
Compile Your Own Community Edition
13
Alternatively, you can clone the Community Edition project from GitHub and compile it. This allows you access to the latest functionality without waiting for a distribution binary. To build the Community Edition, you must first install the Apache Ant tool and follow these steps:
> git clone
[email protected]:orientechnologies/orientdb.git > cd orientdb > ant clean install
After the compilation, all the binaries are placed under the ../releases directory.
Change Permissions The Mac OS X, Linux, and UNIX based operating systems typically require you to change the permissions to execute scripts. The following command will apply the necessary permissions for these scripts in the bin directory of the OrientDB distribution:
> chmod 755 bin/*.sh > chmod -R 777 config
Use inside of OSGi container OrientDB uses a ConcurrentLinkedHashMap implementation provided by https://code.google.com/p/concurrentlinkedhashmap/ to create the LRU based cache. This library actively uses the sun.misc package which is usually not exposed as a system package. To overcome this limitation you should add property org.osgi.framework.system.packages.extra with
value sun.misc to your list of framework
properties. It may be as simple as passing an argument to the VM starting the platform:
> java -Dorg.osgi.framework.system.packages.extra=sun.misc
Other Resources To learn more about how to install OrientDB on specific environments, please refer to the guides below: Install as service on Unix, Linux and MacOSX Install as service on Windows 14
Install with Docker Install on Linux Ubuntu Install on JBoss AS Install on GlassFish Install on Ubuntu 12.04 VPS (DigitalOcean) Install on Vagrant
15
Run the server After you have downloaded the binary distribution of OrientDB and unpacked it, you are now able to start the server. This is done by executing server.sh (or server.bat on Windows) located in bin directory:
> cd bin > ./server.sh
You should now see the following output:
. .` ` , `:. `,` ,:` .,. :,, .,, ,,, . .,.::::: ```` ,` .::,,,,::.,,,,,,`;; .: `,. ::,,,,,,,:.,,.` ` .: ,,:,:,,,,,,,,::. ` ` `` .: ,,:.,,,,,,,,,: `::, ,, ::,::` : :,::` :::: ,:,,,,,,,,,,::,: ,, :. : :: : .: :,,,,,,,,,,:,:: ,, : : : : .: ` :,,,,,,,,,,:,::, ,, .:::::::: : : .: `,...,,:,,,,,,,,,: .:,. ,, ,, : : .: .,,,,::,,,,,,,: `: , ,, : ` : : .: ...,::,,,,::.. `: .,, :, : : : .: ,::::,,,. `: ,, ::::: : : .: ,,:` `,,. ,,, .,` ,,. `, S E R V E R `` `. `` `
2012-12-28 01:25:46:319 INFO Loading configuration from: config/orientdb-server-config.xml... [OServerCon 2012-12-28 01:25:46:625 INFO OrientDB Server v1.6 is starting up... [OServer] 2012-12-28 01:25:47:142 INFO -> Loaded memory database 'temp' [OServer] 2012-12-28 01:25:47:289 INFO Listening binary connections on 0.0.0.0:2424 [OServerNetworkListener] 2012-12-28 01:25:47:290 INFO Listening http connections on 0.0.0.0:2480 [OServerNetworkListener] 2012-12-28 01:25:47:317 INFO OrientDB Server v1.6 is active. [OServer]
The log messages explain what happens when the server starts: 1. The server configuration is loaded from config/orientdb-server-config.xml file. To
16
know more about this step, look into OrientDB Server 2. By default, a "temp" database is always loaded into memory. This is a volatile database that can be used to store temporary data 3. Binary connections are listening on port 2424 for all configured networks (0.0.0.0). If you want to change this configuration edit the config/orientdb-server-config.xml file and modify port/ip settings 4. HTTP connections are listening on port 2480 for all configured networks (0.0.0.0). If you want to change this configuration edit the config/orientdb-server-config.xml file and modify port/ip settings OrientDB server listens on 2 different ports by default. Each port is dedicated to binary and HTTP connections respectively: binary port is used by the console and clients/drivers that support the network binary protocol HTTP port is used by OrientDB Studio web tool and clients/drivers that support the HTTP/REST protocol or tools like CURL
17
Run the console OrientDB provides a command line interface. It can be used to connect to and work with remote or local OrientDB servers. You can start the command line interface by executing console.sh (or console.bat on Windows) located in the bin directory:
> cd bin > ./console.sh
You should now see a welcome message:
OrientDB console v.1.6 www.orientechnologies.com Type 'help' to display all the commands supported. orientdb>
Type the "help" or "?" command to see all available console commands:
orientdb> help AVAILABLE COMMANDS: * alter class Alter a class in the database schema ... * help Print this help * exit Close the console
Connecting to server instance Some console commands such as list databases or create database can be run while only connected to a server instance (you do not have to be connected to a database). Other commands require you to be connected to a database. Before you can connect to a fresh server instance and fully control it, you need to know the root password. The root password is located in config/orientdb-server-config.xml (just search for the users element). If you want to change it, modify the XML file and then restart the server. If you have the required credentials, you should now be able to connect using the following command: 18
orientdb> connect remote:localhost root someUglyPassword Connecting to remote Server instance [remote:localhost] with user 'root'...OK
Next, you can (for example) list databases using the command:
orientdb> list databases Found 1 databases: * GratefulDeadConcerts (plocal)
To connect to another database we can again use the connect command from the console and specify the server URL, username, and password. By default each database has an "admin" user with password "admin" (change the default password on your real database). To connect to the GratefulDeadConcerts database on the local server execute the following:
orientdb> connect remote:localhost/GratefulDeadConcerts admin admin Connecting to database [remote:localhost/GratefulDeadConcerts] with user 'admin'...OK
Let's analyze the URL we have used: remote:localhost/GratefulDeadConcerts . The first part is the protocol, "remote" in this case, which contacts the server using the TCP/IP protocol. "localhost" is the host name or IP address where the server resides; in this case it is on the same machine. "GratefulDeadConcerts" is the name of the database to which we want to connect. The OrientDB distribution comes with the bundled database GratefulDeadConcerts which represents the Graph of the Grateful Dead's concerts. This database can be used by anyone to start exploring the features and characteristics of OrientDB. For more detailed information about the commands see the console page.
19
Classes It's easier to introduce OrientDB's basic concepts by outlining the Document Database API. It is more similar to Relational DBMS concepts and therefore a little more natural to follow for many developers. These basic concepts are shared between all of OrientDB's APIs: Document, Object, and Graph. As with the relational DBMS, OrientDB has the concept of records as an element of storage. There are different types of records, but for the next examples we will always use the document type. A document is composed of attributes and can belong to one class. Going forward we will also refer to the attributes with the terms "fields" and "properties". The concept of class is well known to those who program using object-oriented languages. Classes are also used in OrientDB as a type of data model according to certain rules. To learn more about Classes in general take a look at Wikipedia. To list all the configured classes, type the classes command in the console:
orientdb> classes CLASSES: ----------------------------------------------+---------------------+-----------+ NAME | CLUSTERS | RECORDS | ----------------------------------------------+---------------------+-----------+ AbstractPerson | -1 | 0 | Account | 11 | 1126 | Actor | 91 | 3 | Address | 19 | 166 | Animal | 17 | 0 | .... | .... | .... | Whiz | 14 | 1001 | ----------------------------------------------+---------------------+-----------+ TOTAL 22775 | --------------------------------------------------------------------------------+
To create a new class, use the create class command:
orientdb> create class Student Class created successfully. Total classes in database now: 92
OrientDB allows you to work in a schema-less mode, without defining properties. 20
However, properties are mandatory if you define indexes or constraints. To create a new property use the create property command. Here is an example of creating three properties against the Student class:
orientdb> create property Student.name string Property created successfully with id=1 orientdb> create property Student.surname string Property created successfully with id=2 orientdb> create property Student.birthDate date Property created successfully with id=3
To display the class Student , use the info class command:
orientdb> info class Student
Class................: Student Default cluster......: student (id=96) Supported cluster ids: [96] Properties: -------------------------------+-------------+-------------------------------+-----------+----------+--- NAME | TYPE | LINKED TYPE/CLASS | MANDATORY | READONLY | NOT -------------------------------+-------------+-------------------------------+-----------+----------+--- birthDate | DATE | null | false | false | fal name | STRING | null | false | false | fal surname | STRING | null | false | false | fal -------------------------------+-------------+-------------------------------+-----------+----------+----
To add a constraint, use the alter class command. For example, let's specify that the name field
should be at least 3 characters:
orientdb> alter property Student.name min 3 Property updated successfully
To see all the records in a class, use the browse class command:
> browse class OUser
21
In this case we are listing all of the users of the database. This is not particularly secure. You should further deepen the OrientDB security system, but for now OUser is a class like any other. For each query the console always shows us the number of the records in the result set and the record's ID.
---+---------+--------------------+--------------------+--------------------+------------------- #| RID |name |password |status |roles ---+---------+--------------------+--------------------+--------------------+------------------- 0| #5:0|admin |{SHA-256}8C6976E5B5410415BDE908BD4DEE15DFB167A9C873FC4BB8A81F6F2AB448A 1| #5:1|reader |{SHA-256}3D0941964AA3EBDCB00CCEF58B1BB399F9F898465E9886D5AEC7F31090A0F 2| #5:2|writer |{SHA-256}B93006774CBDD4B299389A03AC3D88C3A76B460D538795BC12718011A909F ---+---------+--------------------+--------------------+--------------------+--------------------
The first column is a number used as an identifier to display the record's detail. To show the first record in detail, it is necessary to use the display record command with the number of the record, in this case 0:
orientdb> display record 0 -------------------------------------------------ODocument - Class: OUser id: #5:0 v.0 ------------------------------------------------- name : admin password : {SHA-256}8C6976E5B5410415BDE908BD4DEE15DFB167A9C873FC4BB8A81F6F2AB448A918 status : ACTIVE roles : [#4:0=#4:0]
22
Clusters We've already talked about classes. A class is a logical concept in OrientDB. Clusters are also an important concept in OrientDB. Records (or documents/vertices) are stored in clusters.
23
What is a cluster? A cluster is a place where a group of records are stored. Perhaps the best equivalent in the relational world would be a Table. By default, OrientDB will create one cluster per class. All the records of a class are stored in the same cluster which has the same name as the class. You can create up to 32,767 (2^15-1) clusters in a database. Understanding the concepts of classes and clusters allows you to take advantage of the power of clusters while designing your new database. Even though the default strategy is that each class maps to one cluster, a class can rely on multiple clusters. You can spawn records physically in multiple places, thereby creating multiple clusters. For example:
The class "Customer" relies on 2 clusters: USA_customers, containing all USA customers. This is the default cluster as denoted by the red star. China_customers, containing all Chinese customers. The default cluster (in this case, the USA_customers cluster) is used by default when the generic class "Customer" is used. Example:
24
When querying the "Customer" class, all the involved clusters are scanned:
If you know the location of a customer you're looking for you can query the target cluster directly. This avoids scanning the other clusters and optimizes the query:
25
To add a new cluster to a class, use the ALTER CLASS command. To remove a cluster use REMOVECLUSTER in ALTER CLASS command. Example to create the cluster "USA_Customers" under the "Customer" class:
ALTER CLASS Customer ADDCLUSTER USA_Customers
The benefits of using different physical places to store records are: faster queries against clusters because only a sub-set of all the class's clusters must be searched good partitioning allows you to reduce/remove the use of indexes parallel queries if on multiple disks sharding large data sets across multiple disks or server instances There are two types of clusters: Physical Cluster (known as local) which is persistent because it writes directly to the file system Memory Cluster where everything is volatile and will be lost on termination of the process or server if the database is remote For most cases physical clusters are preferred because the database must be 26
persistent. OrientDB creates physical clusters by default so you don't have to worry too much about it for now. To view all clusters, from the console run the clusters command:
orientdb> clusters CLUSTERS: ----------------------------------------------+------+---------------------+-----------+ NAME | ID | TYPE | RECORDS | ----------------------------------------------+------+---------------------+-----------+ account | 11| PHYSICAL | 1107 | actor | 91| PHYSICAL | 3 | address | 19| PHYSICAL | 166 | animal | 17| PHYSICAL | 0 | animalrace | 16| PHYSICAL | 2 | .... | ....| .... | .... | ----------------------------------------------+------+---------------------+-----------+ TOTAL 23481 | ---------------------------------------------------------------------------------------+
Since by default each class has its own cluster, we can query the database's users by class or by cluster:
orientdb> browse cluster OUser
---+---------+--------------------+--------------------+--------------------+------------------- #| RID |name |password |status |roles ---+---------+--------------------+--------------------+--------------------+------------------- 0| #5:0|admin |{SHA-256}8C6976E5B5410415BDE908BD4DEE15DFB167A9C873FC4BB8A81F6F2AB448A 1| #5:1|reader |{SHA-256}3D0941964AA3EBDCB00CCEF58B1BB399F9F898465E9886D5AEC7F31090A0F 2| #5:2|writer |{SHA-256}B93006774CBDD4B299389A03AC3D88C3A76B460D538795BC12718011A909F ---+---------+--------------------+--------------------+--------------------+--------------------
The result is identical to browse class ouser executed in the classes section because there is only one cluster for the OUser class in this example. The strategy where OrientDB selects the cluster when inserts a new record is configurable and pluggable. For more information take a look at Cluster Selection.
27
Record ID In OrientDB each record has its own self-assigned unique ID within the database called Record ID or RID. It is composed of two parts:
#:
cluster-id is the id of the cluster. Each database can have a maximum of 32,767 clusters (2^15-1) cluster-position is the position of the record inside the cluster. Each cluster can handle up to 9,223,372,036,854,780,000 (2^63) records, namely 9,223,372 Trillion of records! So the maximum size of a database is 2^78 records = 302,231,454,903 Trillion of records. We have never tested such high numbers due to the lack of hardware resources, but we most definitely have users working with OrientDB databases in the billions of records. A RID (Record ID) is the physical position of the record inside the database. This means that loading a record by its RID is blazing fast, even with a growing database. With document and relational DBMS the more data you have, the slower the database will be. Joins have a heavy runtime cost. OrientDB handles relationships as physical links to the records. The relationship is assigned only once when the edge is created O(1). Compare this to an RDBMS that “computes“ the relationship every single time you query a database O(LogN). Traversing speed is not affected by the database size in OrientDB. It is always constant, whether for one record or 100 billion records. This is critical in the age of Big Data! To load a record directly via the console, use the load record command. Below, we load the record #12:4 of the "demo" database.
orientdb> load record #12:4 -------------------------------------------------ODocument - Class: Company id: #12:4 v.8 ------------------------------------------------- addresses : [NOT LOADED: #19:159] salary : 0.0 employees : 100004 id : 4 name : Microsoft4 initialized : false
28
salary2 : 0.0 checkpoint : true created : Sat Dec 29 23:13:49 CET 2012
The load record command returns some useful information about this record: It's a document. OrientDB supports different types of records. This tutorial covers documents only. The class is "Company" The current version is 8. OrientDB has a MVCC system. It will be covered at a later point. Just know that every time you update a record its version is incremented by 1. We have different field types: "salary" and "salary2" are floats, "employees" and "id" are integers, "name" is a string, "initialized" and "checkpoint" are booleans and "created" is a date-time The field "addresses" has been NOT LOADED. It is also a LINK to another record #19:159. This is a relationship. (More explanation to follow)
29
SQL Most NoSQL products have a custom query language. OrientDB focuses on standards when it comes to query languages. Instead of inventing "Yet Another Query Language", we started from the widely used and well understood SQL. We then extended it to support more complex graph concepts like Trees and Graphs. Why SQL? SQL ubiquitous in the database developer world, it is familiar. Also, it is more readable and concise than Map Reduce scripts.
Select To start, let's write a query that returns the same result as the previous browse cluster ouser and browse class ouser :
select from OUser
Starting from this simple query, we can notice 2 interesting things: This query has no projections. This stands for "the entire record" like using the star (*). OUser is a class. By default queries are executed against classes. The target can also be: a cluster, by prefixing with "cluster:". Example: select from cluster:OUser one or more Record IDs, by using the RecordID directly. For example: select from #10:3 or select from [#10:1, #10:3, #10:5]
an index, by prefixing with "index:". For example: select value from index:dictionary where key = 'Jay'
Similar to standard SQL, OrientDB supports WHERE conditions to filter the returning records by specifying one or more conditions. For example:
select from OUser where name like 'l%'
Returns all OUser records where the name starts with 'l'. For more information, look at all the supported operators and functions: SQL-Where.
30
OrientDB also supports the ORDER BY clause to order the result set by one or more fields. For example:
select from Employee where city = 'Rome' order by surname asc, name asc
This will return all of the Employees who live in Rome, ordered by surname and name in ascending order. You can also use the GROUP BY clause to group results. For example:
select sum(salary) from Employee where age < 40 group by job
This returns the sum of the salaries of all the employees with age under 40 grouped by job type. To limit the result set you can use the LIMIT keyword. For example, to limit the result set to maximum of 20 items:
select from Employee where gender = 'male' limit 20
Thanks to the SKIP keyword you can easily manage pagination. Use SKIP to pass over records from the result set. For example, to divide the result set in 3 pages you could do something like:
select from Employee where gender = 'male' limit 20 select from Employee where gender = 'male' skip 20 limit 20 select from Employee where gender = 'male' skip 40 limit 20
Now that we have the basic skills to execute queries, let's discuss how to manage relationships.
Insert OrientDB supports ANSI-92 syntax:
insert into Employee (name, surname, gender) values ('Jay', 'Miner', 'M')
And the simplified:
31
insert into Employee set name = 'Jay', surname = 'Miner', gender = 'M'
Since OrientDB was created for the web, it can natively ingest JSON data:
insert into Employee content {name : 'Jay', surname : 'Miner', gender : 'M'}
Update The ANSI-92 syntax is supported. Example:
update Employee set local = true where city = 'London'
Also using JSON with the "merge" keyword to merge the input JSON with current record:
update Employee merge { local : true } where city = 'London'
Delete This also respects the ANSI-92 compliant syntax:
delete from Employee where city 'London'
32
Relationships The most important feature of a graph database is the management of relationships. Many users come to OrientDB from MongoDB or other document databases because they lack efficient support of relationships.
33
Relational Model The relational model (and RDBMS - relational database management systems) has long been thought to be the best way to handle relationships. Graph databases suggest a more modern approach to this topic. Most database developers are familiar with the relational model given it's 30+ years of dominance, spreading over generations of developers. Let's review how these systems manage relationships. As an example, we will use the relationships between the Customer and Address tables.
1-to-1 relationship RDBMSs store the value of the target record in the "address" column of the Customer table. This is called a foreign key. The foreign key points to the primary key of the related record in the Address table:
To retrieve the address pointed to by customer "Luca", the query in a RDBMS would be:
SELECT B.location FROM Customer A, Address B WHERE A.name = 'Luca' AND A.address = B.id
34
This is a JOIN! A JOIN is executed at run-time every time you retrieve a relationship.
1-to-Many relationship Since RDBMS have no concept of collections the Customer table cannot have multiple foreign keys. The way to manage a 1-to-Many relationship is by moving the foreign key to the Address table.
To extract all addresses of Customer 'Luca', the query in RDBMS reads:
SELECT B.location FROM Customer A, Address B WHERE A.name = 'Luca' AND B.customer = A.id
Many-to-Many relationship The most complex case is the Many-to-Many relationship. To handle this type of association, RDBMSs need a separate, intermediary table that matches both Customer and Addresses in all required combinations. This results in a double JOIN per record at runtime;
35
To extract all addresses of Customer 'Luca's the query in RDBMS becomes:
SELECT B.location FROM Customer A, Address B, CustomerAddress C WHERE A.name = 'Luca' AND B.id = A.id
The problem with JOINS With document and relational DBMS, the more data you have, the slower the database will perform. Joins have heavy runtime costs. In comparison, OrientDB handles relationships as physical links to the records, assigned only once when the edge is created O(1). Compare this to an RDBMS that “computes“ the relationship every single time you query a database O(LogN). With OrientDB, speed of traversal is not affected by the database size. It is always constant regardless if it has one record or 100 billion records. This is critical in the age of Big Data. Searching for an ID at runtime each time you execute a query, for every record could be very expensive! The first optimization with RDMS is using indexes. Indexes speed up searches but they slow down INSERT , UPDATE and DELETE operations. In addition, they occupy substantial space on disk and in memory. You also need to qualify - are you sure the lookup into an index is actually fast? Let's try to understand how indexes work.
Do indexes solve the problem with JOIN? 36
The database industry has plenty of indexing algorithms. The most common in both Relational and NoSQL DBMS is the B+Tree. All balanced trees work in similar ways. Here is and example of how it would work when you're looking for "Luca": after only 5 hops the record is found.
But what if there were millions or billions of records? There would be many, many more hops. And this operation is executed on every JOIN per record! Imagine joining 4 tables with thousands of records: the number of JOINS could be in the millions!
37
Relations in OrientDB OrientDB doesn't use JOINs. Instead it uses LINKs. A LINK is a relationship managed by storing the target RID in the source record. It's much like storing a pointer between 2 objects in memory. When you have Invoice -> Customer, then you have a pointer to Customer inside Invoice as an attribute. It's exactly the same. In this way it's like your database was in memory, a memory of several exabytes. What about 1-to-N relationships? These relationships are handled as a collection of RIDs, like you would manage objects in memory. OrientDB supports different kinds of relationships: LINK, to point to one record only LINKSET, to point to several records. Like Java Sets, the same RID can only be included once. The pointers also have no order LINKLIST, to point to several records. Like Java Lists, they are ordered and can contain duplicates LINKMAP, to point to several records with a key stored in the source record. The Map values are the RIDs. Works like the Java Map= 1.4: CREATE A SERVER ADMIN CLIENT AGAINST A REMOTE SERVER OServerAdmin serverAdmin = new OServerAdmin("remote:localhost").connect("admin", "admin"); serverAdmin.createDatabase("GratefulDeadConcerts", "graph", "local");
The iStorageMode can be memory or plocal.
630
Drop a database To drop a database from a server you can use the console's drop database command or via API using the OServerAdmin.dropDatabase() method.
// CREATE A SERVER ADMIN CLIENT AGAINST A REMOTE SERVER OServerAdmin serverAdmin = new OServerAdmin("remote:localhost/GratefulDeadConcerts").connect( serverAdmin.dropDatabase("GratefulDeadConcerts");
631
Check if a database exists To check if a database exists in a server via API use the OServerAdmin.existsDatabase() method.
// CREATE A SERVER ADMIN CLIENT AGAINST A REMOTE SERVER OServerAdmin serverAdmin = new OServerAdmin("remote:localhost/GratefulDeadConcerts").connect( serverAdmin.existsDatabase("local");
632
JDBC Driver OrientDB is a NoSQL DBMS that support a subset of SQL ad query language.
633
Include in your projects com.orientechnologies orientdb-jdbc 1.7
NOTE: to use SNAPSHOT version remember to add the Snapshot repository to your pom.xml .
634
How can be used in my code? The driver is registered to the Java SQL DriverManager and can be used to work with all the OrientDB database types: memory, plocal and remote The driver's class is com.orientechnologies.orient.jdbc.OrientJdbcDriver . Use your knowledge of JDBC API to work against OrientDB.
635
First get a connection Properties info = new Properties(); info.put("user", "admin"); info.put("password", "admin"); Connection conn = (OrientJdbcConnection) DriverManager.getConnection("jdbc:orient:remote:localhost/test"
Then execute a Statement and get the ResultSet:
Statement stmt = conn.createStatement(); ResultSet rs = stmt.executeQuery("SELECT stringKey, intKey, text, length, date FROM Item"); rs.next(); rs.getInt("@version"); rs.getString("@class"); rs.getString("@rid"); rs.getString("stringKey"); rs.getInt("intKey"); rs.close(); stmt.close();
The driver retrieves OrientDB metadata (@rid,@class and @version) only on direct queries. Take a look at tests code to see more detailed examples.
636
Advanced features Connection pool By default a new database instance is created every time you ask for a JDBC connection. OrientDB JDBC driver provides a Connection Pool out of the box. Set the connection pool parameters before to ask for a connection:
Properties info = new Properties(); info.put("user", "admin"); info.put("password", "admin"); info.put("db.usePool", "true"); // USE THE POOL info.put("db.pool.min", "3"); // MINIMUM POOL SIZE info.put("db.pool.max", "30"); // MAXIMUM POOL SIZE Connection conn = (OrientJdbcConnection) DriverManager.getConnection("jdbc:orient:remote:localhost/test"
637
JPA There are two ways to configure OrientDB JPA
638
Configuration The first - do it through /META-INF/persistence.xml Folowing OrientDB properties are supported as for now: javax.persistence.jdbc.url, javax.persistence.jdbc.user, javax.persistence.jdbc.password, com.orientdb.entityClasses You can also use tag Example: