Hadi Fadlallah
list all created indexes in Neo4j

Export indexes and constraints from SQL Server graph databases to Neo4j

April 6, 2021 by

In our previously published article in this series, we explained how to migrate SQL Server graph tables into Neo4j and why migration could be beneficial. We only mentioned how to migrate node and edge tables, and we did not mention indexes and constraints. This article is an extension of the previous one, where we will explain how to export the supported indexes and constraints from SQL Server to the Neo4j graph database. In addition, all codes are added to the project we already published on GitHub.

Before getting started

One thing worth mentioning is that this article has nothing to do with edge constraints introduced in SQL Server 2019. It is good to know that SQL Server does not provide a native graph database; they have added graph capabilities on the relational data model’s top. Later, edge constraints were added to prevent edge tables from storing more than one type of relation useless in a native graph database where no tables are found, and data is stored as JSON.

If you are interested to learn more about SQL Server 2019 edge constraints, you can refer to the following article: Graph Database features in SQL Server 2019 – Part 1.

Exporting Indexes

As we mentioned in the introduction, we can export only supported types of indexes from SQL Server to the Neo4j graph database. One of the good points of adding graph capabilities to the SQL Server database engine is that most of the relational data features can be applied to those tables, such as portioning, adding non clustered column store indexes, filtered indexes…

Like SQL Server indexes, a database index in Neo4j is a redundant copy of some of the data in the database to make searches of related data more efficient. Three types of indexes are available:

  1. Single property index: It is a named index on a single property for all nodes with a particular label
  2. Composite index: It is a named index on multiple properties for all nodes with a particular label
  3. Full-text indexes: Powered by Apache Lucene and used to index nodes and relationships by string properties

This means that not all SQL Server indexing features are supported in Neo4j; we cannot add column stores or filters. This section mentions three types of indexes to migrate from SQL Server: non-clustered indexes, unique indexes, full-text search indexes. Since all rowstore indexes will be migrated using the same method, they will be illustrated together, while full-text indexes migration is explained separately.

  • Note: We ignored clustered indexes since they contain all columns in the table, while we really don’t know if it is useful to recreate them in a graph database. Besides, unique indexes will be migrated as a regular index to the graph database, and a unique constraint will be added later (more details in the next section)

Preparing the test environment

All experiments in this tutorial are executed over the environment we were working within the previous article. One change we made is to add an extra column on the “Likes” edge table and to create two non-clustered indexes using the following SQL command:

Non-Clustered rowstore indexes

To recreate Rowstore indexes, we need to retrieve the following metadata for each one:

  • The table name
  • Is it a node or edge table?
  • The index name
  • The list of columns
  • The order of columns

We used the following query to retrieve that information from SQL Server:

In the where clause, we ignored all system indexes since they are useless in our case.

List available indexes in sql server

Figure 1 – Available indexes in the database

To facilitate our migration process and to minimize the line of codes in our project, we generated the Cypher statement via SQL to be executed later within the C# project.

The result of this query should look like the following:

generate Neo4j cypher create index statement from SQL  server

Figure 2 – Cypher create index statements

Unique Indexes

Unique indexes are also non-clustered indexes where a unique constraint is forced on the indexed columns. Both SQL Server and Neo4j automatically create an index on the columns/properties added in a unique constraint. For this reason, we ignored unique indexes from the migration process since they will be auto-generated after exporting unique constraints.

Full-text indexes

In Neo4j, full-text indexes are created using two stored procedures:

  • Nodes tables: db.index.fulltext.createNodeIndex()
  • Edges tables: db.index.fulltext.createRelationshipIndex()

Like what we have done for regular indexes, we can directly use an SQL statement in SQL Server to generate the Cypher statement to create full-text indexes in the graph database. The main difference is that we should split the full-text indexes that are created over Nodes and Edges at the same time in SQL Server since this is not allowed in NoSQL.

We can use the following SQL statement to generate the relevant Cypher code (we added a prefix to the indexes names to prevent index name duplicates error in Neo4j):

To Test this SQL statement, we added a description column to the StadiumCities edge table. Then, we create a full-text catalog named “City_Teams”, where we indexed two Node tables (Cities, Teams) and one Edge table (StadiumCities).

generating neo4j create full-text indexes statement from sql server

Figure 3 – Generate Cypher full-text indexes creation statements

Exporting Constraints

In the Neo4j graph database, there are four types of constraints:

  • Unique node property constraints
  • Node property existence constraints
  • Relationship property existence constraints
  • Node key constraints

Noting that only unique node property constraints are available in the community edition, while all others are only available in the enterprise edition.

Unique node property constraints

In SQL Server, unique constraints can be created over multiple columns, while in Neo4j, this type of constraint can be created for one column. This means that we cannot migrate unique constraints for multiple columns. However, since SQL Server’s unique constraints are defined as unique indexes – even if created using constraints, a unique index is created automatically – we will use the following statement to retrieve all unique constraints created over one column and generate the relevant Cypher statement:

In our example, this statement will generate the following result:

generating Cypher create unique constraint statement

Figure 4 – Cypher create a unique constraint generated statement

  • Note: Adding a unique property constraint on a property will also add a single-property index on that property, so such an index cannot be added separately.

Node/Relationship property existence constraints

In SQL Server, the existing constraints are defined using the “NOT NULL” property that can be added to a column. To generate the Cypher create constraints statements we used the following SQL Command:

This command will generate the following result:

generating Cypher create existence constraint statement

Figure 5 – Cypher create existence constraints statements

We ignored the node table primary key columns since they are related to the Node key constraints. Since there are no Edge keys, we generated an existence constraint for the Edge tables’ primary keys.

Node key constraints

We will consider that the Node Key constraints in the graph databases are defined using Primary key constraints in SQL Server. We can use the following SQL statement to generate the relevant Cypher statements:

This SQL statement will generate the following result:

generating Cypher create node key constraint statement

Figure 6 – Cypher create node keys constraints generated statements

  • Note: Adding a node key constraint for a set of properties will also add a composite index on those properties, so such an index cannot be added separately

Adding Indexes and constraints migration functions to the project

After providing the SQL commands used to generate the Cypher statement used in the database migration process, we have to use those states in the C# project we created in the previous article.

First, we created a new static class called “SQLCommands” to store the five SQL statements needed to migrate indexes and constraints. Then in the SQLReader class, we added one function to execute each SQL command and to store the result within a list of strings:

  • GetIndexes
  • GetFullTextIndexes
  • GetUniqueConstraints
  • GetExistenceConstraints
  • GetNodeKeyConstraints

In the Neo3jWriter class, we added three functions to execute the Cypher statements generated and stored within the SQLReaderClass:

  • ImportIndexes
  • ImportFullTextIndexes
  • InportConstraints

The migration code should look like the following:

We converted the line of code related to the existence and node key constraints to comments since they are not supported in the community edition where we are running our experiments.

After running the code, we can check that all indexes were migrated successfully using CALL db.indexes(); command:

list all created indexes in Neo4j

Figure 7 – List all indexes

If we execute the CALL db.constraints() stored procedure, we can check that all constraints are migrated successfully.

list all created constraints in Neo4j

Figure 8 – List created constraints

Conclusion

This article is considered to be an extension of the previously published article in this series. We explained how to export the supported indexes from SQL Server graph tables into a Neo4j graph database by generating Cypher queries using SQL Statement and executing them using a .NET code.

Table of contents

Import data from MongoDB to SQL Server using SSIS
Getting started with the Neo4j graph database
Migrating SQL Server graph databases to Neo4j
Export indexes and constraints from SQL Server graph databases to Neo4j
Migrating SQL Server graph databases to ArangoDB
Hadi Fadlallah
Graph database, Indexes

About Hadi Fadlallah

Hadi is an SQL Server professional with more than 10 years of experience. His main expertise is in data integration. He's one of the top ETL and SQL Server Integration Services contributors at Stackoverflow.com . Also, he published several article series about Biml, SSIS features, Search engines, Hadoop, and many other technologies. Besides working with SQL Server, he worked with different data technologies such as NoSQL databases, Hadoop, Apache Spark. He is a MongoDB, Neo4j, and ArangoDB certified professional. On the academic level, Hadi holds two master's degrees in computer science and business computing. Currently, he is a Ph.D. candidate in data science focusing on Big Data quality assessment techniques. Hadi really enjoys learning new things everyday and sharing his knowledge. You can reach him on his personal website. View all posts by Hadi Fadlallah

168 Views