SaaS deployment: 1 database or multiple databases

I lately see a lot of discussion on the forum about issues with SaaS deployments. I’ve also noticed that there are 2 different approaches to this.
I am trying to collect (more) arguments and generally want to start a discussion on why to choose a 1 database SaaS model (with tablefilters) or a multiple database SaaS model (with the switchServer function).
The aim of this discussion is also to get a better insight in the needs and/or stumbling blocks of developers who are doing a SaaS deployment.
Hopefully Servoy can use this to make SaaS deployment even better.

Let me kick this off by explaining the concepts and by giving a pro and a con for each:

1 database
With the 1 database model you have 1 single database for all your SaaS client data. Each SaaS client only sees their own data by filtering out their data using Servoy’s TableFilter functions, relationships or plain SQL queries.
Pros

  • When importing an updated solution Servoy will make any appropriate changes to the database for you and you are done with it.

Cons

  • Restoring a backup for a specific customer requires more DBA skills and might be a general PITA.
  • Clients may have concerns that their data could ‘bleed through’ to other users (which might happen when you have a bug in your code)

multiple databases
With the multiple database model you store the data per client in their own database. Servoy will switch (by way of a method call) to the appropriate database after a user logged in.
Pros

  • Restoring a backup for a client is easy.
  • Client data will no matter what be seperated from other customers

Cons

  • When importing a solution Servoy will only make the appropriate changes to the default database and not any of the dynamically switched databases. Making deployment of updates much harder.
    If possible at all when you don’t have remote access the back-end databases by other than the Servoy admin pages.

Do any of you have other arguments for or against either method. Or maybe have a better method than the 2 mentioned in this post.

Hi Robert,

we like the second option: multiple databases. (exact same schema)
and like the same pro’s, you mentioning

the con’s are indeed the updates that were are doing of our solution.
the updates of the schemas, are only done, on the active server, so every other schema of other customers must be done by (manual or scripts) by hand.

For us, this is too much hazzle, so we have setup the first option.

If we want to go for the second option (in the future?) we would like to have/see an extra option:

  • an option (somewhere in our solution, under solution settings??) where we could provide a list of servers, that must be updated.

just my 2 cents! ;-)

HJK:
…we like the second option: multiple databases. (exact same schema)
and like the same pro’s, you mentioning

We have used the second method both with Servoy and Lasso for a number of years primarily because IF a database is corrupted for any reason recovery is simple and you dont loose you entire client base in one disaster.

To code this with the minimum hassle we have a core database that contains solution specific data and a switch database for client related that is the default for the solution - on login the switch is changed to the clients database and totally transparent from them.

All id’s are database managed and any changes to the switch schema are replicated to the client db’s using Navicat’s excellent sync features - so far we have not had a problem with numerous changes and iterations and the whole process has worked extremely well.

Cheers
Gordon

HJK:
…we like the second option: multiple databases. (exact same schema)
and like the same pro’s, you mentioning

We have used the second method both with Servoy and Lasso for a number of years primarily because IF a database is corrupted for any reason recovery is simple and you dont loose you entire client base in one disaster.

To code this with the minimum hassle we have a core database that contains solution specific data and a switch database for client related that is the default for the solution - on login the switch is changed to the clients database and totally transparent from them.

All id’s are database managed and any changes to the switch schema are replicated to the client db’s using Navicat’s excellent sync features - so far we have not had a problem with numerous changes and iterations and the whole process has worked extremely well.

Cheers
Gordon

HJK:
…we like the second option: multiple databases. (exact same schema)
and like the same pro’s, you mentioning

We have used the second method both with Servoy and Lasso for a number of years primarily because IF a database is corrupted for any reason recovery is simple and you dont loose you entire client base in one disaster.

To code this with the minimum hassle we have a core database that contains solution specific data and a switch database for client related that is the default for the solution - on login the switch is changed to the clients database and totally transparent from them.

All id’s are database managed and any changes to the switch schema are replicated to the client db’s using Navicat’s excellent sync features - so far we have not had a problem with numerous changes and iterations and the whole process has worked extremely well.

Cheers
Gordon

Gordon:
We have used the second method both with Servoy and Lasso for a number of years primarily because IF a database is corrupted for any reason recovery is simple and you dont loose you entire client base in one disaster.

Not sure if this is a very good reason for or against the single database concept though.
If your database goes corrupt then that will be most likely because of a disc failure. So then all your databases might be corrupt.
Or are you talking about MySQL’s MyISAM storage engine? I never have understood why people find it acceptable that you need to repair a table in a SQL database. It’s quite unheard of in any other SQL environment.
If you use MySQL I suggest you use a storage engine like InnoDB which is much more stable.

As for more pros/cons I just thought of the issue of database connections.
The more databases the more connections you need to make to the database server. And per named connection you have 1 or more connections open.
This will eat away a lot of (RAM) resources on your server and for commercial environments (Oracle?) you pay per database connection.
With a single database you only need 1 connection pool of x connections.

Maybe other developers that offer SaaS like Adblocks and Stef can share their experiences too. And explain why they made the choices they made.

We have been using the 1 database option for over a year now. On one of our servers we have a SaaS webclient solution with over 500 companies with 99.99% uptime. The only adjustment we have had to make over the past twelve months is to increase the maximum java heap size (by making a change in the wrapper.conf file).

One of the factors that influenced our decision to go with the 1 database approach was the competitive environment in the vertical market that we serve. We knew that we would have to deliver our solution at a low price point to a large number of users in order to remain competitive.

Dean Westover
Choices Software, Inc.
SAN Partner

In our case it isn’t a true SaaS/commercial issue but database-wise it is very similar, just slightly more complicated because the ‘solutions’ aren’t quite the same. And so my question also becomes at what point are there ‘too many’ Servoy solutions running off of one Servoy Server and one Servoy Repository.

I am looking at the situation of creating a number of research database solutions following different forms of cancer. There are a certain number of ‘core tables’ that will be common to all (patient demographics, lab results, surgpath reports, treatments, etc.). But thereafter they can differ enormously.

So in my case I’m wondering about whether it is better to have one database that serves all. In that case there will be many, many tables and many of those tables will only be related to a very small subset of patient records in the ‘core tables’. Alternatively I could take those core group of tables, replicate them in a number of different databases and then build the specific tables related to that particular group of patients. In that case I will have to be continuously importing new patients from the central repository to the ‘satellite’ databases as new patients arrive. That’s not too big a deal as it can be modularized to be incorporated in every solution. But it will mean having different Servoy repositories and different Servoy servers, grouping similar patient groups/diagnoses together. But I wish I had a better grasp of when/how these types of decisions affect the speed of the applications and what is most efficient. How many solutions can/should one run on one Servoy Server/one Servoy repository? How many tables should one limit a database to before speed and efficiency start to go down? These are things I’m trying to get a better understanding of.

John

Hi John,

Interesting case you present.
I believe you ask 2 questions:

  1. how many solutions can Servoy server handle.
  2. when to partition/move my data/tables into separate databases.

I think the answer to both questions is simple. Infrastructure.
In other words hardware and network.
Most of the heavy lifting is done by the database server so if you have a beefy server with a nice speedy RAID/SAN for storage you can take on the world (okay, sorta ;) ).
Ditto with the machine that runs Servoy server. It scales on hardware.

I also mentioned the network. When you have multiple remote (WAN) locations that need to access the solutions and the network connection is poor (internet) then that might be a reason to partition the data and move it to a more local server.

As for the amount of tables in your database slowing you down. Well the number of tables doesn’t matter much I believe. The number of joins over tables in a query does however. The more joins in 1 query the slower your query will become. So when you model your database realize that you don’t have to normalize all your data to the max. Storage is cheap, CPU time isn’t. So some redundant storage might be a good thing in terms of query speed.

I believe the only reason that you have to partition your data is when your tables are growing so big that partitioning them is the only way to get decent performance. But even then you can do this in the same database as well and transparent to Servoy.
Of course this scenario only comes into play with many many millions of rows.

And I guess the last 2 reasons to use 1 or more databases or Servoy servers, in your situation, is preference and budget. Not really technical reasons :)

Hope this helps.

Hi Robert,

Thanks for your thoughts. I’m still mulling over what is the best way to proceed. This is for a major project here that I don’t yet have the go ahead on but part of getting the go ahead is I need to have a pretty good plan in place as to how to proceed and why I am choosing that. :)

You bring up some good points. I think you are right and I’ll probably go with one central database holding all the actual data and one machine just dedicated to that. This will mean a very large database holding hundreds of tables in the end. All solutions will use a core group of tables (patient demographics, visits, diagnosis, etc.) but then each department (urology, head & neck, lymphoma, etc.) will each have their own set of tables unique to their type of cancer. So the ‘joins’ for the most part will be quite different for each department/query/solution.

I, like you, try not to ‘over-normalize’ my data, partly because it also makes my Servoy solutions more complicated! :) However one thing that only recently really sunk in (while looking at datawarehousing/column-based databases) is that every query of a RDBMS retrieves the entire row of a selected table even if you are only ‘going after’ one column. So the number of columns in a table has an effect on speed and thus one of the benefits of optimized normalization.

As for the number of Servoy servers/solutions/Servoy repositories, I’m leaning towards separating those out on a department basis for a few reasons in no particular order:

  1. I myself don’t like it when I have too many solutions/modules running off the same Servoy server/repository. I can’t actually say that I’ve seen a slow down ever but I don’t like scrolling through a long list of unrelated solutions/modules. Not a good reason but it’s mine!

  2. If I break it down so that every department has its own machine which handles/houses both the Servoy Server and the servoy_repository database (only), then a) it is a little simpler in terms of cost/pricing per department and b) a decent machine capable of running that isn’t very costly. The core modules will be on every department machine and then be utilized as the basis of their individual, department-related solutions. Updating the core modules will mean importing them into the department machines but they’ll all be close by. (And if/when we upgrade to Servoy 4.0/Eclipse then it should be even easier to make changes across all repositories).

Up until now though I have always run the servoy_repository as a separate database but on the same database server as my data whether that was MS SQL Server, Oracle or iAnywhere. So this would be a departure from that, i.e. where Servoy gets its logic from one database server (same machine as the Servoy server) and its data from another machine. Network is fast normally so that shouldn’t be an issue. But still not 100% sure which way I’ll go… Not even sure which vendor for the main central data/database. That will depend somewhat on how/who’s hosting it. If it’s me and I have to DBA it, then I’ll probably go with iAnywhere and hope that it is capable of handling the (potential) hundreds of tables - no reason to think not, just don’t know. Musings on a cold (for California) Saturday afternoon (but sunny of course!).