Database of Databases

(dbdb.io)

156 points | by kiyanwang 14 days ago

24 comments

  • danpalmer 14 days ago

    Well I guess that answers it, SQLite is the database of all databases.

    > https://github.com/cmu-db/dbdb.io/blob/master/dbdb/settings....

    • punnerud 14 days ago
      • overcast 14 days ago

        Are these websites just completely unable to handle a 1000 simultaneous requests?

        • mimischi 14 days ago

          Someone should make a database of caches or CDNs. Then combine knowledge! /s

        • polote 14 days ago

          This is more in the range of 10s of requests per second at this position on HN

          • etxm 14 days ago

            They should have used a JSON file as the backend.

          • samsquire 14 days ago

            Writing a simple query engine can be accomplished with a trie and a hash map to be a bit like Dynamo DB and SQL is straightforward to parse (but a pain to implement).

            https://github.com/samsquire/hash-db https://github.com/samsquire/sql-database

            These are my very straightforward codebases where I've been experimenting with SQL parsing (and execution) and dynamo db style querying.

            You need an efficient range operator to implement a database.

            • altrunox 14 days ago

              Damn, I don't think I've read about most of the site top database... Maybe I should try to use something different than PostgreSQL, at least in some side projects.

              • fizixer 14 days ago

                Any good learning resources for implementing your own DBMS, and DB internals? (not DB theory, not SQL, not intro).

              • freeopinion 13 days ago

                https://dbdb.io/browse?foreign-keys=supported&query-interfac...

                x Supports Foreign Keys

                x Supports Stored Procedures

                x Supports SQL Query Interface

                No Postgresql?

                • brummm 14 days ago

                  But does it include itself?

                  • hackbinary 14 days ago

                    Also reminded me of the awesome list of awesome lists:

                    https://github.com/sindresorhus/awesome

                    • samsquire 14 days ago

                      They're using SQLite which is single threaded and single user by design. So this website will not be able to service that much traffic.

                      • simonw 14 days ago

                        SQLite isn't single threaded - and it can support multiple readers very well.

                        The limitation with SQLite is that it doesn't support concurrent writes well - it needs to take a lock on the entire database to perform a write.

                        Writes are crazy fast (a few ms) so this often isn't a problem - but it does mean you wouldn't want to use it to build a site that has many people writing at once, like Hacker News for example.

                        For a site that has low (or no) writes, SQLite works really well even at a much larger scale - 100s of requests a second.

                        • the_duke 13 days ago

                          Enabling the write ahead log makes sqlite behaviour much, much better under (write) contention:

                          > PRAGMA journal_mode=WAL;

                          • simonw 13 days ago

                            Yeah I was getting occasional "database is locked" read errors on a project that had crons writing to the SQLite file which I solved by switching on WAL mode.

                            It still doesn't let you have concurrent writes but it does mean that reads won't error if a write is going on at the same time.

                        • goostavos 14 days ago

                          I've survived a few HN front pages on SQlite + $5 DO droplet. Don't disparage my primary tech stack like this ^_^

                          The traffic from a HN frontpage is relatively low per second tbh.

                        • tyingq 14 days ago

                          Should be fine for reads. Sqlite.org is dynamic, pulling from sqlite data for ~20% of the pages, and it does fine with HN piling on. The single threaded would be an issue for writes, but I don't see why they would be doing writes for a page view. See https://www.sqlite.org/whentouse.html

                          • pfraze 14 days ago

                            Also, caching layers.

                            Presumably a service uses SQLite to simplify their ops. So long as the caching layer is equally simple to maintain, then SQLite continues to make sense to me

                            • tyingq 14 days ago

                              Their repo shows the "DummyCache" (no caching) as the default for the Python/Django setup: https://github.com/cmu-db/dbdb.io/blob/master/dbdb/settings....

                              I wonder if that's how it is in production.

                              • kevincox 14 days ago

                                For a mostly static site you would want to cache "over" Django, probably in a caching proxy or CDN. Of course there are a lot of details, such as many CDNs will always go to the origin on a edge miss instead of locating another copy in the CDN. Of course running a caching nginx on the same box as the Django is probably way more performant than caching "under" Django

                                • tyingq 14 days ago

                                  It's intermittently working now. Shows the server as Apache 2.4.18 / Ubuntu. Apache can page cache, but I assume they aren't using it. I don't see any typical CDN headers either.

                                  Also, somewhat odd, they are bounding the browser-side cache to 10 minutes:

                                  Date: Wed, 16 Sep 2020 18:19:13 GMT

                                  Expires: Wed, 16 Sep 2020 18:29:14 GMT

                              • tbran 13 days ago

                                For this kind of content site, I'd convert the Django site to a static site with django-bakery [1] and stick it on Netlify.

                                I suppose there would be a bunch of rewriting to do searching (lunr.js maybe) and filtering client side.

                                Buuuuuut, at this point, just turning on caching would be easier.

                                [1]: https://github.com/datadesk/django-bakery

                              • ehsankia 14 days ago

                                Well it clearly wasn't :) Site is down already

                                • tyingq 14 days ago

                                  There's several layers. It's Python/Django. I don't think we know what the issue really is. They could be, for example, logging visits to Sqlite. Or other issues unrelated to Sqlite.

                                • remram 14 days ago

                                  Well it looks down now.

                                • bob1029 14 days ago

                                  There is zero reason a carefully engineered application utilizing SQLite cannot completely saturate the IO capabilities of the host it resides on.

                                  Going even further, there are no competing technologies (i.e. hosted SQL solutions) which, when running in single node/instance mode, are competitive with the performance of well-tuned SQLite.

                                  PRAGMA journal_mode=WAL makes all the difference in the universe.

                                  • kreetx 14 days ago

                                    .. and it does appear to be very slow at the moment.

                                  • bird_monster 14 days ago

                                    The amount of databases I'd like to play with and explore greatly outweighs the amount of time or reasons I have to explore databases.

                                    • asplake 14 days ago

                                      So this isn’t in fact instances of databases inside databases? Disappointed! Surely possible with SQLite, right? Has it been done?

                                      • throwaway894345 14 days ago

                                        I was really hoping it was a SQL interface for querying data stored in multiple databases; something like Presto.

                                      • cocktailpeanuts 14 days ago

                                        Thought it was some groundbreaking new database technology....

                                        Speaking of,

                                        What would an interesting version of "database of databases" look like?

                                        • chatmasta 13 days ago

                                          This is basically what we're building at Splitgraph [0]. We're calling it a "data delivery network." You connect to one SQL endpoint and can query (and join across) 40k+ different datasets. It's built on Postgres, and as far as your SQL client is concerned, it's talking to a Postgres database with 40k tables in it. Right now we forward queries to public data portals, but eventually you'll be able to connect live data sources to the DDN without writing any code. We want it to be as easy as configuring Cloudflare; you just upload a set of read-only credentials in the web UI and we take care of the rest. For more private use cases, we're planning to offer private deployments to AWS/GCP/Azure.

                                          Technically, this is database virtualization, which isn't really a new concept. We're implementing it as a database proxy, using PgBouncer instances to intercept queries and route them to Splitgraph engines. Within a Splitgraph engine (which is Postgres + some custom code), each "table" is either a "mounted" live database via a foreign database wrapper (FDW), or part of a point-in-time, versioned database snapshot called a "data image" that you can build with sgr.

                                          [0] https://www.splitgraph.com

                                          • tyingq 14 days ago

                                            The closest I can think of would be the "Data Governance" space. Tools like Collibra: https://www.collibra.com/

                                            Screenshot that makes it easier to understand: https://www.collibra.com/wp-content/uploads/Blog-DataLineage...

                                            • LukeEF 14 days ago

                                              TerminusHub is a database filled with databases (for collaboration): https://terminusdb.com/hub/

                                              • indymike 14 days ago

                                                Apparently SQLite. Who would have thought?

                                                • ralusek 13 days ago

                                                  Probably CosmosDB or MarkLogic.

                                                • nerpderp82 14 days ago

                                                  The lulz thing about the site is that you can't query it with SQL.

                                                  Let me download the sqlite database directly! Andy?

                                                  • thrownaway954 14 days ago

                                                    pretty cool. learned alot of new databases beyond the big 6.

                                                    this site could benefit greatly from not running on a database and being statically generated. even the browse section could just be a vuejs app powered by a json collection.

                                                    • colonelpopcorn 14 days ago

                                                      Did anyone else pronounce this like dibby-dibby-yo?

                                                      • josefrichter 14 days ago

                                                        So this is being reposted over and over again. Not saying it's not useful, but still..

                                                        • joshdance 14 days ago

                                                          Website is down.

                                                          • llacb47 14 days ago

                                                            HN Hug of death.

                                                            • maxcx 14 days ago

                                                              upvote!

                                                              • chickenpotpie 14 days ago

                                                                Reminds me of my favorite Wikipedia article: https://en.wikipedia.org/wiki/List_of_lists_of_lists

                                                              • hackbinary 14 days ago

                                                                This article reminded me of this site:

                                                                http://howfuckedismydatabase.com/

                                                                • damsdu78 14 days ago

                                                                  I don't understand the goal of this site. :|

                                                                  • eshyong 14 days ago

                                                                    It's satire written by a disgruntled engineer :P

                                                                    • hackbinary 14 days ago

                                                                      What's not to understand?

                                                                      The answers for access and Oracle are both brilliant and true.

                                                                      • ssijak 14 days ago

                                                                        Amazon affiliate links