script - python sql pandas

Moving back and forth between an on-disk database and a fast in-memory database? (2)

Frankly, I wouldn't fool around too much with in-memory databases, unless you really do need an indexed structure that you know will always fit entirely within available memory. SQLite is extremely smart about its I/O, especially when you wrap everything (including reads ...) into transactions, as you should. It will very efficiently keep things in memory as it is manipulating data structures that fundamentally live on external storage, and yet it will never exhaust memory (nor, take too much of it). I think that RAM really does work better as "a buffer" instead of being the primary place where data is stored ... especially in a virtual storage environment, where everything must be considered as "backed by external storage anyway."

Python's sqlite3 :memory: option provides speedier queries and updates than the equivalent on-disk database. How can I load a disk-based database into memory, do fast operations on it, and then write the updated version back to disk?

The question How to browse an in memory sqlite database in python seems related but it focuses on how to use a disk-based browsing tool on an in-memory db. The question How can I copy an in-memory SQLite database to another in-memory SQLite database in Python? is also related but it is specific to Django.

My current solution is to read all of the tables, one-at-a-time, from the disk-based database into lists of tuples, then manually recreate the entire database schema for the in-memory db, and then load the data from the lists of tuples into the in-memory db. After operating on the data, the process is reversed.

There must be a better way!

The answer at How to load existing db file to memory in Python sqlite3? provided the important clues. Building on that answer, here is a simplification and generalization of that code.

It eliminates eliminate the unnecessary use of StringIO and is packaged into reusable form for both reading into and writing from an in-memory database.

import sqlite3

def copy_database(source_connection, dest_dbname=':memory:'):
    '''Return a connection to a new copy of an existing database.                        
       Raises an sqlite3.OperationalError if the destination already exists.             
    script = ''.join(source_connection.iterdump())
    dest_conn = sqlite3.connect(dest_dbname)
    return dest_conn

if __name__ == '__main__':
    from contextlib import closing

    with closing(sqlite3.connect('pepsearch.db')) as disk_db:
        mem_db = copy_database(disk_db)

    mem_db.execute('DELETE FROM documents WHERE uri="pep-3154"')

    copy_database(mem_db, 'changed.db').close()