>
Home

Latest Post

Did Postgres Lose My Data?

Hello, let me introduce myself. Today, we’ll pretend that I’m a linguistics researcher. I don’t know much about databases, but I do know a lot about the Balti language of northern Pakistan. That’s why I’m excited about my current translation project.

The most interesting part of this translation project is that I’m doing some computer-driven analysis of a large body of text in the Balti language. For that, I’m going to need a database. You – dear friend – will help me with this! (Because you are an expert with databases!)

Our favorite database is PostgreSQL, so we should use it for my analysis work. You create an EC2 instance running an LTS release of Ubuntu and you create a database for me on the instance:

aws ec2 run-instances --key-name mac --instance-type t2.micro --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=research-db}]' --image-id ami-0172070f66a8ebe63 --region us-east-1

sudo apt install postgresql-common
sudo sh /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
sudo apt install postgresql-15

create database research_texts template=template0 locale_provider=icu icu_locale="en-US"

Pretty easy!

Now I expect to be loading a very large amount of data for analysis. With a little help, I got DBeaver running on my computer and a connection to our “research_texts” database. After some discussion we decide that I should partition the main table, which has a huge list of Balti words along with cross-references and notes for each one.

As the wikipedia page for Balti explains, most people in this region of Pakistan use a Perso-Arabic alphabet which has between 40 and 50 characters. We’ll divide my table into four partitions, with about a quarter of the alphabet each. (And just to be safe we’ll add a default partition as well.)

create table arabic_dictionary_research (
  word text,
  crossreferences text,
  notes text
) partition by range (word);

create table arabic_dictionary_research_p1 partition of arabic_dictionary_research
  for values from ('ا') to ('ح');
create table arabic_dictionary_research_p2 partition of arabic_dictionary_research
  for values from ('ح') to ('س');
create table arabic_dictionary_research_p3 partition of arabic_dictionary_research
  for values from ('س') to ('ل');
create table arabic_dictionary_research_p4 partition of arabic_dictionary_research
  for values from ('ل') to ('ے');
create table arabic_dictionary_research_p5 partition of arabic_dictionary_research
  default;

And finally my work begins! This isn’t my real research, but lets put some example data into the table to get an idea what it might look like.

insert into arabic_dictionary_research
  select 'ب'||generate_series(1,1000), repeat('important cross-references!',100), 'notes'
    union all
  select 'د'||generate_series(1,1000), repeat('important cross-references!',100), 'notes'
    union all
  select 'م'||generate_series(1,1000), repeat('important cross-references!',100), 'notes'
    union all
  select 'ࣈ'||generate_series(1,1000), repeat('important cross-references!',100), 'notes'
    union all
  select 'ق'||generate_series(1,1000), repeat('important cross-references!',100), 'notes'
    union all
  select 'ٶ'||generate_series(1,1000), repeat('important cross-references!',100), 'notes'
;

select word,length(crossreferences),notes from arabic_dictionary_research where word='د100';
select word,length(crossreferences),notes from arabic_dictionary_research where word='ࣈ100';

Update 27-Mar-2023: I published this article over the weekend using my personal computer at home. When I got to work this morning I glanced at the article and saw a square box appearing in many SQL queries where I expected to see the Perso-Arabic letter Graf. That character displays correctly from my home computer (macOS 13 Ventura) and my iPhone (iOS 16) but not on my work laptop (macOS 12 Monterey). I now realize that many readers of this article will also see the box, depending on how up-to-date your mobile or computer operating system is. I’ve added a thumbnail here which links to the image from my phone, so that readers can see how the text is intended to appear. If you are following along with this article by copying and pasting commands, feel free to exclude the lines with a box character if you’d prefer. The examples still work without those lines.

Note: since Arabic Unicode characters have the “right-to-left” attribute set, these words are rendered right-to-left in DBeaver on my Mac.

Well let me tell you: linguistics is hard work. After a month I’ve made a lot of progress. However, my Machine Learning jobs are now using lots of resources on the database server and it slows down my ad-hoc SQL analysis a lot. Separately, I have a growing concern about making extra sure we don’t lose all the work I’ve done so far.

Continue reading

What is Ardent?

ADJECTIVE:
1. Warmth of feeling; passionate
2. Strong enthusiasm or devotion; fervent
3. Burning/fiery or glowing/shining
(American Heritage Dictionary)

Social

As of 2022: I'm on Twitter a lot. On Slack when I have time. Haven't been on IRC for a long time. I've de-supported all other (old) social accounts listed here, but I'll keep them handy for the Zombie Apocalypse.

Slack: jer_s@postgresteam
IRC: jer_s@FreeNode (#postgresql, #ansible, #oracle, ##oracledb)

AIM, MSN, Google: jeremy.schneider@ardentperf.com
Yahoo: ardentperf
ICQ: 614052660

Twitter

Disclaimer

This is my personal website. The views expressed here are mine alone and may not reflect the views of my employer.

contact: 312-725-9249 or schneider @ ardentperf.com


https://about.me/jeremy_schneider

oaktableocmaceracattack

(a)

Enter your email address to receive notifications of new posts by email.

Join 56 other subscribers
%d bloggers like this: