Context processing in multi-table schemas
MOSTLY AI provides powerful features for the training on and generation of multi-table datasets. For such datasets, the Sequential Context Processor (SCP) is a key feature that aids the generation of nested child tables with rich context from any surrounding tables. Thus, synthetic data that you generate with MOSTLY AI can fully retain the existing correlations between nested linked tables in complex schemas.
Example of context processing
To examine how sequential context processing works, let’s consider a multi-table scenario with six tables, as listed below.
customerstableloanstablepaymentstableaccountstablecardstabletransactionstable
The diagram below illustrates the relationships in this table schema.
For ease of reading, the text below refers to generating records in the context of other records. However, the same rules of context apply when training MOSTLY AI generators.
So, when you read generated in the context of, bear in mind that trained on in the context of applies just as equally.
customers table
At the top of the hierarchy is the customers table. This table acts as the primary context for all child and grandchild tables that follow in the hierarchy. This is also known as a subject table.
loans table
The loans table is the first child table in the hierarchy. Along with the parent customers table, both tables represent a two-table scenario, where the customers table is a subject table and the loans table is a linked table.
The records in the loans table are generated in the context of the parent records from the customers table.
In addition, each record in the loans table is generated in the context of all same-table sibling records. Let’s consider how the loans records are generated for the parent Lucia Garcia.
After the first loan record for Lucia Garcia is generated, that first record is then used as context for the second loan record of Lucia Garcia.
Afterwards, all previously generated sibling records (with parent Lucia Garcia) are passed as context for each new sibling to be generated.
To summarize, when MOSTLY AI generates a loan record, it does so in the context of:
- parent
customerrecord (Lucia Garcia) - same-table sibling
loanrecords (the secondConsumerloan is generated in the context of the first two loansConsumerandMortgageofLucia Garcia)
payments table
The payments table is the first grandchild table in the hierarchy. It is a child to the loans table, and a grandchild to the customers table.
When MOSTLY AI generates a payment record, it does so in the context of:
- the grandparent
customerrecord (Lucia Garcia) - parent
loanrecord (Consumerloan) - parent sibling
loanrecords (theMortgageand the twoConsumerloans) - same-table sibling
paymentrecords (thepaymentrecords that belong to the sameConsumerloan)
The context not used is as follows:
- same-table cousin records (
paymentrecords that belong to otherloanparent records)
accounts table
The accounts table is the second child to the customers table and a sibling to the loans table. Just like the customers records, the accounts records are generated in the context of the customers records.
However, what the Sequential Context Processor includes is also the loans records as context. This means that every time an account record is generated, MOSTLY AI provides as context:
- parent
customerrecord (Lucia Garcia) - cross-table sibling
loanrecords (the 2Consumerloans and theMortgageloan) - same table sibling
accountrecords (theSavingsaccount is passed as context when generating theCheckingaccount)
The context not used is as follows:
- any records from the
paymentstable
cards table
The cards table is the second grandchild table in the hierarchy. It is a child to the accounts table and a grandchild to the customers table.
When MOSTLY AI generates a card record, it does so in the context of:
- the grandparent
customerrecord (Lucia Garcia) - the parent
accountrecord (theSavingsaccount ofLucia Garcia) - all same-table parent sibling
accountrecords (theCheckingaccount ofLucia Garcia) - all cross-table parent sibling
loanrecords (the 2Consumerand 1Mortgagerecords) - all previously generated same-table sibling
cardrecords
The context not used is as follows:
- all cross-table cousin records from the
paymentstable (paymentsrecords whose parentloanrecord has as parentLucia Garcia) - all same-table cousin records from the
cardstable (cardsrecords that have anotheraccountas a parent whose parent isLucia Garcia)
transactions table
The generation of the transactions records occurs with the richest context compared to the rest of the tables.
When MOSTLY AI generates transaction records for anaccount (for example, the Savings account as shown in the diagram below), it does so in the context of:
- the grandparent
customerrecord (Lucia Garcia) - the parent
accountrecord (theSavingsaccount that belongs toLucia Garcia) - all same-table parent sibling
accountrecords (theCheckingaccount that also belongs toLucia Garcia) - all cross-table parent sibling
loanrecords (the 2Mortgageand 1Consumerloans that belong toLucia Garcia) - all cross-table sibling
cardrecords (theDebitandCreditcards that also belong to the same parentSavingsaccount) - all previously generated same-table sibling
transactionrecords
The context not used is as follows:
- cross-table cousin records from the
cardstable (cardsthat have anotheraccountas parent whose parent isLucia Garcia) - cross-table cousin records from the
paymentstable (paymentsrecords whose parentloanrecord belongs toLucia Garcia) - same-table cousin records from the
transactionstable (transactionrecords with a differentaccountparent record that belongs toLucia Garcia)
Summary of context processing scenarios
The table below summarizes the types of records that can be passed as context by the Sequential Context Processor.
| Records types | TABULAR | LANGUAGE |
|---|---|---|
| parent | yes | yes |
| grandparent | yes | yes |
| same-table siblings | yes | yes |
| cross-table siblings | yes | yes |
| same-table parent siblings | yes | X |
| cross-table parent siblings | yes | X |
| same-table cousins | X | X |
| cross-table cousins | X | X |