Personal Programming Notes

To err is human; to debug, divine.

Best Friend Forever

BFF is the name of the problem C in Google Code Jam 2016, Round 1A. The summarized problem statement is as follows:

Every kid in your class has a single best friend forever (BFF).
You want to form the largest possible circle of kids such that each kid in the circle is sitting directly next to their BFF, either to the left or to the right.
Give a line that contains N integers F1, F2, ..., FN, where Fi is the student ID number of the BFF of the kid with student ID i, find the greatest number of kids that can be in the circle.

Bash Trap

In this post, we discuss common usage of bash trap to ensure proper cleanup operations in Bash scripts. It also discusses a common idiom trap cleanup INT TERM EXIT where other signals such as INT and TERM is also trapped in addition to EXIT. While such idiom could be valid in some Unix system, it is usually redundant and can be simply wrong (duplicate executions) in most cases, as shown on Mac. A simple test is provided to verify if such idiom is applicable in your current system.

Vertica: Refresh Your Projections

Most information presented in this post is directly quoted from this page.

Epoch: An epoch is 64-bit number that represents a logical time stamp for the data in Vertica. The epoch advances when the logical state of the system changes or when the data is committed with a DML operation (INSERT, UPDATE, MERGE, COPY, or DELETE). The EPOCHS system table contains the date and time of each closed epoch and the corresponding epoch number of the closed epoch.

epochs table
1
2
3
4
=> select * from epochs;

epoch_close_time            epoch_number
2016-03-04 21:44:24.192495    610131

Ancient History Mark (AHM): A large epoch map can increase the catalog size. The ancient history mark is the epoch prior to which historical data can be purged from physical storage. You cannot run any historical queries prior to the AHM. By default, Vertica advances the AHM at an interval of 5 minutes.

There are scenarios that the ancient history marker does not advance: there is an unrefreshed projection. To find about the unrefreshed projection, use the following command:

1
SELECT * FROM projections where is_up_to_date = 'f';

It was already mentioned in the HPE page that AHM will not advance if there’s any projection not up to date. However, it also means that AHM will also not advance if there’s no activity (data insert/update or delete) on a table. AHM could lag behind at the create epoch of some unrefreshed projection. Therefore, we need to make sure we are always refreshing projections after creating them.

Generally, you can refresh a projection by executing the START_REFRESH meta-function, which is a background process, or the REFRESH meta-function, which is a foreground process.

1
select START_REFRESH();

Links

  1. Epoch and AHM
  2. Best Practices

Use One Mocking Framework ONLY

We know that mocking is a critical enabler for unit tests and automated functional tests that don’t require networks and databases and can complete in reasonable time. In a large corporate such as Intuit, different business groups tend to adopt different mocking tools/frameworks for their development and test automation needs. The choice of mocking framework is usually decided by personal preference and experience of few key members of development/automation team. Mocking tools work by integrating with and replacing critical parts of the Java Class Loader. It means that having multiple mocking tools in use will lead to those tools contend to replace the class loader in JVM. This will lead to complex and unexpected consequences and, as a result, random test failures and unreliable tests. For example, we might have tests that work fine locally but start failing when running in combination with others (using other mocking tools) because different mocking frameworks take over the class loader in different order or in different ways.

To fix that, we need to standardize and settle early on a single mocking framework for an organization or a project. Sadly, this is often overlooked before it is too late.

Symlinks in Git

Let’s say we have folders with many symbolic links in them, linking to other files in the same Git repository.

Before
1
2
$ ls -l link
... link -> /path/to/target

Unfortunately after committing into Git, they’ve turned into plain text files. Note that even after committing and pushing into Git, the symlinks still work fine. However, after some branch switches and code merges, the symlinks become actual text files with the link target as the contents.

After
1
2
$ cat link
/path/to/target

Vertica Projections

Projections are key in Vertica performance tuning. Details of Vertica projections are discussed in the following blog posts from HP-Vertica:

  1. https://www.vertica.com/2011/09/01/the-power-of-projections-part-1/
  2. https://www.vertica.com/2011/09/02/the-power-of-projections-part-2/
  3. https://www.vertica.com/2011/09/06/the-power-of-projections-part-3/

In summary, Vertica projections represent collections of columns (like table) but they are optimized for analytics at the physical storage structure level and they are not constrained by the logical schema. For each regular table, Vertica requires a minimum of one projection, called a “superprojection”. Vertica creates a default super-projection when running CREATE TABLE statement. Part 3 also compares Vertica projections with “Materialized Views” and “Indexes” in traditional databases.

For Vertica performance tuning, we create multiple projections, customize them and parameters of each projection to achieve the best performance. Database Designer is a tool provided by Vertica to help us find the optimal projections, based on data statistics and frequent queries.