A couple of months back we were taking a look into the functional aspects of Redshift Spectrum. One of the things we noticed was that while we were able to create views against external tables, when trying to query them you’d get an an empty result set back. For example:
--create view against external table create view attr_tbl_ext_vw as select * from spec_user.attr_tbl_ext_arch; --run select select count(*) from attr_tbl_ext_vw; count ------- 0 (1 row)
When we inspect the view DDL we see the following:
select definition from pg_views where viewname='attr_tbl_ext_vw'; SELECT attr_tbl_ext_arch.commit_ts, attr_tbl_ext_arch.batch_name, attr_tbl_ext_arch.notes_1, attr_tbl_ext_arch.notes_2, attr_tb l_ext_arch.notes_3, attr_tbl_ext_arch.status, attr_tbl_ext_arch.payment_type, attr_tbl_ext_arch.payment_method, attr_tbl_ext_arc h.order_line, attr_tbl_ext_arch.log_id FROM pg_temp_5.spec_user_attr_tbl_ext_arch_5563a6d9a6043 attr_tbl_ext_arch; (1 row)
We raised a support ticket at the time with AWS who confirmed that views on external tables were not supported:
This seemed like a real backwards blow, as if you could create views that spanned both external and standard database tables, you could seamlessly access your archived historic and current delta data, thereby reducing query/ETL complexity.
However, in a post from AWS just a few days ago it was announced that this functionality was now supported through the concept of late binding views. According to the Redshift documentation, with late binding views the underlying object dependencies are not checked until the the view is queried. So with the addition of the ‘with no schema binding’ view clause we can now create views across our in-database and S3 bound data. Let’s run a quick example, where we have a delta table in the database containing 1 row and an external table containing all our historic data:
create view public.attr_tbl_archive_all_vw as select *,'LATEST' as marker from public.attr_tbl_all_delta union select *,'HISTORIC' as marker from spec_user.attr_tbl_ext_arch with no schema binding;
Inspecting the DDL now shows the following –
select definition from pg_views where viewname='attr_tbl_archive_all_vw'; create view public.attr_tbl_archive_all_vw as select *,'LATEST' as marker from public.attr_tbl_all_delta union select *,'HISTORIC' as marker from spec_user.attr_tbl_ext_arch with no schema binding;
so, a quick query:
Looks good. The explain plan shows the access paths for both our objects:
and after executing, the scan info confirms both objects were read:
Great! What if we’re only interested in the latest delta data?
Suggests that we’re only reading from our delta table. Scan info:
Filtering on the marker attribute means we can limit the execution to only reading the delta table, thus allowing us to optimise query performance for certain known conditions and avoid unnecessary table reads.
So with the addition of late binding views, we are now able to simplify access to our data sets held both externally and within the database.