question on logging for `lakectl dbt create-branch...
# help
u
question on logging for
lakectl dbt create-branch-schema
u
i am invoking the command like so:
lakectl dbt create-branch-schema --branch my_branch --log-level DEBUG --log-output log.log
u
it’s failing with an exit code of 1, but the log file only has a single log line that doesn’t seem to have anything to do with the failure… all i see is that it loaded some configuration
u
any pointers on where to look next?
u
Hi @Verun Rahimtoola and welcome! 🙂 First, here is the doc referring to lakeFS and DBT integration https://docs.lakefs.io/integrations/dbt.html. And can you please send the logs?
u
the log file contains this single line:
time="2022-01-24T17:26:03-05:00" level=debug msg="loaded configuration from file" func=<http://github.com/treeverse/lakefs/cmd/lakectl/cmd.glob..func68|github.com/treeverse/lakefs/cmd/lakectl/cmd.glob..func68> file="/home/runner/work/lakeFS/lakeFS/cmd/lakectl/cmd/root.go:67" fields.file=/Users/verun.rahimtoola/.lakectl.yaml file="/home/runner/work/lakeFS/lakeFS/cmd/lakectl/cmd/root.go:67"
u
ah… hold on. i opened up my
.lakectl.yaml
file just now and the
db_location_uri
seems to be clearly wrong
u
or is that not relevant here? the
server.endpoint_url
is definitely correct
u
Can you send also your lakectl configuration file?
u
And I'd love to hear more details about your use case and how you're running lakeFS 🙂
u
hi @Lynn Rozen sorry for the delay.. i’m attaching my `.lakectl.yaml`file. the values i use for
<secret access key>
,
<access key id>
and
<endpoint url>
are definitely correct
u
the use case is the following; given a specific tag/commit of the data, we want to set up our data warehouse schema (which we manage with DBT) to reflect that data so that queries run against it, end up running against that specific commit of the data. i hope i am explaining that properly!
u
Sure, thanks! I am checking it and will get back to you.
u
Hi, I believe the metastore part should be this way:
Copy code
metastore:
  type: hive
  hive:
    uri: <thrift://hive-metastore:9083>
and hive uri must be configured with the correct hive metastore URI. You should run the command from the root of the dbt project, or set it by using the flag 
--project-root
. Also, to be sure, is there any error printed to the lakectl output?
u
hey, sorry, no errors… i’ve updated my
~/.lakectl.yaml
file to have the metastore section as you’ve shown
u
but still the exit code is 1
u
i used
strace
to hunt for clues and i know that it does invoke the
dbt
command… and that is the command that’s failing
u
ok, so it looks like our
dbt
set up is broken
u
that’s probably why it’s failing. i can’t run
dbt
directly either… i’ll fix that and return to this issue
u
Ok, looking forward to hear if it worked 🙂
u
@Lynn Rozen on our side we are working on sorting out our DBT issues. in the meantime, do you think this command is the right tool for the use case we have? ie, if you have a certain commit of the data and you want to set up tables in your warehouse against that data?
u
I believe so, your use case seems compatible for this command and lakeFS-dbt integration.
u
awesome, good to hear - thank you
u
Of course!
u
hi, so i’ve made some progress… i’m working from a container that has both
lakectl
and
dbt
working properly (individually)… and now when i invoke the command, this is the output:
Copy code
bash-4.2# lakectl dbt create-branch-schema --branch verun_test --continue-on-error --skip-views --log-level DEBUG
DEBU[0000]/home/runner/work/lakeFS/lakeFS/cmd/lakectl/cmd/root.go:67 <http://github.com/treeverse/lakefs/cmd/lakectl/cmd.glob..func68()|github.com/treeverse/lakefs/cmd/lakectl/cmd.glob..func68()> loaded configuration from file                fields.file=/root/.lakectl.yaml file=/root/.lakectl.yaml
dbt debug succeeded with schema dbt_msk
EOF
Error executing command.
u
once again the logs aren’t being generated with any useful diagnostics. is there another location
lakectl
logs to?
u
also, the
uri
in the
metastore
section needs to be set without the
thrift://
protocol prefix, otherwise a “too many colons” error results
u
i’ll note that the command runs for quite a while before that
EOF
error so it’s definitely doing more work this time than it was…
u
the
lakectl
version i’m using is:
0.57.2
and
dbt
version is:
0.19.2
u
Hi, first I opened an issue to make the error logs more informative (I'll update it with more information as we go). Did you run the command from the root of the dbt project?
u
our dbt project is set up so that the
dbt_project.yml
file is at the base directory, and all our models etc are defined in sub-directories under a
src/
directory
u
so i had to launch
lakectl
from within the
src
directory (ie, one level below where the
dbt_project.yml
file lives)
u
So can you please try to add to the command the flag
--project-root
with the relevant root directory?
u
the root directory is where the
dbt_project.yml
file lives?
u
ok, i have just relaunched…
u
😕 same error,
EOF
u
Ok, so lets try a few things. 1. First run dbt with a scheme that points to lakeFS main branch. Then run the create-branch-schema command with your relevant branch. 2. Also, I think that maybe the error you get is an error from the metastore, so if there is a property of dbfs location in your configuration file - lets try to remove it.
u
yeah the metastore file doesn't have the dbfs location
u
👍 Just to make sure - you created the branch
verun_test
in lakeFS? And what is the location of
dbt_msk
schema?
u
hmm… yes I did create the branch
verun_test
already, but I'm actually not sure what the location of
dbt_msk
schema is. can you clarify what assumptions the command makes, for its proper execution?
u
is the existing schema expected to be in a certain location relative to the branch you created?
u
For the proper execution first lets create a schema with the location of main branch, something like
<lakefs://some-repo/main/>
. After that run dbt, and then run the current command of
lakectl dbt create-branch-schema --branch verun_test --continue-on-error --skip-views --log-level DEBUG
u
so we already have our main branch at ``lakefs://origin/master``
u
and our current data warehouse schema has that as the backing data
u
is there something else I need to do, before I can run the create-branch-schema command?
u
Can you run SHOW CREATE TABLE dbt_msk with spark sql? Does it points to master branch in lakeFS?
u
it would be super helpful to get a log of what is being attempted behind the scenes so we can diagnose these issues better
u
Can you run SHOW CREATE TABLE dbt_msk with spark sql?
yes!
u
we are able to run normal sql queries right now using Spark SQL
u
Yes, I opened an issue for that, and we'll release a new version with more informative logs. We'll publish it ones it's out. 🙂
u
awesome! thanks
u
For now, I think I'll have to dig in a little bit deeper so we can solve the issue, so I'll get back to you sometime tomorrow. Thanks for your patience and have a great evening!
u
sure, thanks for your help! chat tmrw