Hi everyone! So I have this pipeline that extracts...
# help
u
Hi everyone! So I have this pipeline that extracts data from a database and sends to lakefs periodically. Everything has been working fine for over a month now until I checked my aiflow dashboard and now getting this error
Broken DAG: [/opt/airflow/dags/dagRun.py] Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/lakefs_provider/hooks/lakefs_hook.py", line 26, in <module>
class LakeFSHook(BaseHook):
File "/home/airflow/.local/lib/python3.7/site-packages/lakefs_provider/hooks/lakefs_hook.py", line 106, in LakeFSHook
def log_commits(self, repo: str, ref: str, size: int=100) -> Iterator[Any]:
TypeError: 'ABCMeta' object is not subscriptable
u
Hi @Jude, Sorry you've been having issues. We released a new version of the airflow operators yesterday, which may have broken your code. Can you freeze your version at 0.42.0 to start off with?
u
After that, I'd appreciate if you can open an issue and on it give more details, specifically of the Airflow and Python versions that you are running. THANKS!
u
You mean freeze to lakefs==0.42.0?
u
Freeze
airflow-provider-lakeFS==0.42.0
.
u
Not sure why your Airflow spontaneously updated its provider, but this is code from the new version, which makes it even more suspect.
u
Okay thanks❤️
u
And please let me know how you get along with this!
u
Alright
u
Update the issue was resolved after I updated to the version you suggested. I will go ahead and open this issue on github. Thanks aagain😎
u
Thanks for finding and opening that issue! I'm sorry to say that I can confirm it. This PR should fix it; I shall try to release this week or early next week. Once that happens you'll be able to enjoy the log commits hook in your Airflow pipelines. But I would also generally caution against allowing pypi requirements upgrades in production. No matter how often I update prod, I prefer to it on my own terms rather than those of some dependency.
u
0.43.1 (just released) should fix this. Please let me know! Thanks (and sorry).
u
Thank you for this explanation which I find really helpful. I would have to fix this pypi upgrade thing on my end to avoid issues like this from reoccurring.
u
Hi @Ariel Shaqed (Scolnicov) I am no longer able to use the lakefs file sensor operator. The same code works for me while I was still running the airflow-provider-lakefs version 0.43.0, but after I downgraded to V 0.42.0. Now I get this error. Wondering if this was also part of the changes that was made recently.
Copy code
lakefs_client.exceptions.UnauthorizedException: (401)
for extra details, my code looks like this sense_eventData = LakeFSFileSensor( task_id="sense_eventData_files", repo='test', branch='main', lakefs_conn_id='conn_1', path='bronzelayer/APIs/events.parquet' )
u
Ouch. AFAIK the only way to trigger an UnauthorizedException from the client is by using bad credentials. Sorry to bug you again, but can you re-check your credentials? Do these credentials work when you use them directly from lakectl? If you don't already have one, you'll need a file
~/.lakectl.yaml
that looks like this (just fill in the 3 fields).
Copy code
credentials:
  access_key_id: AKIA...
  secret_access_key: shhh...
server:
  endpoint_url: <https://lakefs.example.com/api/v1>
HTH!
u
Alright, let me work on that
u
Quick question, I have modifed the credentials in the .lakectl.yaml file to the current credentials I am using at the moment, but that didn't work. I can see your access_key_id started with AKIA, which was what I had in there initially before I modified it to the current one I am using in my config file
u
You mean that when you set the new credentials in
~/.lakectl.yaml
you didn't manage to use lakectl? What error do you get?
u
ERROR  [2022-06-22T10:10:32Z]lakeFS/pkg/api/auth_middleware.go:157 pkg/api.userByAuth authenticate                                  error="2 errors occurred:\n\t* built in authenticator: could not decrypt value\n\t* email authenticator: not found: no rows in result set\n\n" host="137.184.147.128:8000" method=GET path="/api/v1/repositories/test/refs/main/objects/stat?path=bronzelayer%2FWEB%2Ffootball-data.parquet" request_id=6f72101a-43c4-4964-8e23-92683f509dfc service=api_gateway service_name=rest_api user=AKIAJSEJPLE4NOPT4KEQ
u
Getting this error from the lakefs logs
u
What command did you run?
u
I didn't run any command, I manually edited the credentials in the file.
u
For me to fully understand the situation, I'd like to go step by step 🙂 1. You tried to use lakeFS file sensor operator, but you got UnauthorizedException. 2. In order to check your credential, Ariel suggested that you try first to run
lakectl
command with those credentials. So, did you had the change of using lakectl before? Did you downloaded it? Where did you get the last error?
u
Okay, firstly I actually didn't run the command to check the credentials using the lakectl. The first thing I did after he suggested that was modifying the credentials.
u
Does this mean that the credentials I parsed to the lakefs client cannot still work in my .lakectl.yaml file?
u
Did you changed
.lakefs.yaml
or
.lakectl.yaml
?
u
yes I changed the .lakectl.yaml
u
My pipeline actually works and gets my data into the right destination in lakefs, but I feel I will need this lakefs operators for extra check
u
@Jude, Two things come to mind: 1. All lakeFS access keys currently begin with
AKIA
... In particular, AFAIK even on lakeFS Cloud email+password work only for logging into the GUI -- and you must define a service credentials access key for lakectl and other REST API access. 2. Credentials to access lakeFS go into
.lakectl.yaml
. And the access key and secret key there are those that you got from lakeFS -- not your S3 credentials. Sorry if you've covered both of these. They are just my 2 favourite mistakes to make with lakeFS 🙂 , I probably get into a credentials mixup at least once a fortnight.
u
Thank you @Ariel Shaqed (Scolnicov) I think I might be making the same mistake as well, sometimes most of these keys look similar but in actual sense, they are not. At least with what you just said now, I know where to look for debugging. Thanks, guys!