Hey <@U041V401KRC>, I understand you are trying to...
# dev
y
Hey @Vaibhav Kumar, I understand you are trying to install airflow using Docker Compose, and running into an issue when adding the lakeFS airflow provider as a requirement. Can you share the final Compose file you are trying this with?
v
Here is the curl as well
curl -LfO '<https://airflow.apache.org/docs/apache-airflow/2.4.1/docker-compose.yaml>'
b
Hi @Vaibhav Kumar, I think the issue is that you set the additional lakefs package by editing the docker-compose at the place it is set to ''. (the init container) This is a fix to solve the issue of installing packages as root.
You need to pass the value as environment variable and it will set it at the right places (except this one).
A quick way to do it, is creating a
.env
file in the same directory with the values you like:
Copy code
_PIP_ADDITIONAL_REQUIREMENTS=airflow-provider-lakefs
I'll also recommend
Copy code
echo -e "AIRFLOW_UID=$(id -u)" >> .env
based on the docs
let me know if it solves your docker compose up issue.
v
Ok, so my .env will have the pip and airflow uid right?
And the original docker compose should be kept intact
b
yes, the env vars docker compose will use.
don't forget to run docker compose down before
just verify that my what I wrote is true - you modified the docker-compose file, right? didn't set env with the above value?
The above docker-compose doesn't include image build spec, just using a pre-built images, so I assumed the above.
Unless you are trying to build a new image that includes lakefs package - so please paste the command line you used to build it.
v
I have done the changes as per your suggestions, it worked. But now while running the lakefs dag example I get below error
y
@Vaibhav Kumar, this looks a lot like an issue that has already been resolved - see this thread. Can you let me know which version of the lakeFS provider you are using?
@Vaibhav Kumar, I think I know what's going on here. It seems like we forgot to fix this issue for the example file that you are using.
I've opened a PR to try and handle that. The team will test it during the week. You are welcome to try it yourself and let me know if it worked.
v
Thanks, meanwhile I will try to start my work by freezing it to 0.42 version.hope that works
jumping lakefs 1
y
Where are you getting the example DAG from?
Also, please share errors in textual format rather than a photo
v
<https://github.com/treeverse/airflow-provider-lakeFS/blob/main/lakefs_provider/example_dags/lakefs-dag.py>
Is that fine?
y
So you're using the script from the main branch with version 0.42 from the provider?
v
yes
I know the example is not from 0.42.0, where can I find it?
y
I think it's better if you used the most up-to-date provider, and modified the script according to the above PR
a
@Vaibhav Kumar I have a Docker container with Airflow and Airflow demo notebook (everything packaged in one container): https://github.com/treeverse/lakeFS-samples/tree/main/03-apache-spark-python-demo Also, I have a webinar tomorrow on Airflow + lakeFS. You can join: https://lakefs.io/event/troubleshoot-and-reproduce-data-with-apache-airflow/
v
Thanks for sharing this @Amit Kesarwani
a
@Vaibhav Kumar You can check my Dockerfile for any installation issues: https://github.com/treeverse/lakeFS-samples/blob/main/03-apache-spark-python-demo/Dockerfile
jiggling lakefs 1
v
Hello, I am not aware of the below syntax, though I can understand the file is being copied in the below syntax, here is the file for any reference. Can someone help?
command: AWS_ACCESS_KEY_ID=${{ env.KEY }} AWS_SECRET_ACCESS_KEY=${{ env.SECRET }} aws s3 cp --endpoint-url=<http://s3.local.lakefs.io:8000>
<s3://example-repo/main/path/to/_SUCCESS> -
@Ariel Shaqed (Scolnicov) @Yoni Augarten Can you guys please help on the above question?
n
@Vaibhav Kumar just to understand you correctly. You are trying to understand what this step does, or you have a different question?
v
First, what this step does is my question?
n
This step waits until lakeFS DAG completes successfully by polling on the SUCCESS file before continuing to the next steps
v
Polling? But I see a copy command? Where is the DAG being called in this command?
n
This is just a means to an end. This command uses a github action called retry https://github.com/nick-fields/retry which tries the following copy command until it succeeds. The _SUCCESS file will be available only at the end of the lakeFS airflow when this file is merged to
main
branch. Once this happens, the copy command succeeds and we continue to the next step. You can follow the lakeFS DAG code to understand the flow better.