Title
#dev
Yoni Augarten

Yoni Augarten

10/09/2022, 1:20 PM
Hey @Vaibhav Kumar, I understand you are trying to install airflow using Docker Compose, and running into an issue when adding the lakeFS airflow provider as a requirement. Can you share the final Compose file you are trying this with?
v

Vaibhav Kumar

10/09/2022, 1:24 PM
Here is the curl as well
curl -LfO '<https://airflow.apache.org/docs/apache-airflow/2.4.1/docker-compose.yaml>'
Barak Amar

Barak Amar

10/09/2022, 1:27 PM
Hi @Vaibhav Kumar, I think the issue is that you set the additional lakefs package by editing the docker-compose at the place it is set to ''. (the init container) This is a fix to solve the issue of installing packages as root.
1:27 PM
You need to pass the value as environment variable and it will set it at the right places (except this one).
1:28 PM
A quick way to do it, is creating a
.env
file in the same directory with the values you like:
_PIP_ADDITIONAL_REQUIREMENTS=airflow-provider-lakefs
1:28 PM
I'll also recommend
echo -e "AIRFLOW_UID=$(id -u)" >> .env
based on the docs
1:28 PM
let me know if it solves your docker compose up issue.
v

Vaibhav Kumar

10/09/2022, 1:29 PM
Ok, so my .env will have the pip and airflow uid right?
1:30 PM
And the original docker compose should be kept intact
Barak Amar

Barak Amar

10/09/2022, 1:30 PM
yes, the env vars docker compose will use.
1:30 PM
don't forget to run docker compose down before
1:33 PM
just verify that my what I wrote is true - you modified the docker-compose file, right? didn't set env with the above value?
1:34 PM
The above docker-compose doesn't include image build spec, just using a pre-built images, so I assumed the above.
1:35 PM
Unless you are trying to build a new image that includes lakefs package - so please paste the command line you used to build it.
v

Vaibhav Kumar

10/09/2022, 2:31 PM
I have done the changes as per your suggestions, it worked. But now while running the lakefs dag example I get below error
Yoni Augarten

Yoni Augarten

10/09/2022, 2:41 PM
@Vaibhav Kumar, this looks a lot like an issue that has already been resolved - see this thread. Can you let me know which version of the lakeFS provider you are using?
2:45 PM
@Vaibhav Kumar, I think I know what's going on here. It seems like we forgot to fix this issue for the example file that you are using.
2:51 PM
I've opened a PR to try and handle that. The team will test it during the week. You are welcome to try it yourself and let me know if it worked.
v

Vaibhav Kumar

10/09/2022, 3:04 PM
Thanks, meanwhile I will try to start my work by freezing it to 0.42 version.hope that works
Yoni Augarten

Yoni Augarten

10/09/2022, 5:09 PM
Where are you getting the example DAG from?
5:10 PM
Also, please share errors in textual format rather than a photo
v

Vaibhav Kumar

10/09/2022, 5:12 PM
<https://github.com/treeverse/airflow-provider-lakeFS/blob/main/lakefs_provider/example_dags/lakefs-dag.py>
5:12 PM
Is that fine?
Yoni Augarten

Yoni Augarten

10/09/2022, 5:13 PM
So you're using the script from the main branch with version 0.42 from the provider?
v

Vaibhav Kumar

10/09/2022, 5:17 PM
yes
5:18 PM
I know the example is not from 0.42.0, where can I find it?
Yoni Augarten

Yoni Augarten

10/09/2022, 5:19 PM
I think it's better if you used the most up-to-date provider, and modified the script according to the above PR
Amit Kesarwani

Amit Kesarwani

10/11/2022, 4:21 PM
@Vaibhav Kumar I have a Docker container with Airflow and Airflow demo notebook (everything packaged in one container): https://github.com/treeverse/lakeFS-samples/tree/main/03-apache-spark-python-demo Also, I have a webinar tomorrow on Airflow + lakeFS. You can join: https://lakefs.io/event/troubleshoot-and-reproduce-data-with-apache-airflow/
v

Vaibhav Kumar

10/11/2022, 5:13 PM
Thanks for sharing this @Amit Kesarwani
Amit Kesarwani

Amit Kesarwani

10/11/2022, 5:34 PM
v

Vaibhav Kumar

10/14/2022, 2:00 PM
Hello, I am not aware of the below syntax, though I can understand the file is being copied in the below syntax, here is the file for any reference. Can someone help?
command: AWS_ACCESS_KEY_ID=${{ env.KEY }} AWS_SECRET_ACCESS_KEY=${{ env.SECRET }} aws s3 cp --endpoint-url=<http://s3.local.lakefs.io:8000>
<s3://example-repo/main/path/to/_SUCCESS> -
2:55 PM
@Ariel Shaqed (Scolnicov) @Yoni Augarten Can you guys please help on the above question?
n

Niro

10/15/2022, 4:11 PM
@Vaibhav Kumar just to understand you correctly. You are trying to understand what this step does, or you have a different question?
v

Vaibhav Kumar

10/15/2022, 4:17 PM
First, what this step does is my question?
n

Niro

10/15/2022, 4:22 PM
This step waits until lakeFS DAG completes successfully by polling on the SUCCESS file before continuing to the next steps
v

Vaibhav Kumar

10/15/2022, 5:20 PM
Polling? But I see a copy command? Where is the DAG being called in this command?
n

Niro

10/15/2022, 5:24 PM
This is just a means to an end. This command uses a github action called retry https://github.com/nick-fields/retry which tries the following copy command until it succeeds. The _SUCCESS file will be available only at the end of the lakeFS airflow when this file is merged to
main
branch. Once this happens, the copy command succeeds and we continue to the next step. You can follow the lakeFS DAG code to understand the flow better.