Unverified Commit 5dac1cc2 authored Feb 05, 2025 by mvdbeek

Add tool_id index on job column

In various places we filter jobs on the tool_id column.
In particular the job cache makes use of this, and this speeds up
the query a lot on my local instance, and I think is necessary on large
public instances to succeed at all now that we also look at jobs in
public histories.

Before:
```
galaxy=# EXPLAIN ANALYZE SELECT job.id, job_to_input_dataset_1.dataset_id
FROM job JOIN (SELECT job.id AS id
FROM job JOIN history ON job.history_id = history.id JOIN job_parameter AS job_parameter_1 ON job.id = job_parameter_1.job_id JOIN job_parameter AS job_parameter_2 ON job.id = job_parameter_2.job_id JOIN job_parameter AS job_parameter_3 ON job.id = job_parameter_3.job_id
                                                                                                                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1002.14..10120.13 rows=2 width=8) (actual time=120.280..120.443 rows=0 loops=1)
   Join Filter: (job_1.id = job_to_input_dataset_1.job_id)
   ->  Nested Loop  (cost=1001.85..10119.67 rows=1 width=20) (actual time=120.279..120.442 rows=0 loops=1)
         Join Filter: (job_1.id = job_parameter_2.job_id)
         ->  Nested Loop  (cost=1001.42..10118.44 rows=1 width=16) (actual time=120.279..120.441 rows=0 loops=1)
               Join Filter: (job_1.id = job_parameter_1.job_id)
               ->  Nested Loop  (cost=1001.00..10117.21 rows=1 width=12) (actual time=109.484..120.405 rows=9 loops=1)
                     ->  Nested Loop  (cost=1000.58..10108.75 rows=1 width=20) (actual time=109.437..120.341 rows=9 loops=1)
                           Join Filter: (job_parameter_3.job_id = job_1.id)
                           ->  Nested Loop  (cost=1000.29..10064.36 rows=1 width=8) (actual time=97.805..118.067 rows=450 loops=1)
                                 ->  Gather  (cost=1000.00..10056.05 rows=1 width=4) (actual time=97.660..117.294 rows=450 loops=1)
                                       Workers Planned: 2
                                       Workers Launched: 2
                                       ->  Parallel Seq Scan on job_parameter job_parameter_3  (cost=0.00..9055.95 rows=1 width=4) (actual time=72.576..108.542 rows=150 loops=3)
                                             Filter: (((name)::text = 'iterate'::text) AND (value = '"no"'::text))
                                             Rows Removed by Filter: 118083
                                 ->  Index Only Scan using job_pkey on job  (cost=0.29..8.31 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=450)
                                       Index Cond: (id = job_parameter_3.job_id)
                                       Heap Fetches: 2
                           ->  Index Scan using job_pkey on job job_1  (cost=0.29..44.38 rows=1 width=12) (actual time=0.005..0.005 rows=0 loops=450)
                                 Index Cond: (id = job.id)
                                 Filter: ((copied_from_job_id IS NULL) AND ((tool_id)::text = 'toolshed.g2.bx.psu.edu/repos/devteam/add_value/addValue/1.0.0'::text) AND (tool_version = '1.0.0'::text) AND ((state)::text = ANY ('{new,queued,waiting,running,ok}'::text[])) AND (NOT (SubPlan 3)) AND (NOT (SubPlan 1)))
                                 Rows Removed by Filter: 1
                                 SubPlan 3
                                   ->  Nested Loop  (cost=0.84..126.91 rows=7 width=0) (actual time=0.020..0.020 rows=0 loops=9)
                                         ->  Index Scan using ix_job_to_output_dataset_job_id on job_to_output_dataset  (cost=0.42..8.79 rows=14 width=4) (actual time=0.009..0.009 rows=1 loops=9)
                                               Index Cond: (job_id = job_1.id)
                                         ->  Index Scan using history_dataset_association_pkey on history_dataset_association  (cost=0.42..8.44 rows=1 width=4) (actual time=0.009..0.009 rows=0 loops=9)
                                               Index Cond: (id = job_to_output_dataset.dataset_id)
                                               Filter: deleted
                                               Rows Removed by Filter: 1
                                 SubPlan 1
                                   ->  Nested Loop  (cost=0.57..24.96 rows=1 width=0) (actual time=0.013..0.013 rows=0 loops=9)
                                         ->  Index Scan using ix_job_to_output_dataset_collection_job_id on job_to_output_dataset_collection  (cost=0.29..8.33 rows=2 width=4) (actual time=0.006..0.006 rows=1 loops=9)
                                               Index Cond: (job_id = job_1.id)
                                         ->  Index Scan using history_dataset_collection_association_pkey on history_dataset_collection_association  (cost=0.29..8.30 rows=1 width=4) (actual time=0.005..0.005 rows=0 loops=9)
                                               Index Cond: (id = job_to_output_dataset_collection.dataset_collection_id)
                                               Filter: deleted
                                               Rows Removed by Filter: 1
                     ->  Index Scan using history_pkey on history  (cost=0.42..8.44 rows=1 width=5) (actual time=0.006..0.006 rows=1 loops=9)
                           Index Cond: (id = job_1.history_id)
                           Filter: ((job_1.user_id = 1) OR published)
               ->  Index Scan using ix_job_parameter_job_id on job_parameter job_parameter_1  (cost=0.42..1.22 rows=1 width=4) (actual time=0.004..0.004 rows=0 loops=9)
                     Index Cond: (job_id = job.id)
                     Filter: (((name)::text = 'exp'::text) AND (value = '"1"'::text))
                     Rows Removed by Filter: 7
         ->  Index Scan using ix_job_parameter_job_id on job_parameter job_parameter_2  (cost=0.42..1.22 rows=1 width=4) (never executed)
               Index Cond: (job_id = job.id)
               Filter: ((value ~~ '{"values": [{"id": %, "src": "hda"}]}'::text) AND ((name)::text = 'input'::text))
   ->  Index Scan using ix_job_to_input_dataset_job_id on job_to_input_dataset job_to_input_dataset_1  (cost=0.29..0.42 rows=3 width=8) (never executed)
         Index Cond: (job_id = job.id)
 Planning Time: 14.954 ms
 Execution Time: 120.783 ms
```

After:
```
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=201.24..363.09 rows=2 width=8) (actual time=2.805..2.812 rows=0 loops=1)
   Join Filter: (job_1.id = job_to_input_dataset_1.job_id)
   ->  Nested Loop  (cost=200.94..362.63 rows=1 width=20) (actual time=2.804..2.810 rows=0 loops=1)
         Join Filter: (job_1.id = job_parameter_3.job_id)
         ->  Nested Loop  (cost=200.52..361.40 rows=1 width=16) (actual time=2.804..2.809 rows=0 loops=1)
               Join Filter: (job_1.id = job_parameter_2.job_id)
               ->  Nested Loop  (cost=200.10..360.16 rows=1 width=12) (actual time=2.487..2.777 rows=6 loops=1)
                     Join Filter: (job_1.id = job_parameter_1.job_id)
                     ->  Nested Loop  (cost=199.68..358.93 rows=1 width=8) (actual time=2.288..2.633 rows=15 loops=1)
                           ->  Nested Loop  (cost=199.25..350.48 rows=1 width=16) (actual time=2.240..2.556 rows=15 loops=1)
                                 ->  Bitmap Heap Scan on job job_1  (cost=198.96..342.17 rows=1 width=12) (actual time=2.210..2.505 rows=15 loops=1)
                                       Recheck Cond: (((tool_id)::text = 'toolshed.g2.bx.psu.edu/repos/devteam/add_value/addValue/1.0.0'::text) AND ((state)::text = ANY ('{new,queued,waiting,running,ok}'::text[])))
                                       Filter: ((copied_from_job_id IS NULL) AND (tool_version = '1.0.0'::text) AND (NOT (SubPlan 3)) AND (NOT (SubPlan 1)))
                                       Heap Blocks: exact=13
                                       ->  BitmapAnd  (cost=198.96..198.96 rows=3 width=0) (actual time=1.914..1.916 rows=0 loops=1)
                                             ->  Bitmap Index Scan on job_tool_id_idx  (cost=0.00..4.35 rows=8 width=0) (actual time=0.858..0.858 rows=19 loops=1)
                                                   Index Cond: ((tool_id)::text = 'toolshed.g2.bx.psu.edu/repos/devteam/add_value/addValue/1.0.0'::text)
                                             ->  Bitmap Index Scan on ix_job_state  (cost=0.00..194.36 rows=16656 width=0) (actual time=1.050..1.050 rows=17141 loops=1)
                                                   Index Cond: ((state)::text = ANY ('{new,queued,waiting,running,ok}'::text[]))
                                       SubPlan 3
                                         ->  Nested Loop  (cost=0.84..126.91 rows=7 width=0) (actual time=0.021..0.021 rows=0 loops=15)
                                               ->  Index Scan using ix_job_to_output_dataset_job_id on job_to_output_dataset  (cost=0.42..8.79 rows=14 width=4) (actual time=0.007..0.008 rows=1 loops=15)
                                                     Index Cond: (job_id = job_1.id)
                                               ->  Index Scan using history_dataset_association_pkey on history_dataset_association  (cost=0.42..8.44 rows=1 width=4) (actual time=0.012..0.012 rows=0 loops=15)
                                                     Index Cond: (id = job_to_output_dataset.dataset_id)
                                                     Filter: deleted
                                                     Rows Removed by Filter: 1
                                       SubPlan 1
                                         ->  Nested Loop  (cost=0.57..24.96 rows=1 width=0) (actual time=0.012..0.012 rows=0 loops=15)
                                               ->  Index Scan using ix_job_to_output_dataset_collection_job_id on job_to_output_dataset_collection  (cost=0.29..8.33 rows=2 width=4) (actual time=0.005..0.005 rows=1 loops=15)
                                                     Index Cond: (job_id = job_1.id)
                                               ->  Index Scan using history_dataset_collection_association_pkey on history_dataset_collection_association  (cost=0.29..8.30 rows=1 width=4) (actual time=0.006..0.006 rows=0 loops=15)
                                                     Index Cond: (id = job_to_output_dataset_collection.dataset_collection_id)
                                                     Filter: deleted
                                                     Rows Removed by Filter: 1
                                 ->  Index Only Scan using job_pkey on job  (cost=0.29..8.31 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=15)
                                       Index Cond: (id = job_1.id)
                                       Heap Fetches: 0
                           ->  Index Scan using history_pkey on history  (cost=0.42..8.44 rows=1 width=5) (actual time=0.004..0.004 rows=1 loops=15)
                                 Index Cond: (id = job_1.history_id)
                                 Filter: ((job_1.user_id = 1) OR published)
                     ->  Index Scan using ix_job_parameter_job_id on job_parameter job_parameter_1  (cost=0.42..1.22 rows=1 width=4) (actual time=0.009..0.009 rows=0 loops=15)
                           Index Cond: (job_id = job.id)
                           Filter: (((name)::text = 'exp'::text) AND (value = '"1"'::text))
                           Rows Removed by Filter: 7
               ->  Index Scan using ix_job_parameter_job_id on job_parameter job_parameter_2  (cost=0.42..1.22 rows=1 width=4) (actual time=0.004..0.004 rows=0 loops=6)
                     Index Cond: (job_id = job.id)
                     Filter: ((value ~~ '{"values": [{"id": %, "src": "hda"}]}'::text) AND ((name)::text = 'input'::text))
                     Rows Removed by Filter: 7
         ->  Index Scan using ix_job_parameter_job_id on job_parameter job_parameter_3  (cost=0.42..1.22 rows=1 width=4) (never executed)
               Index Cond: (job_id = job.id)
               Filter: (((name)::text = 'iterate'::text) AND (value = '"no"'::text))
   ->  Index Scan using ix_job_to_input_dataset_job_id on job_to_input_dataset job_to_input_dataset_1  (cost=0.29..0.42 rows=3 width=8) (never executed)
         Index Cond: (job_id = job.id)
 Planning Time: 12.208 ms
 Execution Time: 3.184 ms
(56 rows)
```

parent 1d31d5b3

lib/galaxy/model/init.py

+1 −1

Original line number	Diff line number	Diff line
		@@ -1461,7 +1461,7 @@ class Job(Base, JobLike, UsesCreateAndUpdateTime, Dictifiable, Serializable):
		update_time: Mapped[datetime] = mapped_column(default=now, onupdate=now, index=True, nullable=True)
		history_id: Mapped[Optional[int]] = mapped_column(ForeignKey("history.id"), index=True)
		library_folder_id: Mapped[Optional[int]] = mapped_column(ForeignKey("library_folder.id"), index=True)
		tool_id: Mapped[Optional[str]] = mapped_column(String(255))
		tool_id: Mapped[Optional[str]] = mapped_column(String(255), index=True)
		tool_version: Mapped[Optional[str]] = mapped_column(TEXT, default="1.0.0")
		galaxy_version: Mapped[Optional[str]] = mapped_column(String(64), default=None)
		dynamic_tool_id: Mapped[Optional[int]] = mapped_column(ForeignKey("dynamic_tool.id"), index=True)

lib/galaxy/model/migrations/alembic/versions_gxy/a4c3ef999ab5_add_index_on_tool_id_column_of_job_table.py

0 → 100644

+32 −0

Original line number	Diff line number	Diff line
		"""Add index on tool_id column of job table

		Revision ID: a4c3ef999ab5
		Revises: 75348cfb3715
		Create Date: 2025-02-05 14:55:13.348044

		"""

		from galaxy.model.database_object_names import build_index_name
		from galaxy.model.migrations.util import (
		create_index,
		drop_index,
		)

		# revision identifiers, used by Alembic.
		revision = "a4c3ef999ab5"
		down_revision = "75348cfb3715"
		branch_labels = None
		depends_on = None


		table_name = "job"
		column_name = "tool_id"
		index_name = build_index_name(table_name, column_name)


		def upgrade():
		create_index(index_name, table_name, [column_name])


		def downgrade():
		drop_index(index_name, table_name)

Admin message