Opened 14 months ago
Closed 13 months ago
#30753 closed enhancement (fixed)
Further improvements in method subgraph
Reported by:  dcoudert  Owned by:  

Priority:  major  Milestone:  sage9.3 
Component:  graph theory  Keywords:  
Cc:  ghkliem  Merged in:  
Authors:  David Coudert, Jonathan Kliem  Reviewers:  David Coudert 
Report Upstream:  N/A  Work issues:  
Branch:  40bacf0 (Commits, GitHub, GitLab)  Commit:  40bacf03c0633affd490bac998856618879ba271 
Dependencies:  #30665, #30769  Stopgaps: 
Description (last modified by )
We add a fast method subgraph_given_vertices
to CGraphBackend
. This method allows fast subgraph creation including copying.
Also improvements in subgraph
with a case forgotten in #30510.
Before #30665:
sage: G = graphs.Grid2dGraph(100, 100) # dynamic sparse sage: V = list(G) sage: %timeit H = G.subgraph(V[:50], algorithm='add') 280 µs ± 465 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit H = G.copy() 30.3 ms ± 72.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: G = G.copy(immutable=True) # static sparse sage: %timeit H = G.subgraph(V[:50], algorithm='add') 553 µs ± 373 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit H = G.copy(immutable=False) 31.6 ms ± 91.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: G = graphs.CompleteGraph(1000) # dynamic dense sage: V = list(G) sage: %timeit H = G.subgraph(V[:50], algorithm='add') 54.2 ms ± 54.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: %timeit H = G.copy() 527 ms ± 663 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
sage: G = graphs.Grid2dGraph(100, 100) # dynamic sparse sage: V = list(G) sage: %timeit H = G.subgraph(V[:50], algorithm='add') 257 µs ± 356 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit H = G.copy() 26.4 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: G = G.copy(immutable=True) # static sparse sage: %timeit H = G.subgraph(V[:50], algorithm='add') 521 µs ± 308 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit H = G.copy(immutable=False) 28.3 ms ± 95.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: G = graphs.CompleteGraph(1000) # dynamic dense sage: V = list(G) sage: %timeit H = G.subgraph(V[:50], algorithm='add') 51.6 ms ± 64.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: %timeit H = G.copy() 429 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
With this ticket:
sage: G = graphs.Grid2dGraph(100, 100) # dynamic sparse sage: V = list(G) sage: %timeit H = G.subgraph(V[:50], algorithm='add') 80.9 µs ± 19.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) sage: %timeit H = G.copy() 15.4 ms ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) sage: G = G.copy(immutable=True) # static sparse sage: %timeit H = G.subgraph(V[:50], algorithm='add') 327 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit H = G.copy(immutable=False) 20.3 ms ± 23 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: G = graphs.CompleteGraph(1000) # dynamic dense sage: V = list(G) sage: %timeit H = G.subgraph(V[:50], algorithm='add') 1.03 ms ± 10.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit H = G.copy() 269 ms ± 3.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Change History (32)
comment:1 Changed 14 months ago by
 Branch set to public/graphs/30753_subgraph
 Cc ghkliem added
 Commit set to 9495c5b111fb3df9b33f2f955ccbbfd44963f9f8
comment:2 Changed 14 months ago by
 Status changed from new to needs_review
comment:3 Changed 14 months ago by
Currently, the default algorithm is 'delete'
if the number of vertices of the subgraph is at least 5% of the number of vertices of the graph. But following tests suggest that 'add'
is faster when the number of vertices of the subgraph is up to 90%... so we should change the threshold to e.g. 90%. Do you agree ?
sage: G = graphs.Grid2dGraph(100, 100) sage: V = list(G) sage: %timeit H = G.subgraph(V[:50], algorithm=None) # use add 219 µs ± 7.56 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit H = G.subgraph(V[:50], algorithm='add') 213 µs ± 10.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit H = G.subgraph(V[:50], algorithm='delete') 62.3 ms ± 1.64 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: %timeit H = G.subgraph(V[:500], algorithm=None) # use add 1.85 ms ± 35.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit H = G.subgraph(V[:500], algorithm='add') 1.99 ms ± 70.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) sage: %timeit H = G.subgraph(V[:500], algorithm='delete') 64 ms ± 1.14 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: %timeit H = G.subgraph(V[:5000], algorithm=None) # use delete but add is faster here 53 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: %timeit H = G.subgraph(V[:5000], algorithm='add') 23.7 ms ± 229 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: %timeit H = G.subgraph(V[:5000], algorithm='delete') 52.3 ms ± 636 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
sage: G = graphs.CompleteGraph(1000) sage: V = list(G) sage: %timeit H = G.subgraph(V[:50], algorithm=None) # use 'add' 17.9 ms ± 408 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) sage: %timeit H = G.subgraph(V[:50], algorithm='add') 19.5 ms ± 441 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) sage: %timeit H = G.subgraph(V[:50], algorithm='delete') 962 ms ± 14.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sage: %timeit H = G.subgraph(V[:100], algorithm=None) # use 'delete' but 'add' is faster 962 ms ± 20.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sage: %timeit H = G.subgraph(V[:100], algorithm='add') 38 ms ± 828 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: %timeit H = G.subgraph(V[:100], algorithm='delete') 959 ms ± 18.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sage: %timeit H = G.subgraph(V[:500], algorithm=None) # use 'delete' but 'add' is faster 949 ms ± 18.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sage: %timeit H = G.subgraph(V[:500], algorithm='add') 283 ms ± 15.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sage: %timeit H = G.subgraph(V[:100], algorithm='delete') 982 ms ± 32.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sage: %timeit H = G.subgraph(V[:800], algorithm=None) # use 'delete' but 'add' is faster 918 ms ± 35.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sage: %timeit H = G.subgraph(V[:800], algorithm='add') 643 ms ± 19.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sage: %timeit H = G.subgraph(V[:800], algorithm='delete') 909 ms ± 35.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sage: %timeit H = G.subgraph(V[:900], algorithm=None) # use 'delete' but 'add' is faster 862 ms ± 19.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) sage: %timeit H = G.subgraph(V[:900], algorithm='add') 735 ms ± 16.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
comment:4 Changed 14 months ago by
 Commit changed from 9495c5b111fb3df9b33f2f955ccbbfd44963f9f8 to c211ff85508ac09c27285aa84c99b98d63e7f4ea
Branch pushed to git repo; I updated commit sha1. New commits:
c211ff8  trac #30753: change threshold for default algorithm

comment:5 Changed 14 months ago by
I change the threshold for default algorithm to use use 'delete'
when the subgraph has at least 90% of the vertices and 'add'
otherwise. It seems faster this way. I updated some doctests (mostly changes in orderings).
comment:6 Changed 14 months ago by
The example in this ticket might imply, that #30665 should be put back onto needs work.
I see a huge slowdown.
comment:7 Changed 14 months ago by
OK, so may be we should wait for the finalization of #30665 before changing the threshold.
comment:8 Changed 14 months ago by
I think I'm done with #30665.
However, I could open another ticket and implement a subgraph iterator in the backend: Given a list of vertices, iterate over all edges in the induced subgraph. This should be much faster.
comment:9 Changed 14 months ago by
We can give it a try, at least for simple cases. You can either do it here (the branch is public) or in a separate ticket but it might be harder to ensure that all changes are consistent (i.e., improve the situation).
comment:10 Changed 14 months ago by
I would say, it is definitely worth it. But I have only a draft now with some todos.
In particular, this should speedup the copy method:
sage: G = graphs.Grid2dGraph(100, 100) sage: V = list(G) sage: %timeit H = G.copy() 31.1 ms ± 183 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) sage: %timeit H = G.subgraph(V[:10000], algorithm='add') 16.6 ms ± 15.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
comment:11 Changed 14 months ago by
Btw, subgraph_by_deleting
should only be used in place or if the backend has an optimized copy method (which appears to be not the case yet).
Not sure about the special subgraphs yet, which are not obtained by just reducing the number of vertices.
comment:12 Changed 14 months ago by
 Commit changed from c211ff85508ac09c27285aa84c99b98d63e7f4ea to adb2d57c09b4ea0e9abc706c37a831d909203ead
Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
5f7b30d  reviewers suggestion

002ec5d  document that an iterator can be used to relabel edges

5deaa25  fix wrong usage and document it

d7c27eb  make iterator saver

86b0ba6  expose `sort_vertices`

737e0f8  lookup in bitset is much faster

07b7a8e  bitset must not be empty

957b4cc  check if vertices are empty

8dbef67  Merge branch 'u/ghkliem/graphs_improve_edge_iterator' of git://trac.sagemath.org/sage into public/graphs/30753_subgraph

adb2d57  first step towards subgraph_by_adding in backend

comment:13 Changed 14 months ago by
 Commit changed from adb2d57c09b4ea0e9abc706c37a831d909203ead to 3a42e25f9b335edbe748daef3d5d924d3d5b775f
Branch pushed to git repo; I updated commit sha1. New commits:
3a42e25  check for bitset capacity

comment:14 Changed 14 months ago by
 Status changed from needs_review to needs_work
comment:15 Changed 14 months ago by
 Dependencies set to #30665
comment:16 Changed 14 months ago by
I'm not sure about this last commit as it has a different number than in #30665. Is it normal ?
comment:17 Changed 14 months ago by
This usually happens to me when cherry picking. I can rebase at some point on #30665.
comment:18 Changed 14 months ago by
 Commit changed from 3a42e25f9b335edbe748daef3d5d924d3d5b775f to 0372941f0f4e62077b5957322d4ae56427dc06ba
comment:19 Changed 14 months ago by
 may be the method
obtain_subgraph
could be renamed likesubgraph_given_vertices
or something more explicit with respect what it actually do.  there is a line
cdef CGraph
to remove  instead of
max(b_vertices_2)
, why not using the size or capacity of bitsetactive_vertices
? or may be the size ofb_vertices
?  Somehow, before starting to add vertices to
other
, we should callrealloc
with appropriate size, thus avoiding multiple calls.  I have no clue how to deal with multiple edges here.
 Concerning methods
add
anddelete
, I fully agree that nowdelete
should be used only wheninplace is True or algorithm == 'delete'
. Theadd
method is now way faster and should be the default one, without threshold.
comment:20 Changed 14 months ago by
I'll keep that in mind.
Concerning the multiedges, unfortunately I shbould clean up dense and sparse graphs backend a bit first. Otherwise, I think it is not a problem.
If self
is mutliedge, it will simply try to make
other
multiedge. If other is dense, this will raise an error.
If I do things right, optimizing is_subgraph
afterwards should be easy. In particual this can be used to check equality fast.
comment:21 Changed 14 months ago by
So may be for the moment, we can use the new method for graphs without multiple edges only and let other cases in the todo list (i.e., raise NotImplementedError
if called on a graph with multiple edges).
It's cool if we can also optimize other methods. The work you have already done for iterating edges plus these optimizations of the subgraph method makes significant speed ups for the entire graph module :))
comment:22 Changed 14 months ago by
The current method does not even work for labeled graphs.
Give me a day or so and I'll see if I can figure things out.
comment:23 Changed 14 months ago by
 Branch changed from public/graphs/30753_subgraph to public/graphs/30753_subgraph_new
 Commit changed from 0372941f0f4e62077b5957322d4ae56427dc06ba to 75b7de5acbbc1be7772b99ae0b8efeaee0d59ba4
This is basically working.
What is missing:
 Documentation.
 Cleanup.
 Exposure in
__init__
ofDiGraph
andGraph
to speed up e.g. copy.  (Maybe) Heuristics when to delete instead of add:
As long as there are no improved copying methods, I changed the deletion subgraph to first obtain the subgraph induced by the given vertices instead of making a copy. Depending on the input of
edges
andedge_property
, this might be better than adding, but maybe in not so many cases. And in those cases it might be more natural to obtain a subgraph first and then delete edges.
Last 10 new commits:
c71a05e  fix doctests

a1e1445  trac #30753: improve _subgraph_by_adding

19f3d62  trac #30753: change threshold for default algorithm

3306bc9  first step towards subgraph_by_adding in backend

ae9204b  reviewers comments

fd5a206  deal with labels and multiple edges

bafda55  avoid bitset must not be empty

853802a  implement subgraph_given_vertices for static sparse

159869f  use cases for subgraph

75b7de5  allow deleting of multiedges and loops

comment:24 Changed 14 months ago by
This is really nice and I observe some speed up at least when doctesting the graph module :))
comment:25 Changed 14 months ago by
 Commit changed from 75b7de5acbbc1be7772b99ae0b8efeaee0d59ba4 to cbda82c0f6674c1248338cb177273b8a24435af1
comment:26 Changed 14 months ago by
 Dependencies changed from #30665 to #30665, #30769
comment:27 Changed 14 months ago by
 Commit changed from cbda82c0f6674c1248338cb177273b8a24435af1 to 776e38f30e301f4bef172ff752633412f641e770
Branch pushed to git repo; I updated commit sha1. This was a forced push. Last 10 new commits:
1a8ec00  trac #30753: change threshold for default algorithm

29b485b  first step towards subgraph_by_adding in backend

2f25e38  reviewers comments

f4f2473  deal with labels and multiple edges

fd624f2  avoid bitset must not be empty

f4bc40b  implement subgraph_given_vertices for static sparse

05c3f72  use cases for subgraph

7a4326e  allow deleting of multiedges and loops

fb96022  fix mistakes

776e38f  expose subgraph method to initialization/copying

comment:28 Changed 14 months ago by
 Commit changed from 776e38f30e301f4bef172ff752633412f641e770 to 40bacf03c0633affd490bac998856618879ba271
comment:29 Changed 14 months ago by
 Status changed from needs_work to needs_review
comment:30 Changed 14 months ago by
 Description modified (diff)
comment:31 Changed 14 months ago by
 Reviewers set to David Coudert
 Status changed from needs_review to positive_review
It's not easy to review the changes done in this ticket as it depends on two other big tickets. Nonetheless, so far all my tests are successful and show serious speed up with respect previous design. Method subgraph was a bottleneck in several of my own code.
comment:32 Changed 13 months ago by
 Branch changed from public/graphs/30753_subgraph_new to 40bacf03c0633affd490bac998856618879ba271
 Resolution set to fixed
 Status changed from positive_review to closed
I'm not sure if the threshold for selecting default algorithm is still best possible. At least, we have improved both algorithms.
New commits:
trac #30753: improve _subgraph_by_adding