Making routable graph from Point and Line shapefile using Networkx?
I have one shapefile which consists of all the streets in the study area, and another shapefile which has points representing origins and destinations.
Is it possible to find shortest path between all origins and destinations, using the links in the link shapefile?
I primarily use Python, Networkx and QGIS for my work, so it would be great to use these tools for this task.
The harder initial problem to solve is co-registering your point data with your network. These data may have come from different sources, and so some positional error is to be expected. In the absence of more complex rules governing how points should be located in the network, you can use the closest point on the network to each origin/destination as shown in this existing answer. The challenge here is finding the nearest location along the line, not just the closest vertex:
Once you've done that step, you should have your original network, and the newly co-registered points. From there, you can use the nx_shp function of NetworkX to import the shapefile into a graph model.
And though it isn't documented, if you peek at the source, you'll see that if you create a shapefile with two layers, one of points and one of lines, it will use your points as the nodes, and the lines as the edges, which can be useful.
QGIS 1.8 has a built-in class called qgis.networkanalysis, it has functions to tie points to lines and calculate shortest path.
Reading shp using networkx, parallel edges are missing attributes
I am trying to use networkx.read_shp to create a MultiGraph from a shape file. I understand that the function needs to be modified since it usually creates a DiGraph I attempt by trying this:
The rest of the function stays the same. When checking eges I can see that when there are parallel edges one of them seems to have an empty dictionary instead of the attributes (despite these being present in the shapefile). I have seen a similar question before but it seems to be using an outdated version of networkx. Is there any way to make a multigraph that has parallel edges with their attributes? I have seen this answer NetworkX - How to create MultiDiGraph from Shapefile? but it is for a directed one, I can try converting it after but if a solution exist that is more direct it would be appreciated.
TypeError: unhashable type: 'LineString' when using ox.simplify_graph()
I have dataset from which I have constructed a NetworkX compatible graph. A shapefile has been converted to dictionaries for nodes and edges, which has then been converted to a GeoDataFrame . From there on, I have used ox.graph_from_gdfs() to create a functioning graph. The edge GeoDataFrame looks something like this (first row, simplified):
while the node GeoDataFrame looks like this:
Converting these to MultiDiGraph returns no errors:
Same data is also returned when converting back from graph to gdfs.
However, when simplifying G, the following error is raised:
My guess would be that parts of the data in gdf_nodes and gdf_edges are not in the correct format, or that something is missing. However, I can't figure out what. I have not encountered any other errors with OSMnx apart from when using this function.
Here is a simple code to reproduce the error
I suspect there are some duplicated nodes with different IDs (see x,y for 111603 and 111604). Maybe this could be the issue?
Network analysis in Python¶
Finding a shortest path using a specific street network is a common GIS problem that has many practical applications. For example navigators are one of those “every-day” applications where routing using specific algorithms is used to find the optimal route between two (or multiple) points.
It is also possible to perform network analysis such as tranposrtation routing in Python. Networkx is a Python module that provides tools for analyzing networks in various different ways. It also contains algorithms such as Dijkstra’s algorithm or A* algoritm that are commonly used to find shortest paths along transportation network.
To be able to conduct network analysis, it is, of course, necessary to have a network that is used for the analyses. OSMnx package that we just explored in previous tutorial, makes it really easy to retrieve routable networks from OpenStreetMap with different transport modes (walking, cycling and driving). OSMnx also combines some functionalities from networkx module to make it straightforward to conduct routing along OpenStreetMap data.
Next we will test the routing functionalities of OSMnx by finding a shortest path between two points based on drivable roads. With tiny modifications, it is also possible to repeat the analysis for the walkable street network.
Making routable graph from Point and Line shapefile using Networkx? - Geographic Information Systems
This is one of the 100+ free recipes of the IPython Cookbook, Second Edition, by Cyrille Rossant, a guide to numerical computing and data science in the Jupyter Notebook. The ebook and printed book are available for purchase at Packt Publishing.
14.7. Creating a route planner for a road network
In this recipe, we build upon several techniques described in the previous recipes in order to create a simple GPS-like route planner in Python. We will retrieve California's road network data from the United States Census Bureau in order to find shortest paths in the road network graph. This will allow us to display road itineraries between any two locations in California.
You need Smopy for this recipe. You can install it with pip install git+https://github.com/rossant/smopy . In order for NetworkX to read Shapefile datasets, you also need GDAL/OGR. You can install it with conda install gdal .
At the time of this writing, gdal does not appear to work well with conda and Python 3.6. You may need to downgrade Python to Python 3.5 with conda install python=3.5 .
- We load the data (a Shapefile dataset) with NetworkX. This dataset contains detailed information about the primary roads in California. NetworkX's read_shp() function returns a graph, where each node is a geographical position, and each edge contains information about the road linking the two nodes. The data comes from the United States Census Bureau website at http://www.census.gov/geo/maps-data/data/tiger.html.
- This graph is not necessarily connected, but we need a connected graph in order to compute shortest paths. Here, we take the largest connected subgraph using the connected_component_subgraphs() function:
- We define two positions (with the latitude and longitude) and find the shortest path between these two positions:
- Each edge in the graph contains information about the road, including a list of points along this road. We first create a function that returns this array of coordinates, for any edge in the graph:
- We can notably use the road path to compute its length. We first need to define a function that computes the distance between any two points in geographical coordinates:
- We update our graph by computing the distance between any two connected nodes. We add this information with the distance attribute of the edges:
- The last step before we can find the shortest path in the graph is to find the two nodes in the graph that are closest to the two requested positions:
- Now, we use NetworkX's shortest_path() function to compute the shortest path between our two positions. We specify that the weight of every edge is the length of the road between them:
- The itinerary has been computed. The path variable contains the list of edges that form the shortest path between our two positions. Now, we can get information about the itinerary with pandas. The dataset has a few fields of interest, including the name and type (State, Interstate, and so on) of the roads:
Here is the total length of this itinerary:
- Our path contains connected nodes in the graph. Every edge between two nodes is characterized by a list of points (constituting a part of the road). Therefore, we need to define a function that concatenates the positions along every edge in the path. We have to concatenate the positions in the right order along our path. We choose the order based on the fact that the last point in an edge needs to be close to the first point in the next edge:
We computed the shortest path with NetworkX's shortest_path() function. Here, this function used Dijkstra's algorithm. This algorithm has a wide variety of applications, for example in network routing protocols.
There are different ways to compute the geographical distance between two points. Here, we used a relatively precise formula: the orthodromic distance (also called great-circle distance), which assumes that the Earth is a perfect sphere. We could also have used a simpler formula since the distance between two successive points on a road is small.
You can find more information about shortest path problems and Dijkstra's algorithm in the following references:
Enterprise Collaboration modules and strong Log Analysis modules
An open source Java library for online and offline map matching with OpenStreetMap. Together with its extensive set of geometric and spatial functions, an in-memory map data structure and basic machine learning functions, it is a versatile basis for scalable location-based services and spatio-temporal data analysis on the map. It is designed for use in parallel and distributed systems and, hence, includes a stand-alone map matching server and can be used in distributed systems for map matching services in the cloud. Barefoot consists of a software library and a (Docker-based) map server that provides access to street map data from OpenStreetMap and is flexible to be used in distributed cloud infrastructures as map data server or side-by-side with Barefoot's stand-alone servers for offline (matcher server) and online map matching (tracker server), or other applications built with Barefoot library. Access to map data is provided with a fast and flexible in-memory map data structure. Together with GeographicLib  and ESRI's geometry API , it provides an extensive set of geographic and geometric operations for spatial data analysis on the map.
Magellan is a distributed execution engine for geospatial analytics on big data. It is implemented on top of Apache Spark and deeply leverages modern database techniques like efficient data layout, code generation and query optimization in order to optimize geospatial queries. The application developer writes standard sql or data frame queries to evaluate geometric expressions while the execution engine takes care of efficiently laying data out in memory during query processing, picking the right query plan, optimizing the query execution with cheap and efficient spatial indices while presenting a declarative abstraction to the developer.
Missing sections and reverse movements in STMATCH #70
I am using the python API (compiled to python3) and STMATCH.
Things are looking promising, however, I do not get perfect matches of the data: I either get missing links or unnecessary reverse movements (and/or both).
I created a standalone example for the behavior here:
Including a parameter variation and a check of the network.
- What is the best way to tune the parameters?
- Maybe build a minimizer around them? Some kind of gradient search?
- How can the reverse movements be avoided?
- How can a connected match be guaranteed?
- Is there any paper available for the STMATCH algorithm?
- In the FMM-paper, a penalty is mentioned for avoiding reverse movement. Is this still implemented?
- Will FMM with a precalculated UBODT file give better results?
- How is the visualization of the network (e.g. here: #30) done? Is this a QGis plugin?
- any other hints?
The text was updated successfully, but these errors were encountered:
We are unable to convert the task to an issue at this time. Please try again.
The issue was successfully created but we are unable to update the comment at this time.
Cyang-kth commented May 14, 2020 •
The best way would be to manually label a subset of trajectories and then try various parameters to improve accuracy as much as possible.
The hyper-parameter training is better to be implemented externally to the current program.
Either you do a preprocessing to smooth the trajectory to remove the reverse movement (like the example you show), or (actually modify the program) it is majorly caused by the assumption that vehicle must move along the direction of an edge. (One way is to add a tolerance for reverse movement)
A match fails if the path cannot be connected (In the example, I think is likely that some network edges may be directed rather than bidirectional.).
The paper is here https://dl.acm.org/doi/abs/10.1145/1653771.1653820
Lou Y, Zhang C, Zheng Y, et al. Map-matching for low-sampling-rate GPS trajectories[C]//Proceedings of the 17th ACM SIGSPATIAL international conference on advances in geographic information systems. 2009: 352-361.
FMM is quite similar to this one except that precomputation is used.
- In the FMM-paper, a penalty is mentioned for avoiding reverse movement. Is this still implemented?
No, that has been removed from the current implementation as it is seldom used.
I do not think so FMM and STMATCH follow the same principle but one use precomputation and one does not.
It is directly visualized in QGIS without external plugin. You just define a directed line symbol.
I would recommend you to have a look at the network in QGIS first. Some part of it may not be connected.
Cweber9843 commented May 14, 2020
Thanks for the quick reply!
I just realized that my problem really might originate from the network: In order to get directed links, I duplicated the original coordinates and exchanged source/target, but without actually reversing the line. I guess for a true directed network, I will have to revert the geometry. Thanks for nudging me in the right direction!
Cweber9843 commented May 14, 2020
Thank you again for your comments, the problem was in the network indeed. I now reversed each geometry, and now the network truly is bidirectional, not only having source-target pairs swapped with the same direction in the geometry.
Now STMATCH reliably finds good matches, there are little reverse movements and in my test sample I only see one false positive match. The algorithm is way less sensitive to the parameters as well.
I can now go on to more realistic tracks and do more testing there.
Cweber9843 commented May 15, 2020
Sorry, one more question, and I reopen this issue since I think it is somewhat related to my network, but I don't see the mistake:
When I run your notebook example for STMATCH, I get the same results as the example, especially I get a result for cpath.
For my own data and network, however, I only get the point data, not the edges:
For the green line in the plot, I lookup the edge IDs in my network, but I think this is generating some artifacts, such as the lower end of the green line in the figure above.
Do you have any hints why I don't get a cpath?
I did not (yet) create my shapefile the way you described, via postGIS, but rather through geopandas and networkX.
Cyang-kth commented May 15, 2020
The reason that you have an opath but an empty cpath is there are two matched point that is not connected in the network.
If you can actually extract that that single trajectory to a separate file traj.csv and then run with log level of 0, then you can see a more detailed log of the map matching process, which may print that some points are not connected.
The log file can contain hundreds of thousands of rows, you can first export it to a log file then check that log
Cweber9843 commented May 19, 2020
@cyang-kth , thanks again for the quick reply.
I now ran stmatch from command line, on a WGS84 network and data (UTM32 gave an SQLite error no such column: publication_date - do you want a bug report on this? The python API runs independently of the CRS.).
However, I can't see any more information on which points are not connected:
Contents of stmatch.log:
[info][stmatch_app_config.cpp:48 ] Start reading stmatch configuration from arguments [info][stmatch_app_config.cpp:105] Finish with reading stmatch arg configuration [info][stmatch_app_config.cpp:109] ---- Print configuration ---- [info][network_config.cpp:6 ] NetworkConfig [info][network_config.cpp:7 ] File name: ./data/bike_centerline_v1_extract_ANVedges_source_target_rev_renamed_mini_WGS84.shp [info][network_config.cpp:8 ] ID name: id [info][network_config.cpp:9 ] Source name: source [info][network_config.cpp:10 ] Target name: target [info][gps_config.cpp:19 ] GPS format: CSV point [info][gps_config.cpp:20 ] File name: ./data/syntetic_data_ANV_WGS84.csv [info][gps_config.cpp:21 ] ID name: id [info][gps_config.cpp:22 ] x name: x [info][gps_config.cpp:23 ] y name: y [info][gps_config.cpp:24 ] Timestamp name: timestamp [info][result_config.cpp:30 ] ResultConfig [info][result_config.cpp:31 ] File: mr.txt [info][result_config.cpp:32 ] Fields: opath cpath mgeom [info][stmatch_algorithm.cpp:22 ] STMATCHAlgorithmConfig [info][stmatch_algorithm.cpp:23 ] k 4 radius 0.4 gps_error 0.5 vmax 80 factor 1.5 [info][stmatch_app_config.cpp:114] Log level 0-trace [info][stmatch_app_config.cpp:115] Step 1 [info][stmatch_app_config.cpp:116] Use omp false [info][stmatch_app_config.cpp:117] ---- Print configuration done ---- [warning][result_config.cpp:167] Overwrite existing result file mr.txt [info][network.cpp:37 ] Read network from file ./data/bike_centerline_v1_extract_ANVedges_source_target_rev_renamed_mini_WGS84.shp [info][network.cpp:130] Number of edges 60 nodes 28 [info][network.cpp:131] Field index: id 2 source 4 target 5 [info][network.cpp:134] Read network done. [info][network_graph.cpp:17 ] Construct graph from network edges start [info][network_graph.cpp:30 ] Graph nodes 28 edges 60 [info][network_graph.cpp:31 ] Construct graph from network edges end [info][gps_reader.cpp:239] Id index 0 x index 1 y index 2 time index 3 [info][stmatch_app.cpp:33 ] Progress report step 1 [info][stmatch_app.cpp:35 ] Start to match trajectories [info][stmatch_app.cpp:64 ] Run map matching in single thread [info][stmatch_app.cpp:67 ] Progress 0 [info][stmatch_app.cpp:81 ] MM process finished [info][stmatch_app.cpp:87 ] Time takes 0.002 [info][stmatch_app.cpp:88 ] Time takes excluding input 0 [info][stmatch_app.cpp:89 ] Finish map match total points 26 matched 0 [info][stmatch_app.cpp:91 ] Matched percentage: 0 [info][stmatch_app.cpp:92 ] Point match speed: 0 [info][stmatch_app.cpp:93 ] Point match speed (excluding input): -nan [info][stmatch_app.cpp:95 ] Time takes 0.002
I am quite confident that my network is connected and bidirectional now:
The numbers in green and red are source and target node, respectively, and the numbers in black are the edge id. All edges are bidirectional, and all of them seem to be connected (e.g. node 126 is source for edge 154, but is target for 362). I checked this for all edges in this example.
A Recipe for Automatically Going From Data to Text to Reveal.js Slides
Over the last few years, I’ve experimented on and off with various recipes for creating text reports from tabular data sets, (spreadsheet plugins are also starting to appear with a similar aim in mind). There are several issues associated with this, including:
- identifying what data or insight you want to report from your dataset
- (automatically deriving the insights)
- constructing appropriate sentences from the data
- organising the sentences into some sort of narrative structure
- making the sentences read well together.
Another approach to humanising the reporting of tabular data is to generate templated webpages that review and report on the contents of a dataset this has certain similarities to dashboard style reporting, mixing tables and charts, although some simple templated text may also be generated to populate the page.
In a business context, reporting often happens via Powerpoint presentations. Slides within the presentation deck may include content pulled from a templated spreadsheet, which itself may automatically generate tables and charts for such reuse from a new dataset. In this case, the recipe may look something like:
In the previous couple of posts, the observant amongst you may have noticed I’ve been exploring a couple of components for a recipe that can be used to generate reveal.js browser based presentations from the 20% that account for the 80%.
The dataset I’ve been tinkering with is a set of monthly transparency spending data from the Isle of Wight Council. Recent releases have the form:
So as hinted at previously, it’s possible to use the following sort of process to automatically generate reveal.js slideshows from a Jupyter notebook with appropriately configured slide cells (actually, normal cells with an appropriate metadata element set) used as an intermediate representation.
There’s an example slideshow based on October 2016 data here. Note that some slides have “subslides”, that is, slides underneath them, so watch the arrow indicators bottom left to keep track of when they’re available. Note also that the scrolling is a bit hit and miss – ideally, a new slide would always be scrolled to the top, and for fragments inserted into a slide one at a time the slide should scroll down to follow them).
The structure of the presentation is broadly as follows:
For example, here’s a summary slide of the spends by directorate – note that we can embed charts easily enough. (The charts are styled using seaborn, so a range of alternative themes are trivially available). The separate directorate items are brought in one at a time as fragments.
The next slide reviews the capital versus expenditure revenue spend for a particular directorate, broken down by expenses type (corresponding slides are generated for all other directorates). (I also did a breakdown for each directorate by service area.)
The items listed are ordered by value, and taken together account for at least 80% of the spend in the corresponding area. Any further items contributing more than 5%(?) of the corresponding spend are also listed.
Notice that subslides are available going down from this slide, rather than across the mains slides in the deck. This 1.5D structure means we can put an element of flexible narrative design into the presentation, giving the reader an opportunity to explore the data, but in a constrained way.
In this case, I generated subslides for each major contributing expenses type to the capital and revenue pots, and then added a breakdown of the major suppliers for that spending area.
This just represents a first pass at generating a 1.5D slide deck from a tabular dataset. A Pareto (80/20) heurstic is used to try to prioritise to the information displayed in order to account for 80% of spend in different areas, or other significant contributions.
Applying this principle repeatedly allows us to identify major spending areas, and then major suppliers within those spending areas.
The next step is to look at other ways of segmenting and structuring the data in order to produce reports that might actually be useful…
If you have any ideas, please let me know via the comments, or get in touch directly…
PS FWIW, it should be easy enough to run any other dataset that looks broadly like the example at the top through the same code with only a couple of minor tweaks…
Animated Flight Visualization
So far we’ve only looked at static lifeless graphics with a few rollovers for additional information. Let’s make an animated visualization that shows the active flights over time between Melbourne and Sydney in Australia.
The SVG document for this type of graphic is made up of text, lines and circles.
The dynamic parts are the time and the elements within the flight group and the data might look something like this:
To get an x position for a dynamic time we’ll need to create a time scale for each flight that maps its departure and arrival times to an x position on our chart. We can loop through our data at the start adding Date objects and scales so they’re easier to work with. Moment.js helps a lot here with date parsing and manipulation.
We can now pass our changing Date to xScale() to get an x coordinate for each flight.
Departure and arrival times are rounded to 5 minutes so we can step through our data in 5m increments from the first departure to the last arrival.
Enter, Update and Exit
D3 allows you to specify transformations and transitions of elements when:
- New data points come in (Enter)
- Existing data points change (Update)
- Existing data points are removed (Exit)
The code above renders a frame every 500ms with a 5 minute time increment:
- It updates the time
- Creates a new flight group with a circle for every flight
- Updates the x/y coordinates of current flights
- Removes the flight groups when they’ve arrived
This works but what we really want is a smooth transition between each of these frames. We can achieve this by creating a transition on any D3 selection and providing a duration and easing function before setting attributes or style properties.
For example, let’s fade in the opacity of entering flight groups.
Let’s fade out exiting flight groups.
Add a smooth transition between the x and y points.
We can also transition the time between the 5 minute increments so that it displays every minute rather than every five minutes using the tween function.