The Challenges of Big Property Data

The Challenges of Big Property Data

Is the plethora of data available in the property sector making us complacent?

Jessica Christiansen-Franks, cofounder of Commercial Proptech of the Year at the Proptech Awards 2023, Neighbourlytics, discusses the use and misuse of data in real estate, the rise of arbitrary boundaries and how we need to resolve availability bias to build better proptech. 

Jessica Christiansen-Franks

Wasn't really sure how to frame this. I didn't know if the morning would be full of people talking about how great tech was and I'd have to come up here and say there are some problems. But I'm very glad that Nigel set the scene for me. And I don't need to start right at the beginning with bursting the bubble that we're not necessarily getting it all right in the industry. So I'm a co founder and CEO of Neighbourlytics. We're a data analytics startup based out of Melbourne, but I'm here all the time and we have been operating 2017, so I have quite a particular view on what happens when data goes wrong in the city making space. Now, I'm a landscape architect originally, very. Not anymore. I'm terrible at getting plants to stay alive or telling you what to put beside the clothesline. But I take a different approach to thinking about technology in cities because I come from that unconventional background.


Jessica Christiansen-Franks

So we know tech has fundamentally changed our cities. We heard about that this morning. But if I'm here to talk to you about the role that data plays in that, I obviously can't do that without putting on the chart. We've also been 100 times before showing the hockey stick curve of the volume of data being created all the time. And so by the year that we're in now, they estimate that we will have process generated and stored. It's 120 zetabytes of data. I didn't know what that was. It's 120,000,000,000,000gb of data. So only 10% of that is new. The rest is being stored and shared and copied and other things. But even just of the new stuff, based on the population, that means every single human, every day is generating 5.4gb of new data. Right? So there's a huge volume of data coming into play.


Jessica Christiansen-Franks

And, of course, us here in Proptech, a lot of the suppliers in the room that the technology companies like myself are looking at how to productize and how to get that to the right decision makers in the property sector to make the right decision. So in that context, we ask the question, with all this data around and the ever increasing volume of data, are the cities getting better yet? Now, Nigel already touched on this, which is helpful, certainly just doing a cursory internet search on the question. We all know, of course, there's a lot of challenges in the Australian property sector at the moment. There's a cost of living crisis. There's issues with the supply chain falling down and the organizations themselves not being able to stay operating. And then, of course, vacancies and how we come out of this post COVID world, what happens with tenancies vacancies?


Jessica Christiansen-Franks

All of these things, the way that we're living, the way that we're using space, has changed. And, of course, you can't do a Google search about smart cities and property in Australia without coming across articles like this of course, we're not taking the conspiracy theorists particularly seriously, and we've got the tinfoil still here. But there is a lot of evangelizing about smarts and he's evangelizing about proctech for the last seven years. Technology is it the money? Yeah. Government. We'll just organize their clothes. I'm actually going to put it on the Lanyard. We actually did do a test before I started. There we go. Okay, sorry. And so this question of what is Tech actually doing for cities, are we doing a better job? Of course, that's a large part of the topic that we're here to talk about today. So as a landscape architect talking to you about Data and Tech, I probably need to tell you a little bit about my background and why I might have a worthwhile opinion on this.

Jessica Christiansen-Franks

So this is a comedy photo of me as a child. It's not relevant for any reason other than I got an interest in cities and how they're created. When I was a kid, I was fortunate enough to be an army brat. So I grew up living all over the world. And so I was very connected to what it means to belong somewhere. How cities are very different from one another. And when you're a newcomer coming to a place, how you participate, what it feels like to be connected and how the city itself, the urban environment itself, can shape your life. And so for that reason, I went into landscape architecture and town planning. I became disappointed quite quickly that town planning didn't really let me do that. Right. I was fascinated with the system of the city, the system of social connection, how all of those things come together to create economic success.


Jessica Christiansen-Franks

But I found in practice, as an urban designer in town planning, I was often involved in delivering things that might have been architecturally beautiful, that weren't really touching on the system of the city. I then moved more into community engagement. We don't need to talk about that, but was very frustrated with the kind of public displeasure with property when there's so much thinking and brains and intelligence going into making great decisions. Yet this public discourse started to come up where people, the public, weren't happy with what was happening in the property sector. So I personally moved more into community engagement. And this was my first experience with data and back. Not actually that long ago, about ten years ago, when I worked at various local councils. Often what were doing was going out to the community, trying to understand what they thought, what they felt, writing it down and then counting the word.


Jessica Christiansen-Franks

And quite literally, my job was to go through the various submissions and come up with bogus statistics like this that holder citizens would be made on, saying well, these often quite nuanced statements from the community about what's important. To them what they like about this development, what they want the future of their city to be like would be boiled down to this. Many are positive. This many are negative. One time I was asked to count how many times the word trees appeared in the documents. Now, firstly, I was on $90,000 a year, and I was literally counting words in documents. So clearly there's an AI solution to this, but also, it is really the best we can do when it comes to understanding ourselves. And that was the frustration through which we created Novalytics. The data that I needed to make better places actually didn't exist.


Jessica Christiansen-Franks

And where it did exist, it was totally bogus. And so there needed to be a better way. And so when we think about the data ecosystem, certainly seven years ago, when we started a lot of data in the property sector about the physical environment itself, that's been a big part of the data ecosystem for a long time. And increasingly, data about the thoughts and feelings. There are certainly many platforms that do a better job than countering the number of times trees appears in submissions. But what were interested in is the piece in the middle, not just what's there, not just what people say, but what is the values and behavior of how people are using and interacting with spaces. And this is the human data gap. And so briefly, before I jump into the problems with the data ecosystem, we're looking at how we can understand who is the community, where do they spend time and what do they love?

Jessica Christiansen-Franks

And across the last seven years, we've built out a multi tiered platform that has data across Australia to look at a number of these things and then generate a report for each of these locations to understand key lifestyle metrics about what happens in our local neighborhoods. Now, as a landscape architect, I didn't know how complicated that would be. Probably wouldn't have started if I'd known. But here we are. And what I've learned across this journey is some of you might be familiar with this concept of the learning curve. So the learning curve isn't just a simple curve. The concept of the learning curve is that there are different points in your understanding along that curve. So as your knowledge increases and confidence is along the bottom, so when you don't know anything and you have no confidence, you're clueless at the bottom. Then as you gain confidence and you can think about it, when you learn to ride a bike, like, a kid who doesn't know how to ride a bike, doesn't know anything about bikes, doesn't care, is clueless, and you start to learn, you know what, you know how to do it, and can become naively confident.


Jessica Christiansen-Franks

Now, where the problem is you go over naive confidence and get into the discouragement stage. And certainly when we started Nabolitics in 2017, that is where the proctech sector was when it came to data. And I believe we're at this point now. So we've had a number of years of the industry becoming natively confident, which I'll talk about, and I'm seeing us being at the beginning of tipping into this discouragingly realistic. And this is the challenge that we all have. If we're looking at how to make sure tech adoption is done well and our industry is making better decisions, we have to recognize the learning curve and see where we're up to. And so those of you again, tech conference, there's got to be a Henry Ford quote the faster horse. We have this burden in the property sector where we've always had data, right?


Jessica Christiansen-Franks

It's just digitizes a lot more of it now. But our sector has had data for a long time. But there are limitations with the past data. And what we've accidentally done as an industry is taken the format and questions and methods of past data and automated them, rather than realizing maybe we can measure different stuff now, maybe we can ask different questions, maybe we can use data in a different way. And so a lot of the clients we work with come to us with a series of questions that are the questions they could have asked 20 year old data they could have asked in a survey. It's not the innovative new types of questions and problem solving that they could bring the industry now. And so they're stuck. And we have to do a lot of education to help them realize that maybe there are better questions that they could be asking.


Jessica Christiansen-Franks

And so because I've only got a short amount of time, I've tried to distill the 30,000 problems into just three because that's a nice presentation and talk about the mistakes that we see made regularly in property data and I will clarify that I'm here to talk about the experience data. So we measure lifestyle, we measure property. There is of course a very mature part of the data ecosystem in Australia around economics for cities. And that's a different beast. But when it comes to actually looking at the performance of cities, how they work, how people interact with them and what they love, this is the piece that I wanted to talk to. So the first one, the first mistake made all the time is that people are just happier with static data, right? And when we think about an availability bias, that's what we've always had before data was digitized before it was being created automatically by the way that people are using cities.


Jessica Christiansen-Franks

When you had to run the population census once every five years through manual surveys, when you had to stand on a street and literally count people and look at where they were going or count the occupancy rates of car parking and those sorts of things, it wasn't possible and straightforward to do it regularly. The data would be done once. And we've brought this mindset of, oh, we've done one that's pretty good and so in experience data, this sort of one and done snapshot idea when we think about lifestyle, happens all the time and is extremely problematic. So think about somewhere like the Queen Street Mall in Brisbane. If you wanted to say what's the experience of that place is, a photo from Google of it, people might draw different images in their mind of it. It could be a way of doing a place order and understanding what that place is like.

Jessica Christiansen-Franks

But what about what it's like at night? And what about what it's like during the Christmas market for when there's a pop up event or different seasonal events? And what about how it matures over time as the cost of living cris changes to Australia? So, yes, it is possible to measure lifestyle as a one and done approach, but it's also extremely problematic to think of anything in our city as static because it's not in any way, shape or form. And so to think about an example of how to change this and the reason I'm using neighborhoods examples is because we don't want to be the only organization out there beating this drum. We want all of you to be encouraging the industry to be thinking differently about the kind of data that is being used and how it's changing. So the data, digital data that many of you are using is real time and it can track change.


Jessica Christiansen-Franks

And tracking change is actually really complex. There's lots and lots of different things that can be tracked. Sorry about the incredibly washed out maps there, but there's a number of different ways. If you're thinking about how dynamic cities are, there's lots of different ways that they're dynamic. So different times of the day and days of the week, they are dynamic. This is a hotspot map using mobile phone movement data, looking across the month of September, just gone in the suburb of Putskray in Victoria. Where did people spend time in the day? What about at night? What about on weekdays? What about on weekends? This type of data is real time, fairly easy to use. And this is the way we should be interrogating our cities to provide insights like what do people love at night? What's the experience like on the weekend? How is the city dynamic and changing across different seasons?


Jessica Christiansen-Franks

Thinking about change across a year is another way of doing it. We did a lot of work across COVID, lots of change overnight without me even putting the dates and charts on. You could probably guess that's when lockdown started. So this is a chart about the volume of activity in the suburb of Chasten in Victoria, month by month, looking at what the quantity of activity looks like. So you get sort of fairly typical time of year cycles and there was lockdown, which none of us saw coming. And then how we came out of lockdown was very nuanced. So thinking about how places change over time. And of course, with digital data, you can look at places and compare them very easily to one another. So we did a piece where we looked at different CBDs, regional centers, greenfield areas. You can again see not just how one place is changing over time, but how change might generally look over time and what the difference is from one place to another.


Jessica Christiansen-Franks

Then also thinking about data being really real time, it makes it very easy to do point to point comparisons. When we started Neighbourlytics in 2017, we thought this would be the thing that flew off the shelf, where we can give you a snapshot into what people are talking about every day of the year, across however long you want to pay us the subscription for. And what we discovered is although that idea cut through and people were very interested in it, they were so used to only looking at a place once at the beginning of the project, they weren't really asking the questions. As the project went on and being required to think about how did it change in December to January to February before the project went in, what was it like? And then once the project was in, what was it like? And once the project was established, what was it like?


Jessica Christiansen-Franks

The industry just isn't set up to think in this way yet because we have this availability bias of static data from the past. The second point is that we so often are choosing the wrong geography. Again, there's an availability bias here. A lot of the data we use in our sector, of course, is geospatial. That's how it cities data. But too often we are using the wrong geography. And the bane of my existence is statistical areas. Like if somebody firstly can tell me off the top of their head what a statistical area is right now, if you could just tell me where the boundaries of this one where is. Okay, a couple of people here might actually know that. But the point is, when you live in a place, that's not what you think about, that's not what the experience of the place is. It's not how you use a space.

Jessica Christiansen-Franks

And so because of the history of statistical areas, like the uniform boundaries that are used so that data sets can be matched, what a utility to that? That's great, but it means data gets pre aggregated and it's very difficult to then break it down and understand site by site, project by project what is actually important to the context of that place. And so here's a quick exercise in Springfield Lakes. So this is the postcode for 30 zero. That is the actual suburb of Springfield Lakes. That is the 20 minutes neighborhood. So the 1 km radius. And then that's where you can walk to within a ten minute walk of the center. So at that end, that's actually more the lived experience of that place. The suburb boundary itself. So if you're looking at data for Springfield Lakes to the look at this strange shape over here, it just happens to capture this other random piece of bushland.


Jessica Christiansen-Franks

It's not relevant to the way that space is used at all. And you can see they're cut up over time. This is a deck, that a slide that I've had for years from 2018. It's so washed out about the trouble with postcodes. This is one postcode they split up. Did you realize that so many postcodes actually split up into multiple unconnected areas? Yet we use that as data to understand how places are working. And in fact, I checked it this morning and thought, what has that changed? Yeah, it has actually changed since 2018. So that postcode boundary is different now. So again, too often we're using these pre aggregated data sets that are totally irrelevant to the actual lived experience of neighborhoods. When you think about this is Smith Street in Fitzroy, Collingwood. Smith street is the center of that place. It's the heart of that community, yet it is literally on the boundary of Fitzroy and Collingwood.


Jessica Christiansen-Franks

If you wanted to understand how Smith Street is working, you'd have to be picking up two different postcodes and matching them together and then picking up other buyers from other areas. It's just a very problematic way to think about place and it doesn't need to be like that. So we always ask, can I get it at a suburb boundary? Can I get it at a particular area? Look, you don't actually want that. The education piece we do not do that. We have some suburb data in our platform, but we very quickly jump into this concept of just a hyper local data set. So look at the catchment immediately around your asset, immediately around the area of interest and use that as the geography. And most of the data that's been used in property has the ability to do this, yet it's not part of the discourse of how we consume data.


Jessica Christiansen-Franks

And so, just to give an example of that, this is Penrith here in Greater Sydney, and we're looking, sorry, I don't have a context photo here. We're looking at the comparison of the north to the south of the train line. So it's all one suburb boundary, it's all one statistical area, it's all sort of part of downtown Penrith. But of course, because of the train line, there's a material urban difference between the north and the south. So you can see from the photos, they do physically look quite different. And then when you look at the data again, base map is very light, but you do get every dot on this map is a thing in the neighborhood that people are interacting with. And in the north of the train line, within a five minute walk of that super local area there's 99 things to do in the south, there's 1100 so the lived experience of that place is totally different, yet they're lumped together in the same statistical area.


Jessica Christiansen-Franks

Then when we look at the mix of different places in the north, there's a limit to the number of things to do. So there's a lot of lifestyle gaps in that place, whereas in the south those gaps aren't there. There's really good employment services, there's lots of things to do from a bar and dining. So again, totally different live experience that is missed if we use the wrong geographies to understand faith. And then the third, and frankly, one of the most problematic issues with data in the property sector is that so often property practitioners are expecting it to be unbiased. Right? And this is actually a huge problem because all data is biased. All data has strength because bias just means there are limitations to edge cases that skew up one way or another. Every single data set has that right. Even the data set where you go and require every citizen in Australia on the same night every five years to fill out the same questions, that has a bias.


Jessica Christiansen-Franks

It has a participation bias of who was at home, who can speak English, who can understand the questions, who in the household pulled it out? Right? So there is bias in everything. And so this discourse that we need unbiased data is incredibly limiting. Instead, what we need to be doing is recognizing that all data sets have a bias and use that bias. Understand therefore, what the inherent strength of that data set is and what it shouldn't be used for. And this availability bias comes from the fact that back when we didn't have much data and we only had a few data points through which to extrapolate a trend, it really did matter that you weren't just doing a survey in the street and asking everybody different questions or only stopping the people in suits. It really mattered that the integrity and sort of predictability of each input was very well known.

Jessica Christiansen-Franks

But in the world we're in now, where there's so much data, by over cleaning it and over curating it, we miss the ability to see the outlying pattern. And so this is what happens with the new types of data, is because it's generated in different ways, if you look at it in its raw form, without it being super cleaned and sanitized, you can find other sorts of super interesting patterns. And so in this spirit that all data has bias. My background in urban design placemaking with a lot of community engagement almost as a way to get past the bias, like go to the people and talk to them and find those things out. There's a lot of issues with this participation or people giving their opinion that has a bias. Like what is front of mind when somebody is giving an opinion is often based on their most recent experience and usually a negative one.


Jessica Christiansen-Franks

It's not about that. People aren't capable of saying just generally, my typical behavior is this. That's just not something people can do in that kind of format. There's a self selection bias, too. The people that choose to turn up to an event, the people that have the time, the people that can attend at the time that it's there, the people that can use a computer, all those sorts of things. And even just the vector of asking somebody for feedback. People are just trying to be helpful. That has a bias to it as well. There's all sorts of distortions that can happen with all the data sets that are super commonly used. So it's very important to understand what each data input is then actually good for, and then use that bias to its strength. I'll give you an example of this. So a core part of our data that we bring in is social media.


Jessica Christiansen-Franks

Now, if somebody can tell me a less biased set of data out there, I want to know what is like that has got to be the most distorted, the most opinionated. It's the shiny life. People want their friends to know that they have or think that they have put out in the world for everyone. So there's tons of bias that goes into what is on social media. What that means, it's good for. It's not good for just knowing how many people go to a cafe, because not everyone posts and everyone talks about it, and everyone cares about the cafe to go to it. But it is good at a neighborhood level. I'm not talking about anything creepy with private mobile phone accounts or anything like that. This is a neighborhood level. If you're looking at Newtown, what are the things people are choosing to post about in Newtown when they're there?


Jessica Christiansen-Franks

What was important enough to their values, about the lifestyle there, that they wanted to show their friends that they did it, or show their friends that they were there. So that's turning the bias of social media into a strength. So rather than thinking about it as something that's a really great uniform input into all the thoughts anybody's had, it is not that. It is a great way of understanding at a whole of neighborhood level, what are the things in that neighborhood that people want to tell their friends about? And so, to give an example from that, our system pulls together exactly that chatter. So we pull out all the images, the posts, the things that people are saying in neighborhoods, be it in the photos that they're posting, but also the reviews that they're leaving. And we're looking at what that diet can tell us about the lifestyle value.


Jessica Christiansen-Franks

And so the interesting thing about social media, if we think about Instagram, is that when people are posting on Instagram, they're usually saying one of three things. They're either saying, look at where I am, look at what I'm doing or look at who I'm with. Now, all of those are fantastic questions to understand space and to understand neighborhoods. And they're all fairly dynamic. They're all affected by the seasons, they're all affected by the weather, they're affected by certain events happening in the community. And so by pulling this sort of data, which is real time, you can create metrics around the bias of social media. What are the activities in this neighborhood that people are most likely to talk about, are most proud about and most likely to participate in creative activities? Tattoos isn't Foots gray, Victoria activities and hobbies at home food and drinks, particularly home dining.


Jessica Christiansen-Franks

So this sort of stuff comes through. What about the way that they are socially connected? So in Foot's Gray, it's pet dogs mostly, followed by small social groups and families laugh, whereas in other neighborhoods, it's the other way around. People are very proud of hanging out with their kids in the past, that's a big social value. Or at certain times of the year, there's more chatter about socializing in bigger groups and coming together with bigger peer groups and doing that sort of thing. And then also looking at the places that people talk about. In some neighborhoods, it is very much about the civic realm, the streetscapes, the artwork, those sorts of things. In other neighborhoods, it's more about private property, renovations, backyards. All of this comes through in social chatter. By using that bias of people wanting to show things to their friends, not only can it be done at that individual level, it can also be done anything digital aggregated altogether.


Jessica Christiansen-Franks

And look out, what does a typical Australian suburb talk about? And then how does any individual suburb compare to that? This, again, with Foots Gray, we can see creative comes through as one of the most common things that people are talking about. And this tiny black dot, because our engineers say it can't be a line, it has to be a dot for some reason, is the Australian average. So a typical Australian suburb has 4% activities talking about creative, whereas in Footscray it's 20% huge part of the identity of Footsay, and that breaks down into what type of creative events and at what time of year does it change across the year, that sort of thing is going to be different. And then again, the way they're socializing, they're pretty on the Australian average with pet dogs much lower when it comes to family socializing. And so that's the end of the three major things we're doing wrong.


Jessica Christiansen-Franks

And I'm just going to recap them because, again, I'm not here talking to my market, I'm here talking to my peers. Right? And we are a specialist in lifestyle data and we intend to stay that way. And we're gradually integrating with a lot of the other people in this room. We want our data to be as ubiquitous as possible across the property sector, but that. Can only happen if we all throw out these biases of how data has been used in the past. And as an industry, we understand that it can be done better, and these are the things that we want to do. So I want you to be thinking about focus on dynamic data so that you can actually track the things in the neighborhood just because they're asking for almost statistical area problems. I can send out the photo. It's a postcode 30 if you want to find the postcode that's basically all data is so much more valuable.


Jessica Christiansen-Franks

And thinking about leveraging bias, let's call out where bias exists and use it and use that as the strength of the data set. Thank you.